Provider matrix¶
Every supported provider, with the extra needed to install it, the env var (if any) it reads, the default endpoint, and any provider-specific constructor kwargs.
Text providers (aimu.client() / aimu.chat())¶
| Provider key (in model string) | Client class | Extra | API key | Default endpoint | Provider-specific kwargs | Async kind |
|---|---|---|---|---|---|---|
ollama |
OllamaClient |
aimu[ollama] |
— | local | model_keep_alive_seconds=60 |
native (ollama.AsyncClient) |
hf |
HuggingFaceClient |
aimu[hf] |
— (HF login for gated models) | local | model_kwargs={...} (passed to from_pretrained) |
wrapped sync via asyncio.to_thread |
anthropic |
AnthropicClient |
aimu[anthropic] |
ANTHROPIC_API_KEY |
api.anthropic.com | model_kwargs={...} |
native (AsyncAnthropic) |
openai |
OpenAIClient |
aimu[openai_compat] |
OPENAI_API_KEY |
api.openai.com | model_kwargs={...} |
native (AsyncOpenAI) |
gemini |
GeminiClient |
aimu[openai_compat] |
GOOGLE_API_KEY |
generativelanguage.googleapis.com | model_kwargs={...} |
native (AsyncOpenAI) |
lmstudio |
LMStudioOpenAIClient |
aimu[openai_compat] |
— | localhost:1234 | base_url= |
native (AsyncOpenAI) |
ollama-openai |
OllamaOpenAIClient |
aimu[openai_compat] |
— | localhost:11434 | base_url= |
native (AsyncOpenAI) |
hf-openai |
HFOpenAIClient |
aimu[openai_compat] |
— | localhost:8000 | base_url= |
native (AsyncOpenAI) |
vllm |
VLLMOpenAIClient |
aimu[openai_compat] |
— | localhost:8000 | base_url= |
native (AsyncOpenAI) |
llamaserver |
LlamaServerOpenAIClient |
aimu[openai_compat] |
— | localhost:8080 | base_url= |
native (AsyncOpenAI) |
sglang |
SGLangOpenAIClient |
aimu[openai_compat] |
— | localhost:30000 | base_url= |
native (AsyncOpenAI) |
llamacpp |
LlamaCppClient |
aimu[llamacpp] |
— | in-process | model_path= (required), n_ctx, n_gpu_layers, chat_format, chat_handler, verbose |
wrapped sync via asyncio.to_thread |
Image providers (aimu.image_client() / aimu.generate_image())¶
| Provider key (in model string) | Client class | Extra | API key | Default endpoint | Provider-specific kwargs | Async kind |
|---|---|---|---|---|---|---|
hf |
HuggingFaceImageClient |
aimu[hf] |
— (HF login for gated models) | local | model_kwargs={...} (passed to pipeline.from_pretrained) |
wrapped sync via asyncio.to_thread |
gemini |
GeminiImageClient |
aimu[google] |
GOOGLE_API_KEY |
generativelanguage.googleapis.com | model_kwargs={"api_key": "..."} |
wrapped sync via asyncio.to_thread |
Image clients share the provider:model_id string format with text — the namespaces don't collide because they're consumed by separate factories (ImageClient vs ModelClient). Per-call generation kwargs differ by provider: HF takes negative_prompt, width, height, num_inference_steps, guidance_scale, seed; Gemini takes aspect_ratio, image_size. All image clients accept num_images=, format= ("pil" / "path" / "bytes" / "data_url"), and output_dir= from the shared BaseImageClient.generate().
The async kind column says how aimu.aio reaches each provider. Native providers use the SDK's own async client; the async surface delivers real coroutine concurrency. Wrapped sync providers (HF, LlamaCpp) load model weights in-process — the async surface buys you event-loop integration (your handler doesn't block) but not coroutine-level concurrency (the GIL and CUDA stream serialize execution). For these providers, async clients are built by wrapping an existing sync client; see how-to: use async.
Passing provider kwargs¶
Provider-specific kwargs are forwarded through ModelClient and aimu.client():
import aimu
# Ollama: keep model warm
aimu.client("ollama:qwen3.5:9b", model_keep_alive_seconds=300)
# llama-cpp: required model_path
aimu.client("llamacpp:qwen3-8b", model_path="/path/to/qwen3-8b.gguf")
# LM Studio: non-default port
aimu.client("lmstudio:qwen3.5-9b", base_url="http://myserver:1234/v1")
Notes per provider¶
OpenAIClientoverrides_update_generate_kwargsfor o-series models (o1/o3/o4): renamesmax_tokens → max_completion_tokensand forcestemperature=1.AnthropicClientstoresself.messagesin OpenAI format; conversion to Anthropic's format happens at request time. Thinking is native (thinking={"type": "enabled", "budget_tokens": N}), not<think>tag parsing.GeminiClientuses Google's OpenAI-compatible endpoint. Gemini 2.5 thinking models emit<think>tags on this endpoint, so the shared_ThinkingParserworks as-is.OllamaClient(native API) supports vision via the message-levelimages=field; only inline base64 / data URLs work, http(s) URLs raiseValueError.LlamaCppClientloads GGUF files in-process. Vision needs anmmprojprojector via thechat_handler=kwarg (e.g.Llava15ChatHandler(clip_model_path=...)).HuggingFaceClientusesAutoProcessorfor vision when available (Gemma 3/4, Qwen 3.5/3.6 VL). The model'stool_call_formatenum value (XML/JSON_OBJECT/JSON_ARRAY/BRACKETED) tells the client how to parse tool calls.- OpenAI-compatible local servers (LM Studio, vLLM, SGLang, llama-server, HF Transformers Serve) all subclass
OpenAICompatClientand differ only in defaultbase_urland the format of their model id strings. HuggingFaceImageClientloads thediffuserspipeline lazily on the firstgenerate()call. The spec'spipeline_classfield names a class in thediffusersnamespace (e.g."StableDiffusionXLPipeline","FluxPipeline"); auto-moves to CUDA or MPS if available.GeminiImageClientuses Google's nativegoogle-genaiSDK (not the OpenAI-compat endpoint). Callsclient.models.generate_contentwithresponse_modalities=[Modality.IMAGE]; Nano Banana returns one image per call sonum_images > 1issues N requests. Aliases like"gemini:nano-banana"resolve togemini-2.5-flash-image.
See also¶
- Model matrix — every shipped model with capability flags.
- Environment variables — every env var AIMU reads.
- How-to: switch providers — practical patterns.