Switch providers¶
Every AIMU client implements the same BaseModelClient interface, so swapping backends is a one-line change. ModelClient is the factory; it accepts either a Model enum member or a "provider:model_id" string.
Use a model string¶
import aimu
client = aimu.client("anthropic:claude-sonnet-4-6")
client = aimu.client("ollama:qwen3.5:9b")
client = aimu.client("openai:gpt-4o-mini")
client = aimu.client("gemini:gemini-2.5-flash")
Model strings have the form "provider:model_id". Colons inside the model id (e.g. Ollama's qwen3.5:9b) are preserved.
Use an enum member¶
For IDE autocomplete and type checking:
from aimu.models import ModelClient, OllamaModel, AnthropicModel
client = ModelClient(OllamaModel.QWEN_3_5_9B)
client = ModelClient(AnthropicModel.CLAUDE_SONNET_4_6)
Resolve a name, string, or enum uniformly¶
When you accept a model from a CLI flag or config and don't know which form it'll arrive in, resolve_model_enum (text) and resolve_image_model_enum (image) normalise all three to an enum member:
import aimu
from aimu.models import AnthropicModel
aimu.resolve_model_enum(AnthropicModel.CLAUDE_SONNET_4_6) # enum member → returned unchanged
aimu.resolve_model_enum("anthropic:claude-sonnet-4-6") # "provider:model_id" string
aimu.resolve_model_enum("CLAUDE_SONNET_4_6") # bare enum-member name
aimu.resolve_image_model_enum("FLUX_2_KLEIN_4B") # same, for image models
aimu.resolve_image_model_enum("gemini:nano-banana")
Pass the result straight to aimu.client(...) / aimu.image_client(...) (both also accept an enum member):
A bare text name is often ambiguous: the same id ships under several providers (e.g. QWEN_3_8B is an Ollama, HuggingFace, and OpenAI-compat model). resolve_model_enum resolves the tie by preferring a provider where the model is already available locally (running Ollama → cached HuggingFace → reachable local server, tool-capable first), logging the choice at WARNING. If it isn't available anywhere, it raises ValueError listing the "provider:model_id" options, so pin the provider with the string form when you need a specific one. (resolve_image_model_enum has no local-availability notion and raises on the rare ambiguity.)
Use whatever model is already available locally¶
Omit model= entirely and AIMU resolves a default: the AIMU_LANGUAGE_MODEL env var if set, otherwise the first already-available local model (running Ollama → cached HuggingFace → reachable local OpenAI-compat server, tool-capable preferred). A cloud provider is never auto-selected and weights are never downloaded.
To inspect or choose among what's available instead of taking the auto-pick:
aimu.available_text_models() # list[Model]: locally loadable models, provider-priority order
aimu.resolve_default_text_model_enum() # the single auto-pick, as an enum member
Provider keys¶
| Provider key | Extra | API key env var |
|---|---|---|
ollama |
aimu[ollama] |
None |
hf |
aimu[hf] |
None |
llamacpp |
aimu[llamacpp] |
None (model_path= required) |
anthropic |
aimu[anthropic] |
ANTHROPIC_API_KEY |
openai |
aimu[openai_compat] |
OPENAI_API_KEY |
gemini |
aimu[openai_compat] |
GOOGLE_API_KEY |
lmstudio |
aimu[openai_compat] |
None (localhost:1234) |
ollama-openai |
aimu[openai_compat] |
None (localhost:11434) |
hf-openai |
aimu[openai_compat] |
None (localhost:8000) |
vllm |
aimu[openai_compat] |
None (localhost:8000) |
llamaserver |
aimu[openai_compat] |
None (localhost:8080) |
sglang |
aimu[openai_compat] |
None (localhost:30000) |
See the provider matrix for full details.
Check what the provider supports¶
client = aimu.client("ollama:qwen3.5:9b")
client.is_tool_using_model # True
client.is_thinking_model # True
client.is_vision_model # False
Pass provider-specific kwargs¶
Extra kwargs are forwarded to the underlying client constructor:
# Ollama: keep the model warm for 5 minutes
aimu.client("ollama:qwen3.5:9b", model_keep_alive_seconds=300)
# llama.cpp: load a local GGUF file
aimu.client("llamacpp:qwen3-8b", model_path="/path/to/qwen3-8b.gguf")
# LM Studio: point at a non-default host
aimu.client("lmstudio:qwen3.5-9b", base_url="http://myserver:1234/v1")
Each provider client's constructor signature is in the API reference.
Timeouts and retries¶
Networked clients take timeout (seconds) and max_retries, forwarded straight to the provider SDK's own request timeout and bounded retry-on-transient-failure. AIMU adds no retry logic of its own.
import aimu
# Cloud and local OpenAI-compat servers: both supported
client = aimu.client("anthropic:claude-sonnet-4-6", timeout=30, max_retries=5)
client = aimu.client("openai:gpt-4o", timeout=30, max_retries=5)
client = aimu.client("vllm:Qwen/Qwen3-8B", timeout=30, max_retries=5)
These are SDK-native (the anthropic and openai SDKs implement timeout + retry), so the names and behavior match those SDKs exactly. Unset values fall back to each SDK's defaults.
Two caveats:
- Ollama (native provider) supports
timeoutbut has no request-retry; passingmax_retriesraisesValueError. Use theollama-openaiprovider (which routes through the OpenAI SDK) if you need retries against Ollama. - In-process providers (
hf:,llamacpp:) run locally with no HTTP request, so they don't accepttimeout/max_retries(passing them raisesTypeError).
Provider failover¶
Per-client max_retries retries the same provider. To fall back to a different provider when one is down, wrap an ordered list of clients in a FallbackClient:
import aimu
from aimu.models import FallbackClient
client = FallbackClient([
aimu.client("anthropic:claude-sonnet-4-6", timeout=30, max_retries=2), # preferred
aimu.client("openai:gpt-4o", timeout=30, max_retries=2), # fallback
aimu.client("ollama:qwen3.5:9b"), # last resort, local
])
print(client.chat("Hello")) # tries Anthropic; on error falls over to OpenAI, then Ollama
The first client that answers wins. A client that raises hands off to the next, carrying the same conversation history, so a multi-turn chat survives a mid-conversation failover. When every client fails, FallbackExhaustedError is raised with the last error chained as its cause.
FallbackClient is itself a BaseModelClient, so it composes everywhere a plain client does:
from aimu.agents import Agent
agent = Agent(client, "You are a helpful assistant.", tools=[...]) # resilient agent
Notes:
- What triggers failover. By default any
Exception. Narrow it withretry_on=(e.g.FallbackClient([...], retry_on=(TimeoutError, ConnectionError))) so permanent errors surface immediately instead of being masked. - Capabilities (
is_thinking_model,model, etc.) reflect the first client, so use capability-compatible clients in one set. - Streaming fails over only before the first chunk is emitted; an error mid-stream propagates rather than replaying.
- Async:
aio.AsyncFallbackClientis the one-for-one async twin.
Combine the two layers: max_retries/timeout give in-SDK retry against one provider, FallbackClient gives cross-provider failover on top.
Anthropic prompt caching¶
An agent resends the same large prefix to the model every turn: its system prompt and tool schemas. On Anthropic you can cache that prefix (cheaper, faster) by opting in with cache_prompt=True:
import aimu
from aimu.agents import Agent
client = aimu.client("anthropic:claude-sonnet-4-6", cache_prompt=True)
agent = Agent(client, "You are a helpful assistant with a long, detailed system prompt...", tools=[...])
agent.run("First question") # writes the cache (system prompt + tools)
agent.run("Second question") # reads it back — most of the prefix is cached
cache_prompt=True marks the system prompt and the tool definitions with Anthropic's cache_control ephemeral breakpoints at request time. Notes:
- Anthropic only. It's a provider-specific kwarg; passing it to another provider raises
TypeError. - Safe to leave on. Below Anthropic's minimum cacheable size (1024 tokens; 2048 for Haiku) the API simply doesn't cache — no error.
- Observe it. When the response reports caching,
client.last_usageincludescache_creation_input_tokens(first call, writing the cache) andcache_read_input_tokens(later calls, reading it). - Conversation messages aren't cached (they change every turn); the win is the stable system-prompt + tools prefix.
Failure modes¶
aimu.client("foo:bar") raises ValueError listing the available providers if the prefix is unknown, and raises with the available model ids if the prefix is valid but the id isn't:
>>> aimu.client("foo:bar")
ValueError: Unknown provider 'foo'. Available providers (with installed deps): ['anthropic', 'ollama', ...]
>>> aimu.client("anthropic:claude-nonsense")
ValueError: Provider 'anthropic' has no model id 'claude-nonsense'. Available: ['claude-fable-5', 'claude-haiku-4-5', 'claude-opus-4-6', 'claude-opus-4-7', 'claude-opus-4-8', 'claude-sonnet-4-6']