Model matrix¶

Every model enum member shipped with AIMU, with capability flags. Generated by hand; kept up-to-date with the enums in aimu/models/.

Legend: ✅ = supported, ✗ = not supported.

Anthropic (`AnthropicModel`)¶

Enum member	Model id	Tools	Thinking	Vision
`CLAUDE_FABLE_5`	`claude-fable-5`	✅	✅ (adaptive)	✅
`CLAUDE_OPUS_4_8`	`claude-opus-4-8`	✅	✅ (adaptive)	✅
`CLAUDE_OPUS_4_7`	`claude-opus-4-7`	✅	✅ (adaptive)	✅
`CLAUDE_OPUS_4_6`	`claude-opus-4-6`	✅	✅ (budget)	✅
`CLAUDE_SONNET_4_6`	`claude-sonnet-4-6`	✅	✅ (budget)	✅
`CLAUDE_HAIKU_4_5`	`claude-haiku-4-5`	✅	✅ (budget)	✅

AIMU requests Anthropic reasoning in one of two shapes, fixed per model by a ThinkingStyle on each AnthropicModel member (an Anthropic-specific enum, analogous to HuggingFace's ToolCallFormat):

budget: thinking={"type": "enabled", "budget_tokens": N}; the model always thinks up to the budget. Used by Opus 4.6, Sonnet 4.6, and Haiku 4.5.
adaptive: thinking={"type": "adaptive", "display": "summarized"}; the model decides per request whether and how much to think (it may not think at all on simple prompts), and temperature/top_p/top_k are not sent. Required by Opus 4.7+ and Fable 5, which reject the budget form with a 400.

Both styles surface reasoning as THINKING stream chunks and populate last_thinking. The thinking= column reflects the universal supports_thinking flag; the style only changes how the request is built, handled entirely inside AnthropicClient.

OpenAI (`OpenAIModel`)¶

Enum member	Model id	Tools	Thinking	Vision
`GPT_4O_MINI`	`gpt-4o-mini`	✅	✗	✅
`GPT_4O`	`gpt-4o`	✅	✗	✅
`GPT_4_1`	`gpt-4.1`	✅	✗	✅
`GPT_4_1_MINI`	`gpt-4.1-mini`	✅	✗	✅
`GPT_4_1_NANO`	`gpt-4.1-nano`	✅	✗	✅
`O4_MINI`	`o4-mini`	✅	✗	✅
`O3`	`o3`	✅	✗	✅
`O3_MINI`	`o3-mini`	✅	✗	✗

o-series models emit reasoning tokens that aren't exposed via the API, so thinking=False even though they reason internally. Pass reasoning_effort via generate_kwargs if needed.

Google Gemini (`GeminiModel`)¶

Enum member	Model id	Tools	Thinking	Vision
`GEMINI_2_0_FLASH`	`gemini-2.0-flash`	✅	✗	✅
`GEMINI_2_0_FLASH_LITE`	`gemini-2.0-flash-lite`	✅	✗	✅
`GEMINI_1_5_PRO`	`gemini-1.5-pro`	✅	✗	✅
`GEMINI_1_5_FLASH`	`gemini-1.5-flash`	✅	✗	✅
`GEMINI_2_5_PRO`	`gemini-2.5-pro`	✅	✅	✅
`GEMINI_2_5_FLASH`	`gemini-2.5-flash`	✅	✅	✅

Gemini 2.5 thinking models emit <think> tags on Google's OpenAI-compatible endpoint.

Ollama native (`OllamaModel`)¶

Enum member	Model id	Tools	Thinking	Vision
`QWEN_3_6_35B`	`qwen3.6:35b`	✅	✅	✗
`QWEN_3_6_27B`	`qwen3.6:27b`	✅	✅	✗
`QWEN_3_5_9B`	`qwen3.5:9b`	✅	✅	✗
`QWEN_3_32B`	`qwen3:32b`	✅	✅	✗
`QWEN_3_8B`	`qwen3:8b`	✅	✅	✗
`GEMMA_4_E4B`	`gemma4:e4b`	✅	✅	✅
`GEMMA_4_12B`	`gemma4:12b`	✅	✅	✅
`GEMMA_4_26B`	`gemma4:26b`	✅	✅	✅
`GEMMA_4_31B`	`gemma4:31b`	✅	✅	✅
`GEMMA_3_12B`	`gemma3:12b`	✗	✗	✅
`NEMOTRON_CASCADE_2_30B`	`nemotron-cascade-2:30b`	✅	✅	✗
`NEMOTRON_3_NANO_30B`	`nemotron-3-nano:30b`	✅	✅	✗
`GLM_4_7_FLASH_31B_Q4`	`glm-4.7-flash:q4_K_M`	✗	✅	✗
`GPT_OSS_20B`	`gpt-oss:20b`	✅	✅	✗
`MAGISTRAL_SMALL_24B`	`magistral:24b`	✅	✅	✗
`MINISTRAL_3_14B`	`ministral-3:14b`	✅	✗	✗
`PHI_4_MINI_3_8B`	`phi4-mini:3.8b`	✗	✗	✗
`PHI_4_14B`	`phi4:14b`	✗	✗	✗
`DEEPSEEK_R1_8B`	`deepseek-r1:8b`	✗	✅	✗
`SMOLLM2_1_7B`	`smollm2:1.7b`	✗	✗	✗
`LLAMA_3_2_3B`	`llama3.2:3b`	✗	✗	✗
`LLAMA_3_1_8B`	`llama3.1:8b`	✗	✗	✗

Some Ollama models can technically be asked for tools but produce unreliable tool calls; those are marked tools=False and documented in the enum source.

HuggingFace (`HuggingFaceModel`)¶

Enum member	Repo id	Tools	Thinking	Vision
`QWEN_3_6_27B`	`Qwen/Qwen3.6-27B-FP8`	✅	✅	✅
`QWEN_3_5_9B`	`Qwen/Qwen3.5-9B`	✅	✅	✅
`QWEN_3_8B`	`Qwen/Qwen3-8B`	✅	✅	✗
`GEMMA_4_E4B`	`google/gemma-4-E4B-it`	✅	✗	✅
`GEMMA_3_12B`	`google/gemma-3-12b-it`	✗	✗	✅
`GPT_OSS_20B`	`openai/gpt-oss-20b`	✅	✅	✗
`MAGISTRAL_SMALL`	`mistralai/Magistral-Small-2509`	✅	✗	✗
`MISTRAL_NEMO_12B`	`mistralai/Mistral-Nemo-Instruct-2407`	✅	✗	✗
`MISTRAL_7B`	`mistralai/Mistral-7B-Instruct-v0.3`	✅	✗	✗
`PHI_4_MINI_3_8B`	`microsoft/Phi-4-mini-instruct`	✗	✗	✗
`PHI_4_14B`	`microsoft/phi-4`	✗	✗	✗
`DEEPSEEK_R1_8B`	`deepseek-ai/DeepSeek-R1-Distill-Llama-8B`	✗	✅	✗
`SMOLLM3_3B`	`HuggingFaceTB/SmolLM3-3B`	✅	✅	✗
`LLAMA_3_2_3B`	`unsloth/Llama-3.2-3B-Instruct`	✅	✗	✗
`LLAMA_3_1_8B`	`meta-llama/Meta-Llama-3.1-8B-Instruct`	✅	✗	✗

_VL suffix variants load with AutoModelForImageTextToText for the vision encoder.

llama-cpp (`LlamaCppModel`)¶

Enum member	Hint id	Tools	Thinking	Vision
`LLAMA_3_1_8B`	`llama-3.1-8b`	✗	✗	✗
`LLAMA_3_2_3B`	`llama-3.2-3b`	✗	✗	✗
`MISTRAL_7B`	`mistral-7b`	✅	✗	✗
`QWEN_3_4B`	`qwen3-4b`	✅	✅	✗
`QWEN_3_8B`	`qwen3-8b`	✅	✅	✗
`DEEPSEEK_R1_7B`	`deepseek-r1-7b`	✗	✅	✗
`PHI_4_MINI`	`phi-4-mini`	✅	✗	✗

llama-cpp model ids are hints; the actual model is loaded from model_path= regardless. Capability flags are honoured by the client.

OpenAI-compatible local servers¶

OllamaOpenAIModel, LMStudioOpenAIModel, VLLMOpenAIModel, HFOpenAIModel, LlamaServerOpenAIModel, and SGLangOpenAIModel enumerate a shared set of common open models. Capability flags for a given member are the same across servers (except where footnoted); the model id format differs per server — LM Studio uses loaded model keys, Ollama uses name:tag, vLLM/SGLang/HF Serve use HuggingFace repo paths, llama-server uses GGUF filenames — so consult the enum source for each server's exact ids.

Enum member	Tools	Thinking	Vision	Servers
`LLAMA_3_1_8B`	✅ †	✗	✗	all
`LLAMA_3_2_3B`	✅ †	✗	✗	all except LM Studio
`MISTRAL_7B`	✅	✗	✗	all
`PHI_4_MINI`	✅	✗	✗	all
`QWEN_3_4B`	✅	✅	✗	all
`QWEN_3_8B`	✅	✅	✗	all
`QWEN_3_5_9B`	✅	✅	✗	Ollama, LM Studio
`DEEPSEEK_R1_8B`	✗	✅	✗	Ollama
`DEEPSEEK_R1_7B`	✗	✅	✗	all except Ollama
`GEMMA_3_12B`	✅ †	✗	✅	all except LM Studio
`GEMMA_4_E4B`	✅	✅	✅	all
`GEMMA_4_12B`	✅	✅	✅	all
`GEMMA_4_26B`	✅	✅	✅	all
`GEMMA_4_31B`	✅	✅	✅	all

† On OllamaOpenAIModel, LLAMA_3_1_8B, LLAMA_3_2_3B, and GEMMA_3_12B are marked tools=✗ — Ollama produces unreliable tool calls for these (matches the native OllamaModel policy). The HuggingFace-repo, GGUF, and LM Studio builds mark them tools=✅.

Gemma 4 E4B/12B are natively audio-capable, but audio is left off for every OpenAI-compat server because audio input isn't reliably exposed by these local servers (see the inline comments in openai_compat.py for the per-server reason). 26B/31B have no native audio.

Image generation¶

Image clients use a different spec class than text (HuggingFaceImageSpec / GeminiImageSpec). The capability flags don't apply, so the matrix shows model-specific defaults instead.

HuggingFace diffusers (`HuggingFaceImageModel`)¶

Enum member	Repo id	Pipeline class	Default steps	Default size	img2img
`SD_1_5`	`runwayml/stable-diffusion-v1-5`	`StableDiffusionPipeline`	25	512×512	✓ (`strength=`)
`SDXL_BASE`	`stabilityai/stable-diffusion-xl-base-1.0`	`StableDiffusionXLPipeline`	30	1024×1024	✓ (`strength=`)
`SD_3_5_MEDIUM`	`stabilityai/stable-diffusion-3.5-medium`	`StableDiffusion3Pipeline`	28	1024×1024	✓ (`strength=`)
`FLUX_1_DEV`	`black-forest-labs/FLUX.1-dev`	`FluxPipeline`	28	1024×1024	✓ (`strength=`)
`FLUX_1_SCHNELL`	`black-forest-labs/FLUX.1-schnell`	`FluxPipeline`	4	1024×1024	✓ (`strength=`)
`FLUX_2_KLEIN_4B`	`black-forest-labs/FLUX.2-klein-4B`	`Flux2KleinPipeline`	4	1024×1024	✓ (unified)
`FLUX_2_KLEIN_9B`	`black-forest-labs/FLUX.2-klein-9B`	`Flux2KleinPipeline`	4	1024×1024	✓ (unified)

The img2img column indicates reference_image= support. strength= models derive output from a noisy version of the reference (0 = identical, 1 = ignore it; default 0.75). "unified" models (Flux2KleinPipeline) condition on the reference directly, with no strength parameter; width/height are derived from the reference.

Spec defaults are starting points: pass num_inference_steps=, guidance_scale=, width=, height=, seed= to override per call. Power users can bypass the enum with a "hf:<repo_id>" string for any HuggingFace diffusers model (defaults to DiffusionPipeline auto-detect loader, img2img_pipeline_class=None).

Google Gemini (`GeminiImageModel`)¶

Enum member	Model id	Notes
`NANO_BANANA`	`gemini-2.5-flash-image`	GA channel. Aspect ratio via `aspect_ratio=` (e.g. `"1:1"`, `"16:9"`).
`NANO_BANANA_PREVIEW`	`gemini-2.5-flash-image-preview`	Preview channel; kept for users who pinned it.

Short-name aliases like "gemini:nano-banana" resolve to the full model id at construction. Nano Banana's generate_content API returns one image per call; num_images > 1 issues N requests.

Audio generation¶

Audio clients use HuggingFaceAudioSpec, distinct from the image and text spec classes. The matrix shows generation defaults rather than capability flags.

HuggingFace (`HuggingFaceAudioModel`)¶

Enum member	Repo id	Pipeline type	Default duration	Default steps
`MUSICGEN_SMALL`	`facebook/musicgen-small`	`musicgen`	10 s	N/A
`MUSICGEN_MEDIUM`	`facebook/musicgen-medium`	`musicgen`	10 s	N/A
`MUSICGEN_LARGE`	`facebook/musicgen-large`	`musicgen`	10 s	N/A
`AUDIOLDM2`	`cvssp/audioldm2`	`audioldm2`	10 s	200
`STABLE_AUDIO_OPEN`	`stabilityai/stable-audio-open-1.0`	`stable_audio`	10 s	200

Pipeline types: - musicgen: token-autoregressive generation via HuggingFace transformers. Duration maps to token count (~50 tokens/s at 32 kHz); num_inference_steps does not apply. Single final AUDIO_GENERATING chunk when streaming. - audioldm2 / stable_audio: latent diffusion via HuggingFace diffusers. Accepts num_inference_steps; emits one progress chunk per step plus a final chunk when streaming.

Override per call with duration_s=, num_inference_steps=, seed=, num_audio=. Power users can bypass the enum with "hf:<repo_id>" for any compatible model (pipeline type inferred from known repo prefixes, defaulting to musicgen).

Speech generation¶

Speech clients use SpeechSpec subclasses, distinct from image and audio spec classes. Speech is text-to-speech (TTS) only; speech-to-text will use a separate BaseTranscriptionClient surface.

HuggingFace (`HuggingFaceSpeechModel`)¶

Enum member	Repo id	Pipeline type	Sample rate	Default voice
`MMS_TTS_ENG`	`facebook/mms-tts-eng`	`tts_pipeline`	16 kHz	N/A
`SPEECHT5`	`microsoft/speecht5_tts`	`speecht5`	16 kHz	CMU Arctic xvectors idx 7306
`BARK`	`suno/bark`	`bark`	24 kHz	`v2/en_speaker_6`

Pipeline types: - tts_pipeline: HuggingFace pipeline("text-to-speech"). Any compatible TTS pipeline model. - speecht5: SpeechT5ForTextToSpeech + SpeechT5HifiGan vocoder + x-vector speaker embeddings. Default embedding loads from Matthijs/cmu-arctic-xvectors (index 7306) on first call. Pass voice="N" to use a different dataset index (0–1132); the dataset is cached on the client after the first lookup. - bark: zero-shot voice cloning. Pass voice= a Bark voice code ("v2/en_speaker_6", "v2/en_speaker_9", etc.).

Power users can bypass the enum with "hf:<repo_id>" for any compatible model (pipeline type inferred from known repo prefixes, defaulting to tts_pipeline).

OpenAI (`OpenAISpeechModel`)¶

Requires OPENAI_API_KEY.

Enum member	Model id	Notes
`TTS_1`	`tts-1`	Fast, standard quality. Recommended for live narration.
`TTS_1_HD`	`tts-1-hd`	Slower, higher quality.

Available voices: alloy (default), echo, fable, onyx, nova, shimmer. Pass as voice= to generate(). OpenAI returns raw 24 kHz 16-bit PCM; encode_audio() handles WAV conversion.

Override per call with voice=, speed=, num_audio=. Override per-agent with make_speech_tool(client, voice=..., speed=...).

Model matrix¶

Anthropic (AnthropicModel)¶

OpenAI (OpenAIModel)¶

Google Gemini (GeminiModel)¶

Ollama native (OllamaModel)¶

HuggingFace (HuggingFaceModel)¶

llama-cpp (LlamaCppModel)¶