Skip to content

aimu.models

Provider-agnostic model clients.

Factory and base class

aimu.models.ModelClient

ModelClient(model: Union[Model, ModelSpec, str], **kwargs: Any)

Bases: BaseModelClient

Public factory for provider-backed model clients.

Accepts either a provider's Model enum member or a "provider:model_id" string::

from aimu.models import ModelClient, OllamaModel

# Enum form
client = ModelClient(OllamaModel.QWEN_3_8B)

# String form (no enum import needed)
client = ModelClient("anthropic:claude-sonnet-4-6")
client = ModelClient("ollama:qwen3.5:9b")

Provider-specific kwargs are forwarded to the concrete client::

ModelClient(LlamaCppModel.QWEN_3_8B, model_path="/path/to/model.gguf")
ModelClient(OllamaModel.LLAMA_3_1_8B, model_keep_alive_seconds=120)
ModelClient(LMStudioOpenAIModel.LLAMA_3_2_3B, base_url="http://myserver:1234/v1")

last_usage property writable

last_usage: Optional[dict]

Token usage of the most recent non-streaming response, or None.

Shape: {"input_tokens", "output_tokens", "total_tokens"}. None after a streaming call or when the provider/server did not report usage.

aimu.models.BaseModelClient

BaseModelClient(model: Model, model_kwargs: Optional[dict] = None, system_message: Optional[str] = None)

Bases: _ChatStateMixin, ABC

Abstract base for all provider clients.

Subclasses implement :meth:generate, :meth:chat, and :meth:_update_generate_kwargs. Tool calling, message history, vision input, and streaming filters are handled here once for every provider.

generate

generate(prompt: str, generate_kwargs: Optional[dict[str, Any]] = None, stream: bool = False, images: Optional[list] = None, include: Optional[Iterable[Union[str, StreamingContentType]]] = None, audio: Optional[list] = None, schema: Optional[type] = None) -> Union[str, Any, Iterator[StreamChunk]]

Single-turn, stateless generation. See :meth:chat for the include filter semantics.

Parameters:

Name Type Description Default
prompt str

The text to generate from.

required
generate_kwargs Optional[dict[str, Any]]

Provider-specific generation parameters.

None
stream bool

If True, return an iterator of :class:StreamChunk instead of a string.

False
images Optional[list]

Optional list of images for vision-capable models, same accepted forms as :meth:chat (file path, pathlib.Path, bytes, http(s) URL, or data URL). Raises ValueError if the model does not support vision. Unlike :meth:chat, this does not touch self.messages; the call stays single-turn and stateless.

None
include Optional[Iterable[Union[str, StreamingContentType]]]

Optional iterable of stream phases to yield. Has no effect when stream=False.

None
audio Optional[list]

Optional list of audio clips for audio-capable models. Each entry may be a file path (str or pathlib.Path), raw bytes, a data:audio/...;base64,... URL, or an http(s) URL (fetched eagerly). Raises ValueError if the model does not support audio input. Like images, this does not touch self.messages. images and audio are mutually exclusive.

None
schema Optional[type]

Optional dataclass type or Pydantic v2 model. When set, returns a validated instance of that type instead of a string. See :meth:chat for the structured-output semantics (native enforcement when supports_structured_output is True, otherwise prompt-and-parse). Mutually exclusive with stream=True.

None

chat

chat(user_message: str, generate_kwargs: Optional[dict[str, Any]] = None, use_tools: bool = True, stream: bool = False, images: Optional[list] = None, include: Optional[Iterable[Union[str, StreamingContentType]]] = None, tools: Optional[list] = None, audio: Optional[list] = None, schema: Optional[type] = None) -> Union[str, Any, Iterator[StreamChunk]]

Multi-turn chat with persistent message history.

Parameters:

Name Type Description Default
user_message str

The text the user is sending this turn.

required
generate_kwargs Optional[dict[str, Any]]

Provider-specific generation parameters. Unknown keys are dropped per-provider; see each client for accepted names.

None
use_tools bool

If False, suppress tool calls even when the model supports tools.

True
stream bool

If True, return an iterator of :class:StreamChunk instead of a string.

False
images Optional[list]

Optional list of images for vision-capable models. Each entry may be a file path (str or pathlib.Path), raw bytes, an http(s):// URL, or a data:image/...;base64,... URL. Raises ValueError if the model does not support vision. Only used on the initial user turn.

None
include Optional[Iterable[Union[str, StreamingContentType]]]

Optional iterable of stream phases to yield. Defaults to all phases (THINKING, TOOL_CALLING, GENERATING, DONE). Has no effect when stream=False. Values may be :class:StreamingContentType members or their string equivalents ("thinking", "tool_calling", "generating", "done").

None
tools Optional[list]

Optional per-call override of the Python @tool callables. None (default) uses the client's configured self.tools; any other value (including [] to disable Python tools for this call) replaces them for this call only and is restored afterwards (MCP tools, being callables in self.tools via MCPClient.as_tools(), are included in the swap). Ignored when use_tools=False.

None
audio Optional[list]

Optional list of audio clips for audio-capable models. Same accepted forms as :meth:generate. Raises ValueError if the model does not support audio input. Audio blocks persist in self.messages for multi-turn context. images and audio are mutually exclusive per turn.

None
schema Optional[type]

Optional dataclass type or Pydantic v2 model. When set, returns a validated instance of that type instead of a string. If the model has supports_structured_output=True the provider enforces the schema natively (OpenAI response_format, Ollama format=, Anthropic forced-tool); otherwise the schema is appended to the prompt and the response is parsed. Raises ValueError on parse failure. Mutually exclusive with stream=True. On Anthropic (native, forced-tool) combining schema with active tools raises.

None

Types

aimu.models.ModelSpec dataclass

ModelSpec(id: str, tools: bool = False, thinking: bool = False, vision: bool = False, audio: bool = False, structured_output: bool = False, generation_kwargs: Optional[dict] = None)

Capability descriptor for a single model.

Holds the provider-side model id plus universal capability flags. Provider-specific extras (e.g. HuggingFace tool-call format) live on the provider's Model subclass, not here.

Equality and hash are by id only, so a ModelSpec can be used directly as an enum value even when generation_kwargs is a dict.

aimu.models.Model

Model(spec: ModelSpec)

Bases: Enum

Base enum for provider model catalogs.

Each member's value is a ModelSpec; capability flags are mirrored as plain attributes (supports_tools, supports_thinking, supports_vision, generation_kwargs) for direct read access. .value returns the provider id string so code can call e.g. ollama.pull(model.value).

aimu.models.StreamChunk

Bases: NamedTuple

A single chunk yielded by client.chat(stream=True), Agent.run(stream=True), image_client.generate(stream=True), or any streaming tool / workflow.

Fields

phase: content type of this chunk (THINKING, TOOL_CALLING, GENERATING, IMAGE_GENERATING, AUDIO_GENERATING, SPEECH_GENERATING, DONE) content: shape depends on phase: - str for THINKING / GENERATING (token). - dict {"name", "arguments", "response"} for TOOL_CALLING (arguments is the dict the model passed to the tool). - dict {"step", "total_steps", "image", "final", "result"} for IMAGE_GENERATING: step is 1-indexed, image is an optional PIL.Image (None unless preview_every opted in this step), final=True marks the terminal chunk for one image, and result carries the encoded output (path / bytes / data-url per format=) on the final chunk. - dict {"step", "total_steps", "final", "result", "duration_s"} for AUDIO_GENERATING: step is 1-indexed (1 of 1 for non-diffusers models), final=True marks the terminal chunk per audio item, and result carries the encoded output on the final chunk. - dict {"chunk_index", "total_chunks", "final", "result"} for SPEECH_GENERATING: total_chunks is None for streaming providers where the total is unknown upfront (OpenAI); 1 for single-pass providers (HuggingFace). final=True marks the terminal chunk; result carries the encoded output on the final chunk only. - str for DONE (usually empty). agent: name of the agent that produced this chunk, or None for a plain client.chat() / client.generate() call. Set automatically by Agent and workflow runners. iteration: zero-based iteration index inside the agent loop, or 0 for plain chat.

Use chunk.is_text() / chunk.is_tool_call() / chunk.is_image_progress() / chunk.is_audio_progress() / chunk.is_speech_progress() to dispatch on phase without repeating the equality check in user code.

is_text

is_text() -> bool

True if this chunk carries text (THINKING or GENERATING).

is_tool_call

is_tool_call() -> bool

True if this chunk carries a tool-call result.

is_image_progress

is_image_progress() -> bool

True if this chunk carries image-generation progress (IMAGE_GENERATING).

is_audio_progress

is_audio_progress() -> bool

True if this chunk carries audio-generation progress (AUDIO_GENERATING).

is_speech_progress

is_speech_progress() -> bool

True if this chunk carries speech-generation progress (SPEECH_GENERATING).

aimu.models.StreamingContentType

Bases: str, Enum

Resilience

aimu.models.FallbackClient

FallbackClient(clients: list, *, retry_on: tuple[type[BaseException], ...] = (Exception,), system_message: Optional[str] = None, name: Optional[str] = None)

Bases: _FallbackStateMixin, BaseModelClient

A BaseModelClient that delegates to an ordered list of clients with failover.

Parameters:

Name Type Description Default
clients list

Ordered client list; index 0 is preferred, later entries are fallbacks.

required
retry_on tuple[type[BaseException], ...]

Exception types that trigger failover to the next client. Defaults to (Exception,) (fail over on any error). Narrow it (e.g. to timeout/connection errors) to let permanent errors surface immediately instead of being masked.

(Exception,)
system_message Optional[str]

Optional system prompt for the shared conversation; synced into whichever client runs.

None
name Optional[str]

Optional label (currently informational only).

None

Conversation state (messages, system_message, tools) lives on the FallbackClient and is loaded into whichever client runs, so history is preserved across a failover. Capability flags (is_thinking_model etc.) and model reflect the first client; use capability-compatible clients in one fallback set.

Streaming caveat: failover only happens before the first chunk is emitted. If a client fails mid-stream (after yielding output), the error propagates rather than silently replaying from another client.

aimu.models.FallbackExhaustedError

FallbackExhaustedError(message: str, errors: list[tuple[Any, BaseException]])

Bases: RuntimeError

Raised when every client in a :class:FallbackClient failed.

The most recent client's exception is chained as __cause__; the full list of (client, exception) pairs is available on .errors for inspection.

Provider clients

aimu.models.OllamaClient

OllamaClient(model: OllamaModel, system_message: Optional[str] = None, model_keep_alive_seconds: int = 60, timeout: Optional[float] = None, max_retries: Optional[int] = None)

aimu.models.AnthropicClient

AnthropicClient(model: AnthropicModel, model_kwargs: Optional[dict] = None, system_message: Optional[str] = None, timeout: Optional[float] = None, max_retries: Optional[int] = None, cache_prompt: bool = False)

Bases: BaseModelClient

Client for Anthropic Claude models using the native anthropic SDK.

Reads ANTHROPIC_API_KEY from the environment (or a .env file). self.messages is always stored in OpenAI format; conversion to the Anthropic API format happens at call time.

aimu.models.HuggingFaceClient

HuggingFaceClient(model: HuggingFaceModel, model_kwargs: Optional[dict] = None, system_message: Optional[str] = None)

aimu.models.LlamaCppClient

LlamaCppClient(model: LlamaCppModel, model_path: str, n_ctx: int = 4096, n_gpu_layers: int = -1, chat_format: Optional[str] = None, chat_handler: Optional[Any] = None, verbose: bool = False, system_message: Optional[str] = None, model_kwargs: Optional[dict] = None)

aimu.models.OpenAIClient

OpenAIClient(model: OpenAIModel, system_message: Optional[str] = None, model_kwargs: Optional[dict] = None, timeout: Optional[float] = None, max_retries: Optional[int] = None)

Bases: OpenAICompatClient

Client for the OpenAI API (GPT and o-series models).

Reads OPENAI_API_KEY from the environment (or a .env file).

aimu.models.GeminiClient

GeminiClient(model: GeminiModel, system_message: Optional[str] = None, model_kwargs: Optional[dict] = None, timeout: Optional[float] = None, max_retries: Optional[int] = None)

Bases: OpenAICompatClient

Client for Google Gemini models via Google's OpenAI-compatible REST API.

Reads GOOGLE_API_KEY from the environment (or a .env file).

aimu.models.LMStudioOpenAIClient

LMStudioOpenAIClient(model: LMStudioOpenAIModel, base_url: str = LMSTUDIO_BASE_URL, **kwargs)

aimu.models.OllamaOpenAIClient

OllamaOpenAIClient(model: OllamaOpenAIModel, base_url: str = OLLAMA_BASE_URL, **kwargs)

aimu.models.HFOpenAIClient

HFOpenAIClient(model: HFOpenAIModel, base_url: str = HF_OPENAI_BASE_URL, **kwargs)

aimu.models.VLLMOpenAIClient

VLLMOpenAIClient(model: VLLMOpenAIModel, base_url: str = VLLM_BASE_URL, **kwargs)

aimu.models.LlamaServerOpenAIClient

LlamaServerOpenAIClient(model: LlamaServerOpenAIModel, base_url: str = LLAMASERVER_BASE_URL, **kwargs)

Bases: OpenAICompatClient

Client for llama.cpp's llama-server OpenAI-compatible REST API.

Start the server with

llama-server -m /path/to/model.gguf --port 8080

aimu.models.SGLangOpenAIClient

SGLangOpenAIClient(model: SGLangOpenAIModel, base_url: str = SGLANG_BASE_URL, **kwargs)

Bases: OpenAICompatClient

Client for SGLang's OpenAI-compatible REST API.

Start the server with

python -m sglang.launch_server --model-path --port 30000

aimu.models.OpenAICompatClient

OpenAICompatClient(model: Model, base_url: str, api_key: str = 'not-needed', system_message: Optional[str] = None, model_kwargs: Optional[dict] = None, timeout: Optional[float] = None, max_retries: Optional[int] = None)

Embedding clients

Text-to-vector clients, a parallel surface to the chat clients. See the Embed text guide.

aimu.models.EmbeddingClient

EmbeddingClient(model: EmbeddingModel | EmbeddingSpec | str, **kwargs: Any)

Bases: FactoryDelegate

Public factory for text-embedding provider clients.

Parallel to :class:aimu.models.TranscriptionClient. Accepts a provider's :class:EmbeddingModel enum member, an :class:EmbeddingSpec, or a "provider:model_id" string ("openai:...", "ollama:..." or "hf:..."). Provider-specific construction kwargs are passed directly, e.g. EmbeddingClient(model, device="cpu").

Examples::

from aimu.models import EmbeddingClient, OpenAIEmbeddingModel

client = EmbeddingClient(OpenAIEmbeddingModel.TEXT_EMBEDDING_3_SMALL)
client = EmbeddingClient("openai:text-embedding-3-small")
client = EmbeddingClient("ollama:nomic-embed-text")

Provider-specific construction kwargs are passed directly::

EmbeddingClient(HuggingFaceEmbeddingModel.BGE_SMALL_EN_V1_5, device="cpu")

embed

embed(texts: str | list[str], **kwargs: Any) -> Any

Embed text. Forwarded to the inner client's :meth:BaseEmbeddingClient.embed.

aimu.models.BaseEmbeddingClient

BaseEmbeddingClient(model: Any, model_kwargs: Optional[dict] = None)

Bases: ABC

Abstract base for text-embedding provider clients.

Subclasses implement :meth:_embed, which takes a non-empty list of strings and returns one vector (list[float]) per input. The public :meth:embed normalizes a single-string call to a single vector and a list call to a list of vectors, so every provider offers the same ergonomic surface.

dimensions property

dimensions: Optional[int]

The embedding vector width declared by the spec, or None if unspecified.

embed

embed(texts: Union[str, list[str]], **kwargs: Any) -> Union[list[float], list[list[float]]]

Embed one string or a list of strings.

A single str returns one vector (list[float]); a list returns a list of vectors (list[list[float]]), preserving order. An empty list returns []. Extra **kwargs are forwarded to the provider call.

aimu.models.resolve_embedding_model_string

resolve_embedding_model_string(model_str: str) -> EmbeddingModel

Look up an embedding-provider model enum from a "provider:model_id" string.

Only matches exact enum-member values; for ad-hoc model ids pass the "provider:..." string directly to :class:EmbeddingClient.

aimu.models.OpenAIEmbeddingClient

OpenAIEmbeddingClient(model: 'OpenAIEmbeddingModel | OpenAIEmbeddingSpec | str', model_kwargs: Optional[dict] = None)

Bases: BaseEmbeddingClient

Text-embedding client for the OpenAI API.

Pass an :class:OpenAIEmbeddingModel member, an :class:OpenAIEmbeddingSpec, or an "openai:<model_id>" string. model_kwargs may carry api_key= / base_url= overrides; otherwise OPENAI_API_KEY is read from the environment.

aimu.models.OllamaEmbeddingClient

OllamaEmbeddingClient(model: OllamaEmbeddingModel | OllamaEmbeddingSpec | str, model_kwargs: Optional[dict] = None)

Bases: BaseEmbeddingClient

Text-embedding client for a local Ollama server.

Pass an :class:OllamaEmbeddingModel member, an :class:OllamaEmbeddingSpec, or an "ollama:<model_id>" string. The model is pulled on construction (same as :class:OllamaClient).

aimu.models.HuggingFaceEmbeddingClient

HuggingFaceEmbeddingClient(model: 'HuggingFaceEmbeddingModel | HuggingFaceEmbeddingSpec | str', model_kwargs: dict | None = None)

Bases: BaseEmbeddingClient

Local text-embedding client backed by sentence-transformers.

Loads model weights lazily on the first :meth:embed call. Pass a :class:HuggingFaceEmbeddingModel member, a :class:HuggingFaceEmbeddingSpec, or a "hf:<repo_id>" string. Target a device with model_kwargs={"device": "cuda:1"}; other model_kwargs are forwarded to SentenceTransformer.

Note on retrieval-tuned models: E5 / BGE expect query/passage prefixes (e.g. "query: ...") for asymmetric retrieval. Pass already-prefixed strings when you need that; symmetric similarity does not.