`aimu.models`¶

Provider-agnostic model clients.

Factory and base class¶

aimu.models.ModelClient ¶

ModelClient(model: Union[Model, ModelSpec, str], **kwargs: Any)

Bases: BaseModelClient

Public factory for provider-backed model clients.

Accepts either a provider's Model enum member or a "provider:model_id" string::

from aimu.models import ModelClient, OllamaModel

# Enum form
client = ModelClient(OllamaModel.QWEN_3_8B)

# String form (no enum import needed)
client = ModelClient("anthropic:claude-sonnet-4-6")
client = ModelClient("ollama:qwen3.5:9b")

Provider-specific kwargs are forwarded to the concrete client::

ModelClient(LlamaCppModel.QWEN_3_8B, model_path="/path/to/model.gguf")
ModelClient(OllamaModel.LLAMA_3_1_8B, model_keep_alive_seconds=120)
ModelClient(LMStudioOpenAIModel.LLAMA_3_2_3B, base_url="http://myserver:1234/v1")

last_usage `property` `writable` ¶

last_usage: Optional[dict]

Token usage of the most recent non-streaming response, or None.

Shape: {"input_tokens", "output_tokens", "total_tokens"}. None after a streaming call or when the provider/server did not report usage.

last_structured `property` `writable` ¶

last_structured

Validated object from the most recent schema= call, or None.

For a streamed structured-output call it is populated only after the stream is fully consumed (mirrors :attr:last_usage).

aimu.models.BaseModelClient ¶

BaseModelClient(model: Model, model_kwargs: Optional[dict] = None, system_message: Optional[str] = None)

Bases: _ChatStateMixin, ABC

Abstract base for all provider clients.

Subclasses implement :meth:generate, :meth:chat, and :meth:_update_generate_kwargs. Tool calling, message history, vision input, and streaming filters are handled here once for every provider.

generate ¶

generate(prompt: str, generate_kwargs: Optional[dict[str, Any]] = None, stream: bool = False, images: Optional[list] = None, include: Optional[Iterable[Union[str, StreamingContentType]]] = None, audio: Optional[list] = None, schema: Optional[type] = None) -> Union[str, Any, Iterator[StreamChunk]]

Single-turn, stateless generation. See :meth:chat for the include filter semantics.

Parameters:

Name	Type	Description	Default
`prompt`	`str`	The text to generate from.	required
`generate_kwargs`	`Optional[dict[str, Any]]`	Provider-specific generation parameters.	`None`
`stream`	`bool`	If True, return an iterator of :class:`StreamChunk` instead of a string.	`False`
`images`	`Optional[list]`	Optional list of images for vision-capable models, same accepted forms as :meth:`chat` (file path, `pathlib.Path`, `bytes`, http(s) URL, or data URL). Raises `ValueError` if the model does not support vision. Unlike :meth:`chat`, this does not touch `self.messages`; the call stays single-turn and stateless.	`None`
`include`	`Optional[Iterable[Union[str, StreamingContentType]]]`	Optional iterable of stream phases to yield. Has no effect when `stream=False`.	`None`
`audio`	`Optional[list]`	Optional list of audio clips for audio-capable models. Each entry may be a file path (str or `pathlib.Path`), raw `bytes`, a `data:audio/...;base64,...` URL, or an http(s) URL (fetched eagerly). Raises `ValueError` if the model does not support audio input. Like `images`, this does not touch `self.messages`. `images` and `audio` are mutually exclusive.	`None`
`schema`	`Optional[type]`	Optional dataclass type or Pydantic v2 model. When set, returns a validated instance of that type instead of a string. See :meth:`chat` for the structured-output semantics (native enforcement when `supports_structured_output` is True, otherwise prompt-and-parse). With `stream=True` returns an iterator of :class:`StreamChunk` ending in a terminal `DONE` chunk whose `content` is `{"result": <object>}`; the validated object is also stored on `self.last_structured` after the stream is consumed.	`None`

chat ¶

chat(user_message: Optional[str] = None, generate_kwargs: Optional[dict[str, Any]] = None, use_tools: bool = True, stream: bool = False, images: Optional[list] = None, include: Optional[Iterable[Union[str, StreamingContentType]]] = None, tools: Optional[list] = None, audio: Optional[list] = None, schema: Optional[type] = None) -> Union[str, Any, Iterator[StreamChunk]]

One model turn against the persistent message history.

A single call issues exactly one model request. If the model requests tools, they are executed and their results appended, and the call returns (the model's response to the tool results comes on the next :meth:chat call — the multi-turn tool loop lives in :class:~aimu.agents.Agent, which wraps this method). Message history persists across calls on the same client.

Parameters:

Name	Type	Description	Default
`user_message`	`Optional[str]`	The text the user is sending this turn. Pass `None` (the default) to run a turn on the current messages without appending a new user turn — the continuation primitive the agent loop uses after a tool turn.	`None`
`generate_kwargs`	`Optional[dict[str, Any]]`	Provider-specific generation parameters. Unknown keys are dropped per-provider; see each client for accepted names.	`None`
`use_tools`	`bool`	If False, suppress tool calls even when the model supports tools.	`True`
`stream`	`bool`	If True, return an iterator of :class:`StreamChunk` instead of a string.	`False`
`images`	`Optional[list]`	Optional list of images for vision-capable models. Each entry may be a file path (str or `pathlib.Path`), raw `bytes`, an `http(s)://` URL, or a `data:image/...;base64,...` URL. Raises `ValueError` if the model does not support vision. Only used on the initial user turn.	`None`
`include`	`Optional[Iterable[Union[str, StreamingContentType]]]`	Optional iterable of stream phases to yield. Defaults to all phases (THINKING, TOOL_CALLING, GENERATING, DONE). Has no effect when `stream=False`. Values may be :class:`StreamingContentType` members or their string equivalents (`"thinking"`, `"tool_calling"`, `"generating"`, `"done"`).	`None`
`tools`	`Optional[list]`	Optional per-call override of the Python `@tool` callables. `None` (default) uses the client's configured `self.tools`; any other value (including `[]` to disable Python tools for this call) replaces them for this call only and is restored afterwards (MCP tools, being callables in `self.tools` via `MCPClient.as_tools()`, are included in the swap). Ignored when `use_tools=False`.	`None`
`audio`	`Optional[list]`	Optional list of audio clips for audio-capable models. Same accepted forms as :meth:`generate`. Raises `ValueError` if the model does not support audio input. Audio blocks persist in `self.messages` for multi-turn context. `images` and `audio` are mutually exclusive per turn.	`None`
`schema`	`Optional[type]`	Optional dataclass type or Pydantic v2 model. When set, returns a validated instance of that type instead of a string. If the model has `supports_structured_output=True` the provider enforces the schema natively (OpenAI `response_format`, Ollama `format=`, Anthropic forced-tool); otherwise the schema is appended to the prompt and the response is parsed. Raises `ValueError` on parse failure. With `stream=True` returns an iterator of :class:`StreamChunk`: thinking/generation chunks stream live, then a terminal `DONE` chunk carries `{"result": <object>}` and `self.last_structured` is set once the stream is consumed. (Anthropic streams the JSON as it is built but emits no thinking, since its forced-tool structured mode is incompatible with extended thinking.) On Anthropic (native, forced-tool) combining `schema` with active `tools` raises.	`None`

Types¶

aimu.models.ModelSpec `dataclass` ¶

ModelSpec(id: str, tools: bool = False, thinking: bool = False, vision: bool = False, audio: bool = False, structured_output: bool = False, generation_kwargs: Optional[dict] = None)

Capability descriptor for a single model.

Holds the provider-side model id plus universal capability flags. Provider-specific extras (e.g. HuggingFace tool-call format) live on the provider's Model subclass, not here.

Equality and hash are by id only, so a ModelSpec can be used directly as an enum value even when generation_kwargs is a dict.

aimu.models.Model ¶

Model(spec: ModelSpec)

Bases: Enum

Base enum for provider model catalogs.

Each member's value is a ModelSpec; capability flags are mirrored as plain attributes (supports_tools, supports_thinking, supports_vision, generation_kwargs) for direct read access. .value returns the provider id string so code can call e.g. ollama.pull(model.value).

aimu.models.StreamChunk ¶

Bases: NamedTuple

A single chunk yielded by client.chat(stream=True), Agent.run(stream=True), image_client.generate(stream=True), or any streaming tool / workflow.

Fields

phase: content type of this chunk (THINKING, TOOL_CALLING, GENERATING, IMAGE_GENERATING, AUDIO_GENERATING, SPEECH_GENERATING, DONE) content: shape depends on phase: - str for THINKING / GENERATING (token). - dict {"name", "arguments", "response"} for TOOL_CALLING (arguments is the dict the model passed to the tool). - dict {"step", "total_steps", "image", "final", "result"} for IMAGE_GENERATING: step is 1-indexed, image is an optional PIL.Image (None unless preview_every opted in this step), final=True marks the terminal chunk for one image, and result carries the encoded output (path / bytes / data-url per format=) on the final chunk. - dict {"step", "total_steps", "final", "result", "duration_s"} for AUDIO_GENERATING: step is 1-indexed (1 of 1 for non-diffusers models), final=True marks the terminal chunk per audio item, and result carries the encoded output on the final chunk. - dict {"chunk_index", "total_chunks", "final", "result"} for SPEECH_GENERATING: total_chunks is None for streaming providers where the total is unknown upfront (OpenAI); 1 for single-pass providers (HuggingFace). final=True marks the terminal chunk; result carries the encoded output on the final chunk only. - str for DONE (usually empty), or dict {"result": <object>} for the terminal chunk of a streamed structured-output call (schema= + stream=True), where result is the validated dataclass / Pydantic instance. agent: name of the agent that produced this chunk, or None for a plain client.chat() / client.generate() call. Set automatically by Agent and workflow runners. iteration: zero-based iteration index inside the agent loop, or 0 for plain chat.

Use chunk.is_text() / chunk.is_tool_call() / chunk.is_image_progress() / chunk.is_audio_progress() / chunk.is_speech_progress() / chunk.is_done() to dispatch on phase without repeating the equality check in user code.

is_text ¶

is_text() -> bool

True if this chunk carries text (THINKING or GENERATING).

is_tool_call ¶

is_tool_call() -> bool

True if this chunk carries a tool-call result.

is_image_progress ¶

is_image_progress() -> bool

True if this chunk carries image-generation progress (IMAGE_GENERATING).

is_audio_progress ¶

is_audio_progress() -> bool

True if this chunk carries audio-generation progress (AUDIO_GENERATING).

is_speech_progress ¶

is_speech_progress() -> bool

True if this chunk carries speech-generation progress (SPEECH_GENERATING).

is_done ¶

is_done() -> bool

True if this chunk is the terminal DONE marker.

For a streamed structured-output call (schema= + stream=True) the DONE chunk's content is {"result": <validated object>}.

aimu.models.StreamingContentType ¶

Bases: str, Enum

Resilience¶

aimu.models.FallbackClient ¶

FallbackClient(clients: list, *, retry_on: tuple[type[BaseException], ...] = (Exception,), system_message: Optional[str] = None, name: Optional[str] = None)

Bases: _FallbackStateMixin, BaseModelClient

A BaseModelClient that delegates to an ordered list of clients with failover.

Parameters:

Name	Type	Description	Default
`clients`	`list`	Ordered client list; index 0 is preferred, later entries are fallbacks.	required
`retry_on`	`tuple[type[BaseException], ...]`	Exception types that trigger failover to the next client. Defaults to `(Exception,)` (fail over on any error). Narrow it (e.g. to timeout/connection errors) to let permanent errors surface immediately instead of being masked.	`(Exception,)`
`system_message`	`Optional[str]`	Optional system prompt for the shared conversation; synced into whichever client runs.	`None`
`name`	`Optional[str]`	Optional label (currently informational only).	`None`

Conversation state (messages, system_message, tools) lives on the FallbackClient and is loaded into whichever client runs, so history is preserved across a failover. Capability flags (is_thinking_model etc.) and model reflect the first client; use capability-compatible clients in one fallback set.

Streaming caveat: failover only happens before the first chunk is emitted. If a client fails mid-stream (after yielding output), the error propagates rather than silently replaying from another client.

aimu.models.FallbackExhaustedError ¶

FallbackExhaustedError(message: str, errors: list[tuple[Any, BaseException]])

Bases: RuntimeError

Raised when every client in a :class:FallbackClient failed.

The most recent client's exception is chained as __cause__; the full list of (client, exception) pairs is available on .errors for inspection.

aimu.models.ModelConnectionError ¶

Bases: RuntimeError

Raised when a model client cannot reach its inference server (e.g. the server is down or the base_url is unreachable). Wraps the underlying provider/transport error, which is preserved on __cause__ so callers can walk the chain for the specific reason (e.g. "Connection refused"). Mirrors :class:aimu.tools.client.MCPConnectionError.

Provider clients¶

aimu.models.OllamaClient ¶

OllamaClient(model: OllamaModel, system_message: Optional[str] = None, model_keep_alive_seconds: int = 60, timeout: Optional[float] = None, max_retries: Optional[int] = None)

Bases: BaseModelClient

aimu.models.AnthropicClient ¶

AnthropicClient(model: AnthropicModel, model_kwargs: Optional[dict] = None, system_message: Optional[str] = None, timeout: Optional[float] = None, max_retries: Optional[int] = None, cache_prompt: bool = False)

Bases: BaseModelClient

Client for Anthropic Claude models using the native anthropic SDK.

Reads ANTHROPIC_API_KEY from the environment (or a .env file). self.messages is always stored in OpenAI format; conversion to the Anthropic API format happens at call time.

aimu.models.HuggingFaceClient ¶

HuggingFaceClient(model: HuggingFaceModel, model_kwargs: Optional[dict] = None, system_message: Optional[str] = None)

Bases: BaseModelClient

aimu.models.LlamaCppClient ¶

LlamaCppClient(model: LlamaCppModel, model_path: str, n_ctx: int = 4096, n_gpu_layers: int = -1, chat_format: Optional[str] = None, chat_handler: Optional[Any] = None, verbose: bool = False, system_message: Optional[str] = None, model_kwargs: Optional[dict] = None)

Bases: BaseModelClient

aimu.models.OpenAIClient ¶

OpenAIClient(model: OpenAIModel, system_message: Optional[str] = None, model_kwargs: Optional[dict] = None, timeout: Optional[float] = None, max_retries: Optional[int] = None)

Bases: OpenAICompatClient

Client for the OpenAI API (GPT and o-series models).

Reads OPENAI_API_KEY from the environment (or a .env file).

aimu.models.GeminiClient ¶

GeminiClient(model: GeminiModel, system_message: Optional[str] = None, model_kwargs: Optional[dict] = None, timeout: Optional[float] = None, max_retries: Optional[int] = None)

Bases: OpenAICompatClient

Client for Google Gemini models via Google's OpenAI-compatible REST API.

Reads GOOGLE_API_KEY from the environment (or a .env file).

aimu.models.LMStudioOpenAIClient ¶

LMStudioOpenAIClient(model: LMStudioOpenAIModel, base_url: str = LMSTUDIO_BASE_URL, **kwargs)

Bases: OpenAICompatClient

aimu.models.OllamaOpenAIClient ¶

OllamaOpenAIClient(model: OllamaOpenAIModel, base_url: str = OLLAMA_BASE_URL, **kwargs)

Bases: OpenAICompatClient

aimu.models.HFOpenAIClient ¶

HFOpenAIClient(model: HFOpenAIModel, base_url: str = HF_OPENAI_BASE_URL, **kwargs)

Bases: OpenAICompatClient

aimu.models.VLLMOpenAIClient ¶

VLLMOpenAIClient(model: VLLMOpenAIModel, base_url: str = VLLM_BASE_URL, **kwargs)

Bases: OpenAICompatClient

aimu.models.LlamaServerOpenAIClient ¶

LlamaServerOpenAIClient(model: LlamaServerOpenAIModel, base_url: str = LLAMASERVER_BASE_URL, **kwargs)

Bases: OpenAICompatClient

Client for llama.cpp's llama-server OpenAI-compatible REST API.

Start the server with

llama-server -m /path/to/model.gguf --port 8080

aimu.models.SGLangOpenAIClient ¶

SGLangOpenAIClient(model: SGLangOpenAIModel, base_url: str = SGLANG_BASE_URL, **kwargs)

Bases: OpenAICompatClient

Client for SGLang's OpenAI-compatible REST API.

Start the server with

python -m sglang.launch_server --model-path --port 30000

aimu.models.OpenAICompatClient ¶

OpenAICompatClient(model: Model, base_url: str, api_key: str = 'not-needed', system_message: Optional[str] = None, model_kwargs: Optional[dict] = None, timeout: Optional[float] = None, max_retries: Optional[int] = None)

Bases: BaseModelClient

Embedding clients¶

Text-to-vector clients, a parallel surface to the chat clients. See the Embed text guide.

aimu.models.EmbeddingClient ¶

EmbeddingClient(model: EmbeddingModel | EmbeddingSpec | str, **kwargs: Any)

Bases: FactoryDelegate

Public factory for text-embedding provider clients.

Parallel to :class:aimu.models.TranscriptionClient. Accepts a provider's :class:EmbeddingModel enum member, an :class:EmbeddingSpec, or a "provider:model_id" string ("openai:...", "ollama:..." or "hf:..."). Provider-specific construction kwargs are passed directly, e.g. EmbeddingClient(model, device="cpu").

Examples::

from aimu.models import EmbeddingClient, OpenAIEmbeddingModel

client = EmbeddingClient(OpenAIEmbeddingModel.TEXT_EMBEDDING_3_SMALL)
client = EmbeddingClient("openai:text-embedding-3-small")
client = EmbeddingClient("ollama:nomic-embed-text")

Provider-specific construction kwargs are passed directly::

EmbeddingClient(HuggingFaceEmbeddingModel.BGE_SMALL_EN_V1_5, device="cpu")

embed ¶

embed(texts: str | list[str], **kwargs: Any) -> Any

Embed text. Forwarded to the inner client's :meth:BaseEmbeddingClient.embed.

aimu.models.BaseEmbeddingClient ¶

BaseEmbeddingClient(model: Any, model_kwargs: Optional[dict] = None)

Bases: ABC

Abstract base for text-embedding provider clients.

Subclasses implement :meth:_embed, which takes a non-empty list of strings and returns one vector (list[float]) per input. The public :meth:embed normalizes a single-string call to a single vector and a list call to a list of vectors, so every provider offers the same ergonomic surface.

dimensions `property` ¶

dimensions: Optional[int]

The embedding vector width declared by the spec, or None if unspecified.

embed ¶

embed(texts: Union[str, list[str]], **kwargs: Any) -> Union[list[float], list[list[float]]]

Embed one string or a list of strings.

A single str returns one vector (list[float]); a list returns a list of vectors (list[list[float]]), preserving order. An empty list returns []. Extra **kwargs are forwarded to the provider call.

aimu.models.resolve_embedding_model_string ¶

resolve_embedding_model_string(model_str: str) -> EmbeddingModel

Look up an embedding-provider model enum from a "provider:model_id" string.

Only matches exact enum-member values; for ad-hoc model ids pass the "provider:..." string directly to :class:EmbeddingClient.

aimu.models.OpenAIEmbeddingClient ¶

OpenAIEmbeddingClient(model: 'OpenAIEmbeddingModel | OpenAIEmbeddingSpec | str', model_kwargs: Optional[dict] = None)

Bases: BaseEmbeddingClient

Text-embedding client for the OpenAI API.

Pass an :class:OpenAIEmbeddingModel member, an :class:OpenAIEmbeddingSpec, or an "openai:<model_id>" string. model_kwargs may carry api_key= / base_url= overrides; otherwise OPENAI_API_KEY is read from the environment.

aimu.models.OllamaEmbeddingClient ¶

OllamaEmbeddingClient(model: OllamaEmbeddingModel | OllamaEmbeddingSpec | str, model_kwargs: Optional[dict] = None)

Bases: BaseEmbeddingClient

Text-embedding client for a local Ollama server.

Pass an :class:OllamaEmbeddingModel member, an :class:OllamaEmbeddingSpec, or an "ollama:<model_id>" string. The model is pulled on construction (same as :class:OllamaClient).

aimu.models.HuggingFaceEmbeddingClient ¶

HuggingFaceEmbeddingClient(model: 'HuggingFaceEmbeddingModel | HuggingFaceEmbeddingSpec | str', model_kwargs: dict | None = None)

Bases: BaseEmbeddingClient

Local text-embedding client backed by sentence-transformers.

Loads model weights lazily on the first :meth:embed call. Pass a :class:HuggingFaceEmbeddingModel member, a :class:HuggingFaceEmbeddingSpec, or a "hf:<repo_id>" string. Target a device with model_kwargs={"device": "cuda:1"}; other model_kwargs are forwarded to SentenceTransformer.

Note on retrieval-tuned models: E5 / BGE expect query/passage prefixes (e.g. "query: ...") for asymmetric retrieval. Pass already-prefixed strings when you need that; symmetric similarity does not.

aimu.models¶

Factory and base class¶

aimu.models.ModelClient ¶

last_usage property writable ¶

last_structured property writable ¶

aimu.models.BaseModelClient ¶

generate ¶

chat ¶

Types¶

aimu.models.ModelSpec dataclass ¶

aimu.models.Model ¶

aimu.models.StreamChunk ¶

is_text ¶

is_tool_call ¶

is_image_progress ¶

is_audio_progress ¶

is_speech_progress ¶

is_done ¶

aimu.models.StreamingContentType ¶

Resilience¶

aimu.models.FallbackClient ¶

aimu.models.FallbackExhaustedError ¶

aimu.models.ModelConnectionError ¶

Provider clients¶

aimu.models.OllamaClient ¶

aimu.models.AnthropicClient ¶

aimu.models.HuggingFaceClient ¶

aimu.models.LlamaCppClient ¶

aimu.models.OpenAIClient ¶

aimu.models.GeminiClient ¶

aimu.models.LMStudioOpenAIClient ¶

aimu.models.OllamaOpenAIClient ¶

aimu.models.HFOpenAIClient ¶

aimu.models.VLLMOpenAIClient ¶

aimu.models.LlamaServerOpenAIClient ¶

aimu.models.SGLangOpenAIClient ¶

aimu.models.OpenAICompatClient ¶

Embedding clients¶

aimu.models.EmbeddingClient ¶

embed ¶

aimu.models.BaseEmbeddingClient ¶

dimensions property ¶

embed ¶

aimu.models.resolve_embedding_model_string ¶

aimu.models.OpenAIEmbeddingClient ¶

aimu.models.OllamaEmbeddingClient ¶

aimu.models.HuggingFaceEmbeddingClient ¶

`aimu.models`¶

last_usage `property` `writable` ¶

last_structured `property` `writable` ¶

aimu.models.ModelSpec `dataclass` ¶

dimensions `property` ¶