aimu.models¶
Provider-agnostic model clients.
Factory and base class¶
aimu.models.ModelClient ¶
Bases: BaseModelClient
Public factory for provider-backed model clients.
Accepts either a provider's Model enum member or a "provider:model_id" string::
from aimu.models import ModelClient, OllamaModel
# Enum form
client = ModelClient(OllamaModel.QWEN_3_8B)
# String form (no enum import needed)
client = ModelClient("anthropic:claude-sonnet-4-6")
client = ModelClient("ollama:qwen3.5:9b")
Provider-specific kwargs are forwarded to the concrete client::
ModelClient(LlamaCppModel.QWEN_3_8B, model_path="/path/to/model.gguf")
ModelClient(OllamaModel.LLAMA_3_1_8B, model_keep_alive_seconds=120)
ModelClient(LMStudioOpenAIModel.LLAMA_3_2_3B, base_url="http://myserver:1234/v1")
last_usage
property
writable
¶
Token usage of the most recent non-streaming response, or None.
Shape: {"input_tokens", "output_tokens", "total_tokens"}. None after a
streaming call or when the provider/server did not report usage.
aimu.models.BaseModelClient ¶
BaseModelClient(model: Model, model_kwargs: Optional[dict] = None, system_message: Optional[str] = None)
Bases: _ChatStateMixin, ABC
Abstract base for all provider clients.
Subclasses implement :meth:generate, :meth:chat, and :meth:_update_generate_kwargs.
Tool calling, message history, vision input, and streaming filters are handled here
once for every provider.
generate ¶
generate(prompt: str, generate_kwargs: Optional[dict[str, Any]] = None, stream: bool = False, images: Optional[list] = None, include: Optional[Iterable[Union[str, StreamingContentType]]] = None, audio: Optional[list] = None, schema: Optional[type] = None) -> Union[str, Any, Iterator[StreamChunk]]
Single-turn, stateless generation. See :meth:chat for the include filter semantics.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
prompt
|
str
|
The text to generate from. |
required |
generate_kwargs
|
Optional[dict[str, Any]]
|
Provider-specific generation parameters. |
None
|
stream
|
bool
|
If True, return an iterator of :class: |
False
|
images
|
Optional[list]
|
Optional list of images for vision-capable models, same accepted forms as
:meth: |
None
|
include
|
Optional[Iterable[Union[str, StreamingContentType]]]
|
Optional iterable of stream phases to yield. Has no effect when |
None
|
audio
|
Optional[list]
|
Optional list of audio clips for audio-capable models. Each entry may be a
file path (str or |
None
|
schema
|
Optional[type]
|
Optional dataclass type or Pydantic v2 model. When set, returns a validated
instance of that type instead of a string. See :meth: |
None
|
chat ¶
chat(user_message: str, generate_kwargs: Optional[dict[str, Any]] = None, use_tools: bool = True, stream: bool = False, images: Optional[list] = None, include: Optional[Iterable[Union[str, StreamingContentType]]] = None, tools: Optional[list] = None, audio: Optional[list] = None, schema: Optional[type] = None) -> Union[str, Any, Iterator[StreamChunk]]
Multi-turn chat with persistent message history.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
user_message
|
str
|
The text the user is sending this turn. |
required |
generate_kwargs
|
Optional[dict[str, Any]]
|
Provider-specific generation parameters. Unknown keys are dropped per-provider; see each client for accepted names. |
None
|
use_tools
|
bool
|
If False, suppress tool calls even when the model supports tools. |
True
|
stream
|
bool
|
If True, return an iterator of :class: |
False
|
images
|
Optional[list]
|
Optional list of images for vision-capable models. Each entry may be a
file path (str or |
None
|
include
|
Optional[Iterable[Union[str, StreamingContentType]]]
|
Optional iterable of stream phases to yield. Defaults to all phases
(THINKING, TOOL_CALLING, GENERATING, DONE). Has no effect when |
None
|
tools
|
Optional[list]
|
Optional per-call override of the Python |
None
|
audio
|
Optional[list]
|
Optional list of audio clips for audio-capable models. Same accepted forms
as :meth: |
None
|
schema
|
Optional[type]
|
Optional dataclass type or Pydantic v2 model. When set, returns a validated
instance of that type instead of a string. If the model has
|
None
|
Types¶
aimu.models.ModelSpec
dataclass
¶
ModelSpec(id: str, tools: bool = False, thinking: bool = False, vision: bool = False, audio: bool = False, structured_output: bool = False, generation_kwargs: Optional[dict] = None)
Capability descriptor for a single model.
Holds the provider-side model id plus universal capability flags. Provider-specific
extras (e.g. HuggingFace tool-call format) live on the provider's Model subclass,
not here.
Equality and hash are by id only, so a ModelSpec can be used directly as an
enum value even when generation_kwargs is a dict.
aimu.models.Model ¶
Bases: Enum
Base enum for provider model catalogs.
Each member's value is a ModelSpec; capability flags are mirrored as plain
attributes (supports_tools, supports_thinking, supports_vision,
generation_kwargs) for direct read access. .value returns the provider id
string so code can call e.g. ollama.pull(model.value).
aimu.models.StreamChunk ¶
Bases: NamedTuple
A single chunk yielded by client.chat(stream=True), Agent.run(stream=True),
image_client.generate(stream=True), or any streaming tool / workflow.
Fields
phase: content type of this chunk (THINKING, TOOL_CALLING, GENERATING,
IMAGE_GENERATING, AUDIO_GENERATING, SPEECH_GENERATING, DONE)
content: shape depends on phase:
- str for THINKING / GENERATING (token).
- dict {"name", "arguments", "response"} for TOOL_CALLING
(arguments is the dict the model passed to the tool).
- dict {"step", "total_steps", "image", "final", "result"} for
IMAGE_GENERATING: step is 1-indexed, image is an optional
PIL.Image (None unless preview_every opted in this step),
final=True marks the terminal chunk for one image, and result
carries the encoded output (path / bytes / data-url per format=)
on the final chunk.
- dict {"step", "total_steps", "final", "result", "duration_s"} for
AUDIO_GENERATING: step is 1-indexed (1 of 1 for non-diffusers
models), final=True marks the terminal chunk per audio item, and
result carries the encoded output on the final chunk.
- dict {"chunk_index", "total_chunks", "final", "result"} for
SPEECH_GENERATING: total_chunks is None for streaming
providers where the total is unknown upfront (OpenAI); 1 for
single-pass providers (HuggingFace). final=True marks the
terminal chunk; result carries the encoded output on the final
chunk only.
- str for DONE (usually empty).
agent: name of the agent that produced this chunk, or None for a plain
client.chat() / client.generate() call. Set automatically by
Agent and workflow runners.
iteration: zero-based iteration index inside the agent loop, or 0 for plain chat.
Use chunk.is_text() / chunk.is_tool_call() / chunk.is_image_progress() /
chunk.is_audio_progress() / chunk.is_speech_progress() to dispatch on phase
without repeating the equality check in user code.
aimu.models.StreamingContentType ¶
Bases: str, Enum
Resilience¶
aimu.models.FallbackClient ¶
FallbackClient(clients: list, *, retry_on: tuple[type[BaseException], ...] = (Exception,), system_message: Optional[str] = None, name: Optional[str] = None)
Bases: _FallbackStateMixin, BaseModelClient
A BaseModelClient that delegates to an ordered list of clients with failover.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
clients
|
list
|
Ordered client list; index 0 is preferred, later entries are fallbacks. |
required |
retry_on
|
tuple[type[BaseException], ...]
|
Exception types that trigger failover to the next client. Defaults to
|
(Exception,)
|
system_message
|
Optional[str]
|
Optional system prompt for the shared conversation; synced into whichever client runs. |
None
|
name
|
Optional[str]
|
Optional label (currently informational only). |
None
|
Conversation state (messages, system_message, tools) lives on the
FallbackClient and is loaded into whichever client runs, so history is preserved
across a failover. Capability flags (is_thinking_model etc.) and model reflect
the first client; use capability-compatible clients in one fallback set.
Streaming caveat: failover only happens before the first chunk is emitted. If a client fails mid-stream (after yielding output), the error propagates rather than silently replaying from another client.
aimu.models.FallbackExhaustedError ¶
Bases: RuntimeError
Raised when every client in a :class:FallbackClient failed.
The most recent client's exception is chained as __cause__; the full list of
(client, exception) pairs is available on .errors for inspection.
Provider clients¶
aimu.models.OllamaClient ¶
OllamaClient(model: OllamaModel, system_message: Optional[str] = None, model_keep_alive_seconds: int = 60, timeout: Optional[float] = None, max_retries: Optional[int] = None)
Bases: BaseModelClient
aimu.models.AnthropicClient ¶
AnthropicClient(model: AnthropicModel, model_kwargs: Optional[dict] = None, system_message: Optional[str] = None, timeout: Optional[float] = None, max_retries: Optional[int] = None, cache_prompt: bool = False)
Bases: BaseModelClient
Client for Anthropic Claude models using the native anthropic SDK.
Reads ANTHROPIC_API_KEY from the environment (or a .env file). self.messages is always stored in OpenAI format; conversion to the Anthropic API format happens at call time.
aimu.models.HuggingFaceClient ¶
HuggingFaceClient(model: HuggingFaceModel, model_kwargs: Optional[dict] = None, system_message: Optional[str] = None)
Bases: BaseModelClient
aimu.models.LlamaCppClient ¶
LlamaCppClient(model: LlamaCppModel, model_path: str, n_ctx: int = 4096, n_gpu_layers: int = -1, chat_format: Optional[str] = None, chat_handler: Optional[Any] = None, verbose: bool = False, system_message: Optional[str] = None, model_kwargs: Optional[dict] = None)
Bases: BaseModelClient
aimu.models.OpenAIClient ¶
OpenAIClient(model: OpenAIModel, system_message: Optional[str] = None, model_kwargs: Optional[dict] = None, timeout: Optional[float] = None, max_retries: Optional[int] = None)
Bases: OpenAICompatClient
Client for the OpenAI API (GPT and o-series models).
Reads OPENAI_API_KEY from the environment (or a .env file).
aimu.models.GeminiClient ¶
GeminiClient(model: GeminiModel, system_message: Optional[str] = None, model_kwargs: Optional[dict] = None, timeout: Optional[float] = None, max_retries: Optional[int] = None)
Bases: OpenAICompatClient
Client for Google Gemini models via Google's OpenAI-compatible REST API.
Reads GOOGLE_API_KEY from the environment (or a .env file).
aimu.models.LMStudioOpenAIClient ¶
Bases: OpenAICompatClient
aimu.models.OllamaOpenAIClient ¶
Bases: OpenAICompatClient
aimu.models.HFOpenAIClient ¶
Bases: OpenAICompatClient
aimu.models.VLLMOpenAIClient ¶
Bases: OpenAICompatClient
aimu.models.LlamaServerOpenAIClient ¶
LlamaServerOpenAIClient(model: LlamaServerOpenAIModel, base_url: str = LLAMASERVER_BASE_URL, **kwargs)
Bases: OpenAICompatClient
Client for llama.cpp's llama-server OpenAI-compatible REST API.
Start the server with
llama-server -m /path/to/model.gguf --port 8080
aimu.models.SGLangOpenAIClient ¶
Bases: OpenAICompatClient
Client for SGLang's OpenAI-compatible REST API.
Start the server with
python -m sglang.launch_server --model-path
aimu.models.OpenAICompatClient ¶
OpenAICompatClient(model: Model, base_url: str, api_key: str = 'not-needed', system_message: Optional[str] = None, model_kwargs: Optional[dict] = None, timeout: Optional[float] = None, max_retries: Optional[int] = None)
Bases: BaseModelClient
Embedding clients¶
Text-to-vector clients, a parallel surface to the chat clients. See the Embed text guide.
aimu.models.EmbeddingClient ¶
Bases: FactoryDelegate
Public factory for text-embedding provider clients.
Parallel to :class:aimu.models.TranscriptionClient. Accepts a provider's
:class:EmbeddingModel enum member, an :class:EmbeddingSpec, or a
"provider:model_id" string ("openai:...", "ollama:..." or "hf:...").
Provider-specific construction kwargs are passed directly, e.g.
EmbeddingClient(model, device="cpu").
Examples::
from aimu.models import EmbeddingClient, OpenAIEmbeddingModel
client = EmbeddingClient(OpenAIEmbeddingModel.TEXT_EMBEDDING_3_SMALL)
client = EmbeddingClient("openai:text-embedding-3-small")
client = EmbeddingClient("ollama:nomic-embed-text")
Provider-specific construction kwargs are passed directly::
EmbeddingClient(HuggingFaceEmbeddingModel.BGE_SMALL_EN_V1_5, device="cpu")
embed ¶
Embed text. Forwarded to the inner client's :meth:BaseEmbeddingClient.embed.
aimu.models.BaseEmbeddingClient ¶
Bases: ABC
Abstract base for text-embedding provider clients.
Subclasses implement :meth:_embed, which takes a non-empty list of strings and
returns one vector (list[float]) per input. The public :meth:embed normalizes
a single-string call to a single vector and a list call to a list of vectors, so
every provider offers the same ergonomic surface.
dimensions
property
¶
The embedding vector width declared by the spec, or None if unspecified.
embed ¶
Embed one string or a list of strings.
A single str returns one vector (list[float]); a list returns a list of
vectors (list[list[float]]), preserving order. An empty list returns [].
Extra **kwargs are forwarded to the provider call.
aimu.models.resolve_embedding_model_string ¶
Look up an embedding-provider model enum from a "provider:model_id" string.
Only matches exact enum-member values; for ad-hoc model ids pass the
"provider:..." string directly to :class:EmbeddingClient.
aimu.models.OpenAIEmbeddingClient ¶
OpenAIEmbeddingClient(model: 'OpenAIEmbeddingModel | OpenAIEmbeddingSpec | str', model_kwargs: Optional[dict] = None)
Bases: BaseEmbeddingClient
Text-embedding client for the OpenAI API.
Pass an :class:OpenAIEmbeddingModel member, an :class:OpenAIEmbeddingSpec, or an
"openai:<model_id>" string. model_kwargs may carry api_key= / base_url=
overrides; otherwise OPENAI_API_KEY is read from the environment.
aimu.models.OllamaEmbeddingClient ¶
OllamaEmbeddingClient(model: OllamaEmbeddingModel | OllamaEmbeddingSpec | str, model_kwargs: Optional[dict] = None)
Bases: BaseEmbeddingClient
Text-embedding client for a local Ollama server.
Pass an :class:OllamaEmbeddingModel member, an :class:OllamaEmbeddingSpec, or an
"ollama:<model_id>" string. The model is pulled on construction (same as
:class:OllamaClient).
aimu.models.HuggingFaceEmbeddingClient ¶
HuggingFaceEmbeddingClient(model: 'HuggingFaceEmbeddingModel | HuggingFaceEmbeddingSpec | str', model_kwargs: dict | None = None)
Bases: BaseEmbeddingClient
Local text-embedding client backed by sentence-transformers.
Loads model weights lazily on the first :meth:embed call. Pass a
:class:HuggingFaceEmbeddingModel member, a :class:HuggingFaceEmbeddingSpec, or a
"hf:<repo_id>" string. Target a device with model_kwargs={"device": "cuda:1"};
other model_kwargs are forwarded to SentenceTransformer.
Note on retrieval-tuned models: E5 / BGE expect query/passage prefixes (e.g.
"query: ...") for asymmetric retrieval. Pass already-prefixed strings when you need
that; symmetric similarity does not.