Changelog¶
v0.10.1 (2026-06-24): cleanup — unified modality factory kwargs, keyword-only restore(), async SkillAgent parity + import-guard hardening¶
Models¶
- Change the modality factory classes (
ImageClient,AudioClient,SpeechClient,TranscriptionClient,EmbeddingClient) now take provider construction kwargs directly as**kwargs, matchingModelClient(model, base_url=...)and the top-levelaimu.image_client(model, variant="fp16")helpers:ImageClient(HuggingFaceImageModel.SDXL_BASE, variant="fp16"). The oldmodel_kwargs={...}argument is removed (pass the kwargs directly instead). The concrete provider clients (HuggingFaceImageClient, etc.) are unchanged and still takemodel_kwargs=. - Fix optional-provider import guards (
aimu.models,ModelClient, and theiraimu.aiomirrors) now catchImportErrorinstead of bareException. A real error inside a provider module (aSyntaxError, anAttributeError, a broken transitive dependency) was previously swallowed and the provider silently reported as "dependency not installed," surfacing later as a confusing "no client for …" message; the real cause now propagates at import time.
Agents and workflows¶
- Change the composite-runner
restore()selectors are now keyword-only and give clear errors on a bad selector (sync +aimu.aio):Chain.restore(messages, *, step=0),Parallel.restore(messages, *, worker=0),Router.restore(messages, *, route=None).step/workerout of range now raiseIndexErrorwith a descriptive message (Router already raisedKeyErroron an unknown route). Existing keyword calls are unaffected; only positional selector calls (e.g.chain.restore(msgs, 1)) need updating tostep=1. The semantic names are kept rather than collapsed to a generictarget=. - Fix async
SkillAgent.run()(aimu.aio) ignoreddeps=andschema=, which its sync twin andaio.Agent.run()both accept — async skill users silently lostToolContextdependency injection and structured output. The async override now mirrorsaio.Agent.run()in full:deps=,schema=(mutually exclusive withstream=True), and thefinal_answer_promptforced-wrap-up on both the streamed and non-streamed paths.
v0.10.0 (2026-06-23): A2A interop + resilience (fallback, timeout/retry), Anthropic prompt caching, streaming usage, uniform restore¶
Models¶
- New streaming token usage:
client.last_usagenow populates after a fully-consumedchat(stream=True)/generate(stream=True), where before it was reset toNone. OpenAI-compat clients request it viastream_options={"include_usage": True}and read the terminal usage chunk; Ollama reads the final streamed part's eval counts; Anthropic readsstream.get_final_message().usage(which also carries the P1-A cache-token fields). Usage is set once the stream is drained (reading mid-stream still yieldsNone), and matches the non-streaming semantics (final turn's counts). Hardened the OpenAI-compat stream loop against empty-choiceschunks. In-process providers (HuggingFace, LlamaCpp) expose no streaming counts and still leave itNone. - New opt-in Anthropic prompt caching:
AnthropicClient/AsyncAnthropicClientacceptcache_prompt=True(threads throughaimu.client("anthropic:...", cache_prompt=True)), which marks the system prompt and the tool definitions withcache_control: {"type":"ephemeral"}breakpoints at request time — the large, unchanging prefix an agent resends every turn. Markers are injected in the two format adapters, so all request paths (chat, tool-follow-up, streaming, structured) are covered. Below Anthropic's minimum cacheable size the API silently skips caching, so the flag is safe to leave on.usage_from_anthropicnow also surfacescache_creation_input_tokens/cache_read_input_tokensinclient.last_usagewhen the response reports them, so cache creation/hits are observable (the baseinput/output/total_tokenskeys are unchanged). Pure passthrough; no AIMU-side caching layer. - New
FallbackClient(sync) /aio.AsyncFallbackClient(async): wrap an ordered list ofBaseModelClients and fail over to the next on error. The first client that answers wins; a raising client (by default anyException, narrowable viaretry_on=) hands off to the next with the same conversation state, so multi-turn history is preserved across a failover; when all fail,FallbackExhaustedErroris raised with the last error chained as__cause__(and all errors on.errors). Because it is aBaseModelClient, it drops intoAgent, workflows,Benchmark, andagent.as_model_client()with no failover-specific wiring. Streaming fails over only before the first chunk is emitted. Pure policy layer (no backoff/sleep); pair with per-clienttimeout/max_retriesfor in-SDK retry plus cross-provider failover. Exported fromaimu,aimu.models, andaimu.aio. - New
timeoutandmax_retrieson the networked model clients (sync +aimu.aio), forwarded verbatim to the underlying SDK so requests get a bounded timeout and automatic retry on transient failures:aimu.client("anthropic:claude-sonnet-4-6", timeout=30, max_retries=5). Supported by Anthropic, OpenAI, Gemini, and every local OpenAI-compat server (LM Studio, vLLM, llama-server, SGLang, Ollama-OpenAI, HF-Serve) via theanthropic/openaiSDKs' native support. Ollama's native client supportstimeout(the syncOllamaClientnow holds anollama.Clientinstance rather than calling module-level functions) but has no request-retry, so passingmax_retriesto it raisesValueErrorpointing at theollama-openaiprovider. In-process providers (HuggingFace, LlamaCpp) are not networked and don't accept these kwargs. No retry/backoff machinery is implemented in AIMU; this is pure passthrough to the SDKs.
Tools¶
- New runtime tool-argument validation. Model-supplied tool-call arguments are now validated and lax-coerced against each
@toolfunction's type hints before the tool runs (sync,aimu.aio, and the streaming / concurrent dispatch paths alike, via the shared_ChatStateMixin._tool_call_kwargs). A coercible mismatch is coerced ("5"→5for anintparam); an uncoercible value, a missing required argument, or an unknown argument raises the newToolArgumentError, which the dispatcher reports back to the model as a tool result so it can self-correct (distinct from a tool that runs and crashes). A PydanticTypeAdapterper parameter is built once at decoration time, so dispatch stays cheap. The validator is exposed asaimu.tools.coerce_tool_arguments(fn, arguments). MCPas_tools()wrappers carry no local type hints and pass through unchanged (their server validates).pydantic>=2, previously a transitive dependency, is now a declared core dependency.
Agents and workflows¶
- New
restore()on every composite runner and fullaimu.aioparity. The save/restore pattern (persist a failed run'slist[dict], reload, resume) now coversRouter.restore(messages, route=None)(route key selects a handler;Nonerestores the routing classifier),Parallel.restore(messages, worker=0)(index selects a worker), andOrchestratorAgent.restore(messages)(delegates to the inner orchestrator agent), in addition to the existingAgent/Chain/EvaluatorOptimizer. The async surface previously had norestore(); all six aio runners now mirror their sync twins.restore()stays per-class (signatures vary by selector), not on theRunnerABC. - New
Runner.as_tool(*, name=None, description=None)(sync andaimu.aio): wraps any agent or workflow as a@tool-style callable (tool(task: str) -> str) that delegates torun(). This is the seam that lets an autonomousAgentcall anyRunner(including aChain/Router/Parallelworkflow or a remote A2A agent), not just other agents. The name defaults to the runner'sname(sanitised), the description to the first line of itssystem_message(or a generic fallback for workflows). - Change
OrchestratorAgent.assemble(workers=...)now acceptslist[Runner](waslist[Agent]) on both surfaces, wrapping each worker viaRunner.as_tool(). Worker dispatch can now target a workflow or a remote agent, not only anAgent. ExistingAgent-only call sites are unaffected; the internal_wrap_worker_as_toolhelper is removed in favour ofas_tool().
A2A interop (new optional a2a extra)¶
- New
aimu.agents.a2a: Agent2Agent protocol interop, the agent-level analog of the MCP tool surface (aimu.tools.MCPClient/python -m aimu.tools.mcp). Install withpip install 'aimu[a2a]';aimu.agents.HAS_A2Areports availability. A2A types never leak intoRunner/Agentcore; they adapt at the boundary. - Consume:
RemoteAgent.connect(url)resolves a remote agent card and returns a localRunner. Because it is aRunner, a remote A2A agent composes like any local one (intoChain/Router/Parallel/OrchestratorAgent.assemble(workers=[...]), or into anAgent's tool list viaremote.as_tool()), with no A2A-specific wiring. The sync client drives the asynca2a-sdkthrough an anyio portal (mirroringMCPClient);aimu.aio.a2a.RemoteAgentuses it natively and supports incrementalmessage/streamstreaming. - Expose:
serve_a2a(runner)(blocking) /build_a2a_app(runner)(returns a Starlette ASGI app) wrap anyRunneras an A2A server with an agent card at/.well-known/agent-card.json. CLI:python -m aimu.agents.a2a --model ... --system ... --port 9000. - Pinned to the
a2a-sdk0.3.xline (pydantic-native API matching the A2A ecosystem); the protobuf1.xline is a tracked future migration. Connection / call failures raiseA2AConnectionError.
Documentation¶
- New notebook
23 - Composing Agents (A2A), explanation page A2A vs MCP, and how-to Connect agents (A2A).
v0.9.1 (2026-06-16): EvaluatorOptimizer revision-prompt fix¶
Agents and workflows¶
- Fix
EvaluatorOptimizer(sync andaimu.aio) lost the draft it was revising. The revision prompt carried only the evaluator's feedback and the original task, so when the generator was anAgentwith a system prompt (which resets its conversation on everyrun()) it could not see its prior response and effectively regenerated from scratch each round instead of revising. The revision prompt now re-supplies the previous output alongside the task and feedback.
v0.9.0 (2026-06-16): Tool dependency injection, structured-output agents, configurable evaluator & pretty_print¶
Tools¶
- New
aimu.ToolContext: dependency injection for tools. A tool parameter annotatedToolContext(orToolContext[Deps]) is filled by the agent at call time and excluded from the model-facing JSON schema, so the model never supplies it. This lets a tool reach shared state (a document store, cache, configuration) without module-level globals.@aimu.toolrecords the injected parameter names onfunc.__tool_injected__; both sync and async dispatch fill them via_tool_call_kwargs()from the client'stool_context_deps. Exported fromaimuandaimu.tools.
Agents and workflows¶
- New
Agent.depsfield + per-runAgent.run(..., deps=...)override (sync andaimu.aio): supplies the value injected asctx.depsinto tools that declare aToolContextparameter. The per-rundeps=takes precedence over the agent'sdeps=field;_prepare_run()publishes the effective value to the model client before each run.None(bareclient.chat()) meansctx.depsisNone. Forwarded bySkillAgent. - New
Agent.run(..., schema=...)(sync andaimu.aio): pass a dataclass or Pydantic v2 model to make the run a single structured-output turn that returns a validated instance instead of running the tool-calling loop. Useful for an agent whose job is to return a typed object (e.g. a critic's verdict). Mutually exclusive withstream=True. - New
EvaluatorOptimizertyped-verdict acceptance, replacing brittle substring matching. Acceptance is now decided by one of three mechanisms in priority order:stop_when(a predicate over the evaluator's output, either the raw text or the typed verdict whenverdict_schemais set),verdict_schema(a dataclass / Pydantic model the evaluator must return via structured output; acceptance reads itspassedbool and revision uses itsfeedbackstr,passed_attr/feedback_attrare configurable, and a malformed verdict raises rather than silently continuing), orpass_keyword(the default, unchanged; accept when the substring appears in the evaluator's text). Leaving the new fields unset preserves prior behaviour exactly.
Console output¶
- New
aimu.pretty_print(stream, *, file=None, show_thinking=False, show_tools=True): render theStreamChunkiterator fromclient.chat(stream=True),Agent.run(stream=True), or any workflow run to a readable transcript (tool calls flagged, generated text streamed inline, thinking optional), and return the concatenated generated text. Saves callers from re-implementing thechunk.is_tool_call()/chunk.is_text()dispatch loop. Exported fromaimu.
Documentation¶
- New README "Agents and workflows", "Tools", "Output and utilities", and quick-start sections cover
ToolContextinjection, the configurableEvaluatorOptimizeracceptance (pass_keyword/stop_when/verdict_schema), andpretty_print, with a runnable example combining all three.
Examples¶
- Change Consolidated the loose
scripts/directory and thedata/skills/demo skills into a single top-levelexamples/tree, organized by theme:examples/text-refinement/(theepic_*family),examples/image-refinement/(thehotdog_*family),examples/news-summarizer/, andexamples/skills/(haiku-poet,unit-converter). Each example directory has its ownREADME.md, andexamples/README.mdindexes them. Files were moved withgit mv(history preserved);scripts/anddata/are removed. - New
aimu.paths.examplesconstant pointing at theexamples/directory.aimu.paths.skillsnow resolves toexamples/skills(wasdata/skills); the unusedaimu.paths.dataconstant is removed. - Change The example test suites (
test_epic_scripts.py,test_hotdog_scripts.py) are now scoped out of the defaultpytestrun viatestpaths = ["tests"]. Run them explicitly withpytest examples/. The two refinement directories are onpythonpathso their shared-helper imports resolve. - New Examples are surfaced from the README (
## Examplessection), the docs site (docs/examples.md+ nav entry), and cross-linked from notebooks 07, 08, and 09. The two iterative-refinement how-to guides andgenerate-images.mdnow reference theexamples/paths.
Models¶
- Fix
HuggingFaceModel.QWEN_3_6_27B(and the Qwen 3.5/3.6 family) crashed at generation withRuntimeError: expected mat1 and mat2 to have the same dtype, but got: c10::BFloat16 != c10::Float8_e4m3fn. These are unified multimodal FP8 checkpoints whosequantization_config.modules_to_not_convertskip-list is written against the multimodal module tree (model.language_model.*/model.visual.*). The text-only entries loaded viaAutoModelForCausalLM, which builds a text-only tree (model.layers.*) the skip-list can't match, so layers meant to stay bf16 (routermlp.gate,lm_head,linear_attnprojections) mis-quantized. Qwen 3.5/3.6 now always load viaAutoModelForImageTextToText. - Change Merged the Qwen 3.5/3.6 text-only and
_VLenum members into singlevision=Trueentries (QWEN_3_6_27B,QWEN_3_5_9B); removedQWEN_3_6_27B_VLandQWEN_3_5_9B_VL. The two variants loaded the identical checkpoint via the identical loader (vision tower included either way), so the split no longer backed any loader or VRAM difference. - Fix
HuggingFaceClient's module-level weight cache could collide: two enum members sharing a repo id andmodel_kwargsbut loading via different classes (AutoModelForCausalLMvsAutoModelForImageTextToText) produced the same cache key, so the second silently received the first's model object._make_cache_keynow folds in a load-profile tag (mirroring how the image/audio/speech clients key onpipeline_class/pipeline_type).
v0.8.0 (2026-06-12): Embeddings, transcription, structured output, RAG & audio input¶
Models¶
- New
audio: bool = Falsefield onModelSpec. Audio-capable text models exposesupports_audioon their enum members,is_audio_modelon their client instances, and anAUDIO_MODELSclassproperty (parallel toTOOL_MODELS,THINKING_MODELS,VISION_MODELS). - New Audio-capable models added to the catalog:
OpenAIModelGPT-4o, GPT-4o-mini, GPT-4.1, GPT-4.1-mini, GPT-4.1-nano;GeminiModel2.0 Flash, 2.0 Flash Lite, 2.5 Pro, 2.5 Flash;HuggingFaceModel.GEMMA_4_E4B,GEMMA_4_12B,NEMOTRON_H_8B. Ollama models remainaudio=Falsewith inline comments noting where the underlying weights support audio (upgrade path once the Ollama API adds audio input).
ModelClient.chat() and ModelClient.generate()¶
- New
audio=parameter on bothchat()(stateful; turn persists inself.messages) andgenerate()(stateless one-shot; no history touched). Accepts any mix of: file path strings,pathlib.Path, raw bytes (WAV assumed),https://URLs (fetched eagerly), anddata:audio/...;base64,...data URLs. Supported format strings:wav,mp3,ogg,flac,m4a,webm, inferred from file extension or MIME type. - New Passing
audio=to a model withsupports_audio=FalseraisesValueErrorbefore any API call. - New
images=andaudio=are mutually exclusive per turn; passing both raisesValueError. - Internally normalised to OpenAI
input_audiocontent blocks ({"type": "input_audio", "input_audio": {"data": "<b64>", "format": "wav"}}). Provider adaptation happens at request time: OpenAI/Gemini/OpenAI-compat pass through; Anthropic converts to{"type": "audio", "source": {"type": "base64", ...}}; HuggingFace decodes to float32 numpy arrays viasoundfileand passes them to theAutoProcessor; Ollama raises with a clear message (API does not yet support audio). - Mirrored on the async surface (
aimu.aio): same signature onaio.chat()andaio.generate(). - Fix
ModelClient._generate(and the asyncAsyncModelClient._generate/_chat) now accept and forwardaudio=. They were missing the parameter while the basegenerate()/chat()always pass it, so everyaimu.client().generate()/aimu.chat(...)call through the factory raisedTypeError: _generate() got an unexpected keyword argument 'audio'. (Concrete provider clients were unaffected, which is why the live test suite, which constructs them directly, didn't surface it.)
Documentation¶
- New
docs/how-to/handle-audio-input.md: accepted input forms, model selection, stateful vs. stateless, async surface, per-provider adaptation. - New
notebooks/05 - Audio Input.ipynb: capability flags, all input forms, multiple clips per turn, stateful/stateless split, multi-turn conversations, capability check, mutual-exclusion demo, Gemini and HuggingFace sections, async surface.
Transcription (speech-to-text)¶
- New
aimu.transcription_client()/aimu.transcribe()+TranscriptionClientfactory +BaseTranscriptionClientABC: a dedicated speech-to-text surface, parallel to TTS (BaseSpeechClient). Disjoint from theaudio=parameter on text models, which handles audio analysis/QA by audio-capable chat models; this surface uses dedicated ASR models (Whisper family, gpt-4o-transcribe) optimised for transcription. - New
OpenAITranscriptionClient+OpenAITranscriptionModel: cloud ASR backed byopenai.audio.transcriptions.create(). Models:WHISPER_1,GPT_4O_TRANSCRIBE,GPT_4O_MINI_TRANSCRIBE. Auth viaOPENAI_API_KEY. Uses the sameopenaiSDK already required by the[openai_compat]extra. - New
HuggingFaceTranscriptionClient+HuggingFaceTranscriptionModel: local ASR backed bytransformers.pipeline("automatic-speech-recognition"). Models:WHISPER_TINY,WHISPER_BASE,WHISPER_SMALL,WHISPER_MEDIUM,WHISPER_LARGE_V3,DISTIL_WHISPER_LARGE_V3. Weight caching via module-level registry (same pattern as other HF clients). - New
transcribe(audio, language=None, response_format="text", prompt=None, temperature=None) -> str | dict. Accepted audio forms: file path, raw bytes,https://URL,data:audio/...URL, the same set asaudio=onchat().response_format="verbose_json"returns a dict withtext,segments(start/end/text),language,duration.response_formatdefaults to"text"(plain string). - New
AIMU_TRANSCRIPTION_MODELenv var: sets the default model foraimu.transcription_client()andaimu.transcribe()whenmodel=is omitted. - New Async mirror under
aimu.aio:AsyncTranscriptionClient,aio.transcription_client(sync_client),await aio.transcribe(audio, *, model, ...). Wraps sync viaasyncio.to_thread(Decision 7, same as every other aio modality). - New Built-in
transcribe_audio(audio_path: str) -> str@toolinaimu.tools.builtin;builtin.transcriptionsubgroup; included inALL_TOOLS. Backed by a lazy_transcription_clientsingleton viaAIMU_TRANSCRIPTION_MODEL.make_transcription_tool(client)binds a fresh tool to a caller-supplied client. - New
docs/how-to/transcribe-audio.mdandnotebooks/21 - Transcription.ipynb.
Embeddings (text-to-vector)¶
- New
aimu.embedding_client()/aimu.embed()+EmbeddingClientfactory +BaseEmbeddingClientABC: a dedicated text-embedding surface, parallel to the other modality clients.embed()takes one string (returnslist[float]) or a list (returnslist[list[float]], order preserved); an empty list returns[]without a provider call.client.dimensionsreports the spec's vector width. - New
OpenAIEmbeddingClient+OpenAIEmbeddingModel(text-embedding-3-small/large,text-embedding-ada-002) viaopenai.embeddings.create(); auth viaOPENAI_API_KEY. - New
OllamaEmbeddingClient+OllamaEmbeddingModel(nomic-embed-text,mxbai-embed-large,bge-m3,all-minilm) viaollama.embed(). - New
HuggingFaceEmbeddingClient+HuggingFaceEmbeddingModel(MiniLM-L6-v2, BGE small/base/large-en-v1.5, GTE-large, E5-large-v2, mxbai-embed-large-v1) backed bysentence-transformersso each model's own pooling/normalization config is honoured; lazy load + module-level weight cache (freed byaimu.clear_hf_cache()). Addssentence-transformers>=3to the[hf]extra. - New
SemanticMemoryStore(embedding_client=...): pluggable embedding model; defaultNonekeeps ChromaDB's built-in embedder (unchanged behaviour). - New
AIMU_EMBEDDING_MODELenv var sets the default model foraimu.embedding_client()/aimu.embed()whenmodel=is omitted (raises if unset; no implicit download). - New Async mirror:
aio.embedding_client(sync_client)/aio.embed()wrap a sync client viaasyncio.to_thread. - Docs
docs/how-to/use-embeddings.md,notebooks/11 - Embeddings.ipynb, API reference, and env-var reference.
Structured output¶
- New
schema=onchat()andgenerate()(sync and async). Pass a dataclass type or a Pydantic v2 model; the call returns a validated instance of that type instead of a string. Mutually exclusive withstream=True. - New
ModelSpec.structured_outputflag →client.supports_structured_outputproperty and aSTRUCTURED_MODELSclassproperty (parallel totools/thinking/vision/audio). Set on the OpenAI, Gemini, Ollama (all models), and Anthropic catalogs. - Auto-escalate semantics: native provider enforcement when
supports_structured_output=True(OpenAIresponse_formatjson_schema; Ollamaformat=; Anthropic forced-tool), otherwise the schema is appended to the prompt and the response is parsed. The branch is on the static capability flag, not on catching a runtime error, so a genuine provider failure surfaces rather than silently downgrading; parse failure raisesValueError. self.messagesstays plain strings; the typed object is a return value only, so conversation history remains provider-portable.- Composition:
schema=works alongsidetools=on OpenAI-compatible and parse-path providers. On Anthropic (native structured output is a forced tool) combiningschema=with active tools raisesValueError. - New
schema_to_json_schema()(internal) converts a dataclass/Pydantic model to a JSON Schema, reusing the@tooldecorator's Python-type → JSON-Schema mapping. - Docs
docs/how-to/use-structured-output.md. - Deferred:
Agent.run(schema=...), astrict=True(native-or-raise) knob, and native HuggingFace/llama-cpp enforcement (those use the parse path).
RAG primitives (retrieval-augmented generation)¶
- New
aimu.rag: chunk/retrieve/rerank helpers as plain functions over theMemoryStoreinterface (no retriever/splitter/loader class hierarchy). - New
split_text(text, *, chunk_size=1000, chunk_overlap=200, separators=None, length_function=len): recursive separator-based chunking (paragraphs → lines → sentences → words → characters) with overlap.length_functiondefaults to character count; pass a tokenizer's counter for token-aware chunking. Oversized unsplittable text hard-cuts atchunk_size. - New
ingest(store, documents, *, chunk_size, chunk_overlap, separators, length_function) -> int: splits one or many documents and stores each chunk viastore.store(); returns the chunk count.retrieve(store, query, *, n_results=5, **search_kwargs) -> list[str]is a RAG-named pass-through tostore.search()(forwards e.g.max_distance=).format_context(chunks, *, separator="\n\n", numbered=False) -> strjoins chunks for prompt augmentation. - New
rerank(query, documents, *, model="cross-encoder/ms-marco-MiniLM-L-6-v2", top_n=None): cross-encoder reranking viasentence-transformers(the[hf]extra); lazy-loaded and cached. Empty input returns[]without loading the model. - New
make_retrieval_tool(store, *, n_results=5)inaimu.tools.builtin: wrapsretrieve+format_contextas aretrieve_context(query)agent tool (returns numbered context). - Docs
docs/how-to/use-rag.mdand theaimu.ragAPI reference. - Loaders and per-chunk metadata are intentionally out of scope: ingestion sources are covered by
read_file/get_webpage(or any text-returning library), and chunks are stored as plain strings per theMemoryStorecontract.
Token usage surfacing¶
- New
client.last_usage: token counts for the most recent non-streamingchat()/generate(), as{"input_tokens", "output_tokens", "total_tokens"}(orNonewhen the provider/server omits usage). Captured for Anthropic, OpenAI-compat (incl. OpenAI/Gemini/local servers), and Ollama, on both sync and async surfaces, and delegated through theModelClient/AsyncModelClientwrappers. Reset toNoneon streaming calls (streaming usage capture is a separate follow-up) and byreset(). Token counts only; dollar cost is derivable but intentionally not computed (no maintained price table).
Anthropic models & adaptive thinking¶
- New
AnthropicModelmembers:CLAUDE_FABLE_5(claude-fable-5),CLAUDE_OPUS_4_8(claude-opus-4-8),CLAUDE_OPUS_4_7(claude-opus-4-7), alltools=True, thinking=True, vision=True. - New
ThinkingStyleenum (ENABLED/ADAPTIVE) carried as a per-member extra onAnthropicModel(analogous to HuggingFace'sToolCallFormat).AnthropicClient._thinking_kwargs()builds the request accordingly:ENABLED→{"type": "enabled", "budget_tokens": N};ADAPTIVE→{"type": "adaptive", "display": "summarized"}withtemperature/top_p/top_kdropped. Opus 4.7+ and Fable 5 are adaptive-only (theenabledform 400s on them); Opus 4.6, Sonnet 4.6, and Haiku 4.5 use the budget form. - Fix
CLAUDE_HAIKU_4_5now correctly hasthinking=True. Haiku 4.5 supports extended thinking via theenabled/budget_tokensform (previously omitted, so thinking tests silently skipped). - Adaptive models decide per request whether to think and may emit none on simple prompts; the thinking tests use a multi-step reasoning prompt and assert thinking emission rather than an exact answer.
- Docs Updated the model matrix, provider matrix, add a new model, and CLAUDE.md (Thinking Models, AnthropicClient notes) to cover the two thinking styles and the new models.
- Docs Added an "Adaptive vs. budget thinking" section to
notebooks/01 - Model Client.ipynb(section C) demonstratingThinkingStyleand adaptive models skipping thinking on trivial prompts.
Dependencies¶
- Fix Pinned the
[hf]extra'skernelsto>=0.12,<0.13. It was unconstrained and resolved tokernels 0.15.2, which is outside the rangetransformerssupports (<0.13);transformersconstructskernels.LayerRepository(...)at import time and 0.13+ maderevision/versionmandatory, sofrom transformers import AutoProcessorraisedValueError, silently flippingHAS_HFtoFalse(HuggingFace clients unavailable) and erroring every HF test on import. - Pinned the
[hf]extra'stransformersto>=5,<6(the major the model catalog targets: Qwen 3.6, Gemma 4, GPT-OSS) so a future major can't reintroduce this class of import-time breakage on resolve.
Async surface¶
- New Async→sync tool bridging for wrapped in-process clients.
AsyncHuggingFaceClient/AsyncLlamaCppClientrun their sync client's_chattool-dispatch loop in a worker thread (viaasyncio.to_thread), and that sync dispatcher refusesasync deftools, but the async surface routinely attaches them (e.g.await aio.MCPClient.as_tools()). Each async tool is now wrapped as a sync callable that drives the coroutine back on the main event loop (run_coroutine_threadsafe) and blocks only the worker thread (no deadlock, the main loop is free, awaiting theto_threadfuture). Async-generator (streaming) tools bridge to sync generators; the OpenAI tool spec and dispatch name are preserved. Async agents using in-process models can now mix sync and async tools (including MCP) transparently.
Fixes¶
- Change HuggingFace default
max_new_tokensraised from1024to4096. The previous default truncated reasoning models (e.g. Qwen 3.5/3.6) mid-thinking, before the closing</think>, which left no room for the answer; the higher default leaves headroom for thinking plus a response. Override per call withgenerate_kwargs={"max_tokens": N}. - Fix Streamed
chat()on a HuggingFace thinking model no longer raisesRuntimeError: generator raised StopIterationwhen a turn produces reasoning but is truncated before emitting an answer. The streaming path now mirrors the non-streaming one: it surfaces the buffered thinking and finishes with empty generated content instead of doing an unguardednext()on an empty token stream. - Fix (tests) Mock-only audio/speech/image API tests previously replaced
transformers/soundfile/diffusersinsys.moduleswith bare stubs at collection time and never restored them, breaking any live model test that ran later in the same session (ModuleNotFoundError: Could not import module 'Qwen3_5ForCausalLM'). The stubs are now installed via auto-restoring,monkeypatch-scoped fixtures, and the permanent install is skipped whenever the real dependency is importable.
v0.7.0 (2026-06-08): MCP tool unification, model resolvers, and agent improvements¶
Breaking changes¶
- Breaking Removed the
model_client.mcp_clientattribute. MCP tools now integrate through the singlemodel_client.toolsregistry: callMCPClient(...).as_tools()(sync) orawait aio.MCPClient.connect(...).as_tools()(async) to turn a server's tools into@tool-style callables, then add them totools(constructorAgent(tools=...),client.tools = ..., or the per-callchat(tools=...)/run(tools=...)override). Migration: replaceclient.mcp_client = mcpwithclient.tools = mcp.as_tools()(concatenate with@toolfunctions as needed, e.g.builtin.web + mcp.as_tools()). Two consequences: dispatch is now one by-name lookup overtools, so on a name collision the last entry wins (previously Python@toolalways beat a same-named MCP tool; to preserve that, append the Python tool aftermcp.as_tools()); andMCPClient.get_tools()is no longer called on everychat()(the tool list is snapshotted byas_tools()), so callas_tools()again to pick up server-side tool changes.SkillAgentand the internal dispatch (_handle_tool_calls(tool_calls),_call_plain_tool(tc, tc_id), both of which lost theirtoolsparameter) were updated accordingly. TheMCPClientclass, itsget_tools()/call_tool()/ping(), and theaio.MCPClientparallel are unchanged. - Breaking
system_messageis no longer immutable after the firstchat(). The setter is now always live: assigning it mid-conversation rewrites the{"role": "system"}entry inmessagesin place (re-conditioning the model on the new prompt while preserving history), inserts one if absent, or removes it onNone. Before the first chat it still just seeds the value. The previous behaviour raisedRuntimeError; code that caught that error to gate areset()can now assign directly. To change the prompt and drop history, usereset(system_message="new"). Two consequences are accepted by design: the transcript becomes counterfactual (prior assistant turns predate the new prompt), and there is no longer a guard against silently re-conditioning aModelClientshared by another agent's in-flight conversation, so don't share a live-conversation client across agents that each setsystem_message. The_system_message_lockedflag has been removed. See System message lifecycle.
Models¶
- New
aimu.resolve_model_enum(model)andaimu.resolve_image_model_enum(model): resolve a model to itsModel/ImageModelenum member from any of three input forms: an enum member (returned unchanged), a"provider:model_id"string (delegates toresolve_model_string/resolve_image_model_string), or a bare enum-member name (e.g."QWEN_3_8B","FLUX_2_KLEIN_4B","NANO_BANANA") looked up across every installed provider enum. Useful for CLIs/scripts that accept "enum, name, or string" uniformly. For text, an ambiguous bare name (the same id ships under many providers) is disambiguated the way the omitted-modeldefault is: prefer a provider where the model is actually available locally (running Ollama → cached HuggingFace → reachable local OpenAI-compat server, tool-capable first), logged at WARNING; if it isn't available under any provider,ValueErrorlists the"provider:model_id"options. This availability probe runs only on the ambiguous path.resolve_image_model_enumhas no local-availability notion (image catalogs don't collide) and raises on the rare ambiguity. Exported fromaimu.modelsand top-levelaimu. - New
aimu.available_text_models(*, include_hf_cache=True)for discovery: return locally available text models asModelenum members (running Ollama → cached HuggingFace → reachable local OpenAI-compat servers), in provider-priority order. Download-free and cloud-free.aimu.resolve_default_text_model_enum(*, include_hf_cache=True)returns the single auto-pick (env var → first available, tool-capable preferred) as an enum member, the enum-returning twin of the internal default resolver that backsclient()/chat()/agent()whenmodel=is omitted. - New Gemma 4 12B added to every provider that can run it:
OllamaModel.GEMMA_4_12B(gemma4:12b, tools/thinking/vision, with the shared Gemma sampling kwargs),HuggingFaceModel.GEMMA_4_12B(the instruction-tunedgoogle/gemma-4-12b-it, tools/vision, processorparse_responsepath), and aGEMMA_4_12Bmember on every OpenAI-compat server enum (OllamaOpenAIModel,LMStudioOpenAIModel,VLLMOpenAIModel,HFOpenAIModel,LlamaServerOpenAIModel,SGLangOpenAIModel) plusLlamaCppModel. The server/llama.cpp entries aretools=True, matching the establishedGEMMA_3_12Bconvention for those catalogs. Resolvable via the usual"provider:model_id"strings (e.g."ollama:gemma4:12b","hf:google/gemma-4-12b-it","vllm:google/gemma-4-12b-it").
Tools¶
- New The
tooldecorator is re-exported at the top level asaimu.tool.@aimu.toolis now the single recommended/documented form across the README, tutorials, how-tos, and notebook examples. It's namespaced, so it can't be silently shadowed by another library's same-namedtooldecorator (LangChain, smolagents, etc.).from aimu.tools import toolremains valid and unchanged (same object); it's the natural form for code already insideaimu.tools. TheToolSignatureErrormessage prefix is now@aimu.tool:to match. No behaviour change to decoration or dispatch. - New
MCPClient.as_tools()(sync) andaio.MCPClient.as_tools()(async) return a server's tools as@tool-style callables, each closing over the client, invokingcall_tool()cross-process, and carrying__tool_spec__/__tool_is_async__/__tool_is_streaming__. Drop them straight intotools(client.tools = mcp.as_tools(),Agent(tools=builtin.web + mcp.as_tools())). This unifies MCP and in-process tools onto the singleself.toolsregistry and one dispatch path; see the breaking-change note above for the migration frommodel_client.mcp_client. New shared helperaimu.tools.mcp_format.mcp_content_to_text(tool_response)flattens acall_toolresult to a string. - New Per-call tool override:
chat(..., tools=None)andAgent.run(..., tools=None)(both sync andaimu.aio) accept atools=list that replaces the client's configuredself.toolsfor a single call/run, restored afterward.tools=None(default) keeps the existing behaviour;tools=[]disables tools for the call (MCP tools, being callables inself.toolsviaas_tools(), are included in the swap). On anAgent, the override applies to every turn of the agentic loop. Implemented as a scopedself.toolsswap (_ChatStateMixin._tools_override) covering both request-spec building and dispatch; the agent threads it through each loopchat()call so no new agent state is introduced. Not safe across concurrentchat()calls on a shared client; same contract asself.messages. Not added to theRunnerABC / workflow classes.
Agents and workflows¶
- New
Agent.final_answer_prompt(opt-in, defaultNone; sync andaimu.aio): guarantees a final answer when the agentic loop exhaustsmax_iterationswhile the model is still calling tools. Instead of returning whatever the last (possibly tool-only) turn produced (an empty or stub result), the agent sends this prompt once with tools disabled (chat(..., tools=[])), forcing the model to synthesize an answer from the context it has gathered. The trigger is the post-loop_last_turn_called_tools()check (no new counter); it fires only on the cap-with-pending-tools path (a natural finish, a turn with no tool calls, is unaffected) and the wrap-up turn is not counted againstmax_iterations.OrchestratorAgent._init_orchestrator()andOrchestratorAgent.assemble(..., final_answer_prompt=...)(sync +aio) forward it to the inner orchestrator agent, and it is accepted as afrom_configkey. Leaving itNonepreserves prior behaviour exactly.
Fixes¶
- Fix
SkillAgentskill injection no longer wipes conversation history when applied to an already-used client. It previously calledreset()to unlock the setter (clearingmessages); it now assignssystem_messagedirectly, which swaps the system entry in place.
v0.6.0 (2026-06-04): Output utilities, model weight caching, and experiment checkpointing¶
Breaking changes¶
- Breaking Renamed
HuggingFaceImageModel.FLUX_DEV→FLUX_1_DEVandFLUX_SCHNELL→FLUX_1_SCHNELLfor naming consistency with theFLUX_2_KLEIN_4B/FLUX_2_KLEIN_9Bmembers. The underlying model id strings (black-forest-labs/FLUX.1-dev,black-forest-labs/FLUX.1-schnell) are unchanged. Update enum references;"hf:black-forest-labs/FLUX.1-dev"string-form usage is unaffected. - Behavior change
builtin.computenow includesexecute_pythonalongsidecalculate. If you were passingtools=builtin.computeand want to exclude the sandboxed REPL, switch totools=[builtin.calculate]explicitly.ALL_TOOLSandmake_tools()are unchanged (opt-in only viapython_sandbox=True).
Output utilities¶
- New
aimu.parse_json_response(text, schema=None): extract JSON from any LLM response string using three extraction strategies (raw parse, fenced code block,{…}substring). Pass a dataclass class or Pydantic v2BaseModelasschemato coerce the parsed dict into a typed object. RaisesValueErroron all-strategy failure with the first 200 characters of the response included. Exported fromaimu.models._json,aimu.models, and top-levelaimu. - New
aimu.generate_json(client, prompt, schema=None, *, retries=2, generate_kwargs=None): callclient.generate()and parse the result as JSON, retrying up toretriestimes on parse failure. Convenience wrapper aroundparse_json_response. - New
aimu.extract_tool_calls(messages): convert an OpenAI-format message list (e.g.agent.model_client.messages) into a flatlist[dict]of{iteration, tool, arguments, result}records. Handles bothargumentsandparameterskey names for cross-model compatibility. Replaces manual reconstruction boilerplate common in agentic scripts.
Model weight caching¶
- New All four in-process HuggingFace clients (
HuggingFaceClient,HuggingFaceImageClient,HuggingFaceAudioClient,HuggingFaceSpeechClient) now maintain a module-level weight registry keyed on(spec.id, *sorted_model_kwargs). A second client instance with the same model and construction kwargs reuses already-loaded weights rather than callingfrom_pretrained()again. The text client checks on construction; the lazy-loading modality clients check on first load.LlamaCppClienthas the same pattern with key(model_path, n_ctx, n_gpu_layers, chat_format). - New
aimu.clear_hf_cache(model=None): evict HuggingFace weight entries from all four modality registries and callgc.collect()+cuda.empty_cache(). Pass a model enum member to clear just that model; passNoneto clear all. - New
aimu.clear_llamacpp_cache(model=None): same forLlamaCppClient.
Tools¶
- New
execute_python(code)built-in tool inbuiltin.compute. Executes sandboxed Python in a fresh namespace per call, captures stdout, and returns the last expression value. Allowed imports:math,statistics,json,re,itertools,functools,datetime, andnumpy/pandas/scipy/matplotlibwhen installed. Filesystem (open,os,pathlib) and subprocess access are blocked. Not included inALL_TOOLS; opt in viatools=builtin.computeormake_tools(python_sandbox=True). - New
make_tools(..., python_sandbox=False): newpython_sandbox=kwarg appendsexecute_pythonwhenTrue. - New
make_memory_tools(store)inaimu.tools.builtin: wraps anyMemoryStoreinstance as three@tool-decorated functions (store_memory,search_memories,list_memories) for direct in-process agent use. Unlike the image/audio/speech built-in tools, there is no lazy singleton: the store is always explicit because persistence semantics (persist_path, backend, collection name) are meaningful caller choices. Works withSemanticMemoryStore,DocumentStore, or anyMemoryStoresubclass. For cross-process or multi-agent memory, the existing FastMCP servers (aimu.memory.mcp/aimu.memory.document_mcp) remain the recommended path. - New
builtin.make_tools(..., memory_store=None): newmemory_store=kwarg appendsmake_memory_tools(store)to the assembled tool list when provided.
Agents and workflows¶
- New
Agent.restore(messages): restore an agent from a savedlist[dict](OpenAI message format) for resuming after failure. Callsmodel_client.reset(), strips the leading system message to prevent duplication on the nextchat(), and setsmodel_client.messages. The live partial state after a failed run is onagent.model_client.messages(not the post-run snapshot fromagent.messages). - New
EvaluatorOptimizer.restore(messages): delegates togenerator.restore(). - New
Chain.restore(messages, step=0): restores the specified step's agent client.
Documentation¶
- New
docs/how-to/using-llms-inside-tools.md: covers the history pollution problem,generate()for stateless in-tool LLM calls, the HuggingFace weight caching model (includingclear_hf_cache()/clear_llamacpp_cache()), and the save/restore checkpointing pattern with a full try/except example.
v0.5.1 (2026-06-01): Image-to-image, FLUX.2 Klein, and curated model catalog¶
Image generation¶
- New Image-to-image (img2img) support: pass
reference_image=toBaseImageClient.generate()(and all subclasses). Accepts a file path string,pathlib.Path, raw bytes, data URL, http(s) URL, or PIL Image. HuggingFace derives the img2img pipeline from the loaded txt2img pipeline viafrom_pipe()(shared weights, no extra VRAM).strength=(default0.75) controls deviation from the reference for FLUX.1-style pipelines.width/heightare ignored; output size is derived from the reference image. Gemini passes the reference as inline PNG data in a multipart request, enabling image editing. - New
HuggingFaceImageModel.FLUX_2_KLEIN_4BandFLUX_2_KLEIN_9B: FLUX.2 Klein by Black Forest Labs. 4-step distilled model with improved text rendering, better hand/face quality, and higher resolution support. UsesFlux2KleinPipeline(diffusers 0.37+), a unified pipeline that handles both txt2img and img2img natively (image=parameter, nostrength).img2img_uses_strength=Falseon the spec distinguishes it from FLUX.1-style img2img. - New
HuggingFaceImageSpec.img2img_pipeline_class: diffusers class name for the img2img variant (e.g."StableDiffusionImg2ImgPipeline");Nonefor ad-hoc"hf:<repo>"strings. - New
HuggingFaceImageSpec.img2img_uses_strength:True(default) for strength-based pipelines;Falsefor unified pipelines like FLUX.2 Klein that condition on the reference image directly. - New
aimu.models._images._reference_image_to_pil(): shared helper used by both HF and Gemini image clients to normalise any reference image input form to a PIL Image. - Changed
scripts/hotdog_loop.pyabsorbshotdog_climbing.py: the two scripts shared identical structure and differed only in their acceptance policy. Pass--strategy climbingfor hill-climbing behaviour (keep best, revert on non-improvement);--strategy greedy(default) preserves the original loop behaviour.hotdog_climbing.pyis removed. - New
scripts/hotdog_img2img.py: iterative hotdog refinement via img2img + strength annealing. Hill-climbs in image space (always refines from the best image, not the most recent) while annealingstrengthfrom high (explore) to low (polish). Detects and warns when the active model does not supportstrength(e.g. FLUX.2 Klein).
Negative prompts¶
- New
ImageSpec.supports_negative_promptcapability flag.Trueby default;Falsefor guidance-distilled / conversational models that have no negative-prompt parameter, such asHuggingFaceImageModel.FLUX_2_KLEIN_4B/_9Band the entire Gemini image family (GeminiImageSpecdefaults it toFalse). - Behavior
BaseImageClient.generate()now raisesValueErrorifnegative_prompt=is passed to a model whose spec setssupports_negative_prompt=False, instead of crashing deep in the pipeline (HuggingFace) or silently ignoring it (Gemini). Callers branch onspec.supports_negative_promptand fold avoidance into the prose prompt for unsupporting models. The hotdog scripts do this via a newnegative_prompt_plan()helper (native kwarg → summarizer-folded positive constraints → prompt suffix, by model).
Curated model catalog (breaking for unknown ids)¶
- Breaking Model id strings must name a model AIMU ships a spec for. Passing an arbitrary
"hf:<unknown-repo>"/"gemini:<unknown-id>"/"openai:<unknown-id>"to an image, audio, or speech client now raisesValueError(listing available ids) instead of fabricating a spec with guessed capabilities. Text was always strict (resolve_model_stringraises); this brings the other modalities in line. For a one-off custom model, construct the provider spec and pass the object (e.g.ImageClient(HuggingFaceImageSpec(...))), the explicit escape hatch. - Fixed A
"provider:model_id"string for a known model now resolves to the same spec object as the equivalent enum member, so capabilities are identical regardless of construction path. Previously the string form fabricated a default spec; e.g."hf:black-forest-labs/FLUX.2-klein-4B"lostsupports_negative_prompt=False/img2img_uses_strength=False, and"hf:suno/bark"lost BARK'sdefault_voice. - Removed The
_REPO_PIPELINE_HINTSrepo-prefix capability-guessing heuristics in the HuggingFace audio and speech clients (dead once unknown ids raise).
v0.5.0 (2026-05-31): Async, audio, speech, and default models¶
A feature release on top of the v0.4 redesign: a full async surface, two new output modalities (audio and speech), a cloud image provider, automatic default-model resolution, and streaming tools. No breaking changes to the v0.4 sync API.
Async surface (aimu.aio)¶
- New
aimu.aiomirrors the entire public sync API one-for-one, with the same class names in a different namespace. Switch paradigms with one import line plusawait. Exportschat,client,Agent,SkillAgent,Chain,Router,Parallel,EvaluatorOptimizer,PlanExecuteEvaluator,OrchestratorAgent,MCPClient. Imported by default, sofrom aimu import aioneeds no separate install. - New
aio.Parallelandconcurrent_tool_calls=Trueuseasyncio.TaskGroupfor structured concurrency: sibling cancellation on first failure,ExceptionGroupaggregation. - New Native async providers: Anthropic, OpenAI, Gemini, Ollama, and every OpenAI-compatible endpoint. In-process providers (HuggingFace, LlamaCpp) wrap an existing sync client so weights load only once (
aio.client(sync_client)). - New async
MCPClientbuilt on FastMCP's native asyncClient(no anyio portal); construct viaawait MCPClient.connect(...). The syncMCPClientremains first-class. - New
@toolasync detection (__tool_is_async__):async deftools are awaited directly; sync CPU-bound tools are routed throughasyncio.to_threadso the event loop stays free. - Note Streaming on the async surface returns
AsyncIterator[StreamChunk](consume withasync for); the sync surface returnsIterator[StreamChunk]. TheStreamChunktype itself is identical on both. - Requirement Python 3.11+ is now required (the async surface uses
asyncio.TaskGroup,asyncio.timeout, and nativeExceptionGroup).
Audio generation¶
- New
aimu.audio_client()/aimu.generate_audio()+AudioClientfactory +BaseAudioClientABC, parallel to the text and image surfaces. - New HuggingFace audio models: MusicGen small/medium/large (32 kHz, token-autoregressive), AudioLDM2 (16 kHz, diffusion), Stable Audio Open (44.1 kHz stereo, diffusion).
- New
AUDIO_GENERATINGStreamChunkphase +StreamChunk.is_audio_progress(). Streaming progress for diffusers-backed models. - New
encode_audio()output formats:numpy(default),bytes,data_url,path(WAV viasoundfile). - New Built-in
generate_audiostreaming tool +make_audio_tool(client, duration_s=);builtin.audiosubgroup.
Speech (text-to-speech)¶
- New
aimu.speech_client()/aimu.generate_speech()+SpeechClientfactory +BaseSpeechClientABC. - New Providers: HuggingFace local (SpeechT5, MMS-TTS, BARK) and OpenAI cloud (
tts-1,tts-1-hd). - New
SPEECH_GENERATINGStreamChunkphase +StreamChunk.is_speech_progress(); OpenAI byte-chunk streaming. - New Built-in
generate_speechstreaming tool +make_speech_tool(client, voice=, speed=);builtin.speechsubgroup.
Image generation¶
- New Google Gemini "Nano Banana" cloud provider (
GeminiImageClient,gemini-2.5-flash-image) under the[google]extra, dispatched viaaimu.image_client("gemini:..."). - New
aimu.image_client()accepts ad-hoc"hf:<repo_id>"and"gemini:<id>"strings in addition to enum members. - New Streaming image generation:
IMAGE_GENERATINGchunks during denoising, with optional per-step latent previews viapreview_every=N(HuggingFace diffusers). - New Built-in
generate_imagestreaming tool,make_image_tool(client, preview_every=), andmake_describe_image_tool(client)(binds vision Q&A to a vision-capable chat client);builtin.imagesubgroup.
Default-model resolution¶
- New
model=is now optional onaimu.chat()/aimu.client()/aimu.agent(). When omitted, AIMU resolves a text default:AIMU_LANGUAGE_MODEL("provider:model_id") first, otherwise an already-available local model (running Ollama → cached HuggingFace model → running local OpenAI-compatible server), restricted to enum-known ids and preferring tool-capable ones. A cloud provider is never auto-selected and weights are never downloaded implicitly. - New
AIMU_IMAGE_MODEL/AIMU_AUDIO_MODEL/AIMU_SPEECH_MODELprovide defaults for the image/audio/speech entry points (env-var only; an unset var raises a clearValueError).
Tools and vision¶
- New Streaming tools: a generator-function
@toolmayyieldStreamChunkobjects mid-execution (flag__tool_is_streaming__); the agent forwards them throughagent.run(stream=True). The tool's recorded response resolves from itsreturnvalue, the last chunk'sresult, orstr(last_chunk.content). - New
images=is now accepted on statelessgenerate()(one-shot vision Q&A that does not touchself.messages), in addition to statefulchat(). - New
builtin.make_tools(base_client, image_client=None, audio_client=None, speech_client=None)assembles the full built-in tool list with automatic image/vision/audio/speech wiring.
Examples¶
- Changed The full-featured Streamlit chatbot (
web/streamlit_chatbot.py) gains image, audio, and speech generation, plus optional TTS narration of completed responses.
v0.4 (2026-05-26): API redesign¶
Breaking changes across four areas, plus the new documentation site.
Top-level API¶
- New
aimu.chat(user_message, *, model, ...): one-shot chat with a model string or enum. - New
aimu.client(model, *, system=None, **kwargs): one-lineModelClientfactory. - New
aimu.resolve_model_string("provider:model_id"): model-string parser. - New
ModelClientnow accepts a"provider:model_id"string in addition to enum members.
Model clients¶
- New
ModelSpecfrozen dataclass replaces positional enum tuples. AllModelenums migrated. - New
client.reset(system_message="__keep__")clears history and unlocks the system-message setter. - Breaking
system_messageis immutable after the firstchat()call. The setter raisesRuntimeError; callreset()to unlock. - New
include=[...]stream filter onchat()andgenerate()selects phases ("thinking","tool_calling","generating","done"). - Internal Abstract methods renamed
chat → _chat,generate → _generate. Concretechat/generateon the base class apply theincludefilter and delegate. - New Memory-aware GPU placement for
HuggingFaceImageClient: on load it measures the pipeline size and each GPU's free VRAM (accounting for other processes), then pins to the freest GPU or falls back to model / sequential CPU offload so large models (SD3, FLUX) load without OOM. Override withmodel_kwargs={"device": "cuda:1"}or{"device_map": ...}. Audio/speech clients take the same{"device": ...}hint. Sharedaimu/models/_hf_device.pyhelpers back all three. - New
ImageSpec.max_prompt_tokensrecords the model's text-encoder prompt budget (77 for CLIP, 256/512 for T5 models like SD3/FLUX,Nonefor uncapped cloud models), exposed onBaseImageClient. Use it to size prompts to the model. - Changed
HuggingFaceImageClientnow defaultstorch_dtypeper device (bf16 on CUDA, fp16 on MPS, fp32 on CPU) instead of"auto", which could silently load in fp32 and double VRAM. Passmodel_kwargs={"torch_dtype": ...}to override.
Agents¶
- Breaking
Agentconstructor signature changed:Agent(model_client, system_message=None, name=None, tools=None, ...).system_messageis the second positional argument;nameis optional (auto-derived). - Breaking
AgenticModelClientremoved from the public API. Useagent.as_model_client()instead. - Breaking
OrchestratorAgent._setup_orchestratorrenamed to_init_orchestrator. - New
OrchestratorAgent.assemble(client, system_message, workers=[...])factory builds an orchestrator without subclassing. - New Workflow factories:
Chain.from_client(client, prompts),Router.from_client(client, classifier_prompt, handlers),Parallel.from_client(client, worker_prompts, aggregator_prompt=),PlanExecuteEvaluator.from_client(client, ...). - Breaking
BaseAgentandWorkflowABCs removed. All concrete agents and workflows inherit directly fromRunner. The agent-vs-workflow split survives as a conceptual category in the docs. - Breaking
AgentChunkandChainChunkcollapsed intoStreamChunk, with no back-compat aliases.chunk.agent_name → chunk.agent;chunk.step → chunk.iteration.
Tools¶
- New
@toolraisesToolSignatureErrorat decoration time on unsupported signatures (*args/**kwargs, params with no type hint and no default). - New
Optional[T]andT | Noneunwrap to the inner type in tool specs. - New Built-in tool subgroups:
builtin.web,builtin.fs,builtin.compute,builtin.misc. - New
MCPClientraisesMCPConnectionError(rather than silently failing) on construction or call failure. Added.ping()method.
Skills¶
- Breaking
SkillManagerraisesSkillLoadErroron malformedSKILL.md(instead of silently skipping). - Breaking
SkillManager.get_skill_body()raisesSkillNotFoundErroron unknown skill name (instead of returning a sentinel string). - New Skill catalogue prompt includes script-derived tool names inline.
- Breaking
Skillrenamed toAgentSkill(no back-compat alias). - New Skills logged at
INFOon discovery.
Documentation¶
- New documentation site built with MkDocs Material and hosted on GitHub Pages.
- Diátaxis structure: tutorials, how-to guides, reference, explanation.
- README slimmed to landing-page size.
Earlier versions¶
This is the first formal changelog entry. Prior versions tracked changes via git history; consult git log on GitHub for v0.3.x and earlier.