AIMU AIMU

AIMU¶

AI Modeling Utilities: a lightweight Python library for building AI-powered applications with a consistent, provider-agnostic interface across text, images, audio, and speech.

Language models are the primary building block, with the same interface extending to image generation, audio generation, and text-to-speech. AIMU separates autonomous agents from code-controlled workflows, and treats agents as composable units that can be used anywhere a plain model client is accepted. Tool integration is structural (not a plugin), semantic and document memory can be dropped in, and a prompt-tuning loop optimises prompts against labelled data without ML machinery.

Install¶

pip install aimu[all]

Or pick the providers you need: aimu[ollama], aimu[anthropic], aimu[openai_compat] (also enables OpenAI TTS), aimu[hf] (text + HF image + audio + TTS), aimu[google] (Google Nano Banana image), aimu[llamacpp].

Quick start¶

import aimu

# One-shot
text = aimu.chat("Hello", model="anthropic:claude-sonnet-4-6")

# Multi-turn
client = aimu.client("ollama:qwen3.5:9b", system="You are concise.")
client.chat("Hi there")
client.chat("What did I just say?")     # history preserved

That's the full mental model: a chat() function for one-shots, a client() factory for conversations, and provider:model_id strings to swap backends.

Where to next¶

Tutorials

Hands-on walkthroughs. Start here if you're new: install to first working agent in 15 minutes.
How-to guides

Task-oriented recipes. "How do I swap providers / write a tool / stream output / benchmark models?"
Reference

The full API surface, capability matrices, environment variables, and CLI commands.
Explanation

The why. Architecture, design principles, the agent/workflow taxonomy, and what AIMU deliberately doesn't do.

What's in the box¶

Provider-agnostic clients: Ollama, HuggingFace, llama-cpp, Anthropic, OpenAI, Gemini, plus every OpenAI-compatible local server (LM Studio, vLLM, SGLang, llama-server, HF Transformers Serve).
Text-to-image and image-to-image: aimu.image_client() and aimu.generate_image() parallel the text surface. HuggingFace diffusers for local generation (SD 1.5 / SDXL / SD 3.5 / FLUX 1 / FLUX 2 Klein), Google Nano Banana for cloud. Pass reference_image= to any generate() call for img2img. Drops into any chat agent via the built-in generate_image tool. See how-to: generate images.
Text-to-audio: aimu.audio_client() and aimu.generate_audio() for music and sound generation (not TTS). HuggingFace MusicGen, AudioLDM2, and Stable Audio Open. See how-to: generate audio.
Text-to-speech: aimu.speech_client() and aimu.generate_speech() for TTS. HuggingFace MMS-TTS/BARK locally; OpenAI tts-1/tts-1-hd in the cloud. Live sentence-by-sentence narration in the Streamlit chatbot. See how-to: generate speech.
Agents and workflows: Agent for autonomous tool-using loops; Chain / Router / Parallel / EvaluatorOptimizer for code-controlled patterns from Anthropic's Building Effective Agents.
Tools: @tool decorator for plain Python functions, plus a synchronous MCPClient wrapper for cross-process tools.
Skills: filesystem-discovered SKILL.md files that auto-inject capabilities into a SkillAgent.
Memory: semantic facts (ChromaDB), path-based documents (Anthropic Memory API), and conversation history (TinyDB).
Prompt management: versioned SQLite catalog plus a hill-climbing tuner with classification, multi-class, extraction, and judged variants.
Evaluation: DeepEval integration and a multi-model benchmark harness with CSV / JSON / catalog export.
Optional async surface: aimu.aio mirrors the whole sync API (same class names, one-import-away). Parallel and concurrent_tool_calls use asyncio.TaskGroup for structured concurrency. See async design.

Examples¶

The examples/ directory ships larger, real-world programs organized by theme: text-refinement/ and image-refinement/ (the same generate → judge → refine loop in two modalities, each implemented as a code loop, an Agent, an EvaluatorOptimizer workflow, and simulated annealing), news-summarizer/ (one task solved with Agent, Chain, Parallel, and OrchestratorAgent), and skills/ (demo skills for SkillAgent discovery). See the examples overview.

Notebooks¶

The notebooks/ directory ships 26 runnable demos ordered to build up incrementally, from 01-model-client, 03-structured-output, and 06-tools through 07-agents, 11-embeddings, 13-rag, and the generative-modality and 22-async notebooks. They are authored as plain-text Quarto .qmd files (markdown with executable python cells); the numbered filenames are self-describing, so browse the directory to read or run them.

Web apps¶

The examples/web/ directory ships two Streamlit chat applications (install the UI stack with pip install aimu[web]). streamlit_chatbot_basic.py (~70 lines) is a minimal showcase (provider/model selector, streaming chat, built-in tools) illustrating how little code a working AIMU chatbot takes. streamlit_chatbot.py is a full-featured version that adds image generation, audio generation, speech narration (live sentence-by-sentence TTS as the model streams), agentic mode, thinking display, and generation sliders; it's intended as an extensible starting point for more sophisticated apps. A Gradio variant is also included. The personal-assistant example also ships a WebSocket front end (web_assistant.py) that streams replies and pushes proactive messages to the browser.