Vision and streaming¶

In ~10 minutes you'll send an image to a vision-capable model and learn the full StreamChunk API along the way.

If you've done the first three tutorials, this one consolidates what you've seen and fills in the remaining I/O patterns.

Setup¶

You need a vision-capable model. The fastest free option is Ollama with Gemma 4:

ollama pull gemma4:e4b

Or use any cloud model that supports vision (GPT-4o, Claude 4.x, Gemini 1.5+/2.x).

import aimu

client = aimu.client("ollama:gemma4:e4b")
client.is_vision_model   # True

1. Send an image¶

Pass images=[...] to chat(). Each item can be a file path, raw bytes, an http(s) URL, or a data:image/... URL:

response = client.chat("What's in this image?", images=["./cat.jpg"])
print(response)

Multiple images? Just add to the list:

client.chat("Compare these.", images=["a.png", "b.png"])

For a one-shot "look once and answer" call that doesn't keep history, use generate(images=...) instead. It's the stateless sibling of chat():

caption = client.generate("Caption this image.", images=["./cat.jpg"])  # client.messages stays empty

client.messages keeps everything in OpenAI content-block format internally. Each provider adapts at request time; see how-to: handle vision for the per-provider details.

2. Stream the response¶

stream=True returns a StreamChunk iterator instead of a string:

for chunk in client.chat("Describe the scene in detail.", images=["./photo.jpg"], stream=True):
    if chunk.is_text():
        print(chunk.content, end="", flush=True)

3. The StreamChunk API¶

Each StreamChunk is a named tuple:

StreamChunk(phase, content, agent=None, iteration=0)

Field	Type	Meaning
`phase`	`StreamingContentType`	`THINKING` / `TOOL_CALLING` / `GENERATING` / `DONE`
`content`	`str` or `dict`	`str` for text phases; `{"name", "arguments", "response"}` for tool calls
`agent`	`str \\| None`	Agent name when emitted by an agent or workflow; `None` for plain `chat()`
`iteration`	`int`	Agent loop index or chain step; `0` for plain chat

Helpers cut the boilerplate:

chunk.is_text()        # True for THINKING, GENERATING
chunk.is_tool_call()   # True for TOOL_CALLING

4. Filter phases¶

To skip thinking output (e.g. show only the final response), pass include=[...]:

for chunk in client.chat("Solve this puzzle...", stream=True, include=["generating"]):
    print(chunk.content, end="")

Or to only show thinking (useful for debugging reasoning models):

for chunk in client.chat("Reason about this...", stream=True, include=["thinking"]):
    print(chunk.content, end="")

Accepts string names or StreamingContentType members.

5. Streaming through agents¶

The same chunk type flows through agents and workflows. Agent context shows up in chunk.agent and chunk.iteration:

from aimu.agents import Agent

agent = Agent(client, "You are a helpful image analyst.", name="vision-bot")

current_agent = None
for chunk in agent.run("Describe the scene", images=["./photo.jpg"], stream=True):
    if chunk.agent != current_agent:
        print(f"\n--- [{chunk.agent}] iteration {chunk.iteration} ---")
        current_agent = chunk.agent
    if chunk.is_text():
        print(chunk.content, end="")
    elif chunk.is_tool_call():
        print(f"\n[tool: {chunk.content['name']}]")

For chains, chunk.iteration is the chain step. For agent loops, it's the loop iteration. Same field, contextually meaningful.

6. Images through workflows¶

images= is threaded through every workflow's run():

Chain.run(task, images=[...]): forwarded only to step 0
Router.run(task, images=[...]): forwarded to the dispatched handler
Parallel.run(task, images=[...]): forwarded to every worker
EvaluatorOptimizer.run(task, images=[...]): forwarded only to the initial generator turn

from aimu.agents import Chain

chain = Chain.from_client(client, [
    "Describe what you see in detail.",
    "Summarise the description in one sentence.",
])
print(chain.run("Tell me about this photo.", images=["./photo.jpg"]))

Step 0 sees the image; step 1 sees only step 0's text output.

What's next¶

You've completed the tutorials. You now know:

The top-level API: aimu.chat(), aimu.client(), model strings
The agent loop and the @tool decorator
Four workflow patterns: Chain, Router, Parallel, EvaluatorOptimizer
Vision input and the StreamChunk API

For specific tasks, browse the how-to guides. For the design intent behind any of this, see explanation.