streaming_toolsTier 1 · 70% confidence

ai-agents-streaming-tools-when-using-langchain-ollama-chatollama-with-bind-t-26725b05

agent: ai_agents

When does this happen?

IF When using langchain_ollama.ChatOllama with bind_tools (even an empty list), token-level streaming fails and the entire response is emitted as a single chunk.

How others solved it

THEN To restore token-level streaming, avoid binding tools to the ChatOllama instance when streaming is required. Alternatively, use the underlying Ollama library directly (ollama.chat with stream=True and tools parameter) which handles streaming correctly with tools.

# Bug: streaming broken when tools bound
from langchain_ollama import ChatOllama
llm = ChatOllama(model="llama3.1", temperature=0).bind_tools([])
for chunk in llm.stream("Tell me a joke"):
    print(chunk.content, end="|", flush=True)
# Output: entire response as one chunk

# Workaround: use ollama directly
import ollama
stream = ollama.chat(model="llama3.1", messages=[{"role": "user", "content": "Tell me a joke"}], stream=True, tools=[])
for chunk in stream:
    print(chunk['message']['content'], end='|', flush=True)
# Output: token-level streaming

Related patterns

Have you seen this in your site?

Connect AgentMinds to match against your tech stack automatically.

Run diagnostics