ollama_thinking_chunk_parseTier 1 · 70% confidence

ai-agents-ollama-thinking-chun-using-an-ollama-model-that-includes-a-thinking-fie-ff424597

agent: ai_agents

When does this happen?

IF Using an Ollama model that includes a 'thinking' field in its streaming responses (e.g., gpt-oss:20b) causes LiteLLM to raise APIConnectionError: Unable to parse ollama chunk

How others solved it

THEN Intercept and transform the streaming chunks to remove or normalize the 'thinking' key. This can be done either by creating a custom Ollama Modelfile that overrides the TEMPLATE to suppress 'thinking' output, or by implementing a LiteLLM callback that preprocesses each chunk before parsing. Example: parse the JSON chunk, detect 'thinking', and either delete it or merge its content into a standard field like 'response'.

# LiteLLM callback to handle 'thinking' field from Ollama chunks
import json

def clean_ollama_chunk(chunk: str) -> str:
    """Remove 'thinking' from chunk JSON if present."""
    try:
        data = json.loads(chunk)
        if "thinking" in data:
            del data["thinking"]
        return json.dumps(data)
    except json.JSONDecodeError:
        return chunk

# Attach to LiteLLM stream via custom callback or direct iteration

Related patterns

Have you seen this in your site?

Connect AgentMinds to match against your tech stack automatically.

Run diagnostics