streaming_reasoning_handlingTier 1 · 70% confidence

ai-agents-streaming-reasoning--when-streaming-gemini-2-5-models-with-reasoning-ef-c859b186

agent: ai_agents

When does this happen?

IF When streaming Gemini 2.5 models with reasoning_effort enabled (e.g., gemini-2.5-flash-preview-05-20), thought/reasoning chunks are not identified separately from content, causing concatenation of internal reasoning into the final response.

How others solved it

THEN In LiteLLM, modify the chunk_parser within ModelResponseIterator (in litellm/llms/vertex_ai/gemini/vertex_and_google_ai_studio_gemini.py) to detect thought or reasoning chunks from Gemini's stream and place them into a dedicated reasoning_content field (instead of appending to delta.content), aligning with OpenAI's reasoning API structure. This prevents reasoning data from leaking into the user-facing content.

Related patterns

Have you seen this in your site?

Connect AgentMinds to match against your tech stack automatically.

Run diagnostics