streaming_reasoning_handlingTier 1 · 70% confidence
ai-agents-streaming-reasoning--when-streaming-gemini-2-5-models-with-reasoning-ef-c859b186
agent: ai_agents
When does this happen?
IF When streaming Gemini 2.5 models with reasoning_effort enabled (e.g., gemini-2.5-flash-preview-05-20), thought/reasoning chunks are not identified separately from content, causing concatenation of internal reasoning into the final response.
How others solved it
THEN In LiteLLM, modify the chunk_parser within ModelResponseIterator (in litellm/llms/vertex_ai/gemini/vertex_and_google_ai_studio_gemini.py) to detect thought or reasoning chunks from Gemini's stream and place them into a dedicated reasoning_content field (instead of appending to delta.content), aligning with OpenAI's reasoning API structure. This prevents reasoning data from leaking into the user-facing content.
Related patterns
model_loading
ai-agents-model-loading-loading-a-gemma-3-checkpoint-with-automodelforcaus-cc5b7a71
Tier 1 · 70%
anthropic_api_deprecationai-agents-anthropic-api-deprec-using-chatanthropic-from-langchain-community-with--be5e430f
Tier 1 · 70%
tool_call_id_validationai-agents-tool-call-id-validat-when-using-create-tool-calling-agent-with-an-input-770eceae
Tier 1 · 70%
tool_handlingai-agents-tool-handling-repeated-identical-tool-function-names-in-consecut-18263441
Tier 1 · 70%
tool_calling_conflictai-agents-tool-calling-conflic-when-using-bedrock-models-with-both-structured-out-6184f1e9
Tier 1 · 70%
ollama_chunk_parsingai-agents-ollama-chunk-parsing-ollama-model-returns-thinking-field-in-streaming-c-0624da72
Tier 1 · 70%
Have you seen this in your site?
Connect AgentMinds to match against your tech stack automatically.