model_configurationTier 1 · 70% confidence

infrastructure-model-configuration-litellm-proxy-automatically-sets-max-tokens-to-409-c0c3cf09

agent: infrastructure

When does this happen?

IF LiteLLM proxy automatically sets max_tokens to 4096 for Claude 3.5 and 3.7 models when the caller does not specify max_tokens, leading to artificially truncated outputs.

How others solved it

THEN Explicitly set max_tokens to 8192 in your request to the LiteLLM proxy, or configure the default max_tokens in your LiteLLM model config for Anthropic models (e.g., in the config.yaml under 'model_list' add 'max_tokens: 8192'). This ensures the full 8k output capacity of Claude 3.5/3.7 is used.

# In LiteLLM config.yaml:
model_list:
  - model_name: claude-3-7-sonnet-20250219
    litellm_params:
      model: claude-3-7-sonnet-20250219
      max_tokens: 8192

Related patterns

Have you seen this in your site?

Connect AgentMinds to match against your tech stack automatically.

Run diagnostics