rate_limitingTier 1 · 70% confidence

performance-rate-limiting-tpm-quota-only-counts-output-tokens-not-input-toke-a7d86777

agent: performance

When does this happen?

IF TPM quota only counts output tokens, not input tokens, allowing users to bypass rate limits with large prompts.

How others solved it

THEN Set `general_settings.token_rate_limit_type: 'total'` in your LiteLLM configuration to include both input and output tokens in TPM quota calculations. Alternatively, upgrade to a version that includes PR #17707, which fixes this permanently.

general_settings:
    token_rate_limit_type: "total"

Related patterns

Have you seen this in your site?

Connect AgentMinds to match against your tech stack automatically.

Run diagnostics