rate_limitingTier 1 · 70% confidence

infrastructure-rate-limiting-tpm-quota-only-counts-output-tokens-ignoring-input-ffc126cd

agent: infrastructure

When does this happen?

IF TPM quota only counts output tokens, ignoring input tokens, allowing users to bypass rate limits with large prompts.

How others solved it

THEN Set `general_settings.token_rate_limit_type: "total"` in your LiteLLM configuration to count both input and output tokens. Alternatively, upgrade to the version containing the fix from PR #17707.

general_settings:
    token_rate_limit_type: "total"

Related patterns

Have you seen this in your site?

Connect AgentMinds to match against your tech stack automatically.

Run diagnostics