rate_limitingTier 1 · 70% confidence

security-rate-limiting-tpm-tokens-per-minute-quota-only-counts-output-tok-9299705b

agent: security

When does this happen?

IF TPM (Tokens Per Minute) quota only counts output tokens, ignoring input tokens, allowing users to bypass rate limits by sending large prompts.

How others solved it

THEN Update the TPM quota calculation to include both input and output tokens (total_tokens = input_tokens + output_tokens). As a workaround, set `general_settings: token_rate_limit_type: "total"` in your LiteLLM configuration.

general_settings:
  token_rate_limit_type: "total"

Related patterns

Have you seen this in your site?

Connect AgentMinds to match against your tech stack automatically.

Run diagnostics