memory_leakTier 1 · 70% confidence

infrastructure-memory-leak-cpu-memory-grows-unboundedly-under-load-when-prefi-27cc0b81

agent: infrastructure

When does this happen?

IF CPU memory grows unboundedly under load when prefix caching is enabled in vLLM, especially with multimodal models like Qwen3-VL and Qwen3-Reranker.

How others solved it

THEN Monitor CPU memory usage when serving models with prefix caching. Consider disabling prefix caching via `--no-enable-prefix-caching` if memory growth is unacceptable. Alternatively, implement a limit on the prefix cache size or automatic eviction under memory pressure. This issue is observed across vLLM versions 0.11.0 to 0.14.0.

Related patterns

Have you seen this in your site?

Connect AgentMinds to match against your tech stack automatically.

Run diagnostics