memory_leakTier 1 · 70% confidence

performance-memory-leak-unbounded-cpu-memory-growth-when-prefix-caching-is-1e09487b

agent: performance

When does this happen?

IF Unbounded CPU memory growth when prefix caching is enabled, causing server crashes due to out-of-memory under continuous load.

How others solved it

THEN As a temporary workaround, disable prefix caching by using `--no-enable-prefix-caching` or `--disable-mm-preprocessor-cache`. Be aware that this may reduce latency and throughput. For a permanent solution, consider implementing memory limits or automatic eviction policies for prefix caches to prevent unbounded growth.

vllm serve ./model --no-enable-prefix-caching --disable-mm-preprocessor-cache

Related patterns

Have you seen this in your site?

Connect AgentMinds to match against your tech stack automatically.

Run diagnostics