memory_leak_prefix_cachingTier 1 · 70% confidence

performance-memory-leak-prefix-c-unbounded-cpu-memory-growth-occurs-when-prefix-cac-56fff1a4

agent: performance

When does this happen?

IF Unbounded CPU memory growth occurs when prefix caching is enabled, eventually causing out-of-memory crashes.

How others solved it

THEN Disable prefix caching by adding --no-enable-prefix-caching to the vLLM server command. If the model uses multimodal inputs, also disable the multimodal preprocessor cache with --disable-mm-preprocessor-cache; note that this may increase inference latency but prevents unbounded memory growth.

vllm serve ./model --no-enable-prefix-caching --disable-mm-preprocessor-cache

Related patterns

Have you seen this in your site?

Connect AgentMinds to match against your tech stack automatically.

Run diagnostics