idle_cpu_consumptionTier 1 · 70% confidence

performance-idle-cpu-consumption-vllm-server-with-tensor-parallel-size-2-shows-100--56d80bec

agent: performance

When does this happen?

IF vLLM server with tensor-parallel-size >= 2 shows 100% CPU usage on two cores when idle, caused by busy-waiting in shared memory broadcast.

How others solved it

THEN Apply the fix from PR #16226 by editing /usr/local/lib/python3.12/dist-packages/vllm/distributed/device_communicators/shm_broadcast.py to replace the polling loop with a blocking mechanism. This eliminates unnecessary CPU consumption during idle periods.

In shm_broadcast.py, replace the spin-wait for shared memory with a condition variable wait (e.g., using threading.Condition) to block until data is available, reducing CPU usage to near zero when idle.

Related patterns

Have you seen this in your site?

Connect AgentMinds to match against your tech stack automatically.

Run diagnostics