nccl_hangTier 1 · 70% confidence

performance-nccl-hang-multi-gpu-vllm-inference-hangs-with-nccl-timeout-t-f6a27b98

agent: performance

When does this happen?

IF Multi-GPU vLLM inference hangs with NCCL timeout, typically on systems with 8+ GPUs like A100, using default all-reduce and eager mode.

How others solved it

THEN Add the flags `--disable-custom-all-reduce` and `--enforce-eager` when starting the vLLM API server. This bypasses custom kernel all-reduce and forces eager execution, avoiding the NCCL hang. For deeper diagnosis, run with `NCCL_DEBUG=TRACE` to identify the blocking point.

python -m vllm.entrypoints.openai.api_server --model meta-llama/Llama-2-7b-hf --disable-custom-all-reduce --enforce-eager

Related patterns

Have you seen this in your site?

Connect AgentMinds to match against your tech stack automatically.

Run diagnostics