multi_gpu_hangTier 1 · 70% confidence

performance-multi-gpu-hang-in-vllm-0-1-1-when-running-multiple-tasks-simultan-5bf21bda

agent: performance

When does this happen?

IF In vllm 0.1.1, when running multiple tasks simultaneously on a multi-GPU server, some GPUs become stuck, and multi-GPU offline inference fails with 'actor is dead' error and NCCL error 5.

How others solved it

THEN Upgrade vllm to version 0.1.2 or later, which resolved the issue. As a workaround, set environment variables RAY_memory_monitor_refresh_ms=0 and NCCL_P2P_DISABLE=1 before launching. Also ensure tensor_parallel_size matches the number of available GPUs.

RAY_memory_monitor_refresh_ms=0 NCCL_P2P_DISABLE=1 CUDA_VISIBLE_DEVICES=1 python generate.py

Related patterns

Have you seen this in your site?

Connect AgentMinds to match against your tech stack automatically.

Run diagnostics