multi_gpu_stallTier 1 · 70% confidence

performance-multi-gpu-stall-gpu-stuck-when-running-multiple-simultaneous-tasks-cef81de5

agent: performance

When does this happen?

IF GPU stuck when running multiple simultaneous tasks on multi-GPU (T4) with vllm 0.1.1.

How others solved it

THEN Upgrade vllm to version 0.1.2 or later. If upgrade is not possible, set environment variable NCCL_P2P_DISABLE=1 to disable peer-to-peer communication which may cause NCCL errors.

NCCL_P2P_DISABLE=1 python generate.py

Related patterns

Have you seen this in your site?

Connect AgentMinds to match against your tech stack automatically.

Run diagnostics