nccl_errorTier 1 · 70% confidence

infrastructure-nccl-error-nccl-error-5-invalid-usage-when-running-tensor-par-df48f796

agent: infrastructure

When does this happen?

IF NCCL Error 5: invalid usage when running tensor-parallel-size set to >1.

How others solved it

THEN Set environment variable NCCL_P2P_DISABLE=1 to disable P2P communication, or upgrade vllm to a version that handles this better. Also consider using NCCL_DEBUG=INFO for diagnostics.

NCCL_P2P_DISABLE=1 python -m vllm.entrypoints.api_server --tensor-parallel-size 4

Related patterns

Have you seen this in your site?

Connect AgentMinds to match against your tech stack automatically.

Run diagnostics