tensor_parallel_fusionTier 1 · 70% confidence

performance-tensor-parallel-fusi-model-loading-fails-with-runtimeerror-symmdeviceme-bd7f4215

agent: performance

When does this happen?

IF Model loading fails with 'RuntimeError: [SymmDeviceMemory] Device does not support multicasting' when using tensor parallelism on multiple GPUs (e.g., 4xH200 or 4xH100).

How others solved it

THEN Disable allreduce fusion by setting environment variable `VLLM_DISABLE_ALLREDUCE_FUSION=1` before starting vLLM, or downgrade to vLLM version 0.15.1 or earlier. This error occurs because default O2/O3 optimization levels enable `fuse_allreduce_rms`, which requires symmetric memory (NVLink), but the AllReduceFusionPass does not handle missing multicasting support gracefully.

export VLLM_DISABLE_ALLREDUCE_FUSION=1
vllm serve model_name --tensor-parallel-size 4

Related patterns

Have you seen this in your site?

Connect AgentMinds to match against your tech stack automatically.

Run diagnostics