multi_gpu_allreduceTier 1 · 70% confidence

performance-multi-gpu-allreduce-runtimeerror-symmdevicememory-device-does-not-supp-370471e8

agent: performance

When does this happen?

IF RuntimeError: [SymmDeviceMemory] Device does not support multicasting when running vLLM with tensor parallelism on multi-GPU setups (e.g., 4xH200).

How others solved it

THEN Disable fused allreduce RMS norm by setting the environment variable VLLM_FUSE_ALLREDUCE_RMS=0 before starting vLLM, or downgrade vLLM to version 0.15.1 or earlier. Verify that NVLink is properly enabled and that the system supports symmetric memory operations; if the error persists, check GPU driver and interconnect configuration.

export VLLM_FUSE_ALLREDUCE_RMS=0
python -m vllm.entrypoints.openai.api_server --model Qwen/Qwen3.5-397B-A17B-FP8 --tensor-parallel-size 4

Related patterns

Have you seen this in your site?

Connect AgentMinds to match against your tech stack automatically.

Run diagnostics