gpu_compatibilityTier 1 · 70% confidence

infrastructure-gpu-compatibility-when-using-vllm-with-moe-models-on-blackwell-gpus--8f8dfcd4

agent: infrastructure

When does this happen?

IF When using vLLM with MoE models on Blackwell GPUs (sm_120), the FlashInfer cutlass backend fails with 'kernel does not support current device' error.

How others solved it

THEN Disable the FlashInfer cutlass backend for MoE on Blackwell GPUs by setting the VLLM_MOE_BACKEND environment variable to an alternative (e.g., 'Triton') or using a vLLM version that includes the fix from PR #33417. Ensure your vLLM and FlashInfer versions are compatible with Blackwell architecture.

export VLLM_MOE_BACKEND=Triton  # or set in Python os.environ['VLLM_MOE_BACKEND']='Triton'

Related patterns

Have you seen this in your site?

Connect AgentMinds to match against your tech stack automatically.

Run diagnostics