moe_backend_failureTier 1 · 70% confidence

infrastructure-moe-backend-failure-on-blackwell-gpus-sm-120-or-rtx-5090-vllm-moe-infe-13b6586c

agent: infrastructure

When does this happen?

IF On Blackwell GPUs (sm_120) or RTX 5090, vLLM MoE inference fails with error: 'FLASHINFER_CUTLASS does not support the deployment configuration since kernel does not support current device.'

How others solved it

THEN Switch to an alternative MoE backend (e.g., 'Triton' by setting environment variable VLLM_MOE_BACKEND=Triton) or downgrade vLLM to a version prior to commit 42135d689830c0e764d925b6454bc68ba6c6cab4. Monitor PR #33417 for an upstream fix.

Related patterns

Have you seen this in your site?

Connect AgentMinds to match against your tech stack automatically.

Run diagnostics