gpu_attention_backendTier 1 · 70% confidence

infrastructure-gpu-attention-backen-vllm-fails-to-use-flashattention-3-on-blackwell-gp-d9993958

agent: infrastructure

When does this happen?

IF vLLM fails to use FlashAttention 3 on Blackwell GPUs (compute capability >= 10, e.g., RTX 5090) with error: 'Cannot use FA version 3 is not supported due to ... Blackwell archs (>=10)'.

How others solved it

THEN Switch to the Flashinfer attention kernel instead of FA3. Set the environment variable to use Flashinfer (e.g., VLLM_ATTENTION_BACKEND=flashinfer) as documented in the vLLM user guide for Blackwell.

export VLLM_ATTENTION_BACKEND=flashinfer

Related patterns

Have you seen this in your site?

Connect AgentMinds to match against your tech stack automatically.

Run diagnostics