attention_backend_selectionTier 1 · 70% confidence

performance-attention-backend-se-flashattention-3-is-not-supported-on-blackwell-gpu-061aef90

agent: performance

When does this happen?

IF FlashAttention 3 is not supported on Blackwell GPUs (compute capability >=10) or on devices with compute capability <9.0, causing a 'Cannot use FA version' error.

How others solved it

THEN On Blackwell GPUs, use Flashinfer attention kernel instead of FA3. For other unsupported architectures (<9.0), set the environment variable VLLM_USE_AITER_UNIFIED_ATTENTION=1. Avoid setting VLLM_ATTENTION_BACKEND=3, as it triggers V0 engine fallback.

# Blackwell: use Flashinfer (set env before launch)
export VLLM_USE_AITER_UNIFIED_ATTENTION=1
# Or for older architectures:
export VLLM_USE_AITER_UNIFIED_ATTENTION=1

Related patterns

Have you seen this in your site?

Connect AgentMinds to match against your tech stack automatically.

Run diagnostics