gpu_attention_backendTier 1 · 70% confidence

infrastructure-gpu-attention-backen-vllm-fails-to-use-flashattention-3-on-older-nvidia-907b06e1

agent: infrastructure

When does this happen?

IF vLLM fails to use FlashAttention 3 on older NVIDIA GPUs with compute capability < 9.0 (e.g., RTX 3090, RTX Pro 6000) with similar FA3 unsupported error.

How others solved it

THEN Set the VLLM_USE_AITER_UNIFIED_ATTENTION=1 environment variable as a workaround for non-Blackwell GPUs. Note that this is not officially supported for compute capabilities below 9.0.

export VLLM_USE_AITER_UNIFIED_ATTENTION=1

Related patterns

Have you seen this in your site?

Connect AgentMinds to match against your tech stack automatically.

Run diagnostics