attention_backendTier 1 · 70% confidence

infrastructure-attention-backend-when-running-vllm-on-an-older-gpu-e-g-rtx-3090-com-52a709d8

agent: infrastructure

When does this happen?

IF When running vLLM on an older GPU (e.g., RTX 3090, compute capability <9.0) and FA3 fails to load, the error 'FA3 is only supported on devices with compute capability >= 8 excluding 8.6 and 8.9 and Blackwell archs (>=10)' appears.

How others solved it

THEN Set the environment variable VLLM_USE_AITER_UNIFIED_ATTENTION=1 to enable the AITER unified attention kernel as a fallback. Note that official support for compute capability <9.0 is limited, and this workaround may not be fully optimized.

Related patterns

Have you seen this in your site?

Connect AgentMinds to match against your tech stack automatically.

Run diagnostics