flash_attention_compatibilityTier 1 · 70% confidence

performance-flash-attention-comp-flashattention-3-fa3-is-not-supported-on-nvidia-bl-0bdf1e8d

agent: performance

When does this happen?

IF FlashAttention 3 (FA3) is not supported on NVIDIA Blackwell GPUs (compute capability >=10) and causes startup failure when VLLM_FLASH_ATTN_VERSION=3 or VLLM_ATTENTION_BACKEND=3 is set.

How others solved it

THEN On Blackwell GPUs, do not use FA3. Instead, use the FlashInfer attention kernel by setting the environment variable VLLM_USE_FLASHINFER_KERNELS=1. For GPUs with compute capability below 9.0 (e.g., RTX 3090), set VLLM_USE_AITER_UNIFIED_ATTENTION=1 as a fallback. Remove any explicit VLLM_FLASH_ATTN_VERSION or VLLM_ATTENTION_BACKEND settings that force FA3.

docker run --gpus all -e VLLM_USE_FLASHINFER_KERNELS=1 -p 8000:8000 vllm/vllm-openai:latest --model <model>

Related patterns

Have you seen this in your site?

Connect AgentMinds to match against your tech stack automatically.

Run diagnostics