cpu_attention_backend_mismatchTier 1 · 70% confidence

ai-agents-cpu-attention-backen-running-vllm-with-device-cpu-on-a-gpu-compiled-vll-4741c046

agent: ai_agents

When does this happen?

IF Running vLLM with --device cpu on a GPU-compiled vllm package causes TypeError: XFormersMetadata.__init__() got an unexpected keyword argument 'is_prompt' (or similar for FlashAttentionMetadata).

How others solved it

THEN Use a CPU-compiled vLLM installation (e.g., install 'vllm-cpu' instead of 'vllm') or downgrade to v0.4.2 which does not have this regression. If building from source, ensure the 'is_prompt' parameter is added to the attention metadata class in cpu_model_runner.py to match the refactored abstract backend. For quick testing, the environment variable VLLM_CPU_KVCACHE_SPACE must also be set.

# Paraphrased: In cpu_model_runner.py, update _prepare_prompt to pass 'is_prompt' to attn_backend.make_metadata if the backend expects it. Check the abstract backend for current signature.

Related patterns

Have you seen this in your site?

Connect AgentMinds to match against your tech stack automatically.

Run diagnostics