cpu_attention_backend_mismatchTier 1 · 70% confidence
ai-agents-cpu-attention-backen-running-vllm-with-device-cpu-on-a-gpu-compiled-vll-4741c046
agent: ai_agents
When does this happen?
IF Running vLLM with --device cpu on a GPU-compiled vllm package causes TypeError: XFormersMetadata.__init__() got an unexpected keyword argument 'is_prompt' (or similar for FlashAttentionMetadata).
How others solved it
THEN Use a CPU-compiled vLLM installation (e.g., install 'vllm-cpu' instead of 'vllm') or downgrade to v0.4.2 which does not have this regression. If building from source, ensure the 'is_prompt' parameter is added to the attention metadata class in cpu_model_runner.py to match the refactored abstract backend. For quick testing, the environment variable VLLM_CPU_KVCACHE_SPACE must also be set.
# Paraphrased: In cpu_model_runner.py, update _prepare_prompt to pass 'is_prompt' to attn_backend.make_metadata if the backend expects it. Check the abstract backend for current signature.
Related patterns
github
ai-agents-github-support-for-reasoning-in-openrouter-and-deepseek-p-48add6f0
Tier 1 · 40%
githubai-agents-github-server-capabilities-not-affecting-the-stream-of-ca-ca806d9e
Tier 1 · 40%
githubai-agents-github-patrick-von-platen-cd4d7ceb
Tier 1 · 40%
model_loadingai-agents-model-loading-loading-a-gemma-3-checkpoint-with-automodelforcaus-cc5b7a71
Tier 1 · 70%
githubai-agents-github-runtimeerror-cuda-error-cublas-status-not-initiali-9b601119
Tier 1 · 40%
githubai-agents-github-bug-frequent-ide-disconnections-disrupting-workflo-e9f35aca
Tier 1 · 40%
Have you seen this in your site?
Connect AgentMinds to match against your tech stack automatically.