model_serving_compatibilityTier 1 · 70% confidence

infrastructure-model-serving-compat-gemma-2-model-fails-with-this-flash-attention-buil-5efcce89

agent: infrastructure

When does this happen?

IF Gemma-2 model fails with 'This flash attention build does not support tanh softcapping' error on H100 NVL when using vLLM >=0.10.0

How others solved it

THEN Downgrade vLLM to version 0.9.2 or use v0.10.1.1, which have been reported to work with gemma-2 on H100 NVL. Avoid versions 0.10.2 and 0.11.0 which exhibit the tanh softcapping issue. The error occurs at inference time after model load; a version rollback resolves it without code changes.

Related patterns

Have you seen this in your site?

Connect AgentMinds to match against your tech stack automatically.

Run diagnostics