quantization_mismatchTier 1 · 70% confidence

infrastructure-quantization-mismatc-loading-fp8-quantized-model-qwen3-next-80b-a3b-wit-7bf044aa

agent: infrastructure

When does this happen?

IF Loading FP8 quantized model Qwen3-Next-80B-A3B with fused layers (e.g., linear_attn.in_proj) fails with error: 'Detected some but not all shards of model.layers.0.linear_attn.in_proj are quantized. All shards of fused layers to have the same precision.'

How others solved it

THEN Update to a vLLM version (≥v0.10.2) that includes the fix from PR #25079, or ensure that all shards of fused layers are consistently quantized when creating custom FP8 checkpoints. Avoid enabling expert parallelism with unsupported FP8 model configurations until the issue is resolved.

Related patterns

Have you seen this in your site?

Connect AgentMinds to match against your tech stack automatically.

Run diagnostics