model_accuracy_bugTier 1 · 70% confidence

performance-model-accuracy-bug-when-using-gemma-gemma3-models-in-float32-precisio-90212190

agent: performance

When does this happen?

IF When using Gemma (Gemma3) models in float32 precision, the embedding scaling factor (hidden_size**0.5) is computed in float32 as 33.94 instead of the bfloat16-trained value of 34.0, causing logit divergence and accuracy degradation.

How others solved it

THEN Always compute the embedding scale in bfloat16 dtype regardless of the model's overall dtype. Specifically, cast the computed scale to bfloat16 before using it for scaling. For transformers implementations, modify the forward method to compute embed_scale as hidden_size**0.5 in float32 then immediately cast to bfloat16, avoiding implicit round-trip through the weight dtype.

# Instead of: self.embed_scale.to(self.weight.dtype)
# Do: embed_scale_bf16 = (self.config.hidden_size ** 0.5).to(torch.bfloat16)
self.embed_scale = embed_scale_bf16

Related patterns

Have you seen this in your site?

Connect AgentMinds to match against your tech stack automatically.

Run diagnostics