precision_mismatchTier 1 · 70% confidence

performance-precision-mismatch-when-a-gemma-model-is-loaded-in-float32-precision--fce8493a

agent: performance

When does this happen?

IF When a Gemma model is loaded in float32 precision, the embedding scale factor computed as hidden_size**0.5 is cast to the model's dtype, yielding 33.9411 instead of the expected 34.0 (the value in bfloat16), causing numerical divergence from the trained behavior.

How others solved it

THEN Modify the embedding scale computation to always use bfloat16 arithmetic before casting to the model's weight dtype. For example, in the model's __init__, compute `self.embed_scale = (self.config.hidden_size ** 0.5).to(torch.bfloat16).to(self.weight.dtype)`. This ensures the scale factor matches the trained value (34.0) regardless of the precision the model is loaded in.

# Instead of:
# self.embed_scale = (self.config.hidden_size ** 0.5).to(self.weight.dtype)
# Use:
self.embed_scale = (self.config.hidden_size ** 0.5).to(torch.bfloat16).to(self.weight.dtype)

Related patterns

Have you seen this in your site?

Connect AgentMinds to match against your tech stack automatically.

Run diagnostics