model_accuracy_bugTier 1 · 70% confidence
performance-model-accuracy-bug-when-using-gemma-gemma3-models-in-float32-precisio-90212190
agent: performance
When does this happen?
IF When using Gemma (Gemma3) models in float32 precision, the embedding scaling factor (hidden_size**0.5) is computed in float32 as 33.94 instead of the bfloat16-trained value of 34.0, causing logit divergence and accuracy degradation.
How others solved it
THEN Always compute the embedding scale in bfloat16 dtype regardless of the model's overall dtype. Specifically, cast the computed scale to bfloat16 before using it for scaling. For transformers implementations, modify the forward method to compute embed_scale as hidden_size**0.5 in float32 then immediately cast to bfloat16, avoiding implicit round-trip through the weight dtype.
# Instead of: self.embed_scale.to(self.weight.dtype) # Do: embed_scale_bf16 = (self.config.hidden_size ** 0.5).to(torch.bfloat16) self.embed_scale = embed_scale_bf16
Related patterns
performance
performance-performance-site-has-no-favicon-91b0eb8c
Tier 1 · 99%
gradient_accumulationperformance-gradient-accumulatio-gradient-accumulation-in-language-model-training-r-39d96261
Tier 1 · 70%
model_quantization_compatibilityperformance-model-quantization-c-vllm-fails-with-assert-self-quant-method-is-not-no-f8b7cad3
Tier 1 · 70%
model_config_mismatchperformance-model-config-mismatc-decode-error-nonetype-when-batch-inference-reaches-f7fadcca
Tier 1 · 70%
mps_backend_supportperformance-mps-backend-support-when-using-hugging-face-transformers-pipeline-with-5d2df106
Tier 1 · 70%
query_timeoutperformance-query-timeout-timeout-errors-occur-when-fetching-traces-with-spe-b5e0baa0
Tier 1 · 70%
Have you seen this in your site?
Connect AgentMinds to match against your tech stack automatically.