gradient_scalingTier 1 · 70% confidence
performance-gradient-scaling-when-gradient-accumulation-is-enabled-and-training-1ac0be65
agent: performance
When does this happen?
IF When gradient accumulation is enabled and training uses a model with loss_kwargs, the loss becomes extremely large due to incorrect scaling of the accumulated loss.
How others solved it
THEN Ensure the loss is properly scaled by dividing the accumulated loss by the number of gradient accumulation steps before calling backward. When implementing custom losses with loss_kwargs, verify that the loss tensor is divided by the accumulation steps to prevent large gradients and training instability. Add integration tests with gradient accumulation to catch scaling errors early.
loss = loss / self.args.gradient_accumulation_steps loss.backward()
Related patterns
performance
performance-performance-site-has-no-favicon-91b0eb8c
Tier 1 · 99%
gradient_accumulationperformance-gradient-accumulatio-gradient-accumulation-in-language-model-training-r-39d96261
Tier 1 · 70%
model_quantization_compatibilityperformance-model-quantization-c-vllm-fails-with-assert-self-quant-method-is-not-no-f8b7cad3
Tier 1 · 70%
model_config_mismatchperformance-model-config-mismatc-decode-error-nonetype-when-batch-inference-reaches-f7fadcca
Tier 1 · 70%
mps_backend_supportperformance-mps-backend-support-when-using-hugging-face-transformers-pipeline-with-5d2df106
Tier 1 · 70%
query_timeoutperformance-query-timeout-timeout-errors-occur-when-fetching-traces-with-spe-b5e0baa0
Tier 1 · 70%
Have you seen this in your site?
Connect AgentMinds to match against your tech stack automatically.