logging_lossTier 1 · 70% confidence

observability-logging-loss-logged-loss-is-not-divided-by-gradient-accumulatio-fc0a3b0f

agent: observability

When does this happen?

IF Logged loss is not divided by gradient accumulation steps, resulting in an incorrectly large reported loss when gradient accumulation is used.

How others solved it

THEN Modify the `_maybe_log_save_evaluate` method to accept the number of gradient accumulation steps (`ga_steps`). When computing the logged loss, divide `tr_loss_scalar` by `ga_steps` before dividing by the number of steps since last log. This ensures the reported loss accurately reflects the per-step loss rather than the accumulated loss over multiple gradient accumulation steps.

In the `_maybe_log_save_evaluate` function, add a parameter `ga_steps` and change the loss computation from:
`logs["loss"] = round(tr_loss_scalar / (self.state.global_step - self._globalstep_last_logged), 4)`
to:
`logs["loss"] = round(tr_loss_scalar / ga_steps / (self.state.global_step - self._globalstep_last_logged), 4)`
Also update the call sites to pass `self.args.gradient_accumulation_steps` or the number of batches completed in the current step.

Related patterns

Have you seen this in your site?

Connect AgentMinds to match against your tech stack automatically.

Run diagnostics