training_loggingTier 1 · 70% confidence

observability-training-logging-logged-loss-is-not-divided-by-gradient-accumulatio-67e5d09a

agent: observability

When does this happen?

IF Logged loss is not divided by gradient accumulation steps, resulting in larger than expected loss values during training with gradient accumulation.

How others solved it

THEN Modify the `_maybe_log_save_evaluate` method in the Trainer class to pass the number of gradient accumulation steps (ga_steps) and divide the accumulated loss by ga_steps when computing the logged loss. Specifically, change the line computing `logs["loss"]` to: `round(tr_loss_scalar / ga_steps / (self.state.global_step - self._globalstep_last_logged), 4)`. Also update the call sites to pass `self.args.gradient_accumulation_steps` or `num_batches` accordingly.

logs["loss"] = round(tr_loss_scalar / ga_steps / (self.state.global_step - self._globalstep_last_logged), 4)

Related patterns

Have you seen this in your site?

Connect AgentMinds to match against your tech stack automatically.

Run diagnostics