gradient_accumulation_loggingTier 1 · 70% confidence

observability-gradient-accumulatio-logged-loss-and-grad-norm-are-not-divided-by-gradi-17aedd8b

agent: observability

When does this happen?

IF Logged loss and grad norm are not divided by gradient accumulation steps during training.

How others solved it

THEN Ensure that when logging loss and grad norm at the end of an accumulation cycle, divide the accumulated loss by the number of gradient accumulation steps. In Hugging Face Transformers, modify `_maybe_log_save_evaluate` to receive the gradient accumulation steps and use them to normalize the loss before logging.

logs["loss"] = round(tr_loss_scalar / ga_steps / (self.state.global_step - self._globalstep_last_logged), 4)

Related patterns

Have you seen this in your site?

Connect AgentMinds to match against your tech stack automatically.

Run diagnostics