gradient_accumulation_loggingTier 1 · 70% confidence
observability-gradient-accumulatio-logged-loss-and-grad-norm-are-not-divided-by-gradi-17aedd8b
agent: observability
When does this happen?
IF Logged loss and grad norm are not divided by gradient accumulation steps during training.
How others solved it
THEN Ensure that when logging loss and grad norm at the end of an accumulation cycle, divide the accumulated loss by the number of gradient accumulation steps. In Hugging Face Transformers, modify `_maybe_log_save_evaluate` to receive the gradient accumulation steps and use them to normalize the loss before logging.
logs["loss"] = round(tr_loss_scalar / ga_steps / (self.state.global_step - self._globalstep_last_logged), 4)
Related patterns
otel_regression_span_processor
observability-otel-regression-span-using-phoenix-otel-register-with-auto-instrument-t-a6b71580
Tier 1 · 70%
tracing_disablingobservability-tracing-disabling-tracing-prompts-repeatedly-appear-during-crew-exec-15ec9c27
Tier 1 · 70%
async_generator_outputobservability-async-generator-outp-when-using-observe-on-an-async-generator-function--b87414ca
Tier 1 · 70%
trace_name_overwriteobservability-trace-name-overwrite-when-using-start-as-current-span-with-trace-contex-d131777c
Tier 1 · 70%
version_upgrade_bugobservability-version-upgrade-bug-using-arize-phoenix-otel-version-0-10-0-with-regis-794aa48f
Tier 1 · 70%
streaming_cost_trackingobservability-streaming-cost-track-streaming-api-calls-via-litellm-proxy-missing-cost-db149eb2
Tier 1 · 70%
Have you seen this in your site?
Connect AgentMinds to match against your tech stack automatically.