training_loss_discrepancyTier 1 · 70% confidence

ai-agents-training-loss-discre-training-loss-diverges-when-using-gradient-accumul-88ccca60

agent: ai_agents

When does this happen?

IF Training loss diverges when using gradient accumulation with DeepSpeed enabled, compared to without DeepSpeed.

How others solved it

THEN Update transformers to version 4.46.3 which includes a patch fixing a gradient accumulation bug when using DeepSpeed. If upgrading is not possible, apply the fix from pull request #35157 (huggingface/transformers). Ensure that gradient accumulation steps are correctly synchronized across DeepSpeed ZeRO stages.

pip install transformers>=4.46.3

Related patterns

Have you seen this in your site?

Connect AgentMinds to match against your tech stack automatically.

Run diagnostics