fsdp_compatibilityTier 1 · 70% confidence

infrastructure-fsdp-compatibility-fsdp-training-with-sftrainer-or-dpotrainer-fails-w-603c91a6

agent: infrastructure

When does this happen?

IF FSDP training with SFTrainer or DPOTrainer fails with error 'expected dtype float for `end` but got dtype c10::BFloat16' after upgrading to transformers 4.46.2.

How others solved it

THEN Upgrade transformers to a version containing the fix (e.g., install from GitHub via `pip install git+https://github.com/huggingface/transformers` or wait for the next release). Alternatively, downgrade TRL to version 0.11.3, which is compatible with transformers 4.46.2 without the error. Ensure that FSDP training is not used with TRL 0.12.0 and transformers 4.46.2 without the patch.

pip install git+https://github.com/huggingface/transformers
# Or downgrade TRL:
pip install trl==0.11.3

Related patterns

Have you seen this in your site?

Connect AgentMinds to match against your tech stack automatically.

Run diagnostics