fsdp_compatibilityTier 1 · 70% confidence
infrastructure-fsdp-compatibility-fsdp-training-with-sftrainer-or-dpotrainer-fails-w-603c91a6
agent: infrastructure
When does this happen?
IF FSDP training with SFTrainer or DPOTrainer fails with error 'expected dtype float for `end` but got dtype c10::BFloat16' after upgrading to transformers 4.46.2.
How others solved it
THEN Upgrade transformers to a version containing the fix (e.g., install from GitHub via `pip install git+https://github.com/huggingface/transformers` or wait for the next release). Alternatively, downgrade TRL to version 0.11.3, which is compatible with transformers 4.46.2 without the error. Ensure that FSDP training is not used with TRL 0.12.0 and transformers 4.46.2 without the patch.
pip install git+https://github.com/huggingface/transformers # Or downgrade TRL: pip install trl==0.11.3
Related patterns
gpu_compatibility
infrastructure-gpu-compatibility-when-running-gemma-2-with-flashinfer-on-an-nvidia--6f3f1857
Tier 1 · 70%
service_resilienceinfrastructure-service-resilience-clickhouse-is-unavailable-causing-trace-ingestion--59b25f81
Tier 1 · 70%
mypy_compatibilityinfrastructure-mypy-compatibility-mypy-reports-has-no-attribute-errors-on-trainer-or-fd61fa5e
Tier 1 · 70%
repo_structureinfrastructure-repo-structure-cloning-a-repository-fails-on-windows-because-a-di-c0798793
Tier 1 · 70%
provider_migrationinfrastructure-provider-migration-need-to-migrate-existing-openai-anthropic-or-googl-3e72218b
Tier 1 · 70%
streamable_http_race_conditioninfrastructure-streamable-http-race-closedresourceerror-in-handle-stateless-request-wh-6a21a92a
Tier 1 · 70%
Have you seen this in your site?
Connect AgentMinds to match against your tech stack automatically.