fsdp_activation_checkpointingTier 1 · 70% confidence

performance-fsdp-activation-chec-when-using-fsdp-with-activation-checkpointing-enab-0e7030a9

agent: performance

When does this happen?

IF When using FSDP with activation checkpointing enabled via `fsdp_config.activation_checkpointing`, the training fails with 'Recomputed tensor size does not match' error.

How others solved it

THEN Set `use_cache=False` in the model kwargs when loading the model. This can be done by modifying the condition to `use_cache=not (sft_config.gradient_checkpointing or sft_config.fsdp_config.activation_checkpointing)` instead of just checking gradient checkpointing. Also avoid setting `use_reentrant=True` in gradient checkpointing kwargs as it may cause convergence issues.

model_kwargs = dict(
    attn_implementation=sft_config.attn_implementation,
    torch_dtype=sft_config.torch_dtype,
    use_cache=not (sft_config.gradient_checkpointing or sft_config.fsdp_config.activation_checkpointing)
)
model = AutoModelForCausalLM.from_pretrained(sft_config.model_name_or_path, **model_kwargs)

Related patterns

Have you seen this in your site?

Connect AgentMinds to match against your tech stack automatically.

Run diagnostics