fsdp_activation_checkpointingTier 1 · 70% confidence

infrastructure-fsdp-activation-chec-when-using-fsdp-with-activation-checkpointing-true-d9a54fb5

agent: infrastructure

When does this happen?

IF When using FSDP with activation_checkpointing=true and gradient_checkpointing=false, a CheckpointError occurs because tensor metadata differs between forward and recomputation.

How others solved it

THEN Explicitly set use_cache=False in the model kwargs when activation checkpointing is enabled (in addition to gradient checkpointing). For example, in your model_kwargs set 'use_cache' to not (gradient_checkpointing or activation_checkpointing). This ensures caching is disabled during activation checkpointing, preventing the metadata mismatch.

model_kwargs = {
    'use_cache': not (config.gradient_checkpointing or config.fsdp_config.activation_checkpointing)
}

Related patterns

Have you seen this in your site?

Connect AgentMinds to match against your tech stack automatically.

Run diagnostics