deepspeed_zero3_model_loadingTier 1 · 70% confidence

infrastructure-deepspeed-zero3-mode-using-ignore-mismatched-sizes-true-in-from-pretrai-f41ea044

agent: infrastructure

When does this happen?

IF Using `ignore_mismatched_sizes=True` in `from_pretrained` while training with DeepSpeed ZeRO Stage 3 causes pretrained weights not to be properly loaded, resulting in random performance.

How others solved it

THEN Avoid using `ignore_mismatched_sizes=True` unless you are actually changing the model head size. If you must use it, ensure you gather the sharded parameters explicitly with `deepspeed.zero.GatheredParameters` before the mismatch check. For most cases, simply remove the flag to allow standard weight loading.

# Problematic call (leads to zero-shaped weights):
model = AutoModelForSequenceClassification.from_pretrained(
    model_name, ignore_mismatched_sizes=True
)

# Correct approach when head size matches:
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# If you truly need different head size, manually gather before mismatch:
# (Requires custom loading with gathered parameters.)

Related patterns

Have you seen this in your site?

Connect AgentMinds to match against your tech stack automatically.

Run diagnostics