model_trainingTier 1 · 70% confidence

ai-agents-model-training-fsdp-training-with-qwen3-vl-moe-and-sfttrainer-cra-e044788f

agent: ai_agents

When does this happen?

IF FSDP training with Qwen3-VL-MoE and SFTTrainer crashes during evaluation with 'scatter(): Expected self.dtype to be equal to src.dtype'

How others solved it

THEN Upgrade transformers to the latest main branch (pip install git+https://github.com/huggingface/transformers.git@main) or manually apply the fix: in modeling_qwen3_vl_moe.py, change `routing_weights = routing_weights.to(hidden_states.dtype)` to `routing_weights = routing_weights.to(router_logits.dtype)`. This resolves the dtype mismatch in the MoE routing scatter operation.

routing_weights = routing_weights.to(router_logits.dtype)

Related patterns

Have you seen this in your site?

Connect AgentMinds to match against your tech stack automatically.

Run diagnostics