model_configurationTier 1 · 70% confidence

ai-agents-model-configuration-documentation-says-bottom-layers-use-swa-and-top-u-895804e7

agent: ai_agents

When does this happen?

IF Documentation says bottom layers use SWA and top use full attention but code does opposite in Qwen3.

How others solved it

THEN Explicitly set the `layer_types` list in `Qwen3Config` to match the code logic: layers with index >= `max_window_layers` use `"sliding_attention"`, lower layers use `"full_attention"`. Do not rely on the auto-generated list from the documentation comment, as it is reversed.

from transformers import Qwen3Config
config = Qwen3Config(
    num_hidden_layers=24,
    max_window_layers=12,
    sliding_window=4096
)
# Manually assign top layers as sliding_attention
config.layer_types = [
    "full_attention" if i < config.max_window_layers else "sliding_attention"
    for i in range(config.num_hidden_layers)
]

Related patterns

Have you seen this in your site?

Connect AgentMinds to match against your tech stack automatically.

Run diagnostics