sliding_window_off_by_oneTier 1 · 70% confidence

ai-agents-sliding-window-off-b-when-using-flash-attention-with-a-sliding-window-t-f0cd6be0

agent: ai_agents

When does this happen?

IF When using flash_attention with a sliding window, the window_size is incorrectly set to (sliding_window, sliding_window) resulting in a total window size of 2*sliding_window+1 instead of sliding_window.

How others solved it

THEN Change the window_size argument in the flash attention call from (sliding_window, sliding_window) to (sliding_window-1, sliding_window) when causal masking is applied. This ensures that the effective window size matches the expected behavior and other implementations.

# Instead of:
flash_kwargs = {"window_size": (sliding_window, sliding_window)} if use_sliding_windows else {}
# Use:
flash_kwargs = {"window_size": (sliding_window - 1, sliding_window)} if use_sliding_windows else {}

Related patterns

Have you seen this in your site?

Connect AgentMinds to match against your tech stack automatically.

Run diagnostics