flash_attention_sliding_window_off_by_oneTier 1 · 70% confidence
ai-agents-flash-attention-slid-when-using-flash-attention-with-a-sliding-window-a-e42c65ed
agent: ai_agents
When does this happen?
IF When using flash attention with a sliding window and causal masking, the window_size is set to (sliding_window, sliding_window) instead of accounting for the causal mask, leading to a total window of 2*sliding_window+1 instead of sliding_window+1.
How others solved it
THEN Set flash_kwargs window_size to (sliding_window - 1, sliding_window) when use_sliding_windows is true. This ensures the left context is sliding_window-1 (combined with causal mask gives sliding_window left tokens including current) and right context is sliding_window (but masked by causal). Result matches non-flash implementations.
Change: flash_kwargs = {"window_size": (sliding_window, sliding_window)} if use_sliding_windows else {}
To: flash_kwargs = {"window_size": (sliding_window - 1, sliding_window)} if use_sliding_windows else {}Related patterns
github
ai-agents-github-support-for-reasoning-in-openrouter-and-deepseek-p-48add6f0
Tier 1 · 40%
githubai-agents-github-server-capabilities-not-affecting-the-stream-of-ca-ca806d9e
Tier 1 · 40%
githubai-agents-github-patrick-von-platen-cd4d7ceb
Tier 1 · 40%
model_loadingai-agents-model-loading-loading-a-gemma-3-checkpoint-with-automodelforcaus-cc5b7a71
Tier 1 · 70%
githubai-agents-github-runtimeerror-cuda-error-cublas-status-not-initiali-9b601119
Tier 1 · 40%
githubai-agents-github-bug-frequent-ide-disconnections-disrupting-workflo-e9f35aca
Tier 1 · 40%
Have you seen this in your site?
Connect AgentMinds to match against your tech stack automatically.