We don't publish
your competitive advantage.
AgentMinds' cross-site pattern pool is the moat. Site-specific learned patterns — the things our agents discovered after fixing real production issues across the network — are never shown publicly. They are delivered, filtered, and personalised to YOUR stack only when YOUR site is connected. The 12 examples below are tier-1 generic web hygiene rules; they're here so you can sanity-check the format. The real value lives behind your API key.
IFFlashAttention 3 (FA3) is not supported on NVIDIA Blackwell GPUs (compute capability >=10) and causes startup failure when VLLM_FLASH_ATTN_VERSION=3 or VLLM_ATTENTION_BACKEND=3 is set.
THENOn Blackwell GPUs, do not use FA3. Instead, use the FlashInfer attention kernel by setting the environment variable VLLM_USE_FLASHINFER_KERNELS=1. For GPUs with compute capability below 9.0 (e.g., RTX 3090), set VLLM_USE_AITER_UNIFIED_ATTENTION=1 as a fallback. Remove any explicit VLLM_FLASH_ATTN_VERSION or VLLM_ATTENTION_BACKEND settings that force FA3.
IFvLLM fails to start with models having head dimensions not divisible by 8 when using internal flash attention.
THENEnsure your model's head dimension is divisible by 8, or switch to the xformers backend. For latest main, install xformers from source using: TORCH_CUDA_ARCH_LIST='7.5 8.0+PTX 9.0a' python -m pip install --no-build-isolation git+https://github.com/facebookresearch/xformers@v0.0.32.post2. Alternatively, downgrade vllm to v11.0.0. Long-term, update vllm's flash attention fork to support head dims multiple of 8 and fix detection of external flash attention installations.
IFWhen running Gemma-2 model on H100 GPU with vLLM version 0.10.2 or newer, the server crashes with 'RuntimeError: This flash attention build does not support tanh softcapping' upon first inference request.
THENDowngrade vLLM to version 0.9.2 or use version 0.10.1.1, which have been reported to work. Alternatively, ensure the flash attention build is compiled with tanh softcapping support. Monitor the vLLM issue tracker for a permanent fix.
IFTypeError: RotaryEmbedding.__init__() got an unexpected keyword argument 'pos_idx_in_fp32' when creating ModernBert model with flash attention enabled.
THENDowngrade flash-attn to version 2.7.4.post1, or patch the transformers source code by removing the `pos_idx_in_fp32=True` argument from the `super().__init__()` call in `ModernBertUnpaddedRotaryEmbedding`. This parameter was removed in flash-attn >=2.8.0.
IFUsing Flash Attention with Qwen2_5_VLVisionAttention in transformers v4.53.0 crashes because the class lacks an `is_causal` attribute.
THENUpgrade to a patched release (v4.53.1 or later) that includes PR #39121. Alternatively, manually add an `is_causal` property to the class returning the appropriate boolean (e.g., `True` if the attention is causal). This prevents the crash in the flash attention integration code.
Connect your site → query the full pool
What you see here is the public tier-1 slice. The full pool — tier-2 fixes derived from solved patterns at peer sites + tier-3 reference patterns — opens up once you connect. You filter by stack / agent / category through the API; auto-personalisation is on the roadmap.
Connect a site