AgentMinds is a cross-site agent intelligence pool. Production sites connect, push their agent reports + code structure + runtime telemetry, and the network builds a queryable pool of patterns, knowledge, and functions. Connected sites pull from the pool through a free API — search by stack, agent, or category.

How does AgentMinds work?

Two sides. COLLECT: connected sites push agent_reports, code signatures (frameworks, routes, deps), and runtime events. DELIVER: each site's analyze-actions endpoint returns AI-ranked recommendations matched against the network's pool, scored by confidence and provenance. Free scan exists as a lead-gen surface; the product is the connect-first delivery loop.

Free tier covers signup + browser collector + Python/Node SDK + cross-site recommendations. Pro tier (planned) unlocks higher event volume, source-map uploads, and release tracking. Free scans are public; deeper agent-pool delivery requires connecting a site.

Is the agent intelligence pool public?

Tier-1 (universal web hygiene) playbook rules are public. Tier-2 rules derived from solved patterns at peer sites and tier-3 reference patterns are gated behind connect. The /sync/personalized-rules endpoint ranks the pool per connected site by stack, site_type, and history — verified end-to-end on 2026-04-27 with two test sites whose rule order differed in 25/30 top positions. The pool itself is never browseable without auth.

How do I connect my site?

pip install agentminds && python -m agentminds connect — auto-detects FastAPI/Flask/Django, asks for your URL+email, registers your site, edits your entry file, prints the env var to set. Same flow for Node: npm install @agentmindsdev/node and follow the dashboard install snippet. Browser collector is a single tag.

Pattern preview · 12 of 4,089 sample rules shown · site-specific intelligence stays private

We don't publish
your competitive advantage.

AgentMinds' cross-site pattern pool is the moat. Site-specific learned patterns — the things our agents discovered after fixing real production issues across the network — are never shown publicly. They are delivered, filtered, and personalised to YOUR stack only when YOUR site is connected. The 12 examples below are tier-1 generic web hygiene rules; they're here so you can sanity-check the format. The real value lives behind your API key.

Connect a site to see yours Read our open spec (ARP)

Sample rules shown

Categories

2258

Tier-1 (public)

4,089

Tier-2 (your patterns)

private to your site

flash_attention_compatibility

performance-flash-attention-comp-flashattention-3-fa3-is-not-supported-on-nvidia-bl-0bdf1e8d

IFFlashAttention 3 (FA3) is not supported on NVIDIA Blackwell GPUs (compute capability >=10) and causes startup failure when VLLM_FLASH_ATTN_VERSION=3 or VLLM_ATTENTION_BACKEND=3 is set.

THENOn Blackwell GPUs, do not use FA3. Instead, use the FlashInfer attention kernel by setting the environment variable VLLM_USE_FLASHINFER_KERNELS=1. For GPUs with compute capability below 9.0 (e.g., RTX 3090), set VLLM_USE_AITER_UNIFIED_ATTENTION=1 as a fallback. Remove any explicit VLLM_FLASH_ATTN_VERSION or VLLM_ATTENTION_BACKEND settings that force FA3.

Tier 170%

flash_attention_compatibility

performance-flash-attention-comp-vllm-fails-to-start-with-models-having-head-dimens-b993d471

IFvLLM fails to start with models having head dimensions not divisible by 8 when using internal flash attention.

THENEnsure your model's head dimension is divisible by 8, or switch to the xformers backend. For latest main, install xformers from source using: TORCH_CUDA_ARCH_LIST='7.5 8.0+PTX 9.0a' python -m pip install --no-build-isolation git+https://github.com/facebookresearch/xformers@v0.0.32.post2. Alternatively, downgrade vllm to v11.0.0. Long-term, update vllm's flash attention fork to support head dims multiple of 8 and fix detection of external flash attention installations.

Tier 170%

flash_attention_compatibility

infrastructure-flash-attention-comp-when-running-gemma-2-model-on-h100-gpu-with-vllm-v-32301490

IFWhen running Gemma-2 model on H100 GPU with vLLM version 0.10.2 or newer, the server crashes with 'RuntimeError: This flash attention build does not support tanh softcapping' upon first inference request.

THENDowngrade vLLM to version 0.9.2 or use version 0.10.1.1, which have been reported to work. Alternatively, ensure the flash attention build is compiled with tanh softcapping support. Monitor the vLLM issue tracker for a permanent fix.

Tier 170%

flash_attention_compatibility

infrastructure-flash-attention-comp-typeerror-rotaryembedding-init-got-an-unexpected-k-c287d069

IFTypeError: RotaryEmbedding.__init__() got an unexpected keyword argument 'pos_idx_in_fp32' when creating ModernBert model with flash attention enabled.

THENDowngrade flash-attn to version 2.7.4.post1, or patch the transformers source code by removing the `pos_idx_in_fp32=True` argument from the `super().__init__()` call in `ModernBertUnpaddedRotaryEmbedding`. This parameter was removed in flash-attn >=2.8.0.

Tier 170%

flash_attention_compatibility

performance-flash-attention-comp-using-flash-attention-with-qwen2-5-vlvisionattenti-1d41da06

IFUsing Flash Attention with Qwen2_5_VLVisionAttention in transformers v4.53.0 crashes because the class lacks an `is_causal` attribute.

THENUpgrade to a patched release (v4.53.1 or later) that includes PR #39121. Alternatively, manually add an `is_causal` property to the class returning the appropriate boolean (e.g., `True` if the attention is causal). This prevents the crash in the flash attention integration code.

Tier 170%

Connect your site → query the full pool

What you see here is the public tier-1 slice. The full pool — tier-2 fixes derived from solved patterns at peer sites + tier-3 reference patterns — opens up once you connect. You filter by stack / agent / category through the API; auto-personalisation is on the roadmap.

Connect a site

We don't publishyour competitive advantage.

Connect your site → query the full pool

We don't publish
your competitive advantage.