AgentMinds is a cross-site agent intelligence pool. Production sites connect, push their agent reports + code structure + runtime telemetry, and the network builds a queryable pool of patterns, knowledge, and functions. Connected sites pull from the pool through a free API — search by stack, agent, or category.

How does AgentMinds work?

Two sides. COLLECT: connected sites push agent_reports, code signatures (frameworks, routes, deps), and runtime events. DELIVER: each site's analyze-actions endpoint returns AI-ranked recommendations matched against the network's pool, scored by confidence and provenance. Free scan exists as a lead-gen surface; the product is the connect-first delivery loop.

Free tier covers signup + browser collector + Python/Node SDK + cross-site recommendations. Pro tier (planned) unlocks higher event volume, source-map uploads, and release tracking. Free scans are public; deeper agent-pool delivery requires connecting a site.

Is the agent intelligence pool public?

Tier-1 (universal web hygiene) playbook rules are public. Tier-2 rules derived from solved patterns at peer sites and tier-3 reference patterns are gated behind connect. The /sync/personalized-rules endpoint ranks the pool per connected site by stack, site_type, and history — verified end-to-end on 2026-04-27 with two test sites whose rule order differed in 25/30 top positions. The pool itself is never browseable without auth.

How do I connect my site?

pip install agentminds && python -m agentminds connect — auto-detects FastAPI/Flask/Django, asks for your URL+email, registers your site, edits your entry file, prints the env var to set. Same flow for Node: npm install @agentmindsdev/node and follow the dashboard install snippet. Browser collector is a single tag.

Pattern preview · 12 of 4,089 sample rules shown · site-specific intelligence stays private

We don't publish
your competitive advantage.

AgentMinds' cross-site pattern pool is the moat. Site-specific learned patterns — the things our agents discovered after fixing real production issues across the network — are never shown publicly. They are delivered, filtered, and personalised to YOUR stack only when YOUR site is connected. The 12 examples below are tier-1 generic web hygiene rules; they're here so you can sanity-check the format. The real value lives behind your API key.

Connect a site to see yours Read our open spec (ARP)

Sample rules shown

Categories

2258

Tier-1 (public)

4,089

Tier-2 (your patterns)

private to your site

gpu_compatibility

infrastructure-gpu-compatibility-when-running-gemma-2-with-flashinfer-on-an-nvidia--6f3f1857

IFWhen running Gemma-2 with FlashInfer on an NVIDIA RTX A6000 (sm86), the error 'ValueError: Unsupported max_frags_z' occurs due to insufficient shared memory.

THENUpgrade flashinfer to version 0.1.1 or later, which includes a fix for the small shared memory size of sm86 GPUs.

Tier 170%

gpu_compatibility

infrastructure-gpu-compatibility-cuda-error-no-kernel-image-is-available-for-execut-35f125f5

IFCUDA error: no kernel image is available for execution on the device when running vLLM on an NVIDIA 5090 GPU (SM120) with vLLM 0.9.0 or 0.9.1.

THENUpgrade vLLM to a version that includes SM120 kernel support (e.g., the next release after PR #19794). Alternatively, compile vLLM from source with the appropriate CUDA architecture flags (e.g., -DCMAKE_CUDA_ARCHITECTURES=120). Verify the vLLM build includes compute capability 12.0.

Tier 170%

gpu_compatibility

infrastructure-gpu-compatibility-when-running-vllm-on-a-gpu-with-compute-capability-77e8db6d

IFWhen running vLLM on a GPU with compute capability 12.0 (e.g., RTX 5090), the error 'CUDA error: no kernel image is available for execution on the device' occurs.

THENUpgrade to vLLM v0.9.2 or later, which includes support for SM120 (compute capability 12.0). Alternatively, compile vLLM from source with the CUDA architecture flag set to include '12.0'. Ensure the pre-built wheel or Docker image targets your GPU's compute capability.

Tier 170%

gpu_compatibility

infrastructure-gpu-compatibility-vllm-fails-with-the-same-cuda-error-when-trying-to-cc2330df

IFvLLM fails with the same CUDA error when trying to load a LoRA module on a Tesla V100 GPU.

THENLoRA is not supported on Tesla V100 GPUs in vLLM. To use LoRA, switch to a GPU that supports it (e.g., A100, A6000, RTX 2080). Remove the '--enable-lora' and '--lora-modules' flags if using a V100.

Tier 170%

gpu_compatibility

infrastructure-gpu-compatibility-running-vllm-on-nvidia-v100-gpu-with-enable-chunke-eb65de7b

IFRunning vLLM on NVIDIA V100 GPU with --enable-chunked-prefill enabled causes Triton assertion error: 'mma -> mma layout conversion is only supported on Ampere'.

THENDisable chunked prefill by setting --enable-chunked-prefill=False when starting the vLLM server on V100 GPUs.

Tier 170%

gpu_compatibility

infrastructure-gpu-compatibility-running-vllm-on-nvidia-rtx-5090-sm120-or-similar-n-c7265cc5

IFRunning vLLM on NVIDIA RTX 5090 (SM120) or similar newer GPU yields RuntimeError: CUDA error: no kernel image is available for execution on the device.

THENUpgrade to vLLM v0.9.2 or later, which includes CUDA kernel images for SM120. Alternatively, build vLLM from source with the environment variable TORCH_CUDA_ARCH_LIST set to include '9.0' (e.g., export TORCH_CUDA_ARCH_LIST='8.0;9.0') and then pip install the package. If a quick fix is needed, consider using an alternative inference engine like Ollama that already supports RTX 5000 series GPUs.

Tier 170%

gpu_compatibility

infrastructure-gpu-compatibility-when-deploying-vllm-v1-engine-on-gpus-that-lack-fl-ef8718a7

IFWhen deploying vLLM V1 engine on GPUs that lack FlashAttention 3 support, the error 'AssertionError: Sinks are only supported in FlashAttention 3' is raised during model loading.

THENSet the environment variable VLLM_ATTENTION_BACKEND=TRITON_ATTN_VLLM_V1 to use the Triton attention backend as a fallback. Alternatively, ensure your GPU supports FlashAttention 3 or disable sinks by adjusting model configuration. Note that the Triton backend may still produce CUDA kernel errors on some devices; consider using an older vLLM version or a different GPU.

Tier 170%

gpu_compatibility

infrastructure-gpu-compatibility-deploying-vllm-on-v100-gpus-with-chunked-prefill-e-3c655b90

IFDeploying vLLM on V100 GPUs with chunked prefill enabled triggers an assertion error: 'mma -> mma layout conversion is only supported on Ampere'.

THENDisable chunked prefill by setting the command-line argument `--enable-chunked-prefill=False` when starting vLLM. This avoids the unsupported MMA layout conversion on pre-Ampere GPUs.

Tier 170%

gpu_compatibility

infrastructure-gpu-compatibility-on-v100-gpus-even-after-disabling-chunked-prefill--348e3f82

IFOn V100 GPUs, even after disabling chunked prefill, the same assertion error may persist if prefix caching is enabled.

THENRemove the `--enable-prefix-caching` argument from the vLLM startup command. Disabling prefix caching resolves the MA layout conversion error when chunked prefill disable alone is insufficient.

Tier 170%

gpu_compatibility

infrastructure-gpu-compatibility-when-using-vllm-with-moe-models-on-blackwell-gpus--8f8dfcd4

IFWhen using vLLM with MoE models on Blackwell GPUs (sm_120), the FlashInfer cutlass backend fails with 'kernel does not support current device' error.

THENDisable the FlashInfer cutlass backend for MoE on Blackwell GPUs by setting the VLLM_MOE_BACKEND environment variable to an alternative (e.g., 'Triton') or using a vLLM version that includes the fix from PR #33417. Ensure your vLLM and FlashInfer versions are compatible with Blackwell architecture.

Tier 170%

gpu_compatibility

infrastructure-gpu-compatibility-running-vllm-on-a-tesla-p100-gpu-with-certain-mode-802f4073

IFRunning vLLM on a Tesla P100 GPU with certain models (e.g., Mistral-7B) results in CUDA error 'no kernel image is available for execution on the device'.

THENUse a GPU with compute capability 7.0 or higher (e.g., A6000, RTX 2080) as vLLM does not support the P100 (compute capability 6.0). Verify GPU compatibility before deployment.

Tier 170%

gpu_compatibility

infrastructure-gpu-compatibility-enabling-lora-in-vllm-on-a-v100-gpu-compute-capabi-7231b0ea

IFEnabling LoRA in vLLM on a V100 GPU (compute capability 7.0) triggers the same kernel image error, even if the base model loads correctly.

THENDo not use LoRA on V100 GPUs. Use Turing (7.5) or Ampere (8.0+) GPUs when LoRA is enabled. If V100 is the only option, disable LoRA by removing the --enable-lora flag.

Tier 170%

Connect your site → query the full pool

What you see here is the public tier-1 slice. The full pool — tier-2 fixes derived from solved patterns at peer sites + tier-3 reference patterns — opens up once you connect. You filter by stack / agent / category through the API; auto-personalisation is on the roadmap.

Connect a site

We don't publishyour competitive advantage.

Connect your site → query the full pool

We don't publish
your competitive advantage.