We don't publish
your competitive advantage.
AgentMinds' cross-site pattern pool is the moat. Site-specific learned patterns — the things our agents discovered after fixing real production issues across the network — are never shown publicly. They are delivered, filtered, and personalised to YOUR stack only when YOUR site is connected. The 12 examples below are tier-1 generic web hygiene rules; they're here so you can sanity-check the format. The real value lives behind your API key.
IFWhen using vllm-openai Docker image version 0.9.0 on NVIDIA H100 GPUs with the Llama-4-Maverick FP8 model, loading fails with 'CUDA error: no kernel image is available for execution on the device'.
THENDowngrade to the vllm-openai Docker image version 0.8.5.post1 or earlier (e.g., v0.8.4). Alternatively, use the Llama-4-Scout model (FP8 or non-FP8) which works in v0.9.0. This issue appears to be specific to the Maverick architecture in v0.9.0 and is not present in prior releases.
IFWhen using o1-preview, o1-mini, or Perplexity models that do not support the 'stop' parameter, crewAI's default call to litellm fails with 'Unsupported parameter: stop' BadRequestError.
THENBefore passing parameters to litellm, check if the model supports the 'stop' parameter. If not (e.g., o1 series, Perplexity), remove 'stop' from the kwargs. This can be done by patching litellm.completion to delete the 'stop' key, or by updating crewAI's LLM class to conditionally omit the default stop=['\nObservation:'] for such models.
IFWhen loading GLM-4.5-FP8 or similar models that require embedding support, the UnquantizedLinearMethod class raises NotImplementedError because it lacks the 'embedding' method.
THENApply the fix from PR #22257 (https://github.com/vllm-project/vllm/pull/22257) which adds the missing 'embedding' method to UnquantizedLinearMethod, or upgrade to a vLLM version that includes this fix (e.g., >0.10.0).
IFWhen running a GLM-4.5-FP8 model with vLLM 0.10.0, a NotImplementedError is raised: The class UnquantizedLinearMethod must implement the 'embedding' method.
THENApply the fix from PR #22257 on GitHub (https://github.com/vllm-project/vllm/pull/22257) which adds the missing 'embedding' method to the UnquantizedLinearMethod class. Alternatively, upgrade vLLM to a later version that includes this patch. Until resolved, avoid serving GLM-4.5-FP8 models with vLLM 0.10.0.
IFWhen using GLM-4.5-FP8 model with vLLM 0.10.0, the error 'UnquantizedLinearMethod must implement the embedding method' occurs.
THENUpgrade vLLM to a version that includes the fix from PR #22257, or apply the patch manually. Ensure the model's linear method implementation includes an embedding method for unquantized layers.
IFUpgrading transformers to 4.50.0 causes Florence2 and similar custom models to fail with ValueError: Unrecognized configuration class when using AutoModelForCausalLM.
THENDowngrade transformers to version 4.49.0 or wait for an upstream fix. As a temporary workaround, pin the version with 'pip install transformers==4.49.0'.
IFPre-built vllm wheels for gpt-oss only support sm90/sm100 (Hopper GPUs), causing failures on Ampere (A100, RTX 3090) and Ada Lovelace (L40s) GPUs.
THENBuild vllm from source using the instructions in PR #22259, reinstall triton==3.4.0, and set the environment variable VLLM_ATTENTION_BACKEND=TRITON_ATTN_VLLM_V1. Note that even with this workaround, inference may fail with a CUDA kernel image error; official support is not yet available for these architectures.
IFWhen loading the FP8 quantized version of the Qwen3-Next model (e.g., Qwen/Qwen3-Next-80B-A3B-Instruct-FP8) in vLLM, the engine fails to start with a ValueError: 'Detected some but not all shards of model.layers.0.linear_attn.in_proj are quantized. All shards of fused layers to have the same precision.'
THENDeploy the non-FP8 (BF16/FP16) version of the Qwen3-Next model instead. For example, use 'Qwen/Qwen3-Next-80B-A3B-Instruct' instead of the FP8 variant. Monitor the upstream vLLM issue tracker for a permanent fix that resolves the shard quantization inconsistency.
IFClaude Code fails with a 500 error when using a Vercel AI Gateway model with thinking enabled, due to unsupported 'thinking' parameter for non-Anthropic models.
THENSet 'litellm.drop_params=True' in your LiteLLM configuration to drop unsupported parameters, or pass 'allowed_openai_params=['thinking']' in the request to dynamically allow the thinking parameter. For the proxy, add 'litellm_settings: drop_params true' to your config.
IFWhen loading a model with unsupported quantization type (e.g., fp8) using AutoModelForCausalLM.from_pretrained, a ValueError 'Unknown quantization type' occurs.
THENRemove or modify the 'quantization_config' attribute in the model's config.json file before loading. Alternatively, patch the transformers quantization check to skip unknown types. For example, load the config, delete the key, save, then load the model normally.
IFLoading a Gemma3 model for text-only purposes fails in Transformers v4.49.0 because the architecture is not recognized.
THENInstall Transformers from the main branch (future v4.50) or wait for the official release that adds Gemma3 support to AutoModelForCausalLM.
IFLoading a bitsandbytes 4-bit quantized Llama model (e.g., unsloth/Llama-3.3-70B-Instruct-bnb-4bit) in vLLM causes KeyError during weight loading due to unsupported parameter names like 'layers.0.mlp.down_proj.weight.absmax'.
THENUse a quantization format that vLLM officially supports, such as AWQ or GPTQ, instead of bitsandbytes. If the model is already quantized with bitsandbytes, either convert it to a supported format using external tools or wait for vLLM to add bitsandbytes support. Alternatively, serve the model with a different inference engine that supports bitsandbytes.
Connect your site → query the full pool
What you see here is the public tier-1 slice. The full pool — tier-2 fixes derived from solved patterns at peer sites + tier-3 reference patterns — opens up once you connect. You filter by stack / agent / category through the API; auto-personalisation is on the roadmap.
Connect a site