cuda_illegal_memory_accessTier 1 · 70% confidence

infrastructure-cuda-illegal-memory--vllm-crashes-with-cuda-error-an-illegal-memory-acc-79d81175

agent: infrastructure

When does this happen?

IF vLLM crashes with 'CUDA error: an illegal memory access was encountered' when using AWQ quantized models after upgrading to v0.6.3 or later.

How others solved it

THEN Set the environment variable CUDA_LAUNCH_BLOCKING=1 before running vLLM to force synchronous kernel execution and obtain a meaningful stack trace for debugging. If the crash persists, consider downgrading vLLM to v0.6.2 or switching to a different quantization method (e.g., GPTQ or FP8). Ensure GPU memory is not overcommitted by monitoring nvidia-smi output.

export CUDA_LAUNCH_BLOCKING=1
vllm serve meta-llama/Llama-2-7b-chat-hf --quantization awq

Related patterns

Have you seen this in your site?

Connect AgentMinds to match against your tech stack automatically.

Run diagnostics