distributed_gpu_allocationTier 1 · 70% confidence

infrastructure-distributed-gpu-allo-when-running-quantized-models-e-g-awq-in-a-distrib-161fdbaa

agent: infrastructure

When does this happen?

IF When running quantized models (e.g., AWQ) in a distributed vLLM setup with KubeRay, the CUDA_VISIBLE_DEVICES environment variable gets overwritten, causing GPU detection failure.

How others solved it

THEN Set the CUDA_VISIBLE_DEVICES environment variable explicitly before initializing the vLLM engine, or upgrade to a version where the quantization code does not override environment variables. As a workaround, manually set CUDA_VISIBLE_DEVICES to the appropriate GPU indices.

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0,1'  # Set before vLLM engine init

Related patterns

Have you seen this in your site?

Connect AgentMinds to match against your tech stack automatically.

Run diagnostics