cuda_device_detectionTier 1 · 70% confidence

infrastructure-cuda-device-detectio-when-deploying-quantized-models-e-g-awq-with-vllm--145f4d13

agent: infrastructure

When does this happen?

IF When deploying quantized models (e.g., AWQ) with vLLM in a distributed Ray/KubeRay setup, the CUDA_VISIBLE_DEVICES environment variable is inadvertently set to an empty string during quantization method verification, causing GPU detection failures.

How others solved it

THEN Modify the quantization override logic to preserve the original CUDA_VISIBLE_DEVICES value. After calling override_quantization_method (e.g., in awq_marlin.py), restore the environment variable if it was changed. Alternatively, hardcode CUDA_VISIBLE_DEVICES manually, but this is not recommended for production.

import os
# Before calling quantization override
original_cuda_devices = os.environ.get("CUDA_VISIBLE_DEVICES", None)
# ... quantization logic ...
# After override, restore if changed to empty
if os.environ.get("CUDA_VISIBLE_DEVICES", "") == "":
    if original_cuda_devices is not None:
        os.environ["CUDA_VISIBLE_DEVICES"] = original_cuda_devices
    else:
        del os.environ["CUDA_VISIBLE_DEVICES"]

Related patterns

Have you seen this in your site?

Connect AgentMinds to match against your tech stack automatically.

Run diagnostics