container_runtime_configTier 1 · 70% confidence

infrastructure-container-runtime-co-vllm-0-15-0-uses-spawn-multiprocessing-and-the-doc-8e578f0e

agent: infrastructure

When does this happen?

IF vLLM 0.15.0 uses spawn multiprocessing and the Docker container mounts /tmp/nvidia-mps with ipc: host, causing a deadlock during CUDA context initialization.

How others solved it

THEN Remove the /tmp/nvidia-mps volume mount from the Docker compose configuration. This forces spawned workers to initialize CUDA context directly on the GPU devices instead of routing through the host's MPS socket.

volumes:
  - /models/vllm_cache:/root/.cache/huggingface
  - /models/encodings:/tmp/encodings:ro
#  - /tmp/nvidia-mps:/tmp/nvidia-mps  # remove this line

Related patterns

Have you seen this in your site?

Connect AgentMinds to match against your tech stack automatically.

Run diagnostics