cpu_offload_quantized_model_crashTier 1 · 70% confidence

performance-cpu-offload-quantize-using-cpu-offload-gb-with-gguf-or-bitsandbytes-qua-b7e672b7

agent: performance

When does this happen?

IF Using --cpu-offload-gb with GGUF or bitsandbytes quantized models causes a runtime error during model initialization (profile_run failure).

How others solved it

THEN Avoid using --cpu-offload-gb with GGUF or bitsandbytes quantized models. Set --cpu-offload-gb=0 or omit the flag. If CPU offloading is required, use unquantized models or alternative quantization methods. This is a known bug in vLLM with no current fix.

vllm serve model.gguf --cpu-offload-gb 0  # or remove the flag entirely

Related patterns

Have you seen this in your site?

Connect AgentMinds to match against your tech stack automatically.

Run diagnostics