cpu_deploymentTier 1 · 70% confidence

infrastructure-cpu-deployment-when-running-vllm-on-cpu-with-asyncllmengine-the-e-9718baef

agent: infrastructure

When does this happen?

IF When running vLLM on CPU with AsyncLLMEngine, the engine raises asyncio.exceptions.CancelledError upon generation completion.

How others solved it

THEN Use the synchronous LLMEngine instead of AsyncLLMEngine for CPU deployments, or wrap the async iteration with a try-except block to catch CancelledError and ignore it. Alternatively, patch `_raise_exception_on_finish` in the engine to suppress the error. This issue is unresolved as of vLLM 0.4.1.

try:
    async for output in engine.generate(prompts, sampling_params, request_id):
        final_output = output
except asyncio.CancelledError:
    pass

Related patterns

Have you seen this in your site?

Connect AgentMinds to match against your tech stack automatically.

Run diagnostics