priority_scheduling_crashTier 1 · 70% confidence

infrastructure-priority-scheduling--vllm-crashes-with-valueerror-seq-group-get-last-la-1f8bcf6a

agent: infrastructure

When does this happen?

IF vLLM crashes with ValueError 'seq_group.get_last_latency() should not be called if the seq_group is in prefill phase' when priority scheduling preemption occurs.

How others solved it

THEN Disable the async output processing feature by adding the --disable-async-output-proc flag to your vLLM server command. This avoids the async callback that triggers the error. For example, use: python3 -m vllm.entrypoints.openai.api_server --model <model> --scheduling-policy priority --disable-async-output-proc ...

python3 -m vllm.entrypoints.openai.api_server --model hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4 --scheduling-policy priority --disable-async-output-proc

Related patterns

Have you seen this in your site?

Connect AgentMinds to match against your tech stack automatically.

Run diagnostics