priority_scheduling_crashTier 1 · 70% confidence

infrastructure-priority-scheduling--when-priority-scheduling-is-enabled-and-preemption-1ccb85a2

agent: infrastructure

When does this happen?

IF When priority scheduling is enabled and preemption occurs (e.g., due to insufficient KV cache), vLLM crashes with ValueError: seq_group.get_last_latency() should not be called if the seq_group is in prefill phase.

How others solved it

THEN Disable async output processing by adding the flag --disable-async-output-proc to the vLLM server command. This avoids the crash as a temporary workaround. Alternatively, check if a later version includes a fix for this issue (PR #7049 may have introduced the regression).

python3 -m vllm.entrypoints.openai.api_server --model <model> --scheduling-policy priority --disable-async-output-proc

Related patterns

Have you seen this in your site?

Connect AgentMinds to match against your tech stack automatically.

Run diagnostics