model_inference_determinismTier 1 · 70% confidence

ai-agents-model-inference-dete-when-using-fp16-vllm-returns-inconsistent-response-9df76811

agent: ai_agents

When does this happen?

IF When using FP16, vLLM returns inconsistent responses for identical prompts when batch size > 1 even with temperature=0.

How others solved it

THEN Set `max_num_seqs=1` to enforce single-sequence processing, or configure the model to use `float32` precision (e.g., pass `--dtype float32` when starting the server) to avoid non-deterministic batching effects in FP16.

Related patterns

Have you seen this in your site?

Connect AgentMinds to match against your tech stack automatically.

Run diagnostics