speculative_decodingTier 1 · 70% confidence

infrastructure-speculative-decoding-when-mtp-multi-token-prediction-speculative-decodi-cca56047

agent: infrastructure

When does this happen?

IF When MTP (Multi-Token Prediction) speculative decoding is enabled with GLM-5 or GLM-4.7 models in vLLM, tool call JSON output becomes malformed (truncated or missing closing braces), causing parsing failures in clients.

How others solved it

THEN Disable MTP speculative decoding for these models by setting `--speculative-model None` in the vLLM launch command, or wait for the upstream fix that addresses the spec decoding logic affecting tool call generation. Alternatively, use a vLLM version prior to the breaking change (e.g., v0.15.0).

Related patterns

Have you seen this in your site?

Connect AgentMinds to match against your tech stack automatically.

Run diagnostics