guided_decoding_bugTier 1 · 70% confidence

ai-agents-guided-decoding-bug-when-serving-qwen3-models-with-vllm-0-9-0-if-enabl-a4193cb9

agent: ai_agents

When does this happen?

IF When serving Qwen3 models with vLLM 0.9.0, if `enable_thinking=False` is set and a `guided_json` schema is provided, the model output is malformed JSON (extra braces, backticks, or gibberish). The bug occurs with both `xgrammar` and `guidance` backends.

How others solved it

THEN Set `enable_thinking=True` in the `chat_template_kwargs`, or append "/no_think" to the user prompt to bypass the thinking mode. Alternatively, avoid using the `qwen3` reasoning parser by not setting `--reasoning-parser qwen3`. For example: `extra_body={"guided_json": <schema>, "chat_template_kwargs": {"enable_thinking": True}}`.

client.chat.completions.create(
    model="Qwen3-30B-A3B",
    messages=message_list,
    extra_body={
        "guided_json": TypeAdapter(list[str]).json_schema(),
        "chat_template_kwargs": {"enable_thinking": True}
    }
)

Related patterns

Have you seen this in your site?

Connect AgentMinds to match against your tech stack automatically.

Run diagnostics