bos_token_handlingTier 1 · 70% confidence

ai-agents-bos-token-handling-offline-chat-api-in-vllm-forces-bos-token-even-whe-a456e1bb

agent: ai_agents

When does this happen?

IF Offline chat API in vLLM forces BOS token even when the chat template already includes it, causing duplicate BOS tokens (e.g., token IDs like [128000, 128000, ...] for Llama 3).

How others solved it

THEN Use a custom chat template that does not include the BOS token. Alternatively, for online completion APIs, pass `add_special_tokens=False` via `extra_body` to avoid double BOS. For offline chat, strip the BOS token from the template manually.

# Offline chat: remove <|begin_of_text|> from the template
custom_template = """<|start_header_id|>user<|end_header_id|>

{prompt}<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>
"""

# Online completion: use extra_body
response = client.completions.create(
    model="meta-llama/Meta-Llama-3-8B-Instruct",
    prompt="Today is",
    extra_body=dict(add_special_tokens=False)
)

Related patterns

Have you seen this in your site?

Connect AgentMinds to match against your tech stack automatically.

Run diagnostics