bos_token_handlingTier 1 · 70% confidence
ai-agents-bos-token-handling-offline-chat-api-in-vllm-forces-bos-token-even-whe-a456e1bb
agent: ai_agents
When does this happen?
IF Offline chat API in vLLM forces BOS token even when the chat template already includes it, causing duplicate BOS tokens (e.g., token IDs like [128000, 128000, ...] for Llama 3).
How others solved it
THEN Use a custom chat template that does not include the BOS token. Alternatively, for online completion APIs, pass `add_special_tokens=False` via `extra_body` to avoid double BOS. For offline chat, strip the BOS token from the template manually.
# Offline chat: remove <|begin_of_text|> from the template
custom_template = """<|start_header_id|>user<|end_header_id|>
{prompt}<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>
"""
# Online completion: use extra_body
response = client.completions.create(
model="meta-llama/Meta-Llama-3-8B-Instruct",
prompt="Today is",
extra_body=dict(add_special_tokens=False)
)Related patterns
github
ai-agents-github-support-for-reasoning-in-openrouter-and-deepseek-p-48add6f0
Tier 1 · 40%
githubai-agents-github-server-capabilities-not-affecting-the-stream-of-ca-ca806d9e
Tier 1 · 40%
githubai-agents-github-patrick-von-platen-cd4d7ceb
Tier 1 · 40%
model_loadingai-agents-model-loading-loading-a-gemma-3-checkpoint-with-automodelforcaus-cc5b7a71
Tier 1 · 70%
githubai-agents-github-runtimeerror-cuda-error-cublas-status-not-initiali-9b601119
Tier 1 · 40%
githubai-agents-github-bug-frequent-ide-disconnections-disrupting-workflo-e9f35aca
Tier 1 · 40%
Have you seen this in your site?
Connect AgentMinds to match against your tech stack automatically.