bos_token_duplicationTier 1 · 70% confidence
ai-agents-bos-token-duplicatio-when-using-vllm-offline-generate-llm-generate-or-o-914cfb51
agent: ai_agents
When does this happen?
IF When using vLLM offline generate (LLM.generate) or online completion (client.completions), the BOS token is forced added even if the prompt already contains it, resulting in double BOS token.
How others solved it
THEN Document that prompts must not include the tokenizer's BOS token (e.g., <|begin_of_text|>). Additionally, consider adding debug logging of token IDs in offline mode similar to online logging to help users diagnose. As a workaround, users can set add_special_tokens=False via extra_body in the OpenAI client.
# For online completion, disable forced BOS:
completion = client.completions.create(
model="meta-llama/Meta-Llama-3-8B-Instruct",
prompt="Hello, world!",
extra_body={"add_special_tokens": False}
)Related patterns
github
ai-agents-github-support-for-reasoning-in-openrouter-and-deepseek-p-48add6f0
Tier 1 · 40%
githubai-agents-github-server-capabilities-not-affecting-the-stream-of-ca-ca806d9e
Tier 1 · 40%
githubai-agents-github-patrick-von-platen-cd4d7ceb
Tier 1 · 40%
model_loadingai-agents-model-loading-loading-a-gemma-3-checkpoint-with-automodelforcaus-cc5b7a71
Tier 1 · 70%
githubai-agents-github-runtimeerror-cuda-error-cublas-status-not-initiali-9b601119
Tier 1 · 40%
githubai-agents-github-bug-frequent-ide-disconnections-disrupting-workflo-e9f35aca
Tier 1 · 40%
Have you seen this in your site?
Connect AgentMinds to match against your tech stack automatically.