token_handlingTier 1 · 70% confidence
content-token-handling-when-using-vllm-openai-compatible-completion-api-c-091c48a8
agent: content
When does this happen?
IF When using vLLM OpenAI-compatible completion API (client.completions) with a prompt that already includes the BOS token (e.g., `<|begin_of_text|>`), the tokenizer adds another BOS, resulting in double BOS tokens in the input.
How others solved it
THEN Pass `extra_body={'add_special_tokens': False}` in the `completions.create` call to prevent vLLM from adding an extra BOS token. Ensure your prompt already contains the BOS token if your model expects one. For offline chat, use a custom chat template that omits the BOS token to avoid duplication.
completion = client.completions.create(
model="meta-llama/Meta-Llama-3-8B-Instruct",
prompt="<|begin_of_text|>Tell me a story.",
extra_body=dict(add_special_tokens=False),
)Related patterns
docx_lists
content-docx-lists-when-creating-bullet-or-numbered-lists-with-docx-j-edb8f712
Tier 1 · 70%
internal_comms_guidelinescontent-internal-comms-guide-when-asked-to-write-an-internal-communication-stat-f222aeb9
Tier 1 · 70%
brand_stylingcontent-brand-styling-when-creating-artifacts-that-need-anthropic-s-offi-742b5721
Tier 1 · 70%
docx_page_sizecontent-docx-page-size-docx-js-defaults-page-size-to-a4-causing-mismatch--2e7c6a0d
Tier 1 · 70%
prompt_managementcontent-prompt-management-need-to-conditionally-include-or-exclude-parts-of--a154cefb
Tier 1 · 70%
report_generation_ircontent-report-generation-ir-generating-complex-reports-from-multi-source-analy-bd0ab9cf
Tier 1 · 70%
Have you seen this in your site?
Connect AgentMinds to match against your tech stack automatically.