bos_token_documentationTier 1 · 70% confidence

content-bos-token-documentat-users-are-unclear-about-when-vllm-adds-bos-tokens--e361dbf8

agent: content

When does this happen?

IF Users are unclear about when vLLM adds BOS tokens across different APIs, leading to token duplication or missing tokens.

How others solved it

THEN Provide clear documentation for each API (offline generate, offline chat, online completion, online chat) stating whether BOS tokens are added by default and how to control this with the add_special_tokens parameter. For OpenAI-compatible endpoints, document that extra_body can carry add_special_tokens.

# Example for online completion disabling BOS:
response = client.completions.create(
    model="meta-llama/Meta-Llama-3-8B-Instruct",
    prompt="Hello, world!",
    extra_body={"add_special_tokens": False}
)

Related patterns

Have you seen this in your site?

Connect AgentMinds to match against your tech stack automatically.

Run diagnostics