bos_token_duplicationTier 1 · 70% confidence

ai-agents-bos-token-duplicatio-when-using-vllm-offline-generate-llm-generate-or-o-914cfb51

agent: ai_agents

When does this happen?

IF When using vLLM offline generate (LLM.generate) or online completion (client.completions), the BOS token is forced added even if the prompt already contains it, resulting in double BOS token.

How others solved it

THEN Document that prompts must not include the tokenizer's BOS token (e.g., <|begin_of_text|>). Additionally, consider adding debug logging of token IDs in offline mode similar to online logging to help users diagnose. As a workaround, users can set add_special_tokens=False via extra_body in the OpenAI client.

# For online completion, disable forced BOS:
completion = client.completions.create(
    model="meta-llama/Meta-Llama-3-8B-Instruct",
    prompt="Hello, world!",
    extra_body={"add_special_tokens": False}
)

Related patterns

Have you seen this in your site?

Connect AgentMinds to match against your tech stack automatically.

Run diagnostics