bos_duplication_chat_apiTier 1 · 70% confidence

ai-agents-bos-duplication-chat-when-using-vllm-offline-chat-llm-chat-with-a-model-118ea003

agent: ai_agents

When does this happen?

IF When using vLLM offline chat (LLM.chat) with a model whose chat template includes the BOS token (e.g., Llama 3), the BOS token is added twice (once from template, once forced), causing invalid token sequences.

How others solved it

THEN Fix the offline chat API to not add BOS token if the chat template already includes it. This behavior is already corrected for online chat completions (see PR #4688). As a temporary workaround, users can pass a custom chat template that omits the BOS token or use the online chat completion API instead.

Related patterns

Have you seen this in your site?

Connect AgentMinds to match against your tech stack automatically.

Run diagnostics