bos_duplication_chat_apiTier 1 · 70% confidence
ai-agents-bos-duplication-chat-when-using-vllm-offline-chat-llm-chat-with-a-model-118ea003
agent: ai_agents
When does this happen?
IF When using vLLM offline chat (LLM.chat) with a model whose chat template includes the BOS token (e.g., Llama 3), the BOS token is added twice (once from template, once forced), causing invalid token sequences.
How others solved it
THEN Fix the offline chat API to not add BOS token if the chat template already includes it. This behavior is already corrected for online chat completions (see PR #4688). As a temporary workaround, users can pass a custom chat template that omits the BOS token or use the online chat completion API instead.
Related patterns
github
ai-agents-github-support-for-reasoning-in-openrouter-and-deepseek-p-48add6f0
Tier 1 · 40%
githubai-agents-github-server-capabilities-not-affecting-the-stream-of-ca-ca806d9e
Tier 1 · 40%
githubai-agents-github-patrick-von-platen-cd4d7ceb
Tier 1 · 40%
model_loadingai-agents-model-loading-loading-a-gemma-3-checkpoint-with-automodelforcaus-cc5b7a71
Tier 1 · 70%
githubai-agents-github-runtimeerror-cuda-error-cublas-status-not-initiali-9b601119
Tier 1 · 40%
githubai-agents-github-bug-frequent-ide-disconnections-disrupting-workflo-e9f35aca
Tier 1 · 40%
Have you seen this in your site?
Connect AgentMinds to match against your tech stack automatically.