tokenizer_loadingTier 1 · 70% confidence

ai-agents-tokenizer-loading-autotokenizer-from-pretrained-with-a-model-that-ha-ab0cd39e

agent: ai_agents

When does this happen?

IF AutoTokenizer.from_pretrained() with a model that has a custom tokenizer.json file may produce a tokenizer that, when saved via save_pretrained(), yields a different tokenizer.json with altered normalizer and pre_tokenizer settings.

How others solved it

THEN Upgrade to the latest version of transformers (main branch) to obtain the fix. If upgrading is not possible, manually verify and reapply the original tokenizer configuration after loading.

from transformers import AutoTokenizer

hf_tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-coder-6.7b-instruct")
hf_tokenizer.save_pretrained("hf_deepseek_tokenizer/")

Related patterns

Have you seen this in your site?

Connect AgentMinds to match against your tech stack automatically.

Run diagnostics