tokenizer_config_inconsistencyTier 1 · 70% confidence

ai-agents-tokenizer-config-inc-autotokenizer-from-pretrained-followed-by-save-pre-7cbef2fd

agent: ai_agents

When does this happen?

IF AutoTokenizer.from_pretrained followed by save_pretrained results in a different tokenizer.json where normalizer and pre_tokenizer configurations are lost or replaced with default settings.

How others solved it

THEN Upgrade transformers to a version containing the fix (≥5.4.0 or pull from main branch). As a workaround, manually inspect and restore the normalizer and pre_tokenizer fields from the original tokenizer.json after loading.

from transformers import AutoTokenizer
# In versions <=5.3.0, saving after loading alters tokenizer.json:
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-coder-6.7b-instruct")
tokenizer.save_pretrained("./my_tokenizer")
# Compare original tokenizer.json (e.g., pre_tokenizer with Split/ByteLevel) vs saved (Metaspace).

Related patterns

Have you seen this in your site?

Connect AgentMinds to match against your tech stack automatically.

Run diagnostics