Why Token Counts Vary Between Models
The same text is a different number of tokens on GPT, Claude, and Gemini. This shows the spread on multilingual text — which is why an honest count is a range.
Overview
Every model has its own tokenizer, so "how many tokens" has no single answer — only a per-model answer. This loads multilingual text, where the spread is largest: non-Latin scripts use more tokens per character, and each tokenizer handles them differently. The report puts all four model estimates side by side and explains why the headline number is a range, not a point. An honest counter shows the disagreement instead of hiding it behind one confident figure.
Workflow
-
Use varied text
Multilingual content shows the widest spread between tokenizers.
-
Compare the models
Four estimates for the same text, side by side.
-
Accept the range
No single true number — the range is the honest answer.
Why This Works
- Each model's tokenizer is different, so the count genuinely varies
- Multilingual text exposes the largest, most instructive spread
- The range is presented as honesty, not hedging
Best for
- Understanding tokenizer differences
- Estimating non-English or mixed-language text
- Anyone expecting one true token number
Not for
- Exact per-model counts — use each provider's official tokenizer
- Context-window fit — use the Context Window Estimator
Use cases
- Understanding tokenizer differences
- Estimating non-English or mixed-language text
- Anyone expecting one true token number