Prompt Engineering

Tokens vs Characters — Why They Differ

A token is not a character. This shows the gap on a real prompt — characters, tokens, and the ratio between them — so the difference stops being abstract.

Open in Token Counter

Overview

People reach for character count because it is easy, then get surprised by token bills. This loads a short prompt with no assumed response so the focus stays on the units: roughly four characters per token in English, but it swings with content and language. Tokens are the model's unit and characters are the keyboard's — counting one to predict the other is where estimates go wrong. The report makes the ratio visible instead of leaving you to guess it.

How to use this resource

See both counts

Characters and tokens side by side on the same text.
Read the ratio

Roughly 4 characters per token in English — but only roughly.
Watch it move

Change the text type or language and the ratio shifts.

Why This Works

Seeing both counts on one text makes the gap concrete, not theoretical
The ~4:1 English ratio is shown, not assumed
Content-type detection demonstrates why the ratio is not fixed

Best for

Understanding why token bills surprise people
Learning the rough character-to-token ratio
Anyone using character count as a token proxy

Not for

Counting characters for a platform limit — use the Character Counter
Deciding context-window fit — use the Context Window Estimator

FAQ

What is the actual ratio of characters to tokens?

Roughly four characters per token in English, shown here rather than assumed. The report puts 'Characters: 274' next to 'Estimated tokens: ~70' and lists 'Tokens per character (approx): 0.3', making the gap concrete. The ESTIMATION NOTES define a token as a sub-word chunk, not a character, and warn the ratio swings with content type and language.

Why do people get surprised by token bills when they only counted characters?

Because characters are the keyboard's unit and tokens are the model's, so one doesn't predict the other cleanly. This sample keeps 'Assumed response: none (input only)' to hold focus on the units, showing 274 characters map to about 70 tokens at a rough 4-to-1 that shifts with language and code. These remain estimates, not exact tokenizer output.

Customize This Resource

Opens this text in Token Counter. Count to get the full token and cost report — then adjust the model and assumed response length.

Open in Token Counter

Prompt Template

Copy it as-is, or use Open in Token Counter to load it pre-filled and customize it with your own context.

TOKEN COUNT REPORT

TOKEN ESTIMATE
- Estimated tokens: ~70 (range 63–77)
- Characters: 274
- Words: 45
- Detected content type: Prose
- Tokens per word (approx): 1.6
- Tokens per character (approx): 0.3

COST ESTIMATE — GPT-5
- Pricing (approximate, June 2026): input $1.25/1M tokens · output $10.00/1M tokens
- Input cost (this prompt): $0.000079–$0.000096
- Assumed response: none (input only)
- Per 1,000 calls: $0.0788–$0.0963

MODEL NOTES
- OpenAI tokenizer (o200k-class) — English averages roughly 4 characters per token.
- Same text, estimated per model: GPT-5 ~70 · Claude Opus ~73 · Claude Sonnet ~73 · Gemini Pro ~70 (a count, not a fit check — for "will it fit?" use the Context Window Estimator).

USAGE GUIDANCE
- For scale on GPT-5: a short prompt ≈ 71, a medium prompt ≈ 506, a large prompt ≈ 3,031 tokens.
- This text is closest to a short prompt.
- Output tokens usually cost several times more than input — add an assumed response above to see the fuller cost.

ESTIMATION NOTES
- A token is not a character and not a word — it is a sub-word chunk. English averages ~4 characters / ~0.75 words per token.
- Estimates vary by tokenizer: the same text tokenizes differently on GPT, Claude, and Gemini — that is why this is a range, not a single number.
- Language matters: CJK and many non-Latin scripts use more tokens per character than English.
- Code differs from prose: symbols, indentation, and punctuation push code to more tokens per character.
- These are character-based ESTIMATES, not tokenizer output. Pricing is approximate as of June 2026; providers change rates — verify before relying on a number.

More resources from Token Counter

Resource

Estimate Token Usage Before You Run It

Know how many tokens a job will consume before you send it — input plus an assumed response, costed per call and at scale.

Prompt Engineering

Resource

Token Counter for AI Prompts

Paste a prompt, get an honest token estimate — a range, not a fake-precise number — plus the cost across GPT, Claude, and Gemini.

Prompt Engineering

Resource

Calculate AI API Cost for a Prompt

Turn a prompt into a dollar figure: input cost, output cost, combined per call, and the number that actually matters — cost per 1,000 calls.

Prompt Engineering

Resources that pair well

Resource

Estimate Token Budget — Plan Before You Paste

Token budget planning for real workloads: how much of the window a transcript actually consumes, what is left for the answer, and how much headroom remains.

Prompt Engineering

Resource

Message Too Long — the Fix That Doesn't Butcher Content

The "message too long" error has a structural fix: split at paragraph boundaries into sequenced chunks with wait rules, instead of pasting fragments and hoping.

Prompt Engineering

Resource

Prompt Cleanup Examples (Before & After)

A set of before-and-after examples showing exactly what prompt cleanup removes — and what it deliberately leaves alone.

Prompt Engineering

Related tools

Tool

Token Counter

Estimate how many tokens a prompt is and what it costs — honest ranges across GPT, Claude, and Gemini, with per-call and per-1,000-call pricing.

Prompt Utilities

Guides for this resource

Guide

How to count tokens in a prompt before you send it

Counting a prompt's tokens before you send it tells you whether it fits the model, what it will cost, and whether the end might get cut off. Here's how to check and trim.

Prompt Engineering

Tip: Save time by exploring related resources and tools that integrate with this resource.