Fit Large Codebase Context — Code Tokenizes Differently
Code is denser in tokens than prose: symbols, indentation, and short identifiers all cost extra. Estimate code files with code ratios before pasting them.
Overview
The most common surprise in code-heavy prompts: a file that "looks small" consumes far more tokens than the same character count of prose, because tokenizers split symbols, casing, and indentation aggressively. This scenario loads a real reconciliation module and shows the engine's content-type detection at work — the text is classified as Code and estimated with code ratios, not prose ratios. The budget math then answers the practical questions: how many files of this size fit alongside the question, and how much room the review's answer needs.
Workflow
-
Let detection classify it
Braces, arrows, and indentation flip the estimate to code ratios automatically — no manual setting.
-
Budget per file
One file's estimate scales linearly — the headroom line says how many more fit.
-
Reserve review-sized output
Code questions get long answers; a Large response budget keeps the review from truncating.
Why This Works
- Code-aware ratios correct the systematic underestimate prose math produces
- Automatic type detection removes the setting nobody knows how to choose
- Per-file budgeting matches how code conversations actually grow
Best for
- Code review and refactoring prompts with pasted sources
- Developers feeding multiple files into one conversation
- Anyone surprised by code's token appetite
Not for
- Reviewing the pasted code's quality — that's the Code Review Prompt Generator
- Packaging the files with delimiters and labels — that's the Long Input Formatter
Use cases
- Budgeting how many files fit in a review prompt
- Estimating a module before pasting it for analysis
- Explaining why code "runs out of context" faster than prose