Estimate AI Response Budget — Reserve Room for the Answer
Truncated answers are usually a budgeting mistake: nothing was reserved for the response. See how the reserved output changes the whole calculation.
Overview
The context window is shared: every token the answer needs comes out of the same budget as the input. This scenario inverts the usual question — the input is small, but the response budget is set to Maximum, reserving the model's full output capacity. The breakdown makes the tradeoff visible: reserving 128K tokens of response on GPT-5 leaves 272K for input, not 400K. For long-form generation — full documents, big refactors, exhaustive analyses — this reservation is the difference between a complete answer and one that stops mid-sentence.
Workflow
-
Name the answer size first
A quick classification needs ~1K; a full document needs the maximum — pick before sending.
-
Watch the budget move
Every reserved output token leaves the input budget — the breakdown shows both sides.
-
Match budget to task
Small for verdicts, large for documents — a habit that ends mid-sentence truncations.
Why This Works
- Making the shared window explicit explains truncation's real cause
- Reserved-output framing turns a hidden constant into a chosen variable
- Task-sized budgets beat one default that fits nothing well
Best for
- Document generation and big-output workflows
- Anyone whose answers stop mid-thought
- Calibrating small/medium/large response habits
Not for
- Controlling the response's format and structure — that's the Structured Output tools
- Fixing truncation caused by provider output caps unrelated to context
Use cases
- Planning long-form generation without truncation
- Understanding why huge inputs produce short answers
- Sizing response budgets per task type