Prompt Engineering

SQL Performance Analysis — Surviving the Billion-Row Table

At 240M rows and growing: keyset pagination over OFFSET, partition pruning, and the PostgreSQL specifics — analysis aimed at the scale the table will reach, not the scale it passed.

Open in SQL Optimization Prompt

Overview

Performance analysis at large scale is a different discipline: what matters is not today's timing but the curve it sits on. This prompt configures the scaling goal on PostgreSQL for an audit-log table heading to a billion rows: reason at target scale explicitly, identify the worse-than-linear work, replace OFFSET pagination — whose cost grows with page depth — with keyset pagination, and evaluate partition pruning where the platform offers it. The PostgreSQL guidance grounds it: EXPLAIN (ANALYZE, BUFFERS) for the evidence, BRIN indexes as the append-only table's cheap friend, bloat and autovacuum state at volumes where they dominate, and parallelism checked rather than assumed.

How to use this resource

Name the target scale

Every conclusion states the row count it assumes — one billion is the design point, not today's 240M.
Kill OFFSET before it kills you

Deep-page OFFSET cost grows with depth; keyset pagination holds constant.
Prune instead of scan

Partitioning evaluated by what queries can actually prune — not by partitioning fashion.

Why This Works

Target-scale reasoning catches what passes today and fails next year
Keyset-over-OFFSET is the highest-value pagination fix at scale
Pruning-based evaluation keeps partitioning honest about its wins

Best for

Append-heavy tables: audit logs, events, time series
Systems whose data outgrew their original query patterns
Capacity planning grounded in query mechanics

Not for

Small-table tuning where scale is not the issue — that's the Query Speed goal
Refactoring the application's data access layer — that's the Refactor Prompt Builder

Use cases

Preparing audit and event tables for the next order of magnitude
Replacing OFFSET pagination before deep pages time out
Evaluating partitioning with honest pruning analysis

FAQ

Does this prompt need my execution plan and row counts before it recommends anything?

It works without them but won't guess in their place. The QUERY CONTEXT states "Execution plan: not provided. Do not invent one," and marks every plan-dependent conclusion as pending EXPLAIN (ANALYZE, BUFFERS); row counts and indexes are likewise not assumed. Where evidence is missing, the first recommendations are the commands to gather it. The sql-optimization-prompt builds this; you run it in your assistant and supply the plan for firm conclusions.

Why does it push keyset pagination over OFFSET for a large table?

Because the symptoms note that "pagination uses OFFSET and deep pages time out." Analysis priority 3 bounds the working set with "keyset pagination instead of OFFSET." Any rewrite it proposes "must return identical results," calling out NULL handling, duplicates, and ordering differences. It's an evidenced recommendation for you to review and test on production-shaped data, not a guaranteed fix.

Is this the right prompt for tuning a small, slow query?

Not really — this configures the Large Dataset Scaling goal, reasoning toward one billion rows for an append-only audit-log table already at 240M. Its priorities are worse-than-linear work, spill risk, and partition pruning. Per the notFor, a small table where scale isn't the issue fits the Query Speed goal better, and refactoring the application's data-access layer is the Refactor Prompt Builder's territory.

Customize This Resource

Opens this setup in SQL Optimization Prompt. Generate to get the full optimization contract — then adjust the goal, platform, and evidence mode.

Open in SQL Optimization Prompt

Prompt Template

Copy it as-is, or use Open in SQL Optimization Prompt to load it pre-filled and customize it with your own context.

OPTIMIZATION OBJECTIVE
Make the audit-log queries survive the table reaching one billion rows.
Optimization goal: large dataset scaling — make the query survive growth, not just pass today.
Establish why this query is expensive before changing anything. Every recommendation must trace to evidence from this query, this plan, or this schema — no generic database advice.

DATABASE CONTEXT
Platform: PostgreSQL.
- Demand EXPLAIN (ANALYZE, BUFFERS) — plain EXPLAIN cannot show actual rows, loops, or buffer traffic.
- Check bloat and statistics: dead tuples inflate scans and stale statistics mislead the planner — verify last autovacuum and analyze times.
- Reason about planner costs: default random_page_cost assumes spinning disks; on SSDs it pushes the planner away from index scans it should take.
- Use PostgreSQL's index arsenal only where evidence supports it: partial indexes for hot subsets, expression indexes for computed predicates, BRIN for huge append-only tables.
- Check parallelism: whether the plan uses workers, and whether the query shape (volatile functions, some CTEs) blocks them.

QUERY CONTEXT
Query:
```sql
[Paste the SQL query here]
```
Execution plan: not provided. Do not invent one. State the exact command to capture it — EXPLAIN (ANALYZE, BUFFERS) — and mark every plan-dependent conclusion as pending that evidence.
Tables and row counts: not provided — do not assume row counts. State the cardinality questions that block firm conclusions.
Existing indexes: not provided — do not assume any index exists. Before recommending new indexes, list what must be checked about the current ones.

PERFORMANCE SYMPTOMS
Already at 240M rows; queries degrade monthly; pagination uses OFFSET and deep pages time out.

OPTIMIZATION GOAL
Primary goal: Large Dataset Scaling.
Analysis priorities for this goal:
1. Reason at target scale: a plan that is fine at 100 thousand rows may own the server at 100 million — state the scale every conclusion assumes.
2. Identify work that grows faster than the data: join shapes that multiply, sorts that will spill, aggregates over ever-growing history.
3. Bound the working set: time-windowing, keyset pagination instead of OFFSET, and partition pruning where the platform offers it.
4. Check growth asymmetry: which tables grow and which stay constant — the right plan today may invert as the ratios change.

EVIDENCE REVIEW
Evidence mode: Standard Analysis.
- Prefer evidence from the provided query, plan, and schema; where evidence is missing, label the assumption explicitly.
- Carry at least two candidate bottlenecks until evidence separates them.

BOTTLENECK ANALYSIS
- Identify the bottlenecks and, for each: what it costs, why it matters for this workload, and the estimated share of the total cost.
- Expected analysis areas for this goal: Worse-than-linear operations; Spill risk at target volume; OFFSET pagination and unbounded scans; Partitioning and pruning opportunities.
- Distinguish the bottleneck from its symptom — a slow sort caused by a missing filter is a filter problem, not a sort problem.

OPTIMIZATION OPPORTUNITIES
- Order opportunities by expected impact, highest first; state the basis for each estimate.
- For each opportunity: the change, the expected effect, and the evidence that predicts it.
- Any query rewrite must return identical results — call out every difference in NULL handling, duplicates, and ordering, or state explicitly that there is none.
- Do not tune speculatively: a change without an evidenced problem is risk without benefit.

INDEX RECOMMENDATIONS
- Every index recommendation must name the exact predicates, joins, or sorts it serves — no index without a clause.
- Justify column order for composite indexes, and say whether the index should cover (and at what storage and write cost).
- State each index's write tax: which inserts and updates it slows, and whether the workload can afford that.
- Check existing indexes first: a near-miss index that could be extended beats a new overlapping one.

TRADEOFF ANALYSIS
- Every recommendation carries its costs: write amplification, storage, maintenance burden, plan-stability risk, staleness (for pre-aggregation).
- State when NOT to apply each recommendation — the workload shape under which it backfires.

ASSUMPTIONS
- List every assumption made about data volumes, value distributions, index state, or workload patterns.
- Mark each assumption VERIFIED (with its evidence) or UNVERIFIED (with the query or command that would resolve it).
- Any recommendation that depends on an UNVERIFIED assumption must be flagged as conditional on it.

NON-GOALS
- Do not invent execution plans.
- Do not assume indexes exist.
- Do not assume row counts.
- Do not recommend changes without justification.
- Separate facts from assumptions throughout.
- Explain the tradeoffs of every change that has them.
- Schema redesign and application-level changes are out of scope — unless the evidence shows no query-level fix exists, in which case say so and stop.

OUTPUT REQUIREMENTS
- Present recommendations as an ordered list, highest expected impact first, each with its evidence and its tradeoffs.
- For each recommendation, include the verification step: how to measure the improvement — EXPLAIN (ANALYZE, BUFFERS) before and after, on production-shaped data.
- Where essential evidence is missing, the first recommendations are the commands to gather it — not guesses in its place.
- End with the open questions: what could not be determined from the provided context, and what would settle each.

More resources from SQL Optimization Prompt

Resource

Optimize SQL Query — Joins, Cardinality, and Wasted Work

A five-table join that got slow as data grew: establish cardinality first, check each join strategy against the data shape, and hunt the fan-out doing wasted work.

Prompt Engineering

Resource

SQL Optimization Prompt — the Evidence-Based Contract

"Optimize this query" gets generic indexing advice. The optimization contract demands evidence: real bottlenecks, justified indexes with their write tax, and no invented plans.

Prompt Engineering

Resource

Execution Plan Analysis — Read What the Engine Actually Chose

Optimize what the engine does, not what the SQL looks like: cost concentration, estimate-vs-actual gaps, and plan warnings — with forensic evidence rules.

Prompt Engineering

Resources that pair well

Resource

Debugging Prompt — the Investigation Contract

"Fix this error" gets guesses. The investigation contract gets a ten-stage diagnosis: facts separated from assumptions, alternatives weighed, fixes justified.

Prompt Engineering

Resource

Code Review Prompt — the Review Contract

"Review this code" gets shallow comments. The review contract gets findings with severities, a checklist, and a verdict.

Prompt Engineering

Resource

Refactor Prompt — the Behavior Preservation Contract

"Refactor this code" invites silent behavior changes. The refactoring contract preserves business rules, outputs, and side effects — and flags uncertainty instead of deciding it.

Prompt Engineering

Related tools

Tool

SQL Optimization Prompt

Build evidence-based SQL optimization prompts — goal, platform, and the evidence you have turn into a query tuning contract.

Coding Workflows

Tip: Save time by exploring related resources and tools that integrate with this resource.