Prompt Engineering

Evaluate AI Prompt Quality with Scores

Put numbers on prompt quality: eight scored dimensions — clarity, specificity, structure, output control, completeness, risk, efficiency, readiness.

Open in Prompt Comparator

Overview

"Is this a good prompt?" is easier to answer when you can measure it against an alternative. Quality decomposes into checkable parts: is the wording concrete or vague, does it control the output's shape and length, does it cover audience and context, does it contradict itself, does every word earn its tokens? Score a prompt against a baseline variant and the abstract question becomes eight specific ones. The loaded pair compares a typical mid-quality prompt against a strong one so you can calibrate what each score band looks like.

How to use this resource

Compare the calibration pair

Run the loaded example with Model Readiness focus. Note which dimensions separate the mid prompt from the strong one.
Score your own prompt

Paste your prompt as A and the strong example (or your own improved draft) as B to see where yours lands.
Read the gaps, not just the number

The Risks / Gaps list is the actionable part — each entry names a missing quality dimension in plain words.
Iterate and re-compare

Apply two or three suggestions, re-compare, and watch which dimensions move. That's the feedback loop.

Why This Works

Decomposed scores turn 'be a better prompt writer' into specific, fixable habits
Comparing against a reference prompt anchors the scores — a number only means something next to another number
The same eight dimensions apply to every prompt type, so the skill transfers across tasks

Best for

Anyone who wants a working definition of prompt quality, not folklore
Reviewing prompts before they enter a shared library
Diagnosing why a prompt underperforms by seeing which dimension drags it down

Not for

Grading a single prompt in isolation — the comparator needs a second prompt as the reference point
Output evaluation — this scores the instructions, not the model's answer

Use cases

Benchmarking your everyday prompt against a deliberately strengthened version
Calibrating what an 80+ output-control score actually looks like in practice
Building intuition for which dimension your prompts habitually neglect

FAQ

Can I score a single prompt on its own with this, or do I need two?

You need two — the comparator anchors scores against a reference, so "a number only means something next to another number." The notFor line is explicit that grading one prompt in isolation doesn't work. The loaded pair sets a mid-quality "improve my resume" prompt (Version A) against a scoped fintech-backend-engineer version (Version B); you paste yours as A and a strong draft as B.

Which part of the prompt comparator output should I act on?

The Risks / Gaps list, not just the headline number — each entry "names a missing quality dimension in plain words," like weak output control or thin completeness. The eight scored dimensions (clarity, specificity, structure, output control, completeness, risk, efficiency, readiness) show where a prompt drags; the gaps tell you what to add. Apply two or three, re-compare, and watch which dimensions move.

Customize This Resource

Opens both prompts in Prompt Comparator. Compare them to see scores, strengths, and which one is stronger.

Open in Prompt Comparator

Prompt A

Copy it as-is, or use Open in Prompt Comparator to load it pre-filled and customize it with your own context.

Act as a career coach. Help me improve my resume. Give detailed feedback and make it sound professional.

Prompt B

Act as a career coach reviewing a resume for a senior backend engineer applying to fintech roles.
Evaluate: impact statements (numbers over duties), keyword match for fintech, and length.
Output: a table of issues with severity, then the three highest-impact rewrites.
Do not rewrite the whole resume.

More resources from Prompt Comparator

Resource

Compare Two ChatGPT Prompts

A side-by-side way to decide between two ChatGPT prompt drafts — scored on clarity, specificity, output control, and risk instead of gut feeling.

Prompt Engineering

Resource

Which Prompt Is Better? A Decision Checklist

Seven questions that decide between two prompts — audience, format, length control, constraints, criteria, ambiguity, and contradictions.

Prompt Engineering

Resource

Compare Two Blog Writing Prompts

Two blog prompt variations for the same topic, compared: which one actually controls angle, audience, structure, and length?

Content

Resources that pair well

Resource

Prompt Cleanup Examples (Before & After)

A set of before-and-after examples showing exactly what prompt cleanup removes — and what it deliberately leaves alone.

Prompt Engineering

Resource

Agent Instruction Prompt Formatter

Formats fuzzy agent instructions into a structured prompt with objective, available tools, constraints, success criteria, and failure handling.

AI Agents

Resource

Bug Triage Assistant

Convert scattered bug notes, Slack messages, or user complaints into structured engineering tasks with reproduction steps, severity, and root cause hypothesis.

Engineering

Related tools

Tool

Prompt Comparator

Compare two prompts side by side — quality scores, strengths, risks, and a clear recommendation.

Prompt Builders

Tip: Save time by exploring related resources and tools that integrate with this resource.