Conversation Quality Evaluation Prompt
Judge the whole conversation, not one reply — evaluate a multi-turn exchange for context retention, coherence, goal completion, and recovery from misunderstanding.
Overview
A chat agent can give good individual replies and still fail the conversation — losing context across turns, contradicting itself, or never actually resolving the user's goal. This prompt evaluates the multi-turn exchange as a whole: does it hold context, stay coherent turn to turn, recover when it misunderstands, and reach the user's goal — the qualities single-reply scoring can't see.
Why This Works
- Session-level qualities (retention, recovery) are invisible to single-reply scoring
- Goal completion measures what users actually care about, not reply polish
- Counting turns-to-resolution catches the agent that gets there inefficiently
Best for
- Conversational agents and chatbots
- Support and assistant agents judged on whole sessions
- Teams scoring only single replies and missing session-level failures
Not for
- Single-turn output scoring — use the Agent Evaluation Scorecard
- Generating conversation test cases — use the Agent Test Scenario Prompt
Use cases
- Evaluating a multi-turn chatbot or support agent
- Catching context loss and contradictions across turns
- Measuring whether conversations actually resolve the goal