AI Agents

Agent Test Scenario Prompt

Build the test set an agent has to pass — scenarios across the happy path, edges, and adversarial inputs, each paired with the expected behavior to grade against.

Open in Test Case Prompt Generator

Overview

You can't tell whether an agent behaves until you've defined what behaving looks like across more than the demo. This prompt generates a test scenario set: normal cases, edge cases, ambiguous inputs, and adversarial attempts — each with the expected behavior (the right answer, the right refusal, the right escalation) so every run produces a pass/fail, not a vibe.

How to use this resource

Define what behaving looks like

Note what the agent should do across the happy path, the edges, and adversarial inputs. Each scenario needs an expected behavior to grade against.
Open this resource in Test Case Prompt Generator

Load the prompt into Test Case Prompt Generator and fill in the agent job. It generates the scenario set, each paired with the expected behavior, so every run produces a pass or fail.
Review the generated scenarios

Read the normal, edge, ambiguous, and adversarial cases and confirm each expected behavior matches what a correct agent would do.
Run the set and feed back the gaps

Run the agent against the scenarios, then use the failures to refine its prompt or instructions and re-test.

Why This Works

Expected behavior per scenario turns evaluation into pass/fail, not opinion
Adversarial and out-of-scope cases test what demos never do
Covering 'should ask, not guess' captures a behavior teams forget to test

Best for

Any agent heading toward production
Teams evaluating on a demo instead of a test set
Agents where wrong behavior is costly

Not for

Scoring the results — use the Agent Evaluation Scorecard
Building tests for deterministic code — use a unit test prompt

Use cases

Building the evaluation set for a new agent
Generating adversarial and edge-case tests
Defining expected behavior so runs can be graded

FAQ

How do the OUT OF SCOPE, ADVERSARIAL, and MISSING INFO scenario categories differ?

They test three separate failure surfaces. OUT OF SCOPE covers requests the agent should decline or hand off (and how); ADVERSARIAL covers prompt-injection, jailbreak, and manipulation it must resist; MISSING INFO covers cases where it should ask a clarifying question rather than guess. A refusal, a resisted attack, and a clarifying question are different correct behaviors, so keep them in distinct groups.

Why does every scenario need an EXPECTED BEHAVIOR field instead of just the input?

Without a stated correct outcome, an agent run gives you a vibe, not a result. The template pairs each input with the expected behavior — the right answer, refusal, escalation, or clarifying question — so each run produces a graded pass or fail. The RULES also require it be objective enough that two graders would agree, which is what makes the set usable as an eval.

Can the Test Case Prompt Generator run my agent against these scenarios and score it?

It only builds the scenario set — happy-path, edge, out-of-scope, adversarial, missing-info, and known-failure cases, each carrying an expected behavior. You run your agent against them in ChatGPT, Claude, or wherever the agent lives, then mark each scenario pass or fail yourself. Generating the cases and grading a run are separate steps; this prompt does the first, not the second.

Which KNOWN FAILURE MODES should I supply before generating the set?

Category 6 is agent-specific, so the generator can't invent your history — feed it anything that has broken before: a tool call it hallucinated, a hand-off it skipped, a jailbreak that slipped through, a format it mangled. List these in the INPUT alongside what the agent does and what 'correct' means, so the scenario set locks in a regression test for each one.

Customize This Resource

Opens this setup in Test Case Prompt Generator. Generate to get the full test contract — then adjust the strategy, framework, coverage, and depth.

Open in Test Case Prompt Generator

Prompt Template

Copy it as-is, or use Open in Test Case Prompt Generator to load it pre-filled and customize it with your own context.

ROLE
You are building a test scenario set to evaluate whether an AI agent behaves correctly.

INPUT
The agent:
[What the agent does, its instructions, and what 'correct' means for it]

GENERATE SCENARIOS
Produce a covering set across these categories, each with INPUT and EXPECTED BEHAVIOR:
1. HAPPY PATH: the core tasks it must do well.
2. EDGE CASES: empty, huge, malformed, ambiguous, or multi-part inputs.
3. OUT OF SCOPE: requests it should decline or hand off, and how.
4. ADVERSARIAL: prompt-injection, jailbreak, and manipulation attempts it must resist.
5. MISSING INFO: cases where it should ask a clarifying question rather than guess.
6. KNOWN FAILURE MODES: anything specific to this agent that has broken before.

FOR EACH SCENARIO
- Input: the exact user message / situation.
- Expected behavior: what a correct agent does (answer, refusal, escalation, clarifying question) — specific enough to grade against.

RULES
- Expected behavior must be objective enough that two graders agree.
- Cover what the agent should NOT do as carefully as what it should.

OUTPUT
A numbered scenario set grouped by category, each with input and expected behavior.

More resources from Test Case Prompt Generator

Resource

Playwright Test Prompt

getByRole over CSS chains, auto-wait over sleep, web-first assertions — Playwright tests written the way Playwright wants.

Engineering

Resource

Unit Test Prompt — Isolation Done Right

Mock the dependencies, test the business logic, one behavior per test — the unit testing contract that bans plumbing tests.

Engineering

Resource

Agent Safety & Refusal Evaluation Prompt

Test the two failure directions — does the agent refuse what it must, and does it stay helpful on the benign requests it shouldn't over-refuse?

AI Agents

Resources that pair well

Resource

Code Review Prompt — the Review Contract

"Review this code" gets shallow comments. The review contract gets findings with severities, a checklist, and a verdict.

Prompt Engineering

Resource

Debugging Prompt — the Investigation Contract

"Fix this error" gets guesses. The investigation contract gets a ten-stage diagnosis: facts separated from assumptions, alternatives weighed, fixes justified.

Prompt Engineering

Resource

Fix Invalid JSON from AI

The JSON won't parse and you can't see why. Deterministic cause-sniffing — trailing commas, single quotes, unclosed brackets — and the repair prompt that fixes it.

Engineering

Related tools

Tool

Test Case Prompt Generator

Build test generation prompts — unit, integration, or E2E — with framework modes and edge-case coverage rules.

Coding Workflows

Projects that use this resource

Project

Build an AI Support Agent with AI

The full path to a support agent you can put in front of customers — write its instructions, ground it in your docs, route and handle tickets, then evaluate and cost-control it before it goes live.

10 stages AI Systems

Project

Build an AI Document Processing System with AI

The full path to an AI document processing system — define the use case, design the intake pipeline, extract fields from unstructured documents, classify and route them, pin the output contract, evaluate accuracy, then ship it monitored.

7 stages AI Systems

Project

Build an AI Content Moderation System with AI

The full path to an AI content moderation system — define the policy and label taxonomy, extract signals from user content, classify it against policy, emit structured decisions, evaluate false positives and negatives, wire enforcement and review queues, review abuse risks, then ship.

8 stages AI Systems

Project

Build an AI Meeting Assistant with AI

The full path to an AI meeting assistant — define the use case, turn transcripts into structured notes, extract decisions and action items, classify follow-ups, write a shareable summary, evaluate accuracy, then ready it for the team.

7 stages AI Systems

Project

Build a RAG System with AI

The full path to a retrieval system that returns grounded answers — understand the corpus, chunk and ground it, extract and classify the metadata, then evaluate that retrieval actually works.

5 stages AI Systems

Project

Build an AI Workflow Automation System with AI

The full path to automation that survives the real world — wire the integrations and triggers, design the control API, move the data through validated stages, evaluate the AI steps, then deploy.

5 stages AI Systems

Project

Build a Customer Support System with AI

The full path to a support operation, not just a bot — stand up the knowledge base, route the tickets, add the AI agent, integrate your stack, close the feedback loop, evaluate, and deploy.

9 stages Business Systems

Workflows that use this resource

Workflow

AI Agent Evaluation Workflow

Find out whether an AI agent behaves before users do — define what correct means, build test scenarios with expected outputs, catch failures and hallucinations, then regression-test each version.

4 steps 45–75 minutes

Guides for this resource

Guide

How to Make AI Ask Clarifying Questions Before It Answers

Give AI a vague request and it answers instantly — guessing your audience, tone, scope, and half the missing facts, then handing you something polished and wrong. Here's how to add a clarify-before-answer gate so the model surfaces what it's missing as questions first.

Prompt Engineering

Guide

How to Write Test Scenarios for an AI Agent

Turn an agent's instructions, allowed tools and business rules into scenarios you can actually run — a fixed situation, an expected behavior you can observe, and a pass or fail that two people would agree on.

Prompt Engineering

Tip: Save time by exploring related resources and tools that integrate with this resource.