AI Agents Evaluation Testing

Agent Test Scenario Prompt

Build the test set an agent has to pass — scenarios across the happy path, edges, and adversarial inputs, each paired with the expected behavior to grade against.

Overview

You can't tell whether an agent behaves until you've defined what behaving looks like across more than the demo. This prompt generates a test scenario set: normal cases, edge cases, ambiguous inputs, and adversarial attempts — each with the expected behavior (the right answer, the right refusal, the right escalation) so every run produces a pass/fail, not a vibe.

Why This Works

  • Expected behavior per scenario turns evaluation into pass/fail, not opinion
  • Adversarial and out-of-scope cases test what demos never do
  • Covering 'should ask, not guess' captures a behavior teams forget to test

Best for

  • Any agent heading toward production
  • Teams evaluating on a demo instead of a test set
  • Agents where wrong behavior is costly

Not for

  • Scoring the results — use the Agent Evaluation Scorecard
  • Building tests for deterministic code — use a unit test prompt

Use cases

  • Building the evaluation set for a new agent
  • Generating adversarial and edge-case tests
  • Defining expected behavior so runs can be graded

Tip: Save time by exploring related resources and tools that integrate with this workflow.

Explore all resources