AI Agents Evaluation Failure Analysis

Agent Failure Analysis Prompt

Turn a failed case into a fix — diagnose where in the agent's flow it went wrong, categorize the failure, and point at the prompt, tool, or context that caused it.

Overview

An agent that fails a case isn't useful feedback until you know WHY it failed — bad instruction, wrong retrieval, a tool error, or a reasoning slip. This prompt analyzes a failure: it walks the agent's trace to find where it went off the rails, categorizes the failure type, identifies the likely root cause (prompt, context, tool, or model), and recommends where the fix belongs — so evaluation feeds improvement instead of just a red mark.

Why This Works

  • Finding the first failure point stops you fixing a downstream symptom
  • Attributing to a layer tells you where the fix actually belongs
  • Spotting a pattern turns one failure into a permanent test

Best for

  • Debugging agent failures during evaluation
  • Multi-step agents where failures cascade
  • Turning eval red marks into actionable fixes

Not for

  • Detecting that a failure occurred — use the Scorecard or Scenario prompts
  • Code-level debugging — use a debugging prompt

Use cases

  • Diagnosing why an agent failed a test case
  • Attributing a failure to prompt, retrieval, tool, or model
  • Deciding where a fix should go after a bad output

Tip: Save time by exploring related resources and tools that integrate with this workflow.

Explore all resources