Setup loaded. Click Generate Extraction Prompt.

Structured Output

Extraction Prompt Generator

Define what the AI should pull out of a text — invoices, emails, résumés, tickets, contracts — and get an extraction prompt with field definitions, name-aware extraction rules, missing-data behavior, and an ambiguity policy. Runs entirely in your browser.

What should be pulled out of the text, and for what purpose? E.g. "Extract customer information from support emails."

Source Type

Changes the reading guidance in the prompt and the suggested fields below.

Output Format

For full contract control — strictness, schema types — use the JSON Output Prompt Builder.

Missing Data Behavior

What the model should do when a field isn't in the text — the #1 source of inconsistent extractions.

Ambiguity Handling

Strict extracts only what's explicitly stated; Best Guess allows text-grounded inferences.

Fields to Extract *

Each field names a piece of information, not a data type — the description tells the model what to look for. Reorder with ↑ ↓.

Suggested for General Text
Extraction Rules Preview (live — derived from your field names)

                
            

AI Resource Library

Resources for this tool

View All Resources →

Workflow Playbooks

Playbooks that use this tool

All Playbooks →
Structured Output Workflows · 4 steps

AI Data Extraction Workflow

Turn messy text into structured data you can trust enough to feed another system — bound the source, extract the fields, force clean JSON, and validate before it flows downstream.

View Playbook →
Structured Output Workflows · 3 steps

AI Classification Workflow

Build a text classification step you can automate on — pull out the unit to classify, assign a label from a fixed set, and validate the label is one you actually allow.

View Playbook →
Documentation Workflows · 3 steps

AI Meeting Notes Workflow

Turn a meeting transcript into notes people actually use — a faithful summary, the action items pulled out and assigned, and a clean shareable format.

View Playbook →
Operations Workflows · 4 steps

AI Customer Support Workflow

Run inbound support the same way every time — triage and route the ticket, pull the details that matter, draft a reply in a consistent voice, and log the resolution for the record.

View Playbook →
Operations Workflows · 3 steps

AI Hiring Workflow

Run hiring the same way for every role — build a reusable job-description template, lay out a consistent screening sequence, and extract structured data from resumes instead of eyeballing each one.

View Playbook →

How it works

Describe the extraction goal, pick the source type — email, invoice, résumé, support ticket, contract, meeting notes, product review, or general text — and define the fields to extract: a name, a required flag, and a description of what information the field holds. The source type changes the prompt's reading guidance and suggests fields you can add with one click. The engine derives name-aware extraction rules automatically — an email field gets "valid email address only", an amount field gets "numeric value only", a date field gets ISO formatting — and you see them live before generating. Choose what happens to missing data (empty, null, unknown, or skip) and how the model should treat ambiguity (strict, conservative, or best guess). Click Generate Extraction Prompt for the full prompt: source guidance, field definitions, extraction rules, missing-data behavior, ambiguity policy, and an example extraction in JSON, YAML, XML, or CSV. Nothing leaves your browser.

Use cases

  • Pulling invoice numbers, totals, and due dates out of emailed invoices
  • Turning résumés into screening records with consistent fields
  • Extracting ticket metadata from free-text customer messages
  • Mining contracts, meeting notes, and reviews for the fields that matter

Pro tips

  • Missing-data behavior is the most consequential setting on the page. Pipelines want Return Null (stable keys); spreadsheets want Leave Empty or Return Unknown (visible gaps); Skip Field only suits consumers that tolerate absent keys.
  • Field descriptions do the extraction work: "Decisions actually made — not topics discussed" filters better than any rule. Write descriptions that say what does NOT belong in the field.
  • Use Strict ambiguity when wrong data is worse than no data (finance, legal), Best Guess when a blank field is worse than an imperfect one (lead enrichment).
  • Name fields the way the engine can help: total_amount gets a numeric-only rule, issue_date gets ISO formatting, skills gets list handling. The rules preview shows what each name earns before you generate.

FAQ

How is this different from the JSON Output Prompt Builder?

The JSON tool defines the output's structure — the contract any task returns its data in, with types and strictness levels. This tool defines what to extract: which information to pull from a text, how to read the source, what to do when a value is missing or ambiguous. The output format here is deliberately light; when you need full contract control, generate the extraction here and tighten the format there.

Why don't extraction fields have a type?

Because an extraction field names a piece of information, not a data shape. "total_amount" means "find the grand total in this text" — whether it serializes as a number is a formatting concern. The engine still infers sensible example values (numbers for amounts, lists for skills, true/false for reply_needed), but the field definition stays about meaning.

What does the missing-data setting actually change?

The MISSING DATA block of the prompt. Leave Empty returns an empty string, Return Null keeps the key with null, Return Unknown writes the literal "unknown", Skip Field omits the key. The engine adapts each to the output format honestly — CSV has no null, so cells stay empty; CSV columns can't be skipped, so the contract instructs empty cells instead.

Is classifying text the same as extracting from it?

No, and the boundary matters: pulling a value out of the text (a name, a total, a date) is extraction — this tool. Choosing a label from a closed set you define (spam/not-spam, positive/negative) is classification — that's the Data Classification Prompt in this category. A severity field that copies the customer's own words is extraction; deciding the severity yourself is classification.

What are the extraction rules the preview shows?

Name-aware rules the engine derives per field: email fields get "valid address only", phone fields get normalization, dates get ISO format, amounts get "numeric value only", identifiers get "exactly as written", list-like fields (skills, action_items, pros) get one-entry-per-item handling. Fields without a matching pattern rely on their descriptions — the preview tells you which.

Why does the source type matter?

Because reading an invoice is not reading meeting notes. The source type adds reading guidance to the prompt — "values follow printed labels", "prefer the newest message over quoted history", "action items may be phrased as commitments" — and suggests the fields that source usually yields. It's the difference between a generic scraper and a prompt that knows what it's looking at.