Prompt Engineering

Extract Data From Text with AI

Free text in, named fields out. The extraction prompt pattern that turns any unstructured text into consistent, parseable records.

Open in Extraction Prompt Generator

Overview

Asking a model to "pull out the important information" produces different fields every run. Reliable extraction means naming the fields, describing what each one holds, and deciding two things up front: what happens when a value is missing, and how much inference is allowed. This resource loads a lead-form extraction — free text from a contact form into six named CRM fields, strict ambiguity, null for anything absent — the smallest complete example of the pattern that works on any text.

How to use this resource

Name the fields, not the wish

Replace "important details" with lead_name, email, use_case — fields the consumer can rely on existing.
Decide the missing-data behavior

Null keeps keys stable for pipelines; empty or "unknown" keeps gaps visible for humans. Decide once, in the prompt.
Set the ambiguity policy

Strict for data you'll act on, best guess for data you'll review. The default drift between the two is where inconsistency comes from.

Why This Works

Named fields with descriptions turn a vague request into a checkable contract
Explicit missing-data rules eliminate the model's biggest improvisation point
The example extraction shows the exact shape, so the model imitates instead of inventing

Best for

Any pipeline that feeds model output into code or a spreadsheet
Teams tired of extraction results that change shape between runs
Texts with no fixed layout — notes, messages, form dumps

Not for

Defining the output format in depth — that's the JSON Output Prompt Builder
Assigning labels from a fixed set — that's classification, not extraction

Use cases

Turning free-text form submissions into CRM-ready records
Getting the same six fields out of every text, every run
Replacing "summarize the key info" prompts with named-field extraction

FAQ

How does this extraction prompt handle a field like company or timeline that isn't in the text?

It sets that field to null and keeps the key. The MISSING DATA rule says "If a field is not found in the source, set it to null — do not omit the key," and "Never invent or guess a value." With the AMBIGUITY POLICY — extract only what's "explicitly stated" — the six-field shape stays stable every run. You run the prompt; downstream code consumes the JSON.

Why does team_size come out as a number instead of words?

An explicit EXTRACTION RULE handles it: "team_size: Extract as a digit, not words," and the valid-extraction example shows "team_size": 3 unquoted. That keeps the field parseable for a spreadsheet or CRM rather than yielding "three." The prompt instructs the model on this; whether your assistant honors it perfectly is something you verify — the AI Output Validator is the separate step for that.

Is this extraction prompt the same as classifying text into categories?

Extraction pulls named values that exist in the text into fields like use_case and email; classification assigns a label from a fixed set, which the notFor line calls out as a different job. Here the model reads free text and returns the six defined fields as one JSON object with no extra fields. If you need fixed-label tagging instead of value extraction, this preset isn't the pattern.

Customize This Resource

Opens this setup in Extraction Prompt Generator. Generate to get the full extraction prompt — then adjust the fields, missing-data behavior, and ambiguity policy.

Open in Extraction Prompt Generator

Prompt Template

Copy it as-is, or use Open in Extraction Prompt Generator to load it pre-filled and customize it with your own context.

TASK
Extract lead details from the submitted form text in the input for our CRM.

SOURCE
The input is unstructured free text.
Reading guidance: Read the entire text before extracting — relevant values may appear anywhere, in any order.

FIELDS TO EXTRACT
- lead_name (required): Lead's full name
- email (required): Lead's email address
- company (optional): Lead's company
- team_size (optional): Number of people on their team
- use_case (required): What they want to use the product for
- timeline (optional): When they plan to start

EXTRACTION RULES
- lead_name: The person's full name as written — do not expand initials or invent missing parts.
- email: A valid email address only — not the surrounding text.
- company: The organization's name as written, without surrounding commentary.
- team_size: Extract as a digit, not words.
- Extract values from the source text only — do not use outside knowledge to fill fields.
- Do not summarize or interpret the document — return only the fields listed above.

MISSING DATA
- If a field is not found in the source, set it to null — do not omit the key.
- Never invent or guess a value for a missing field.

AMBIGUITY POLICY
- Extract only information explicitly stated in the source text — do not infer values.
- If a value is implied but not stated, treat the field as missing.

OUTPUT FORMAT
Return the extracted data as a single valid JSON object using the field names below.

EXAMPLE OF A VALID EXTRACTION
{
  "lead_name": "Jane Smith",
  "email": "user@example.com",
  "company": "Acme Corp",
  "team_size": 3,
  "use_case": "Automate weekly reporting",
  "timeline": "Q3 2026"
}

OUTPUT RULES
- Return only the JSON object — no text before or after it.
- Do not wrap the output in markdown code fences.
- Use the field names exactly as defined — no renaming, no extra fields.

More resources from Extraction Prompt Generator

Resource

Extract Invoice Data with AI

Invoice number, vendor, dates, total, currency — extracted into clean fields with strict no-inference rules, ready for accounts payable.

Operations

Resource

Missing Data in AI Extraction — Null, Unknown, or Skip

The most consequential setting in any extraction prompt: what the model does when the field isn't in the text. Four behaviors, and when each is right.

Prompt Engineering

Resource

Extract Contract Information with AI

Parties, effective date, term, payment, termination notice, governing law — key terms into a contract register, with "unknown" marking every gap loudly.

Operations

Resources that pair well

Resource

Force JSON Output from AI

Stop getting 'Sure, here is the JSON…' — the output-contract pattern that forces models to return only parseable JSON: schema, example, and a strict rule block.

Prompt Engineering

Resource

Classify Support Tickets with AI

Billing, Technical, Account, How-To, Feature Request — ticket triage with definitions that decide the borderline cases for the model.

Support

Resource

Fix Invalid JSON from AI

The JSON won't parse and you can't see why. Deterministic cause-sniffing — trailing commas, single quotes, unclosed brackets — and the repair prompt that fixes it.

Engineering

Related tools

Tool

Extraction Prompt Generator

Build prompts that extract defined fields from unstructured text — emails, invoices, tickets, résumés.

Structured Output

Projects that use this resource

Project

Build an AI Document Processing System with AI

The full path to an AI document processing system — define the use case, design the intake pipeline, extract fields from unstructured documents, classify and route them, pin the output contract, evaluate accuracy, then ship it monitored.

7 stages AI Systems

Project

Build an AI Content Moderation System with AI

The full path to an AI content moderation system — define the policy and label taxonomy, extract signals from user content, classify it against policy, emit structured decisions, evaluate false positives and negatives, wire enforcement and review queues, review abuse risks, then ship.

8 stages AI Systems

Project

Build an AI Research Assistant with AI

The full path to an AI research assistant — define its scope, organize the source corpus, ground responses in references, extract key facts, synthesize findings, check groundedness, then validate it for use.

7 stages AI Systems

Project

Build a RAG System with AI

The full path to a retrieval system that returns grounded answers — understand the corpus, chunk and ground it, extract and classify the metadata, then evaluate that retrieval actually works.

5 stages AI Systems

Project

Build a Programmatic SEO Site with AI

The full path to pages that rank at scale, not penalty bait — map the intents, build the data set, structure it, template the page, then QA before publishing hundreds.

6 stages Content & SEO

Project

Build a Knowledge Base with AI

The full path to knowledge that's findable by people and AI — plan the taxonomy, structure it for search, write the articles, tag the metadata, make it retrievable, then ship it maintainable.

6 stages Knowledge Systems

Project

Build a Data Pipeline with AI

The full path to a pipeline that moves data without corrupting it — design the ingestion and transforms, extract and structure the sources, gate the quality, store it, then deliver and ship it monitored.

6 stages Data Systems

Workflows that use this resource

Workflow

AI Data Extraction Workflow

Turn messy text into structured data you can trust enough to feed another system — bound the source, extract the fields, force clean JSON, and validate before it flows downstream.

4 steps 25–45 minutes

Tip: Save time by exploring related resources and tools that integrate with this resource.