Prompt Engineering

Missing Data in AI Extraction — Null, Unknown, or Skip

The most consequential setting in any extraction prompt: what the model does when the field isn't in the text. Four behaviors, and when each is right.

Open in Extraction Prompt Generator

Overview

Every extraction eventually meets a text that doesn't contain the field — and what happens next is the difference between a reliable pipeline and silent garbage. Four contracts exist: Leave Empty (visible gap, human-friendly), Return Null (stable keys, pipeline-friendly), Return Unknown (loud gap a reviewer can't miss), and Skip Field (lean output, consumers must check key existence). This resource loads a sparse-by-nature lead-form extraction set to Return Unknown with strict ambiguity — the configuration where missing data is most visible — so you can see the contract language and swap behaviors to compare.

How to use this resource

Match the behavior to the consumer

Code parsing the output wants Return Null; a human scanning a sheet wants Leave Empty or Return Unknown; Skip Field needs existence-checking consumers.
Swap behaviors and diff the prompt

Load this setup, change Missing Data, and watch the MISSING DATA block rewrite itself — the contract is two lines, and they do all the work.
Keep the never-invent rule

Whatever behavior you choose, the constant is "never invent or guess a value for a missing field" — the line every behavior shares.

Why This Works

Naming the behavior in the prompt removes the model's single biggest improvisation point
"Unknown" exploits human attention: a literal string in a data column gets investigated
The never-invent rule survives every behavior choice — absence stays absence

Best for

Sparse sources — forms, short messages, partial documents
Teams debugging "where did this value come from?" incidents
Pipelines whose consumers disagree about null vs missing keys

Not for

Ambiguity handling — that's the neighboring policy; missing means absent, ambiguous means unclear
Format-level null semantics (JSON null vs empty XML element) — the engine adapts those automatically

Use cases

Choosing the right missing-data behavior before a pipeline ships
Making data gaps visible to human reviewers with "unknown"
Stopping models from inventing values for absent fields

FAQ

How do I decide between null, unknown, empty, or skip for missing extraction fields?

Match the behavior to who consumes the output. The workflow maps it: code parsing the result wants Return Null for stable keys, a human scanning a sheet wants Leave Empty or Return Unknown, and Skip Field needs consumers that check key existence. This loaded set uses Return Unknown — a literal "unknown" string a reviewer can't miss — so you can see the contract, then swap behaviors to compare.

What stops the model from guessing a value when a field isn't in the text?

Two reinforcing rules. The MISSING DATA block says to return the literal "unknown" only for genuinely absent information and never invent or guess a value, and the AMBIGUITY POLICY adds that a value implied but not stated is treated as missing. That never-invent line is the constant across all four behaviors, so absence stays absence whichever contract you pick.

If I swap the missing-data behavior, what actually changes in the prompt?

Only the MISSING DATA block rewrites — the workflow calls it a two-line contract that does all the work. The extraction-prompt-generator regenerates those lines when you change the setting, while FIELDS TO EXTRACT, EXTRACTION RULES, and the required/optional split (lead_name, email, use_case required) stay put. You still run the resulting prompt in your own assistant.

Customize This Resource

Opens this setup in Extraction Prompt Generator. Generate to get the full extraction prompt — then adjust the fields, missing-data behavior, and ambiguity policy.

Open in Extraction Prompt Generator

Prompt Template

Copy it as-is, or use Open in Extraction Prompt Generator to load it pre-filled and customize it with your own context.

TASK
Extract lead details from the submitted form text in the input for our CRM.

SOURCE
The input is unstructured free text.
Reading guidance: Read the entire text before extracting — relevant values may appear anywhere, in any order.

FIELDS TO EXTRACT
- lead_name (required): Lead's full name
- email (required): Lead's email address
- company (optional): Lead's company
- team_size (optional): Number of people on their team
- use_case (required): What they want to use the product for
- timeline (optional): When they plan to start

EXTRACTION RULES
- lead_name: The person's full name as written — do not expand initials or invent missing parts.
- email: A valid email address only — not the surrounding text.
- company: The organization's name as written, without surrounding commentary.
- team_size: Extract as a digit, not words.
- Extract values from the source text only — do not use outside knowledge to fill fields.
- Do not summarize or interpret the document — return only the fields listed above.

MISSING DATA
- If a field is not found in the source, return the literal string "unknown".
- Use "unknown" only for genuinely absent information — never as a shortcut for skimming the text.
- Never invent or guess a value for a missing field.

AMBIGUITY POLICY
- Extract only information explicitly stated in the source text — do not infer values.
- If a value is implied but not stated, treat the field as missing.

OUTPUT FORMAT
Return the extracted data as a single valid JSON object using the field names below.

EXAMPLE OF A VALID EXTRACTION
{
  "lead_name": "Jane Smith",
  "email": "user@example.com",
  "company": "Acme Corp",
  "team_size": 3,
  "use_case": "Automate weekly reporting",
  "timeline": "Q3 2026"
}

OUTPUT RULES
- Return only the JSON object — no text before or after it.
- Do not wrap the output in markdown code fences.
- Use the field names exactly as defined — no renaming, no extra fields.

More resources from Extraction Prompt Generator

Resource

Extract Data From Text with AI

Free text in, named fields out. The extraction prompt pattern that turns any unstructured text into consistent, parseable records.

Prompt Engineering

Resource

Extract Invoice Data with AI

Invoice number, vendor, dates, total, currency — extracted into clean fields with strict no-inference rules, ready for accounts payable.

Operations

Resource

Extract Contract Information with AI

Parties, effective date, term, payment, termination notice, governing law — key terms into a contract register, with "unknown" marking every gap loudly.

Operations

Resources that pair well

Resource

Force JSON Output from AI

Stop getting 'Sure, here is the JSON…' — the output-contract pattern that forces models to return only parseable JSON: schema, example, and a strict rule block.

Prompt Engineering

Resource

Classify Support Tickets with AI

Billing, Technical, Account, How-To, Feature Request — ticket triage with definitions that decide the borderline cases for the model.

Support

Resource

Fix Invalid JSON from AI

The JSON won't parse and you can't see why. Deterministic cause-sniffing — trailing commas, single quotes, unclosed brackets — and the repair prompt that fixes it.

Engineering

Related tools

Tool

Extraction Prompt Generator

Build prompts that extract defined fields from unstructured text — emails, invoices, tickets, résumés.

Structured Output

Guides for this resource

Guide

How to Stop AI From Inventing Missing Data

When a source is missing a field, AI tends to fill the gap with a plausible guess instead of saying it isn't there. Here's how to make the model mark missing data explicitly — and check the result before you trust it.

Structured Outputs & JSON

Guide

Turn a Meeting Transcript Into Decisions and Actions

"Summarize this transcript" turns suggestions into decisions, invents deadlines, and drops owners. Here's how to extract decisions, action items, owners, deadlines, and open questions instead — each with evidence and a needs_review flag, and "not stated" for what's missing.

Structured Outputs & JSON

Guide

Design an Extraction Schema Before You Extract

AI extraction returns clean JSON that quietly means something different each run. Design the extraction schema first — each field's meaning, type, allowed values, missing-value rule, and required evidence — and review it before a single record is pulled.

Structured Outputs & JSON

Guide

Handle Unknown vs Empty vs Not-Applicable in AI Output

Ask AI to extract fields with "use null if missing" and one null means missing, false, a stated "none", and a real zero — valid JSON, wrong meaning. Handle unknown vs empty vs not-applicable in AI output: give each field a value state so absence gets classified, not collapsed.

Structured Outputs & JSON

View all 6 guides

Tip: Save time by exploring related resources and tools that integrate with this resource.