Missing Data in AI Extraction — Null, Unknown, or Skip
The most consequential setting in any extraction prompt: what the model does when the field isn't in the text. Four behaviors, and when each is right.
View Resource →Structured Output
Define what the AI should pull out of a text — invoices, emails, résumés, tickets, contracts — and get an extraction prompt with field definitions, name-aware extraction rules, missing-data behavior, and an ambiguity policy. Runs entirely in your browser.
What should be pulled out of the text, and for what purpose? E.g. "Extract customer information from support emails."
Each field names a piece of information, not a data type — the description tells the model what to look for. Reorder with ↑ ↓.
⚠ Skip Field + CSV would shift columns — the contract will instruct empty cells instead.
The most consequential setting in any extraction prompt: what the model does when the field isn't in the text. Four behaviors, and when each is right.
View Resource →Parties, effective date, term, payment, termination notice, governing law — key terms into a contract register, with "unknown" marking every gap loudly.
View Resource →Free text in, named fields out. The extraction prompt pattern that turns any unstructured text into consistent, parseable records.
View Resource →Sender, company, request, deadline — out of emails with quoted replies and signature blocks, using guidance that knows how email is actually read.
View Resource →Invoice number, vendor, dates, total, currency — extracted into clean fields with strict no-inference rules, ready for accounts payable.
View Resource →Decisions actually made, commitments actually given — extracted from fragmentary meeting notes that never label their action items.
View Resource →Pros, cons, feature requests, rating — review text into feedback-board fields, with experienced-vs-wished kept strictly apart.
View Resource →Candidate name, current role, years, skills, education — résumés into consistent screening records, with inference kept on a short leash.
View Resource →Product, issue summary, stated severity, steps already tried — ticket fields extracted from free-text customer messages, without the model's own judgment leaking in.
View Resource →The six sections a reliable extraction prompt needs: source guidance, field definitions, extraction rules, missing-data behavior, ambiguity policy, example.
View Resource →Turn messy text into structured data you can trust enough to feed another system — bound the source, extract the fields, force clean JSON, and validate before it flows downstream.
View Playbook →Build a text classification step you can automate on — pull out the unit to classify, assign a label from a fixed set, and validate the label is one you actually allow.
View Playbook →Turn a meeting transcript into notes people actually use — a faithful summary, the action items pulled out and assigned, and a clean shareable format.
View Playbook →Turn a pile of reviews, surveys, or support comments into themes and priorities — extract the real signal, classify it by theme and sentiment, then summarize what's worth acting on.
View Playbook →Run inbound support the same way every time — triage and route the ticket, pull the details that matter, draft a reply in a consistent voice, and log the resolution for the record.
View Playbook →Run hiring the same way for every role — build a reusable job-description template, lay out a consistent screening sequence, and extract structured data from resumes instead of eyeballing each one.
View Playbook →Describe the extraction goal, pick the source type — email, invoice, résumé, support ticket, contract, meeting notes, product review, or general text — and define the fields to extract: a name, a required flag, and a description of what information the field holds. The source type changes the prompt's reading guidance and suggests fields you can add with one click. The engine derives name-aware extraction rules automatically — an email field gets "valid email address only", an amount field gets "numeric value only", a date field gets ISO formatting — and you see them live before generating. Choose what happens to missing data (empty, null, unknown, or skip) and how the model should treat ambiguity (strict, conservative, or best guess). Click Generate Extraction Prompt for the full prompt: source guidance, field definitions, extraction rules, missing-data behavior, ambiguity policy, and an example extraction in JSON, YAML, XML, or CSV. Nothing leaves your browser.
The JSON tool defines the output's structure — the contract any task returns its data in, with types and strictness levels. This tool defines what to extract: which information to pull from a text, how to read the source, what to do when a value is missing or ambiguous. The output format here is deliberately light; when you need full contract control, generate the extraction here and tighten the format there.
Because an extraction field names a piece of information, not a data shape. "total_amount" means "find the grand total in this text" — whether it serializes as a number is a formatting concern. The engine still infers sensible example values (numbers for amounts, lists for skills, true/false for reply_needed), but the field definition stays about meaning.
The MISSING DATA block of the prompt. Leave Empty returns an empty string, Return Null keeps the key with null, Return Unknown writes the literal "unknown", Skip Field omits the key. The engine adapts each to the output format honestly — CSV has no null, so cells stay empty; CSV columns can't be skipped, so the contract instructs empty cells instead.
No, and the boundary matters: pulling a value out of the text (a name, a total, a date) is extraction — this tool. Choosing a label from a closed set you define (spam/not-spam, positive/negative) is classification — that's the Data Classification Prompt in this category. A severity field that copies the customer's own words is extraction; deciding the severity yourself is classification.
Name-aware rules the engine derives per field: email fields get "valid address only", phone fields get normalization, dates get ISO format, amounts get "numeric value only", identifiers get "exactly as written", list-like fields (skills, action_items, pros) get one-entry-per-item handling. Fields without a matching pattern rely on their descriptions — the preview tells you which.
Because reading an invoice is not reading meeting notes. The source type adds reading guidance to the prompt — "values follow printed labels", "prefer the newest message over quoted history", "action items may be phrased as commitments" — and suggests the fields that source usually yields. It's the difference between a generic scraper and a prompt that knows what it's looking at.