On this page
TLDR
Structured outputs guarantee Claude returns JSON matching a schema instead of natural language. Force a tool with a JSON schema; the API enforces structure, not your parser. Schema design (nullable fields, 'unclear' enum values) prevents fabrication. schema.org + tools
What it is
Structured output is a deterministic contract between the model and your application: Claude returns data in a specific JSON shape, guaranteed to match a schema you define. Without structured output, Claude returns prose wrapped in whatever Markdown it picks: sometimes with explanatory text, sometimes with code blocks, sometimes with tangents. With structured output, you pass a JSON schema and Claude returns valid JSON matching it, or the request fails. Pattern: User → Prompt + Schema → Claude → Valid JSON.
The power of structured output is that it shifts the burden from output parsing (your problem) to token generation (Claude's problem). You don't extract fields from messy prose. You don't validate structure after the fact. Instead, you pass a schema (tool_use blocks with tool_choice: forced, or JSON-mode APIs) and Claude ensures conformance. Schema design is 80% of the work: bad schemas cause fabrication (e.g. inventing a refund reason when the contract is silent).
Three patterns exist: (1) tool_use blocks: pass {type: "tool", name: "extract_fields"} as a pseudo-tool, Claude must call it with your schema as input; (2) tool_choice forced: same, but with deterministic routing; (3) JSON-mode APIs (Anthropic Batch, some partners): Claude returns only JSON. Tool use is recommended, it's transparent, supports retries, and integrates naturally with agentic loops.
The validation-retry loop is the structural answer to schema violations and fabrication. When Claude returns invalid JSON or a fabricated value, you don't blame the model. You send the original document + the invalid extraction + the specific error back to Claude and ask it to correct. This loop typically runs 2-3 times, with each retry reducing errors by 70-80%. Without the loop, you silently accept garbage.
How it works
The tool_use + schema pattern uses Claude's native tool-calling. You define a tool with a JSON schema in input_schema: {type: "object", properties: {...}, required: [...]}. Claude sees the tool, understands the schema, and calls it with arguments matching it. The guarantee is structural: token generation is constrained so Claude can only output JSON matching the schema. If Claude tries invalid JSON, the model rejects it and re-generates.
tool_choice: forced adds determinism by removing the choice. Instead of asking "should I call this tool," you assert "you MUST." The request: {tool_choice: {type: "tool", name: "extract_fields"}}. Claude skips reasoning about whether to call and goes straight to execution. Saves a turn and guarantees the tool fires. Use forced for extraction pipelines where the tool is always the goal; use regular tool_use (with auto) when the model should decide.
The validation-retry loop structure: (1) Send user input + document + extraction schema. (2) Claude extracts and calls the tool with JSON. (3) Validate the JSON against the schema AND check for fabrication (required fields that are null, generic strings when the source is specific). (4) If valid: done. (5) If invalid: send back the original + failed extraction + specific error ("field 'refund_amount' should be null when document is silent, not '0.00'"). (6) Claude retries with feedback in context.
Fabrication happens when the schema doesn't protect against it. Example: {refund_reason: {type: "string"}} without a nullable field or enum. Claude sees a required string and fills it with something plausible, even if the contract never states a reason. The fix: {refund_reason: {type: ["string", "null"]}, confidence: {type: "number"}} and instruct Claude: "set null if silent." Nullable fields + escape hatches (enum with "unclear") are the schema anti-fabrication pattern.

Where you'll see it
Healthcare intake form extraction
Patient intake notes flow through tool_use with a JSON schema (patient_id, dob, complaint, meds[], allergies[]). Validation-retry: invalid date format → Claude sees the error → fixes input → resubmits. Error rate drops from 8% (prompt-only) to 0.3% with the schema + retry.
Refund authorization with policy hooks
Customer service agent extracts refund_amount, customer_id, reason via structured output. PostToolUse hook validates: amount within policy? customer eligible? Hook blocks illegal refunds before execution. Compliance reaches 100%; prompt-only achieves ~70%.
Research claim extraction with provenance
Literature review returns JSON: [{claim, evidence_tier, sources: [{title, url, date}]}]. Schema requires every claim to have ≥1 source. PreToolUse checks tier label; PostToolUse deduplicates. Output is deterministic and audit-ready for clinicians.
Code examples
from anthropic import Anthropic
import json
client = Anthropic()
extract_intake = {
"name": "save_patient_intake",
"description": "Save structured patient intake data with field validation.",
"input_schema": {
"type": "object",
"properties": {
"patient_id": {"type": "string"},
"dob": {"type": "string", "format": "date"},
"complaint": {"type": "string"},
"medications": {"type": "array", "items": {"type": "string"}},
"allergies": {"type": "array", "items": {"type": "string"}},
"balance": {"type": ["number", "null"]}, # nullable prevents fabrication
},
"required": ["patient_id", "dob", "complaint"],
},
}
def extract(intake_text: str) -> dict:
resp = client.messages.create(
model="claude-opus-4-5",
max_tokens=1024,
tools=[extract_intake],
# Force the tool, guarantees structured JSON, not free text
tool_choice={"type": "tool", "name": "save_patient_intake"},
messages=[{"role": "user", "content": f"Extract:\n{intake_text}"}],
)
if resp.stop_reason != "tool_use":
raise ValueError("expected tool_use, got " + resp.stop_reason)
tool_block = next(b for b in resp.content if b.type == "tool_use")
return tool_block.input # already structured, no parsing needed
Looks right, isn't
Each row pairs a plausible-looking pattern with the failure it actually creates. These are the shapes exam distractors are built from.
Just write 'output JSON' in the prompt, Claude is good enough.
Prompt-only JSON output is ~85% reliable. The model occasionally inserts narrative ('Sure, here is the JSON: ...') or omits fields. Forced tool_use with a schema is 100% structured.
Forced tool_use guarantees the output is correct.
Forced tool_use guarantees STRUCTURE, not CONTENT correctness. The model can still emit semantically wrong values. Add validation-retry for semantic checks.
Use non-nullable fields so missing data is treated as an error.
Non-nullable forces the model to fabricate when data is missing. Mark optional fields as nullable ([type, 'null']); add an 'unclear' enum value. Honest non-answers beat invented ones.
Anthropic supports OpenAI-style response_format: { type: "json_object" }.
Anthropic does NOT support `response_format`. The Anthropic-native pattern is forced tool_use with a JSON Schema. Sending response_format in an Anthropic request is silently ignored; sending it via a translation proxy (e.g. OpenRouter) might work but you've lost schema enforcement because tool_choice isn't applied. Stay on tool_use.
If the schema is correct, the output will always be valid JSON, no need to wrap in try/parse.
Forced tool_use guarantees the structure matches the schema, but edge cases still slip through: a number field with a value of Infinity, a string field with a 50KB blob exceeding your downstream limit, a date string that's syntactically valid but semantically 9999-12-31. Always validate after parsing, the schema is a filter, not a fortress.
Side-by-side
| Approach | Reliability | Audit | Best for |
|---|---|---|---|
| Prompt-only ('output JSON') | ~85% | Weak (free text) | Casual extraction; non-critical |
| tool_use + schema (forced) | ~99% structure | Good (logged input) | Most extraction tasks |
| + validation-retry | ~99%+ structure & semantics | Excellent (retry trail) | Healthcare, finance, legal |
| + PostToolUse policy hook | 100% policy compliance | Audit-grade (pre-exec gate) | Compliance-critical workflows |
| + Batch API for bulk | Same reliability, 50% cost | Same as forced tool_use | 1K+ documents in async pipeline |
| + Prompt caching of schema | Same reliability, ~90% input savings on cached prefix | Same as forced | High-throughput repeated extraction |
Decision tree
Do you need guaranteed JSON structure?
Do extraction errors have business consequences?
Is this a compliance-critical workflow (refund, medical, legal)?
Are you extracting from 1K+ documents in a non-interactive pipeline?
Will the same schema be reused across many requests in a session?
cache_control: {type: "ephemeral"} to the tools array. Cuts input cost ~90% for the cached schema on each subsequent extraction.Question patterns

101 V2 questions wired to this concept. Tap an answer to check it instantly — you'll see whether it's right and why — then expand the full breakdown for the mental model and all four rationales.
Tap your answer to check it.
Tap your answer to check it.
Tap your answer to check it.
Tap your answer to check it.
Tap your answer to check it.
Tap your answer to check it.
95 additional questions for this concept live in the practice pillar. Take a mock exam ↗
Frequently asked
How does Anthropic's structured output compare to OpenAI's `response_format`?
tool_use block, not a top-level field). Don't try to send OpenAI's `response_format` to Anthropic, it's silently ignored.Can I nest objects deeply in the schema?
{patient_id, dob, complaint} over {patient: {info: {id, dob}, visit: {complaint}}}.What does the model do when a required field can't be extracted from the source?
nullable types or an enum: ["unclear", "not_provided"] value, it returns the honest non-answer. Schema design is the anti-fabrication lever, prompt instructions alone are unreliable.How do I extract an unknown number of items (a variable-length list)?
{type: "array", items: {...}} with no minItems/maxItems constraints. The model decides count based on the source. If you need a hard cap, set maxItems so the model stops emitting at the limit instead of hallucinating to fill space.Does prompt caching apply to the JSON Schema?
tools array (including schemas) can be marked with cache_control: {type: "ephemeral"}. ~90% input cost savings on the cached prefix for subsequent calls within 5 minutes. Standard pattern for high-volume extraction pipelines.What's the cost difference between prompt-only JSON and forced tool_use?
Can I combine extended thinking with structured outputs?
"any". The model thinks, then chooses the extraction tool autonomously. Reliability is slightly lower (~95% vs ~99%) but reasoning is much stronger for complex documents.How do I handle a field that should be a number OR a string OR a list?
type enum field gating which value field is populated; (2) oneOf schema, but Claude's schema support for oneOf is limited and can confuse the model. Discriminated union is more reliable.What happens if I send a malformed JSON Schema in the tool definition?
type value misspelled ("strring"), required referencing a property not in properties, circular $ref (not supported anyway). Validate schemas with a JSON Schema linter before deploying.How do I audit-trail a structured extraction in a regulated environment?
tool_use.input (pre-validation), validation errors (if any), final accepted output. The Anthropic message_id is your correlation key. Combined with a PostToolUse policy hook log, this is sufficient for SOC 2 / HIPAA audit.Work this with your AI
Work this concept hands-on with Claude Code, Codex, or claude.ai. Copy a prompt, paste it into your assistant, and practise in tandem. Each one keeps you active (explain it back, get drilled, or build) rather than just reading.
- Drill it like the exam (scenario MCQs)Practice in the exam's scenario-MCQ format with trap awareness.
- Explain it back (Feynman)Build durable, transferable understanding of a concept you can half-state.
- Test me, adapting the difficultyActive recall practice on a concept you think you know.
- Check my prerequisites firstBefore studying a concept that keeps not sticking.
- Find the high-leverage 20%When a domain feels too big and you are short on time.
