D4.4 · Domain 4 · Prompt Engineering · 20% of CCA-F

Structured Outputs.

9 min read·10 sections·Tier A

Structured outputs guarantee Claude returns JSON matching a schema instead of natural language. Force a tool with a JSON schema; the API enforces structure, not your parser. Schema design (nullable fields, 'unclear' enum values) prevents fabrication. schema.org + tools

Anti-fabrication patternDomain 4Tool-use enforcement
Structured Outputs, hero illustration featuring Loop mascot in a warm gallery scene.
Domain D4Prompt Engineering · 20%
On this page
01 · Summary

TLDR

Structured outputs guarantee Claude returns JSON matching a schema instead of natural language. Force a tool with a JSON schema; the API enforces structure, not your parser. Schema design (nullable fields, 'unclear' enum values) prevents fabrication. schema.org + tools

3
Anti-fabrication patterns
tool_use
Enforcement
2
Schema elements
D4
Exam domain
100%
Guarantee
02 · Definition

What it is

Structured output is a deterministic contract between the model and your application: Claude returns data in a specific JSON shape, guaranteed to match a schema you define. Without structured output, Claude returns prose wrapped in whatever Markdown it picks: sometimes with explanatory text, sometimes with code blocks, sometimes with tangents. With structured output, you pass a JSON schema and Claude returns valid JSON matching it, or the request fails. Pattern: User → Prompt + Schema → Claude → Valid JSON.

The power of structured output is that it shifts the burden from output parsing (your problem) to token generation (Claude's problem). You don't extract fields from messy prose. You don't validate structure after the fact. Instead, you pass a schema (tool_use blocks with tool_choice: forced, or JSON-mode APIs) and Claude ensures conformance. Schema design is 80% of the work: bad schemas cause fabrication (e.g. inventing a refund reason when the contract is silent).

Three patterns exist: (1) tool_use blocks: pass {type: "tool", name: "extract_fields"} as a pseudo-tool, Claude must call it with your schema as input; (2) tool_choice forced: same, but with deterministic routing; (3) JSON-mode APIs (Anthropic Batch, some partners): Claude returns only JSON. Tool use is recommended, it's transparent, supports retries, and integrates naturally with agentic loops.

The validation-retry loop is the structural answer to schema violations and fabrication. When Claude returns invalid JSON or a fabricated value, you don't blame the model. You send the original document + the invalid extraction + the specific error back to Claude and ask it to correct. This loop typically runs 2-3 times, with each retry reducing errors by 70-80%. Without the loop, you silently accept garbage.

03 · Mechanics

How it works

The tool_use + schema pattern uses Claude's native tool-calling. You define a tool with a JSON schema in input_schema: {type: "object", properties: {...}, required: [...]}. Claude sees the tool, understands the schema, and calls it with arguments matching it. The guarantee is structural: token generation is constrained so Claude can only output JSON matching the schema. If Claude tries invalid JSON, the model rejects it and re-generates.

tool_choice: forced adds determinism by removing the choice. Instead of asking "should I call this tool," you assert "you MUST." The request: {tool_choice: {type: "tool", name: "extract_fields"}}. Claude skips reasoning about whether to call and goes straight to execution. Saves a turn and guarantees the tool fires. Use forced for extraction pipelines where the tool is always the goal; use regular tool_use (with auto) when the model should decide.

The validation-retry loop structure: (1) Send user input + document + extraction schema. (2) Claude extracts and calls the tool with JSON. (3) Validate the JSON against the schema AND check for fabrication (required fields that are null, generic strings when the source is specific). (4) If valid: done. (5) If invalid: send back the original + failed extraction + specific error ("field 'refund_amount' should be null when document is silent, not '0.00'"). (6) Claude retries with feedback in context.

Fabrication happens when the schema doesn't protect against it. Example: {refund_reason: {type: "string"}} without a nullable field or enum. Claude sees a required string and fills it with something plausible, even if the contract never states a reason. The fix: {refund_reason: {type: ["string", "null"]}, confidence: {type: "number"}} and instruct Claude: "set null if silent." Nullable fields + escape hatches (enum with "unclear") are the schema anti-fabrication pattern.

Structured Outputs mechanics, painterly diagram featuring Loop mascot.
04 · In production

Where you'll see it

Healthcare intake form extraction

Patient intake notes flow through tool_use with a JSON schema (patient_id, dob, complaint, meds[], allergies[]). Validation-retry: invalid date format → Claude sees the error → fixes input → resubmits. Error rate drops from 8% (prompt-only) to 0.3% with the schema + retry.

Refund authorization with policy hooks

Customer service agent extracts refund_amount, customer_id, reason via structured output. PostToolUse hook validates: amount within policy? customer eligible? Hook blocks illegal refunds before execution. Compliance reaches 100%; prompt-only achieves ~70%.

Research claim extraction with provenance

Literature review returns JSON: [{claim, evidence_tier, sources: [{title, url, date}]}]. Schema requires every claim to have ≥1 source. PreToolUse checks tier label; PostToolUse deduplicates. Output is deterministic and audit-ready for clinicians.

05 · Implementation

Code examples

tool_use as a JSON-schema enforcer
from anthropic import Anthropic
import json

client = Anthropic()

extract_intake = {
    "name": "save_patient_intake",
    "description": "Save structured patient intake data with field validation.",
    "input_schema": {
        "type": "object",
        "properties": {
            "patient_id": {"type": "string"},
            "dob": {"type": "string", "format": "date"},
            "complaint": {"type": "string"},
            "medications": {"type": "array", "items": {"type": "string"}},
            "allergies": {"type": "array", "items": {"type": "string"}},
            "balance": {"type": ["number", "null"]},  # nullable prevents fabrication
        },
        "required": ["patient_id", "dob", "complaint"],
    },
}

def extract(intake_text: str) -> dict:
    resp = client.messages.create(
        model="claude-opus-4-5",
        max_tokens=1024,
        tools=[extract_intake],
        # Force the tool, guarantees structured JSON, not free text
        tool_choice={"type": "tool", "name": "save_patient_intake"},
        messages=[{"role": "user", "content": f"Extract:\n{intake_text}"}],
    )

    if resp.stop_reason != "tool_use":
        raise ValueError("expected tool_use, got " + resp.stop_reason)

    tool_block = next(b for b in resp.content if b.type == "tool_use")
    return tool_block.input  # already structured, no parsing needed
Forced tool_choice + JSON schema = guaranteed structure. Nullable types prevent fabrication when data is missing. No regex parsing of free text.
06 · Distractor patterns

Looks right, isn't

Each row pairs a plausible-looking pattern with the failure it actually creates. These are the shapes exam distractors are built from.

Looks right

Just write 'output JSON' in the prompt, Claude is good enough.

Actually wrong

Prompt-only JSON output is ~85% reliable. The model occasionally inserts narrative ('Sure, here is the JSON: ...') or omits fields. Forced tool_use with a schema is 100% structured.

Looks right

Forced tool_use guarantees the output is correct.

Actually wrong

Forced tool_use guarantees STRUCTURE, not CONTENT correctness. The model can still emit semantically wrong values. Add validation-retry for semantic checks.

Looks right

Use non-nullable fields so missing data is treated as an error.

Actually wrong

Non-nullable forces the model to fabricate when data is missing. Mark optional fields as nullable ([type, 'null']); add an 'unclear' enum value. Honest non-answers beat invented ones.

Looks right

Anthropic supports OpenAI-style response_format: { type: "json_object" }.

Actually wrong

Anthropic does NOT support `response_format`. The Anthropic-native pattern is forced tool_use with a JSON Schema. Sending response_format in an Anthropic request is silently ignored; sending it via a translation proxy (e.g. OpenRouter) might work but you've lost schema enforcement because tool_choice isn't applied. Stay on tool_use.

Looks right

If the schema is correct, the output will always be valid JSON, no need to wrap in try/parse.

Actually wrong

Forced tool_use guarantees the structure matches the schema, but edge cases still slip through: a number field with a value of Infinity, a string field with a 50KB blob exceeding your downstream limit, a date string that's syntactically valid but semantically 9999-12-31. Always validate after parsing, the schema is a filter, not a fortress.

07 · Compare

Side-by-side

ApproachReliabilityAuditBest for
Prompt-only ('output JSON')~85%Weak (free text)Casual extraction; non-critical
tool_use + schema (forced)~99% structureGood (logged input)Most extraction tasks
+ validation-retry~99%+ structure & semanticsExcellent (retry trail)Healthcare, finance, legal
+ PostToolUse policy hook100% policy complianceAudit-grade (pre-exec gate)Compliance-critical workflows
+ Batch API for bulkSame reliability, 50% costSame as forced tool_use1K+ documents in async pipeline
+ Prompt caching of schemaSame reliability, ~90% input savings on cached prefixSame as forcedHigh-throughput repeated extraction
08 · When to use

Decision tree

01

Do you need guaranteed JSON structure?

YesUse tool_use with a schema + tool_choice forced. Skip prompt-only.
NoPrompt is fine, accept the ~15% noise.
02

Do extraction errors have business consequences?

YesAdd a validation-retry loop. Feed errors back to Claude for self-correction.
NoSingle-pass tool_use is enough.
03

Is this a compliance-critical workflow (refund, medical, legal)?

YesAdd a PostToolUse hook before execution. Hooks enforce policy deterministically; prompts can't.
NoSchema + retry is sufficient.
04

Are you extracting from 1K+ documents in a non-interactive pipeline?

YesUse the Batch API with the same forced tool_use schema. 50% cost reduction, async results within 24h. Validation-retry runs at the application layer after batch completion.
NoSynchronous extraction with the standard Messages API.
05

Will the same schema be reused across many requests in a session?

YesAdd cache_control: {type: "ephemeral"} to the tools array. Cuts input cost ~90% for the cached schema on each subsequent extraction.
NoSkip caching; the schema is per-request anyway.
09 · On the exam

Question patterns

Structured Outputs exam trap, painterly cautionary scene featuring Loop mascot.

101 V2 questions wired to this concept. Tap an answer to check it instantly — you'll see whether it's right and why — then expand the full breakdown for the mental model and all four rationales.

Your code-review CI bot returns valid JSON with empty findings: [] even on PRs that clearly have issues. Why?

Tap your answer to check it.

A tool throws an exception inside your harness; on the next iteration Claude requests the same tool again with the same arguments. Why?

Tap your answer to check it.

A research subagent ran for 40 turns and returned a perfect summary, but the bill is huge. What is the architectural fix?

Tap your answer to check it.

You spawned 4 subagents in parallel; 3 finished and the 4th hangs forever. How do you debug?

Tap your answer to check it.

Your TypeScript handler branches on stop_reason for end_turn, tool_use, and max_tokens. What is missing for a production-safe handler?

Tap your answer to check it.

Why is using stop_reason preferred over counting tool_use blocks for loop control?

Tap your answer to check it.

95 additional questions for this concept live in the practice pillar. Take a mock exam ↗

10 · FAQ

Frequently asked

How does Anthropic's structured output compare to OpenAI's `response_format`?
Anthropic uses forced tool_use with a JSON Schema; OpenAI uses `response_format: {type: "json_schema"}`. Functionally equivalent for type enforcement, but the Anthropic pattern integrates natively with agentic loops (the response is a tool_use block, not a top-level field). Don't try to send OpenAI's `response_format` to Anthropic, it's silently ignored.
Can I nest objects deeply in the schema?
Yes, JSON Schema supports arbitrary nesting. But deep nesting (4+ levels) reduces extraction reliability, the model has to reason about more structure simultaneously. Flatten where possible: prefer {patient_id, dob, complaint} over {patient: {info: {id, dob}, visit: {complaint}}}.
What does the model do when a required field can't be extracted from the source?
Without escape hatches it fabricates. With nullable types or an enum: ["unclear", "not_provided"] value, it returns the honest non-answer. Schema design is the anti-fabrication lever, prompt instructions alone are unreliable.
How do I extract an unknown number of items (a variable-length list)?
Use {type: "array", items: {...}} with no minItems/maxItems constraints. The model decides count based on the source. If you need a hard cap, set maxItems so the model stops emitting at the limit instead of hallucinating to fill space.
Does prompt caching apply to the JSON Schema?
Yes, the entire tools array (including schemas) can be marked with cache_control: {type: "ephemeral"}. ~90% input cost savings on the cached prefix for subsequent calls within 5 minutes. Standard pattern for high-volume extraction pipelines.
What's the cost difference between prompt-only JSON and forced tool_use?
Tool definitions add input tokens (typically 200-1500 per tool). On a single call, prompt-only is cheaper. On 1000+ calls with the same schema, forced tool_use + caching is dramatically cheaper because the schema is a cached prefix and reliability is higher (fewer retries).
Can I combine extended thinking with structured outputs?
Yes, but you must use `tool_choice: "auto"`, not forced. Extended thinking is incompatible with forced or "any". The model thinks, then chooses the extraction tool autonomously. Reliability is slightly lower (~95% vs ~99%) but reasoning is much stronger for complex documents.
How do I handle a field that should be a number OR a string OR a list?
Two patterns: (1) discriminated union with a type enum field gating which value field is populated; (2) oneOf schema, but Claude's schema support for oneOf is limited and can confuse the model. Discriminated union is more reliable.
What happens if I send a malformed JSON Schema in the tool definition?
The API rejects with a 400 error before any inference. Common errors: type value misspelled ("strring"), required referencing a property not in properties, circular $ref (not supported anyway). Validate schemas with a JSON Schema linter before deploying.
How do I audit-trail a structured extraction in a regulated environment?
Log five fields per call: input document hash, tool definition hash, raw tool_use.input (pre-validation), validation errors (if any), final accepted output. The Anthropic message_id is your correlation key. Combined with a PostToolUse policy hook log, this is sufficient for SOC 2 / HIPAA audit.
11 · Practice with AI

Work this with your AI

Work this concept hands-on with Claude Code, Codex, or claude.ai. Copy a prompt, paste it into your assistant, and practise in tandem. Each one keeps you active (explain it back, get drilled, or build) rather than just reading.

  • Drill it like the exam (scenario MCQs)
    Practice in the exam's scenario-MCQ format with trap awareness.
  • Explain it back (Feynman)
    Build durable, transferable understanding of a concept you can half-state.
  • Test me, adapting the difficulty
    Active recall practice on a concept you think you know.
  • Check my prerequisites first
    Before studying a concept that keeps not sticking.
  • Find the high-leverage 20%
    When a domain feels too big and you are short on time.
Self-check

Test yourself

Three diagnostic questions on this primitive. Reveal each answer when you have a guess. Want a full 60-question mock? Open the mock hub →

Q1You prompt Claude for JSON and 15% of responses include explanatory text. How do you fix this for production?
Stop prompting for JSON; use tool_use with a JSON schema and tool_choice: forced. Token generation is constrained to match the schema; structure is guaranteed. Prompt-only approaches are probabilistic.
Q2Your schema has `{refund_reason: {type: "string"}}`. Claude returns `"reason: unable to determine"` when the contract is silent. What's wrong?
The schema forces a string, so Claude fabricates one when source is silent. Fix: {refund_reason: {type: ["string", "null"]}}. Or add an enum: ["refund", "replacement", "unclear"]. Give Claude a way to say "I don't know."
Q3A validation-retry loop runs 5 times and still fails. What's the next step?
Don't retry forever. After 2-3 retries, escalate to a human reviewer with the document and last-attempt extraction. Failure to converge usually means the source is genuinely ambiguous, not that Claude is broken.
Last reviewed: 2026-05-04·Refresh cadence: monthly
D4.4 · D4 · Prompt Engineering

Structured Outputs, complete.

You've covered the full ten-section breakdown for this primitive, definition, mechanics, code, false positives, comparison, decision tree, exam patterns, and FAQ. One technical primitive down on the path to CCA-F.

More platforms →