# Structured Outputs

> Structured outputs guarantee Claude returns JSON matching a schema instead of natural language. Force a tool with a JSON schema; the API enforces structure, not your parser. Schema design (nullable fields, 'unclear' enum values) prevents fabrication.

**Domain:** D4 · Prompt Engineering (20% of CCA-F exam)
**Canonical:** https://claudearchitectcertification.com/concepts/structured-outputs
**Last reviewed:** 2026-05-04

## Quick stats

- **Anti-fabrication patterns:** 3
- **Enforcement:** tool_use
- **Schema elements:** 2
- **Exam domain:** D4
- **Guarantee:** 100%

## What it is

Structured output is a deterministic contract between the model and your application: Claude returns data in a specific JSON shape, guaranteed to match a schema you define. Without structured output, Claude returns prose wrapped in whatever Markdown it picks: sometimes with explanatory text, sometimes with code blocks, sometimes with tangents. With structured output, you pass a JSON schema and Claude returns valid JSON matching it, or the request fails. Pattern: User → Prompt + Schema → Claude → Valid JSON.

The power of structured output is that it shifts the burden from output parsing (your problem) to token generation (Claude's problem). You don't extract fields from messy prose. You don't validate structure after the fact. Instead, you pass a schema (tool_use blocks with tool_choice: forced, or JSON-mode APIs) and Claude ensures conformance. Schema design is 80% of the work: bad schemas cause fabrication (e.g. inventing a refund reason when the contract is silent).

Three patterns exist: (1) tool_use blocks: pass {type: "tool", name: "extract_fields"} as a pseudo-tool, Claude must call it with your schema as input; (2) tool_choice forced: same, but with deterministic routing; (3) JSON-mode APIs (Anthropic Batch, some partners): Claude returns only JSON. Tool use is recommended, it's transparent, supports retries, and integrates naturally with agentic loops.

The validation-retry loop is the structural answer to schema violations and fabrication. When Claude returns invalid JSON or a fabricated value, you don't blame the model. You send the original document + the invalid extraction + the specific error back to Claude and ask it to correct. This loop typically runs 2-3 times, with each retry reducing errors by 70-80%. Without the loop, you silently accept garbage.

## How it works

The tool_use + schema pattern uses Claude's native tool-calling. You define a tool with a JSON schema in input_schema: {type: "object", properties: {...}, required: [...]}. Claude sees the tool, understands the schema, and calls it with arguments matching it. The guarantee is structural: token generation is constrained so Claude can only output JSON matching the schema. If Claude tries invalid JSON, the model rejects it and re-generates.

tool_choice: forced adds determinism by removing the choice. Instead of asking "should I call this tool," you assert "you MUST." The request: {tool_choice: {type: "tool", name: "extract_fields"}}. Claude skips reasoning about whether to call and goes straight to execution. Saves a turn and guarantees the tool fires. Use forced for extraction pipelines where the tool is always the goal; use regular tool_use (with auto) when the model should decide.

The validation-retry loop structure: (1) Send user input + document + extraction schema. (2) Claude extracts and calls the tool with JSON. (3) Validate the JSON against the schema AND check for fabrication (required fields that are null, generic strings when the source is specific). (4) If valid: done. (5) If invalid: send back the original + failed extraction + specific error ("field 'refund_amount' should be null when document is silent, not '0.00'"). (6) Claude retries with feedback in context.

Fabrication happens when the schema doesn't protect against it. Example: {refund_reason: {type: "string"}} without a nullable field or enum. Claude sees a required string and fills it with something plausible, even if the contract never states a reason. The fix: {refund_reason: {type: ["string", "null"]}, confidence: {type: "number"}} and instruct Claude: "set null if silent." Nullable fields + escape hatches (enum with "unclear") are the schema anti-fabrication pattern.

## Where you'll see it in production

### Healthcare intake form extraction

Patient intake notes flow through tool_use with a JSON schema (patient_id, dob, complaint, meds[], allergies[]). Validation-retry: invalid date format → Claude sees the error → fixes input → resubmits. Error rate drops from 8% (prompt-only) to 0.3% with the schema + retry.

### Refund authorization with policy hooks

Customer service agent extracts refund_amount, customer_id, reason via structured output. PostToolUse hook validates: amount within policy? customer eligible? Hook blocks illegal refunds before execution. Compliance reaches 100%; prompt-only achieves ~70%.

### Research claim extraction with provenance

Literature review returns JSON: [{claim, evidence_tier, sources: [{title, url, date}]}]. Schema requires every claim to have ≥1 source. PreToolUse checks tier label; PostToolUse deduplicates. Output is deterministic and audit-ready for clinicians.

## Code examples

### tool_use as a JSON-schema enforcer

```python
from anthropic import Anthropic
import json

client = Anthropic()

extract_intake = {
    "name": "save_patient_intake",
    "description": "Save structured patient intake data with field validation.",
    "input_schema": {
        "type": "object",
        "properties": {
            "patient_id": {"type": "string"},
            "dob": {"type": "string", "format": "date"},
            "complaint": {"type": "string"},
            "medications": {"type": "array", "items": {"type": "string"}},
            "allergies": {"type": "array", "items": {"type": "string"}},
            "balance": {"type": ["number", "null"]},  # nullable prevents fabrication
        },
        "required": ["patient_id", "dob", "complaint"],
    },
}

def extract(intake_text: str) -> dict:
    resp = client.messages.create(
        model="claude-opus-4-5",
        max_tokens=1024,
        tools=[extract_intake],
        # Force the tool, guarantees structured JSON, not free text
        tool_choice={"type": "tool", "name": "save_patient_intake"},
        messages=[{"role": "user", "content": f"Extract:\n{intake_text}"}],
    )

    if resp.stop_reason != "tool_use":
        raise ValueError("expected tool_use, got " + resp.stop_reason)

    tool_block = next(b for b in resp.content if b.type == "tool_use")
    return tool_block.input  # already structured, no parsing needed

```

> Forced tool_choice + JSON schema = guaranteed structure. Nullable types prevent fabrication when data is missing. No regex parsing of free text.

### Validation-retry loop for semantic errors

```python
from datetime import datetime

def extract_with_retry(text: str, max_attempts: int = 3) -> dict:
    messages = [{"role": "user", "content": f"Extract intake:\n{text}"}]

    for attempt in range(max_attempts):
        resp = client.messages.create(
            model="claude-opus-4-5", max_tokens=1024,
            tools=[extract_intake],
            tool_choice={"type": "tool", "name": "save_patient_intake"},
            messages=messages,
        )
        tool_block = next(b for b in resp.content if b.type == "tool_use")
        data = tool_block.input

        # Schema enforces TYPES; we still need to validate semantics.
        error = validate_semantically(data)
        if not error:
            return data

        # Feed the error back to Claude, it can fix and retry
        messages.append({"role": "assistant", "content": resp.content})
        messages.append({
            "role": "user",
            "content": [{"type": "tool_result", "tool_use_id": tool_block.id,
                         "content": f"Validation error: {error}. Fix and resubmit."}],
        })

    raise RuntimeError(f"failed after {max_attempts} attempts")

def validate_semantically(data: dict) -> str | None:
    try:
        datetime.strptime(data["dob"], "%Y-%m-%d")
    except ValueError:
        return f"dob '{data['dob']}' is not YYYY-MM-DD format"
    if data.get("complaint", "").strip() == "":
        return "complaint is empty"
    return None

```

> Schema enforces types; validation-retry catches semantic errors (bad date format, empty fields). Claude sees the error and fixes its own output within the same loop.

## Looks-right vs actually-wrong

| Looks right | Actually wrong |
|---|---|
| Just write 'output JSON' in the prompt, Claude is good enough. | Prompt-only JSON output is ~85% reliable. The model occasionally inserts narrative ('Sure, here is the JSON: ...') or omits fields. Forced tool_use with a schema is 100% structured. |
| Forced tool_use guarantees the output is correct. | Forced tool_use guarantees STRUCTURE, not CONTENT correctness. The model can still emit semantically wrong values. Add validation-retry for semantic checks. |
| Use non-nullable fields so missing data is treated as an error. | Non-nullable forces the model to fabricate when data is missing. Mark optional fields as nullable ([type, 'null']); add an 'unclear' enum value. Honest non-answers beat invented ones. |
| Anthropic supports OpenAI-style response_format: { type: "json_object" }. | Anthropic does NOT support response_format. The Anthropic-native pattern is forced tool_use with a JSON Schema. Sending response_format in an Anthropic request is silently ignored; sending it via a translation proxy (e.g. OpenRouter) might work but you've lost schema enforcement because tool_choice isn't applied. Stay on tool_use. |
| If the schema is correct, the output will always be valid JSON, no need to wrap in try/parse. | Forced tool_use guarantees the structure matches the schema, but edge cases still slip through: a number field with a value of Infinity, a string field with a 50KB blob exceeding your downstream limit, a date string that's syntactically valid but semantically 9999-12-31. Always validate after parsing, the schema is a filter, not a fortress. |

## Comparison

| Approach | Reliability | Audit | Best for |
| --- | --- | --- | --- |
| Prompt-only ('output JSON') | ~85% | Weak (free text) | Casual extraction; non-critical |
| tool_use + schema (forced) | ~99% structure | Good (logged input) | Most extraction tasks |
| + validation-retry | ~99%+ structure & semantics | Excellent (retry trail) | Healthcare, finance, legal |
| + PostToolUse policy hook | 100% policy compliance | Audit-grade (pre-exec gate) | Compliance-critical workflows |
| + Batch API for bulk | Same reliability, 50% cost | Same as forced tool_use | 1K+ documents in async pipeline |
| + Prompt caching of schema | Same reliability, ~90% input savings on cached prefix | Same as forced | High-throughput repeated extraction |

## Decision tree

1. **Do you need guaranteed JSON structure?**
   - **Yes:** Use tool_use with a schema + tool_choice forced. Skip prompt-only.
   - **No:** Prompt is fine, accept the ~15% noise.

2. **Do extraction errors have business consequences?**
   - **Yes:** Add a validation-retry loop. Feed errors back to Claude for self-correction.
   - **No:** Single-pass tool_use is enough.

3. **Is this a compliance-critical workflow (refund, medical, legal)?**
   - **Yes:** Add a PostToolUse hook before execution. Hooks enforce policy deterministically; prompts can't.
   - **No:** Schema + retry is sufficient.

4. **Are you extracting from 1K+ documents in a non-interactive pipeline?**
   - **Yes:** Use the Batch API with the same forced tool_use schema. 50% cost reduction, async results within 24h. Validation-retry runs at the application layer after batch completion.
   - **No:** Synchronous extraction with the standard Messages API.

5. **Will the same schema be reused across many requests in a session?**
   - **Yes:** Add cache_control: {type: "ephemeral"} to the tools array. Cuts input cost ~90% for the cached schema on each subsequent extraction.
   - **No:** Skip caching; the schema is per-request anyway.

## Exam-pattern questions

### Q1. You prompt Claude for JSON and 15% of responses include explanatory text. How do you fix this for production?

Stop prompting for JSON; use tool_use with a JSON schema and tool_choice: forced. Token generation is constrained to match the schema; structure is guaranteed. Prompt-only approaches are probabilistic.

### Q2. Your schema has {refund_reason: {type: "string"}}. Claude returns "reason: unable to determine" when the contract is silent. What's wrong?

The schema forces a string, so Claude fabricates one when source is silent. Fix: {refund_reason: {type: ["string", "null"]}}. Or add an enum: ["refund", "replacement", "unclear"]. Give Claude a way to say "I don't know."

### Q3. A validation-retry loop runs 5 times and still fails. What's the next step?

Don't retry forever. After 2-3 retries, escalate to a human reviewer with the document and last-attempt extraction. Failure to converge usually means the source is genuinely ambiguous, not that Claude is broken.

### Q4. You force tool_choice: forced for extraction. Sometimes Claude returns tool_use with empty input. Why?

Forced tool_choice guarantees the tool fires, not that input is valid. Empty input means the source has no extractable data. Validate input in your harness; return a structured error so Claude can ask for more context or escalate.

### Q5. Your extraction tool returns confidence: 0.4 for a critical field. What should production do?

Route low-confidence extractions to human review. Low confidence is a feature: it means Claude is honest about uncertainty. Auto-accept high-confidence; queue medium for batch review; escalate low immediately.

### Q6. You use tool_choice: forced with extended_thinking. The API returns 400. Why?

Extended thinking is incompatible with forced or any tool_choice. Use auto when thinking is on. If extraction must be guaranteed, drop thinking; if reasoning matters more, accept text fallback.

### Q7. Schema validation passes but the extracted customer_id is "customer_001" when the document says "cus_abc123". What's the gap?

Schema enforces structure (string), not content correctness. Add a regex pattern: "pattern": "^cus_[a-z0-9]+$". Or validate semantically in your harness and trigger validation-retry on mismatch.

### Q8. You're using Batch API for 1,000 extractions. 50 fail validation. What's the cost-aware retry strategy?

Batch doesn't auto-retry. Submit failures as a new batch with the original document + the validation error. Most converge on the second pass. Truly stubborn cases go to human review. Don't re-run all 1,000.

## FAQ

### Q1. How does Anthropic's structured output compare to OpenAI's response_format?

Anthropic uses forced tool_use with a JSON Schema; OpenAI uses response_format: {type: "json_schema"}. Functionally equivalent for type enforcement, but the Anthropic pattern integrates natively with agentic loops (the response is a tool_use block, not a top-level field). Don't try to send OpenAI's response_format to Anthropic, it's silently ignored.

### Q2. Can I nest objects deeply in the schema?

Yes, JSON Schema supports arbitrary nesting. But deep nesting (4+ levels) reduces extraction reliability, the model has to reason about more structure simultaneously. Flatten where possible: prefer {patient_id, dob, complaint} over {patient: {info: {id, dob}, visit: {complaint}}}.

### Q3. What does the model do when a required field can't be extracted from the source?

Without escape hatches it fabricates. With nullable types or an enum: ["unclear", "not_provided"] value, it returns the honest non-answer. Schema design is the anti-fabrication lever, prompt instructions alone are unreliable.

### Q4. How do I extract an unknown number of items (a variable-length list)?

Use {type: "array", items: {...}} with no minItems/maxItems constraints. The model decides count based on the source. If you need a hard cap, set maxItems so the model stops emitting at the limit instead of hallucinating to fill space.

### Q5. Does prompt caching apply to the JSON Schema?

Yes, the entire tools array (including schemas) can be marked with cache_control: {type: "ephemeral"}. ~90% input cost savings on the cached prefix for subsequent calls within 5 minutes. Standard pattern for high-volume extraction pipelines.

### Q6. What's the cost difference between prompt-only JSON and forced tool_use?

Tool definitions add input tokens (typically 200-1500 per tool). On a single call, prompt-only is cheaper. On 1000+ calls with the same schema, forced tool_use + caching is dramatically cheaper because the schema is a cached prefix and reliability is higher (fewer retries).

### Q7. Can I combine extended thinking with structured outputs?

Yes, but you must use tool_choice: "auto", not forced. Extended thinking is incompatible with forced or "any". The model thinks, then chooses the extraction tool autonomously. Reliability is slightly lower (~95% vs ~99%) but reasoning is much stronger for complex documents.

### Q8. How do I handle a field that should be a number OR a string OR a list?

Two patterns: (1) discriminated union with a type enum field gating which value field is populated; (2) oneOf schema, but Claude's schema support for oneOf is limited and can confuse the model. Discriminated union is more reliable.

### Q9. What happens if I send a malformed JSON Schema in the tool definition?

The API rejects with a 400 error before any inference. Common errors: type value misspelled ("strring"), required referencing a property not in properties, circular $ref (not supported anyway). Validate schemas with a JSON Schema linter before deploying.

### Q10. How do I audit-trail a structured extraction in a regulated environment?

Log five fields per call: input document hash, tool definition hash, raw tool_use.input (pre-validation), validation errors (if any), final accepted output. The Anthropic message_id is your correlation key. Combined with a PostToolUse policy hook log, this is sufficient for SOC 2 / HIPAA audit.

---

**Source:** https://claudearchitectcertification.com/concepts/structured-outputs
**Vault sources:** ACP-T03 §4.4; GAI-K04 §18 anti-fabrication schema
**Last reviewed:** 2026-05-04

**Evidence tiers** — 🟢 official Anthropic doc / API contract · 🟡 partial doc / inferred · 🟠 community-derived · 🔴 disputed.