P3.3 deep dive · D2 + D5 · Tool Design + Integration

Structured Claim-Source Mapping.

How verified claims get pinned to their source documents in a structured-output schema so the final report cannot fabricate.

6 prose blocks·4 decisions·5 failure modes·4 exam Qs

Every verified claim emits as a JSON object with claim_id, claim_text, source_url, source_passage, confidence, and notes. The schema is enforced at the verification step. Synthesis renders citations from the schema and cannot invent attributions because the source text is pinned in-record.

Domain 2 + 5Anti-fabricationSchema-enforced
Scope
One sub-pattern of the parent. Slimmer surface, same exam rigour.
Exam
18% D2 · 15% D5. Each decision and failure mode names the canonical distractor.
Canonical patternD2 + D5
01 · The pattern

What it is

Structured claim-source mapping is the contract that says every claim in the final report is traceable to a specific passage in a specific source. The verification subagent emits JSON that pairs each claim with its provenance: claim_id, claim_text, source_url, source_passage (the actual quoted text, not just a URL), publication_date, and confidence. Synthesis reads only this schema and renders inline citations from it. No free-form attribution, no model-generated URLs.

The architectural reason is anti-fabrication. A synthesis subagent that writes prose plus citations from memory will eventually produce a citation that looks plausible but doesn't exist. By forcing synthesis to render [1] from a record where source_passage is already pinned, you remove the model's ability to hallucinate the link. If the passage isn't in the schema, there is no citation to render. Either the record is complete or it gets surfaced as a data gap.

The schema also handles the conflicting-source case explicitly. When two sources disagree (45% Pew vs 12% McKinsey), the verification subagent emits a sources_reconciled array with both records pinned, plus a notes field that explains the apparent conflict (different timeframes, different definitions, different populations). Synthesis presents both with attribution; it does not pick a winner. Picking one is misinformation. Preserving both with context is journalism.

02 · How it runs

How it works

The verification subagent receives pooled findings from research subagents and a fact-check rubric. For each candidate claim, it confirms the source is credible and dated, extracts the verbatim source_passage that backs the claim, and assigns a confidence score. The output is JSON only: {verifications: [{claim_id, claim_text, verified, confidence, sources_reconciled: [{stat, source_url, source_passage, publication_date, context}], notes}]}. The schema is enforced via Pydantic in Python or Zod in TypeScript; malformed output fails the verification step and triggers a retry.

Synthesis is read-only. It receives the verified-claims JSON, the coordinator's narrative prompt, and a tool list of [Read] only. It walks the verifications array in order, writes prose that flows logically, and emits inline [1], [2] citations that index into sources_reconciled. The render is mechanical. Synthesis is never asked to invent an attribution; if it tries, the verification record is the only thing the citation can point to.

Data gaps are first-class. When verification finds a claim it cannot confirm (no credible source, conflicting data without enough context, or a research-subagent timeout), it emits {verified: false, notes: 'no credible source within window'}. Synthesis is instructed to acknowledge that gap in prose: Adoption rates among independent musicians remain unverified across our sources. Transparent gaps beat confident fabrication every time. This is the architectural detail that protects the report from looking complete when it isn't.

03 · Configuration decisions

The 4 decisions

Each row pairs the right answer with the most-tested distractor. The Why column explains the failure mode behind the wrong choice.

DecisionRight answerWrong answerWhy
Two sources disagree (45% Pew vs 12% McKinsey)Preserve both in sources_reconciled with attribution + notes explaining the differencePick the higher-confidence source and drop the otherBoth numbers are correct under their own definitions (any-use vs daily-use). Dropping one is misinformation. Preserving both with context is the journalistic and architectural move.
Synthesis needs a citation for a claim. Where does the URL come from?From the source_url field in the verification recordSynthesis generates the citation from memory of training dataModel-generated citations are the canonical fabrication failure. Schema-pinned citations cannot be invented. The model is rendering, not authoring.
A claim has no credible source. What happens?Emit {verified: false, notes: 'no credible source'}. Synthesis acknowledges the gap in proseDrop the claim silently from the final reportSilent drops produce reports that look complete when they aren't. Acknowledged gaps are honest and let the reader judge confidence.
Should the schema include the verbatim source_passage?Yes. The exact quoted text that backs the claimJust the URL. The passage can be re-fetched at render timeRe-fetching introduces a new failure mode (URL went 404, page changed). Pinning the passage at verification time freezes provenance. The record is self-contained.
04 · Failure modes

Where it breaks

5 failure pairs. Each one is one exam pattern. The fix is always architectural, never a prose plea to the model.

Free-form citations

Synthesis writes prose with (Pew, 2024) style citations from memory. Half the year tags are wrong; one URL doesn't resolve.

✅ Fix

Force synthesis to render [1], [2] indexes that resolve through the verified-claims JSON. The model is a renderer, not a citation author.

Conflict-flattening

Verification sees 45% Pew and 12% McKinsey, writes ~30% (averaged). The averaged number doesn't exist anywhere; the report is misinformation.

✅ Fix

Emit both in sources_reconciled with attribution + notes explaining timeframe/definition/population differences. Synthesis presents both with context.

Silent unverified-claim drop

Verification can't confirm a claim, drops it. Final report reads as if no such question was ever asked.

✅ Fix

Emit {verified: false, notes: 'unverified'} and instruct synthesis to acknowledge the gap. Transparency beats false completeness.

Passage-by-URL only

Schema stores source_url but not source_passage. A week later the URL returns 404 or the page is rewritten. Report claims become unverifiable.

✅ Fix

Pin the verbatim source_passage at verification time. The record is self-contained even if the source URL drifts.

Schema not enforced

Verification emits malformed JSON; synthesis parses what it can and improvises the rest. Fabrication slips back in via the missing fields.

✅ Fix

Validate verification output with Pydantic / Zod. Malformed output fails the step and triggers a retry. Synthesis never sees an incomplete record.

05 · Exam patterns

Exam patterns

5 V2 questions wired to this deep dive. Each shows all 4 options with rationale, the mental model under test, and the priority order across distractors.

A web-search subagent times out and returns an empty result list. The coordinator treats this as 'no information available' and moves forward. The final report is incomplete. What is the architectural fix?
A research report cites two conflicting statistics: '45% of creative workers use AI' (Pew) and '12% use AI daily' (McKinsey). Should synthesis pick the more likely one?
A synthesis subagent needs to verify ~100 facts in a final report. Calling verify_fact sequentially takes 60+ seconds. What is the architectural fix?
Research system decomposes 'impact of AI on creative industries' into visual arts, music, and writing. Web-search subagent finds excellent results for all three. Synthesis covers only visual arts. Where is the bug?
Subagent A (academic papers) finds a key direction. Subagent B (web search) needs that direction. A junior engineer suggests A call B directly to skip a coordinator round trip. Why is that wrong?
06 · Concepts in play

Concepts wired

4 primitives compose this sub-pattern. Each card links to the concept page where the primitive is taught in isolation.

07 · Sibling deep dives

Continue the parent

2 more sub-patterns under Multi-Agent Research System. Each one drills into a different load-bearing decision.

08 · FAQ

Frequently asked

Why pin the passage instead of just the URL?
URLs drift. Pages get rewritten, papers retracted, sites moved. Pinning the verbatim source_passage at verification time freezes provenance in the record itself. The record is self-contained even if the source URL later breaks.
How does the schema prevent fabrication if the model is still generating prose?
The model generates prose freely, but citation rendering is mechanical: [1] resolves through the verified-claims JSON to a record with source_url and source_passage. If a record doesn't exist, there's no [N] to render. The model can't invent attributions because it's rendering an array, not authoring citations.
What confidence score threshold should trigger a gap acknowledgement?
Domain-dependent, but a reasonable default is < 0.6 confidence triggers a notes field that synthesis surfaces. Below < 0.3 triggers an explicit gap acknowledgement. Above >= 0.6, present the claim normally. Tune by calibrating against human-graded reports.
P3.3 deep dive · D2 · Tool Design + Integration · Deep dive

Structured Claim-Source Mapping, complete.

You've covered the full ten-section breakdown for this primitive, definition, mechanics, code, false positives, comparison, decision tree, exam patterns, and FAQ. One technical primitive down on the path to CCA-F.

Share your win →