What it is
Structured claim-source mapping is the contract that says every claim in the final report is traceable to a specific passage in a specific source. The verification subagent emits JSON that pairs each claim with its provenance: claim_id, claim_text, source_url, source_passage (the actual quoted text, not just a URL), publication_date, and confidence. Synthesis reads only this schema and renders inline citations from it. No free-form attribution, no model-generated URLs.
The architectural reason is anti-fabrication. A synthesis subagent that writes prose plus citations from memory will eventually produce a citation that looks plausible but doesn't exist. By forcing synthesis to render [1] from a record where source_passage is already pinned, you remove the model's ability to hallucinate the link. If the passage isn't in the schema, there is no citation to render. Either the record is complete or it gets surfaced as a data gap.
The schema also handles the conflicting-source case explicitly. When two sources disagree (45% Pew vs 12% McKinsey), the verification subagent emits a sources_reconciled array with both records pinned, plus a notes field that explains the apparent conflict (different timeframes, different definitions, different populations). Synthesis presents both with attribution; it does not pick a winner. Picking one is misinformation. Preserving both with context is journalism.
How it works
The verification subagent receives pooled findings from research subagents and a fact-check rubric. For each candidate claim, it confirms the source is credible and dated, extracts the verbatim source_passage that backs the claim, and assigns a confidence score. The output is JSON only: {verifications: [{claim_id, claim_text, verified, confidence, sources_reconciled: [{stat, source_url, source_passage, publication_date, context}], notes}]}. The schema is enforced via Pydantic in Python or Zod in TypeScript; malformed output fails the verification step and triggers a retry.
Synthesis is read-only. It receives the verified-claims JSON, the coordinator's narrative prompt, and a tool list of [Read] only. It walks the verifications array in order, writes prose that flows logically, and emits inline [1], [2] citations that index into sources_reconciled. The render is mechanical. Synthesis is never asked to invent an attribution; if it tries, the verification record is the only thing the citation can point to.
Data gaps are first-class. When verification finds a claim it cannot confirm (no credible source, conflicting data without enough context, or a research-subagent timeout), it emits {verified: false, notes: 'no credible source within window'}. Synthesis is instructed to acknowledge that gap in prose: Adoption rates among independent musicians remain unverified across our sources. Transparent gaps beat confident fabrication every time. This is the architectural detail that protects the report from looking complete when it isn't.
The 4 decisions
Each row pairs the right answer with the most-tested distractor. The Why column explains the failure mode behind the wrong choice.
| Decision | Right answer | Wrong answer | Why |
|---|---|---|---|
| Two sources disagree (45% Pew vs 12% McKinsey) | Preserve both in sources_reconciled with attribution + notes explaining the difference | Pick the higher-confidence source and drop the other | Both numbers are correct under their own definitions (any-use vs daily-use). Dropping one is misinformation. Preserving both with context is the journalistic and architectural move. |
| Synthesis needs a citation for a claim. Where does the URL come from? | From the source_url field in the verification record | Synthesis generates the citation from memory of training data | Model-generated citations are the canonical fabrication failure. Schema-pinned citations cannot be invented. The model is rendering, not authoring. |
| A claim has no credible source. What happens? | Emit {verified: false, notes: 'no credible source'}. Synthesis acknowledges the gap in prose | Drop the claim silently from the final report | Silent drops produce reports that look complete when they aren't. Acknowledged gaps are honest and let the reader judge confidence. |
Should the schema include the verbatim source_passage? | Yes. The exact quoted text that backs the claim | Just the URL. The passage can be re-fetched at render time | Re-fetching introduces a new failure mode (URL went 404, page changed). Pinning the passage at verification time freezes provenance. The record is self-contained. |
Where it breaks
5 failure pairs. Each one is one exam pattern. The fix is always architectural, never a prose plea to the model.
Synthesis writes prose with (Pew, 2024) style citations from memory. Half the year tags are wrong; one URL doesn't resolve.
Force synthesis to render [1], [2] indexes that resolve through the verified-claims JSON. The model is a renderer, not a citation author.
Verification sees 45% Pew and 12% McKinsey, writes ~30% (averaged). The averaged number doesn't exist anywhere; the report is misinformation.
Emit both in sources_reconciled with attribution + notes explaining timeframe/definition/population differences. Synthesis presents both with context.
Verification can't confirm a claim, drops it. Final report reads as if no such question was ever asked.
Emit {verified: false, notes: 'unverified'} and instruct synthesis to acknowledge the gap. Transparency beats false completeness.
Schema stores source_url but not source_passage. A week later the URL returns 404 or the page is rewritten. Report claims become unverifiable.
Pin the verbatim source_passage at verification time. The record is self-contained even if the source URL drifts.
Verification emits malformed JSON; synthesis parses what it can and improvises the rest. Fabrication slips back in via the missing fields.
Validate verification output with Pydantic / Zod. Malformed output fails the step and triggers a retry. Synthesis never sees an incomplete record.
Exam patterns
5 V2 questions wired to this deep dive. Each shows all 4 options with rationale, the mental model under test, and the priority order across distractors.
Concepts wired
4 primitives compose this sub-pattern. Each card links to the concept page where the primitive is taught in isolation.
Continue the parent
2 more sub-patterns under Multi-Agent Research System. Each one drills into a different load-bearing decision.
Coordinator Routing
How the coordinator decomposes a research query and dispatches subtasks across the hub-and-spoke topology.
The coordinator owns semantic decomposition (not lexical), enumerates every relevant sub-domain before spawning anything, and dispatches research subagents in parallel via a synchronous-fork-then-join pattern. Coverage gaps live in the decomposition step. Not in the subagents.
Read deep dive ↗Subagent allowedTools and Isolation
How the parent agent enforces tool whitelists per subagent and how each subagent runs in a fresh context with no chat-history inheritance.
Every subagent declares an allowedTools list ([Read, WebSearch, Bash] for research, [Read] only for synthesis). The SDK enforces it. Each subagent runs in a fresh isolated context with no inherited messages. Every fact it needs is embedded in the task prompt. Tool overscoping and history inheritance are the canonical failure modes.
Frequently asked
Why pin the passage instead of just the URL?
source_passage at verification time freezes provenance in the record itself. The record is self-contained even if the source URL later breaks.How does the schema prevent fabrication if the model is still generating prose?
[1] resolves through the verified-claims JSON to a record with source_url and source_passage. If a record doesn't exist, there's no [N] to render. The model can't invent attributions because it's rendering an array, not authoring citations.What confidence score threshold should trigger a gap acknowledgement?
< 0.6 confidence triggers a notes field that synthesis surfaces. Below < 0.3 triggers an explicit gap acknowledgement. Above >= 0.6, present the claim normally. Tune by calibrating against human-graded reports.