# Multi-Agent Research System

> A hub-and-spoke research system. The coordinator owns task decomposition (semantic, not lexical), spawns 3-5 research subagents in parallel with isolated contexts and scoped tools, routes findings through a verification subagent that preserves contradictions with attribution (45% Pew vs 12% McKinsey, both kept), and hands verified claims to a read-only synthesis subagent that emits cited Markdown. Subagents NEVER talk to each other directly. All communication routes through the coordinator. Timeouts return structured error context, not silence. The single most-tested distractor: blaming the subagent for narrow coverage when the coordinator's decomposition was the bug.

**Sub-marker:** P3.3
**Domains:** D1 · Agentic Architectures, D2 · Tool Design + Integration
**Exam weight:** 45% of CCA-F (D1 + D2)
**Build time:** 26 minutes
**Source:** 🟢 Official Anthropic guide scenario · in published exam guide and practice exam
**Canonical:** https://claudearchitectcertification.com/scenarios/multi-agent-research-system
**Last reviewed:** 2026-05-04

## In plain English

Think of this as how you ask one question and get back a properly cited briefing from many sources at once. A coordinator splits the question into the obvious sub-questions (visual arts, music, writing, film, performing arts. Not just the first one that comes to mind), spawns a small team of researchers in parallel, each works alone in their lane, then a separate fact-checker reconciles anything they disagree on, and a final writer turns the verified findings into a single readable report with citations. The whole point is that one big agent thinking by itself misses things; a small team with the right division of labour does not.

## Exam impact

Domain 1 (Agentic Architecture, 27%) tests coordinator vs subagent responsibilities, hub-and-spoke topology, and where decomposition actually lives. Domain 2 (Tool Design, 18%) tests scoped tool access per subagent, structured error context, and verify_fact scoping. This scenario is in the published guide AND the practice exam. Questions you drill here match the live exam closely. The 'who's at fault when coverage is narrow' question is the canonical exam distractor.

## The problem

### What the customer needs
- Complete coverage of the research topic. Every relevant sub-domain enumerated, none silently dropped.
- Reconciled contradictions preserved with attribution, not flattened into one 'most likely' number.
- Cited final report that traces every claim to a verifiable source and acknowledges data gaps.

### Why naive approaches fail
- Coordinator decomposes 'creative industries' into only visual arts. Misses music, writing, film, performing arts.
- Web-search subagent times out and returns empty results as success. Coordinator treats as 'no info' instead of 'needs retry'.
- Synthesis picks the 'more likely' statistic between 45% (Pew) and 12% (McKinsey). Drops the conflict, ships misinformation.

### Definition of done
- Topic-coverage gap rate = 0 (decomposition reviewed before spawn)
- Timeout-as-empty-results rate = 0 (structured error context required)
- Contradiction-preservation rate = 100% (both stats + sources retained)
- Subagent-to-subagent direct call rate = 0 (all routes through coordinator)

## Concepts in play

- 🟢 **Subagents** (`subagents`), Isolated parallel research workers
- 🟢 **Agentic loops** (`agentic-loops`), Coordinator + subagent loops
- 🟢 **Tool calling** (`tool-calling`), Scoped tool whitelist per subagent
- 🟢 **tool_choice** (`tool-choice`), auto on research, forced on verification
- 🟢 **stop_reason** (`stop-reason`), Coordinator + subagent termination
- 🟢 **Structured outputs** (`structured-outputs`), Findings + verifications JSON shape
- 🟢 **Evaluation** (`evaluation`), Verification subagent as fact-check gate
- 🟢 **Context window** (`context-window`), Subagent isolation prevents bloat

## Components

### Coordinator Agent, the hub of hub-and-spoke

Receives the user query, performs SEMANTIC decomposition (not lexical) into all relevant sub-domains, spawns research subagents in parallel with explicit task prompts, awaits all results, routes findings to verification, hands verified claims to synthesis. Owns every cross-subagent communication path.

**Configuration:** Decomposition is the load-bearing step. For 'impact of AI on creative industries' the coordinator must enumerate visual + music + writing + film + performing arts, not stop at the first sub-domain. Spawn pattern: asyncio.gather (Python) / Promise.all (TS).
**Concept:** `subagents`

### Research Subagent (parallel), scoped tools, isolated context

One subagent per sub-domain. Receives an explicit task prompt. No inherited history, no parent context. Runs research with a narrow tool whitelist (Read, WebSearch, Bash). Returns structured findings JSON: {claim, sources: [{url, date, confidence}]}. Never editorialises; reports facts as stated.

**Configuration:** system: "You are a research specialist. Find authoritative sources. Return JSON {findings: [{claim, sources: [...]}]}." tools: [Read, WebSearch, Bash]. messages: [{role: "user", content: task_from_coordinator}].
**Concept:** `tool-calling`

### Verification Subagent, fact-check + reconcile contradictions

Cross-checks all claims from research subagents. When two sources conflict (45% Pew vs 12% McKinsey), preserves both with their context and attribution rather than picking the 'more likely' one. Returns verified claims with confidence scores; the verification step is what protects the report from misinformation.

**Configuration:** Input: pooled claims from all research subagents. Output: {verifications: [{claim, verified, confidence, sources_reconciled, notes}]}. Notes field captures the context that explains apparent contradictions (different timeframes, definitions, populations).
**Concept:** `evaluation`

### Synthesis Subagent, read-only narrative generator

Receives verified claims + the coordinator's narrative prompt. Writes a cohesive Markdown report with inline citations [1], [2]. CRITICAL: tools restricted to Read only. No WebSearch, no Bash. This prevents re-research and keeps synthesis focused on stitching the verified facts into a story.

**Configuration:** system: "You are a synthesis specialist. Read verified findings and write a cited narrative. Do NOT research." tools: [Read]. Input: {verified_claims, narrative_prompt}. Output: Markdown with [n] citations.
**Concept:** `context-window`

### Error Propagation Layer, structured timeout context

When a subagent times out or hits a dead end, returns structured error context the coordinator can act on: {status: 'timeout', query, partial_results, alternatives}. Coordinator inspects status_code and either retries with a narrower scope, accepts partial data, or transparently marks the gap in the final report.

**Configuration:** On timeout: {status: 'timeout', query, partial, alternatives: ['narrower query', 'different keywords', ...]}. On no_results: {status: 'no_results', query, alternatives}. Never return [] as success. Silence loses the failure context.
**Concept:** `structured-outputs`

## Build steps

### 1. Build the coordinator's semantic decomposition

The decomposition step is where most coverage failures actually originate. Analyse the topic semantically and enumerate ALL relevant sub-domains before spawning anything. For 'creative industries', that means visual arts AND music AND writing AND film AND performing arts. Not the first one that comes to mind. The decomposition is the coordinator's load-bearing responsibility.

**Python:**

```python
from typing import List

def decompose_query(query: str) -> List[str]:
    """Semantic decomposition. Enumerate all relevant sub-domains.

    The exam-question distractor is to blame subagents for narrow
    coverage when the coordinator's decomposition was the bug.
    """
    q = query.lower()
    if "creative industries" in q:
        domains = [
            "visual arts (digital art, graphic design, photography)",
            "music production and composition",
            "writing (novels, journalism, screenwriting)",
            "film and video production",
            "performing arts (theater, dance)",
        ]
    elif "healthcare" in q:
        domains = [
            "clinical decision support",
            "medical imaging and diagnostics",
            "drug discovery and trials",
            "patient-facing communication",
            "administrative + revenue cycle",
        ]
    else:
        # Generic fallback. STILL decompose, never single-shot
        domains = [
            f"{query}. Recent academic literature",
            f"{query}. Industry case studies",
            f"{query}. Empirical adoption data",
        ]
    return [f"Find AI impact on {d}" for d in domains]
```

**TypeScript:**

```typescript
function decomposeQuery(query: string): string[] {
  // Semantic decomposition. Enumerate all relevant sub-domains.
  // The exam-question distractor is to blame subagents for narrow
  // coverage when the coordinator's decomposition was the bug.
  const q = query.toLowerCase();
  let domains: string[];
  if (q.includes("creative industries")) {
    domains = [
      "visual arts (digital art, graphic design, photography)",
      "music production and composition",
      "writing (novels, journalism, screenwriting)",
      "film and video production",
      "performing arts (theater, dance)",
    ];
  } else if (q.includes("healthcare")) {
    domains = [
      "clinical decision support",
      "medical imaging and diagnostics",
      "drug discovery and trials",
      "patient-facing communication",
      "administrative + revenue cycle",
    ];
  } else {
    // Generic fallback. STILL decompose, never single-shot
    domains = [
      `${query}. Recent academic literature`,
      `${query}. Industry case studies`,
      `${query}. Empirical adoption data`,
    ];
  }
  return domains.map((d) => `Find AI impact on ${d}`);
}
```

Concept: `subagents`

### 2. Define subagent system prompts and tool whitelists

Every subagent gets its own system prompt + scoped tool list. Research subagents get [Read, WebSearch, Bash]; verification gets [Read, WebSearch, Bash] + a fact-check rubric; synthesis gets [Read] only. That read-only restriction is the architectural detail that prevents synthesis from re-researching mid-narrative.

**Python:**

```python
RESEARCH_SUBAGENT_SYSTEM = """You are a research specialist.
1. Find authoritative sources on the assigned sub-domain.
2. Extract key claims with evidence.
3. Return structured JSON ONLY:
   {"findings": [
       {"claim": "...", "sources": [{"url": "...", "date": "YYYY-MM-DD", "confidence": 0.0-1.0}]}
   ]}
Do NOT synthesize, editorialize, or pick winners between conflicting sources.
Report facts as stated. Preserve contradictions for verification."""

VERIFICATION_SUBAGENT_SYSTEM = """You are a fact-checker.
1. Read pooled claims from research subagents.
2. Cross-check each claim against its sources.
3. When sources conflict, PRESERVE BOTH with attribution + context.
4. Return JSON:
   {"verifications": [
       {"claim": "...", "verified": true|false, "confidence": 0.0-1.0,
        "sources_reconciled": [...], "notes": "context about conflicts"}
   ]}"""

SYNTHESIS_SUBAGENT_SYSTEM = """You are a synthesis specialist.
1. Read verified findings (JSON).
2. Write a cohesive Markdown narrative with inline [n] citations.
3. Acknowledge data gaps transparently when verifications flag them.
4. Do NOT conduct new research. You have ONLY the Read tool."""
```

**TypeScript:**

```typescript
const RESEARCH_SUBAGENT_SYSTEM = `You are a research specialist.
1. Find authoritative sources on the assigned sub-domain.
2. Extract key claims with evidence.
3. Return structured JSON ONLY:
   {"findings": [
       {"claim": "...", "sources": [{"url": "...", "date": "YYYY-MM-DD", "confidence": 0.0-1.0}]}
   ]}
Do NOT synthesize, editorialize, or pick winners between conflicting sources.
Report facts as stated. Preserve contradictions for verification.`;

const VERIFICATION_SUBAGENT_SYSTEM = `You are a fact-checker.
1. Read pooled claims from research subagents.
2. Cross-check each claim against its sources.
3. When sources conflict, PRESERVE BOTH with attribution + context.
4. Return JSON:
   {"verifications": [
       {"claim": "...", "verified": true|false, "confidence": 0.0-1.0,
        "sources_reconciled": [...], "notes": "context about conflicts"}
   ]}`;

const SYNTHESIS_SUBAGENT_SYSTEM = `You are a synthesis specialist.
1. Read verified findings (JSON).
2. Write a cohesive Markdown narrative with inline [n] citations.
3. Acknowledge data gaps transparently when verifications flag them.
4. Do NOT conduct new research. You have ONLY the Read tool.`;
```

Concept: `tool-calling`

### 3. Spawn research subagents in parallel

All research subagents fire at once via async fan-out. Latency is max(subagents), not sum. The whole point of the architecture. Each subagent receives an explicit task prompt with the context it needs; nothing is inherited from the coordinator's history. Cost: N separate API calls. Worth it.

**Python:**

```python
import asyncio
from anthropic import AsyncAnthropic

client = AsyncAnthropic()

async def spawn_research(task: str) -> dict:
    """One isolated research subagent. No inherited history."""
    resp = await client.messages.create(
        model="claude-sonnet-4.5",
        max_tokens=2048,
        system=RESEARCH_SUBAGENT_SYSTEM,
        tools=RESEARCH_TOOLS,  # [Read, WebSearch, Bash]
        messages=[{"role": "user", "content": task}],
    )
    return {
        "task": task,
        "stop_reason": resp.stop_reason,
        "result": extract_json(resp),
    }

async def research_in_parallel(tasks: list[str]) -> list[dict]:
    """Fan out N subagents at once. Latency = max, not sum."""
    return await asyncio.gather(*(spawn_research(t) for t in tasks))

# Coordinator usage
tasks = decompose_query("impact of AI on creative industries")
results = asyncio.run(research_in_parallel(tasks))
```

**TypeScript:**

```typescript
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();

async function spawnResearch(task: string) {
  // One isolated research subagent. No inherited history.
  const resp = await client.messages.create({
    model: "claude-sonnet-4.5",
    max_tokens: 2048,
    system: RESEARCH_SUBAGENT_SYSTEM,
    tools: RESEARCH_TOOLS, // [Read, WebSearch, Bash]
    messages: [{ role: "user", content: task }],
  });
  return {
    task,
    stop_reason: resp.stop_reason,
    result: extractJson(resp),
  };
}

async function researchInParallel(tasks: string[]) {
  // Fan out N subagents at once. Latency = max, not sum.
  return Promise.all(tasks.map((t) => spawnResearch(t)));
}

// Coordinator usage
const tasks = decomposeQuery("impact of AI on creative industries");
const results = await researchInParallel(tasks);
```

Concept: `agentic-loops`

### 4. Return structured error context, never silence

When a subagent times out or hits no results, the WORST thing it can do is return []. The coordinator then can't tell whether 'no info exists' or 'we never got the data'. A critical distinction for the final report. Always return a structured error: status + query + partial_results + alternatives. The coordinator inspects status_code and decides: retry, narrow, or transparently mark the gap.

**Python:**

```python
def handle_subagent_error(error_type: str, query: str, partial: list | None = None) -> dict:
    """Structured error context. Coordinator consumes status_code to decide."""
    if error_type == "timeout":
        return {
            "status": "timeout",
            "query": query,
            "partial_results": partial or [],
            "alternatives": [
                "Narrow the query to a single sub-aspect",
                "Try synonyms or domain-specific terms",
                "Reduce the time horizon (e.g., last 12 months)",
            ],
        }
    if error_type == "no_results":
        return {
            "status": "no_results",
            "query": query,
            "alternatives": ["Broaden keywords", "Check spelling", "Try sister terms"],
        }
    if error_type == "rate_limited":
        return {"status": "rate_limited", "query": query, "retry_after_s": 60}
    return {"status": "unknown_error", "query": query}

# Coordinator inspects status_code
def coordinator_handle(result: dict, original_task: str):
    status = result.get("status")
    if status == "timeout":
        # Either retry with the first alternative or include a transparent gap
        return retry_with_narrower_query(result["alternatives"][0])
    if status == "no_results":
        return mark_data_unavailable(original_task)
    return result  # success
```

**TypeScript:**

```typescript
function handleSubagentError(
  errorType: string,
  query: string,
  partial?: unknown[],
): Record<string, unknown> {
  // Structured error context. Coordinator consumes status_code to decide.
  if (errorType === "timeout") {
    return {
      status: "timeout",
      query,
      partial_results: partial ?? [],
      alternatives: [
        "Narrow the query to a single sub-aspect",
        "Try synonyms or domain-specific terms",
        "Reduce the time horizon (e.g., last 12 months)",
      ],
    };
  }
  if (errorType === "no_results") {
    return {
      status: "no_results",
      query,
      alternatives: ["Broaden keywords", "Check spelling", "Try sister terms"],
    };
  }
  if (errorType === "rate_limited") {
    return { status: "rate_limited", query, retry_after_s: 60 };
  }
  return { status: "unknown_error", query };
}

// Coordinator inspects status_code
function coordinatorHandle(
  result: Record<string, unknown>,
  originalTask: string,
) {
  const status = result.status;
  if (status === "timeout") {
    const alts = result.alternatives as string[];
    return retryWithNarrowerQuery(alts[0]);
  }
  if (status === "no_results") {
    return markDataUnavailable(originalTask);
  }
  return result; // success
}
```

Concept: `structured-outputs`

### 5. Run the verification subagent and preserve contradictions

Pool all claims from research subagents and pass them to a single verification subagent. When two sources conflict (45% Pew vs 12% McKinsey), the verification subagent's job is NOT to pick a winner. It is to preserve both with their context (different definitions, different timeframes, different populations) and attribute each to its source. Picking one is misinformation; preserving both is journalism.

**Python:**

```python
def build_verification_task(pooled_claims: list[dict]) -> str:
    """Pool all research-subagent claims for fact-checking."""
    body = "\n".join(
        f"- Claim: {c['claim']}\n  Sources: {c['sources']}"
        for c in pooled_claims
    )
    return f"""Verify the following claims pooled from research subagents.

For each claim:
1. Check that sources are credible and dated.
2. If two or more sources CONFLICT, do NOT pick a winner.
   Preserve both with their context (timeframe, definition, population)
   and emit both in sources_reconciled with notes explaining the apparent
   contradiction. The reader gets BOTH numbers + WHY they differ.
3. Return JSON:
   {{"verifications": [
       {{"claim": "...", "verified": true|false, "confidence": 0.0-1.0,
         "sources_reconciled": [...], "notes": "..." }}
   ]}}

CLAIMS TO VERIFY:
{body}"""

# Coordinator pools claims and dispatches
all_claims = [c for r in research_results for c in r.get("findings", [])]
verification_task = build_verification_task(all_claims)
verified = client.messages.create(
    model="claude-sonnet-4.5",
    max_tokens=4096,
    system=VERIFICATION_SUBAGENT_SYSTEM,
    tools=VERIFICATION_TOOLS,
    messages=[{"role": "user", "content": verification_task}],
)
```

**TypeScript:**

```typescript
function buildVerificationTask(
  pooledClaims: Array<{ claim: string; sources: unknown[] }>,
): string {
  // Pool all research-subagent claims for fact-checking.
  const body = pooledClaims
    .map(
      (c) => `- Claim: ${c.claim}\n  Sources: ${JSON.stringify(c.sources)}`,
    )
    .join("\n");
  return `Verify the following claims pooled from research subagents.

For each claim:
1. Check that sources are credible and dated.
2. If two or more sources CONFLICT, do NOT pick a winner.
   Preserve both with their context (timeframe, definition, population)
   and emit both in sources_reconciled with notes explaining the apparent
   contradiction. The reader gets BOTH numbers + WHY they differ.
3. Return JSON:
   {"verifications": [
       {"claim": "...", "verified": true|false, "confidence": 0.0-1.0,
        "sources_reconciled": [...], "notes": "..." }
   ]}

CLAIMS TO VERIFY:
${body}`;
}

// Coordinator pools claims and dispatches
const allClaims = researchResults.flatMap(
  (r) => (r.findings as Array<{ claim: string; sources: unknown[] }>) ?? [],
);
const verificationTask = buildVerificationTask(allClaims);
const verified = await client.messages.create({
  model: "claude-sonnet-4.5",
  max_tokens: 4096,
  system: VERIFICATION_SUBAGENT_SYSTEM,
  tools: VERIFICATION_TOOLS,
  messages: [{ role: "user", content: verificationTask }],
});
```

Concept: `evaluation`

### 6. Run the synthesis subagent with READ-ONLY tools

Synthesis is the final step. It receives verified claims + the coordinator's narrative prompt, and emits Markdown with inline citations. The crucial detail: the synthesis subagent's tool list is [Read] only. No WebSearch, no Bash. That restriction prevents it from re-researching mid-narrative (a common failure mode where synthesis fact-checks itself again and inflates latency 3-5×).

**Python:**

```python
def synthesize(verified_claims: list[dict], narrative_prompt: str) -> str:
    """Read-only narrative generation. No re-research."""
    task = f"""Narrative prompt from coordinator:
{narrative_prompt}

Verified findings (JSON):
{json.dumps(verified_claims, indent=2)}

Write a Markdown report that:
1. Flows logically from finding to finding
2. Uses inline citations [1], [2], etc. matching sources_reconciled order
3. ACKNOWLEDGES data gaps transparently where verifications.notes flagged them
4. Does NOT re-research or speculate beyond the verified findings

You have ONLY the Read tool. No WebSearch, no Bash."""

    resp = client.messages.create(
        model="claude-sonnet-4.5",
        max_tokens=3000,
        system=SYNTHESIS_SUBAGENT_SYSTEM,
        tools=[READ_TOOL_ONLY],  # Critical: read-only
        messages=[{"role": "user", "content": task}],
    )
    return extract_text(resp)
```

**TypeScript:**

```typescript
async function synthesize(
  verifiedClaims: unknown[],
  narrativePrompt: string,
): Promise<string> {
  // Read-only narrative generation. No re-research.
  const task = `Narrative prompt from coordinator:
${narrativePrompt}

Verified findings (JSON):
${JSON.stringify(verifiedClaims, null, 2)}

Write a Markdown report that:
1. Flows logically from finding to finding
2. Uses inline citations [1], [2], etc. matching sources_reconciled order
3. ACKNOWLEDGES data gaps transparently where verifications.notes flagged them
4. Does NOT re-research or speculate beyond the verified findings

You have ONLY the Read tool. No WebSearch, no Bash.`;

  const resp = await client.messages.create({
    model: "claude-sonnet-4.5",
    max_tokens: 3000,
    system: SYNTHESIS_SUBAGENT_SYSTEM,
    tools: [READ_TOOL_ONLY], // Critical: read-only
    messages: [{ role: "user", content: task }],
  });
  return extractText(resp);
}
```

Concept: `context-window`

### 7. Route ALL communication through the coordinator

If subagent B needs a finding from subagent A, the answer is NOT to call A from B. The answer is: A finishes, returns to coordinator, coordinator passes the finding into B's task prompt. This single rule preserves isolation (each subagent has clean context), parallelism (when dependencies allow), and visibility (the coordinator owns the whole orchestration graph).

**Python:**

```python
# WRONG. Direct subagent-to-subagent communication
# class ResearcherA:
#     def __init__(self, researcher_b):
#         self.b = researcher_b  # ❌ creates hidden coupling
#     async def find_papers(self, topic):
#         finding = await self._search(topic)
#         await self.b.search_web(finding)  # ❌ breaks isolation

# RIGHT. Coordinator owns all routing
async def coordinated_dependent_research(topic: str):
    # Phase 1: A runs alone
    a_finding = await spawn_research(f"Find academic papers on {topic}")

    # Phase 2: coordinator passes A's finding into B's TASK PROMPT
    b_task = (
        f"Topic: {topic}. "
        f"Key research direction surfaced by paper search: "
        f"{a_finding['result']['top_direction']}. "
        f"Now search the web for industry coverage of that direction."
    )
    b_result = await spawn_research(b_task)

    # Phase 3: coordinator continues with both
    return {"a": a_finding, "b": b_result}
```

**TypeScript:**

```typescript
// WRONG. Direct subagent-to-subagent communication
// class ResearcherA {
//   constructor(private b: ResearcherB) {} // ❌ creates hidden coupling
//   async findPapers(topic: string) {
//     const finding = await this.search(topic);
//     await this.b.searchWeb(finding); // ❌ breaks isolation
//   }
// }

// RIGHT. Coordinator owns all routing
async function coordinatedDependentResearch(topic: string) {
  // Phase 1: A runs alone
  const aFinding = await spawnResearch(`Find academic papers on ${topic}`);

  // Phase 2: coordinator passes A's finding into B's TASK PROMPT
  const bTask =
    `Topic: ${topic}. Key research direction surfaced by paper search: ` +
    `${(aFinding.result as { top_direction: string }).top_direction}. ` +
    `Now search the web for industry coverage of that direction.`;
  const bResult = await spawnResearch(bTask);

  // Phase 3: coordinator continues with both
  return { a: aFinding, b: bResult };
}
```

Concept: `subagents`

### 8. Cap parallelism and add retry budgets

Parallel fan-out has diminishing returns past 5-7 subagents. API concurrency limits, context-window contention on the coordinator side, and rate-limit backpressure all kick in. Cap concurrency, set a retry budget per subagent (typically 2 retries with narrowed queries), and emit telemetry: spawn count, parallel max, retry rate, partial-data rate. These metrics are how you tune the system in production.

**Python:**

```python
import asyncio

MAX_PARALLEL = 5
RETRY_BUDGET = 2

semaphore = asyncio.Semaphore(MAX_PARALLEL)

async def spawn_with_retry(task: str, attempt: int = 0) -> dict:
    """Bounded-concurrency research with structured-error retry."""
    async with semaphore:
        result = await spawn_research(task)
        status = result.get("result", {}).get("status")
        if status == "timeout" and attempt < RETRY_BUDGET:
            narrower = narrow_query(result["result"].get("alternatives", [task])[0])
            telemetry.increment("subagent.retry", labels={"reason": "timeout"})
            return await spawn_with_retry(narrower, attempt + 1)
        return result

# Coordinator with bounded fan-out + retries
results = await asyncio.gather(*(spawn_with_retry(t) for t in tasks))
```

**TypeScript:**

```typescript
const MAX_PARALLEL = 5;
const RETRY_BUDGET = 2;

// Lightweight semaphore for bounded concurrency
class Semaphore {
  private q: Array<() => void> = [];
  constructor(private avail: number) {}
  async acquire(): Promise<void> {
    if (this.avail > 0) {
      this.avail--;
      return;
    }
    return new Promise((res) => this.q.push(res));
  }
  release(): void {
    const next = this.q.shift();
    if (next) next();
    else this.avail++;
  }
}

const sem = new Semaphore(MAX_PARALLEL);

async function spawnWithRetry(task: string, attempt = 0): Promise<unknown> {
  await sem.acquire();
  try {
    const result = await spawnResearch(task);
    const status = (result.result as { status?: string })?.status;
    if (status === "timeout" && attempt < RETRY_BUDGET) {
      const alts = (result.result as { alternatives?: string[] })?.alternatives ?? [task];
      telemetry.increment("subagent.retry", { reason: "timeout" });
      sem.release();
      return spawnWithRetry(narrowQuery(alts[0]), attempt + 1);
    }
    return result;
  } finally {
    sem.release();
  }
}

const results = await Promise.all(tasks.map((t) => spawnWithRetry(t)));
```

Concept: `evaluation`

## Decision matrix

| Decision | Right answer | Wrong answer | Why |
|---|---|---|---|
| Coverage gap appears in the final report | Audit the coordinator's decomposition first. It almost always lives there | Tune subagent prompts or upgrade their model | Decomposition is the coordinator's job. If 4 of 5 sub-domains were never enumerated, no amount of subagent quality recovers them. Fix decomposition; the rest follows. |
| Subagent timed out. What does it return? | Structured error: {status: 'timeout', query, partial_results, alternatives} | Empty list [], marked as success | Silence loses the failure context. The coordinator can't distinguish 'no info exists' from 'we never got the data', so the final report can't acknowledge the gap honestly. |
| Two sources disagree (45% Pew vs 12% McKinsey) | Verification preserves both with attribution + context (different definitions, different timeframes) | Synthesis picks the 'more likely' one and drops the other | Both numbers are correct under their respective definitions. Dropping one is misinformation. Preserving both with context is the journalistic move and the architectural one. |
| Subagent A's output is needed by Subagent B | A returns to coordinator; coordinator passes A's finding into B's task prompt | A calls B directly with the finding | Direct subagent-to-subagent calls break isolation, kill parallelism (B waits for A), and hide the dependency from the coordinator's view. Hub-and-spoke is the architecture for a reason. |

## Failure modes

| Anti-pattern | Failure | Fix |
|---|---|---|
| AP-10 · Narrow task decomposition | Coordinator decomposes 'creative industries' into visual arts only; report misses music, writing, film, performing arts. Subagents finished successfully. The bug is upstream. | Fix the coordinator's semantic decomposition. Enumerate ALL relevant sub-domains before spawning. The decomposition step is the coordinator's load-bearing responsibility. |
| AP-11 · Silent timeout returns empty as success | Web-search subagent times out, returns []. Coordinator treats as 'no information exists' instead of 'timed out'. Final report has a silent gap. | Return structured error: {status: 'timeout', query, partial_results, alternatives}. Coordinator inspects status_code, retries with a narrower scope, or marks the gap transparently in the report. |
| AP-12 · Latency bloat from over-broad verification | Synthesis subagent calls verify_fact for 100 claims sequentially. 80 are simple (Wikipedia), 20 complex. Total 60+ seconds for what should be 10. | Scope verify_fact narrowly: simple-claim batch verification (parallel, ~3s) + dedicated complex-verification subagent (parallel, pre-synthesis). Synthesis assumes facts are pre-verified. |
| AP-13 · Dropped contradictions | Two sources conflict (45% any-use Pew vs 12% daily-use McKinsey). Synthesis picks the 'more likely' one; the other is dropped. Report is misinformation. | Preserve both at the verification step with sources_reconciled + notes explaining the apparent conflict. Synthesis presents both with attribution. Reader sees both numbers and why they differ. |
| AP-14 · Direct subagent-to-subagent communication | Researcher A (papers) directly hands a finding to Researcher B (web). Isolation breaks; parallelism degrades to sequential; coordinator loses visibility. | Route everything through the coordinator. A returns to coordinator; coordinator constructs B's task prompt with A's finding embedded. Hub-and-spoke is non-negotiable. |

## Implementation checklist

- [ ] Coordinator's decomposition function reviewed for completeness BEFORE first spawn (`subagents`)
- [ ] Each subagent has its own system prompt + scoped tool whitelist (`tool-calling`)
- [ ] Synthesis subagent has Read-only tool list (no WebSearch, no Bash) (`context-window`)
- [ ] All subagent fan-out via async gather / Promise.all (`agentic-loops`)
- [ ] Structured error context on every timeout / no-results path (`structured-outputs`)
- [ ] Verification subagent preserves contradictions with attribution (`evaluation`)
- [ ] No direct subagent-to-subagent calls anywhere in the codebase (`subagents`)
- [ ] Coordinator inspects status_code on every subagent return
- [ ] Bounded concurrency (semaphore) to cap parallel API calls
- [ ] Per-subagent retry budget with narrowed-query alternatives
- [ ] Telemetry: spawn count, parallel max, retry rate, partial-data rate

## Cost &amp; latency

- **Research subagents (3-5 parallel):** ~$0.06-0.15 per query, 3 subagents × ~20K input + ~2K output ≈ $0.06; 5 subagents ≈ $0.10. Parallel: latency = max(subagents) ≈ 3-5s, cost = sum.
- **Verification subagent:** ~$0.03-0.05 per query, Reads pooled findings (~15K) + cross-checks (~10K) + emits verifications (~1K) ≈ $0.04. Single pass, serial.
- **Synthesis subagent:** ~$0.02-0.03 per query, Reads verified findings (~5K) + generates Markdown narrative (~3K). Read-only tools keep cost low; no re-research.
- **Retry overhead (timeouts):** ~$0.01-0.02 per retry, Narrowed retries (~10K input + 1K output). Cap at 2 retries per subagent to bound cost; beyond that, accept partial data and mark the gap.
- **p95 end-to-end latency:** ~10-14s, Decompose ~0.5s + parallel research ~5s + verification ~3s + synthesis ~3s + coordinator overhead. Subagents in parallel save ~10s vs sequential.

## Domain weights

- **D1 · Agentic Architectures (27%):** Coordinator + research/verification/synthesis subagent loops + hub-and-spoke routing
- **D2 · Tool Design + Integration (18%):** Per-subagent tool whitelist + structured error context + verification rubric

## Practice questions

### Q1. A research system decomposes 'impact of AI on creative industries' into three subtopics: visual arts, music, writing. The web-search subagent finds results for all three. The synthesis subagent produces a report covering only visual arts. Why?

The root cause is the coordinator's decomposition, not the subagents. Subagents finished successfully on what they were assigned; the coordinator only assigned visual-arts tasks. The fix is upstream: analyse the topic semantically and enumerate all relevant sub-domains (visual + music + writing + film + performing arts) before spawning. Decomposition is the coordinator's load-bearing responsibility, and the most-tested distractor on this scenario is to blame the subagents instead.

### Q2. A web-search subagent times out and returns an empty result list. The coordinator treats this as 'no information available' and moves forward. The final report is incomplete. What's the architectural fix?

Subagents must return structured error context on timeout, never silence. The shape: {status: 'timeout', query, partial_results, alternatives: ['narrower query', 'different keywords', ...]}. The coordinator inspects status_code and decides: retry with the first alternative, accept partial data, or transparently mark the gap in the synthesis. Returning [] as success conflates 'no info exists' with 'we never got the data'. And the report can't acknowledge a gap it can't see. Tagged to AP-11.

### Q3. A research report cites two conflicting statistics: '45% of creative workers use AI' (Pew) and '12% use AI daily' (McKinsey). Should synthesis pick the more likely one?

No. Both are correct. They measure different things (any-use vs daily-use) under different methodologies. The verification subagent's job is to preserve both with attribution + context, structured as sources_reconciled: [{stat: '45%', source: 'Pew', context: 'any use'}, {stat: '12%', source: 'McKinsey', context: 'daily use'}] plus a notes field explaining the apparent contradiction. Synthesis then presents both with attribution. Picking one is misinformation; preserving both is journalism. Tagged to AP-13.

### Q4. Subagent A (academic papers) finds a key research direction. Subagent B (web search) needs that finding to guide its queries. Should A pass it directly to B?

No. All cross-subagent communication routes through the coordinator. A returns its finding; the coordinator constructs B's task prompt with the finding embedded: Topic: ... Key direction from paper search: [A's finding]. Now search the web for that direction. Direct calls break isolation (B inherits A's context noise), kill parallelism (B waits for A even when it could run in parallel), and hide the dependency from the coordinator's orchestration graph. Hub-and-spoke is non-negotiable. Tagged to AP-14.

### Q5. A synthesis subagent needs to verify ~100 facts in a final report. Calling verify_fact sequentially takes 60+ seconds. What's the architectural fix?

Two layers. First, scope verify_fact narrowly: simple-claim verification (Wikipedia lookups, ~5ms each) batched in parallel completes ~80 claims in ~2s. Second, dedicate a separate verification subagent that runs before synthesis, processing complex multi-source reconciliation in parallel; synthesis then assumes facts are pre-verified and only reads. Total latency drops from 60s+ to ~3-5s. The general lesson: don't let the synthesis subagent re-research mid-narrative. Tagged to AP-12.

## FAQ

### Q1. Should subagents run in parallel or in sequence?

Parallel whenever possible. Independent research tasks (visual arts, music, writing) all run at once via asyncio.gather / Promise.all. Cost: N separate API calls. Latency: max(N) ≈ 5-8s, not sum. Sequence only when there's a true data dependency (B needs A's output). And even then, the coordinator handles the chaining; subagents never call each other.

### Q2. Can subagents inherit the coordinator's conversation history?

No. Subagents are isolated by design. That's the architectural win. The coordinator passes context explicitly in the subagent's task prompt: User asked: [query]. Key context so far: [pinned facts]. Your task: [focused research goal]. Subagent starts fresh with only what's in the task. Inheriting history defeats parallelism and bloats per-subagent context cost.

### Q3. What happens if multiple subagents return partial / timeout results?

Coordinator collects what came back, invokes synthesis with a narrative-prompt note: Research is incomplete due to timeouts on [X, Y]. The report should acknowledge gaps in those areas explicitly. Transparency beats silence. The reader sees a report that says 'we got these 3 sub-domains; the other 2 timed out' rather than a confidently-misleading report missing 2 whole sub-domains.

### Q4. Should the synthesis subagent have web-search access?

No. Read-only is the architectural detail. Synthesis stitches verified findings into a narrative; it does not re-research. If synthesis needs to verify a fact mid-sentence, that's a sign verification should have been broader upstream. Fix the verification phase, not the synthesis tool list. Read-only also caps the latency and cost of synthesis predictably.

### Q5. How do we handle contradictions surfaced by research subagents?

Don't resolve them at the subagent level. Pass conflicting findings to the verification subagent with sources intact. Verification reconciles: preserves both with attribution + a notes field explaining the conflict (different timeframes, definitions, populations, methodologies). Synthesis then presents both with context. The reader gets transparency; the system avoids fabricating false certainty.

### Q6. Can a subagent spawn another subagent (nested)?

In theory yes; in practice avoid it. Nested subagents increase latency, complicate context flow, and obscure the orchestration graph from the coordinator. Keep the hierarchy shallow: coordinator → leaf subagents. If you need 'meta-research' (one subagent's job is to figure out what to research), have the coordinator do that decomposition step explicitly.

### Q7. What's the maximum number of subagents to run in parallel?

No hard limit, diminishing returns past 5-7. API concurrency limits, rate-limit backpressure, and context-window contention on the coordinator side all start kicking in. Use a bounded semaphore (MAX_PARALLEL = 5), measure latency at different fan-outs, and tune to your workload. For 10+ tasks, consider Batch API or sequential task chains.

## Production readiness

- [ ] Decomposition function unit-tested with 5+ representative topic types
- [ ] Each subagent's tool whitelist verified. No Bash + Edit + Write together unless intended
- [ ] Structured error contract enforced via TypeScript / Pydantic schema
- [ ] Bounded concurrency (semaphore) tested under burst load (20+ queued tasks)
- [ ] Retry budget per subagent capped (default 2); telemetry on retry rate
- [ ] Verification subagent has fact-check rubric + contradiction-preservation rule documented
- [ ] Synthesis subagent's tool list is [Read] only; lint check prevents regression
- [ ] End-to-end test: contradictions survive from research → verification → synthesis with attribution

---

**Source:** https://claudearchitectcertification.com/scenarios/multi-agent-research-system
**Vault sources:** ACP-T05 §Scenario 3 (5 ✅/❌ pairs · official guide scenario); ACP-T08 §3.3 metadata; Course 16 Subagents. Lesson 03 designing effective subagents; ACP-T06 (5 practice Qs tagged to components); GAI-K05 CCA exam questions and scenarios (Scenario 3 walkthrough); COD-K04 Feynman architecture review (multi-agent patterns)
**Last reviewed:** 2026-05-04

**Evidence tiers**, 🟢 official Anthropic doc · 🟡 partial doc / inferred · 🟠 community-derived · 🔴 disputed.
