P3.4 · D1 + D2 · Process

Developer Productivity Agent.

Think of this as the agent that helps a developer rename a function across a 1,000-file codebase without losing their afternoon to it. Instead of one big agent reading every file (which fails because there's too much to remember), the work is split: one tiny helper finds the matching files with grep, another reads only those files, a third reviews the proposed changes from a fresh perspective so it doesn't just rubber-stamp its own work, and a human approves the final commit. The whole point is that productivity tasks are easier when you delegate to specialists rather than asking one agent to do everything alone.

22 min build·5 components·8 concepts

A team of specialised subagents that handle codebase exploration, code review, and doc generation. Each with a narrow tool whitelist and isolated context. The lead agent uses Grep + Glob first to locate relevant files (never Read everything), delegates review to an independent reviewer subagent to avoid confirmation bias, and gates every destructive action behind the 4D Framework (Delegation → Description → Discernment → Diligence) where a human clicks approve. Distribution beats monoliths: 4-5 tools per subagent across 2-3 specialists routes more accurately than one 15-tool agent.

45% exam weight
SourceOfficial Anthropic guide scenario · in published exam guide
What do the colours mean?
Green
Official Anthropic doc or API contract
Yellow
Partial doc / inferred
Orange
Community-derived
Red
Disputed / changes frequently
Stack
Claude Code · Git · MCP language servers (optional)
Needs
Built-in tools · subagents · 4D Framework
Exam
45% of CCA-F (D1 + D2). 27% D1 · 18% D2. Highest-weight scenario on the test. Master this one and you've covered most of it.
Loop — the ACP mascot — illustrated as a calm customer-support agent at a walnut desk with headset, notebook, and a small speech-bubble holding an inbound question.
End-to-end flow45% of CCA-F (D1 + D2)
01 · Problem framing

The problem

What the customer needs

  1. Find every reference to a function across a 1,000-file repo without reading every file.
  2. Review the agent's proposed changes with a fresh perspective so confirmation bias doesn't pass through.
  3. Generate up-to-date docs that reflect the actual code, not last quarter's spec.

Why naive approaches fail

  1. Monolithic single agent with 15 tools trying to do everything → routing accuracy drops 8% per tool past 5; the agent alternates between similar tools and misses obvious matches.
  2. Read every file first to 'understand the codebase' → context floods at file 80; the agent has lost track of the original task by file 200.
  3. Same session generates AND reviews the code → confirmation bias passes through; the reviewer agrees with the writer because it shares the writer's context.
Definition of done
  • Grep / Glob locates matched files first; Read opens only those files
  • Code review runs in an independent subagent with fresh context
  • Tool count ≤ 5 per subagent; specialists distributed across reader / reviewer / docgen
  • 4D Framework approval gate before any auto-merge or destructive action
  • Bash structured-output flags (--format json, --porcelain) replace fragile regex parsing
02 · Architecture

The system

03 · Component detail

What each part does

5 components, each owns a concept. Click any card to drill into the underlying primitive.

Built-in Tool Suite

Read · Write · Edit · Bash · Grep · Glob

The six built-in tools cover almost every productivity task. Use Grep + Glob to locate before Read; use Edit (not Write) for in-place changes to existing files; reserve Bash for actual commands (compile, test, run). Never as a fallback for file I/O. The grep-then-read sequence is the single biggest token-efficiency lever.

Configuration

Tool whitelist per subagent: Reader=[Read,Grep,Glob]; Reviewer=[Read,Grep,Bash(test,lint)]; Doc-gen=[Read,Write,Glob]. Never grant 15 tools to one agent. Accuracy drops 8% per tool past 5.

Concept: tool-calling

Codebase Context Loader

imports · dependencies · architecture

Walks the repo at session start: parses package.json / pyproject.toml, reads README.md + .claude/CLAUDE.md, builds a dependency graph (top 50 imports), surfaces the architecture-decisions doc. Loaded once into the lead agent's context so it doesn't re-discover on every turn.

Configuration

Run on first invocation: project_type, key_dirs, top_imports[], architecture_summary. Persist as .claude/context-cache.json with hash-based invalidation. Refresh on package.json change.

Concept: claude-md-hierarchy

Code-Review Subagent

independent · fresh context

Spawned per change-set with [Read, Grep, Bash(test,lint)] only. No Edit, no Write. Fresh context, no inherited history from the writing agent. Reviews against .claude/rules/ and the codebase context. Returns { verdict, issues: [{ line, severity, message }], summary }. Independence is the architectural point. Same-session review just rubber-stamps.

Configuration

system: 'You are a code reviewer. Read only. Check against .claude/rules/ and existing patterns.' tools: [Read, Grep, Bash(npm test, npm run lint)]. Receives diff + context summary; returns structured verdict.

Concept: evaluation

Doc-Generator Subagent

writes from code, not from spec

Specialised subagent that reads source files and generates documentation that reflects what the code actually does, not what last quarter's spec said it would do. Writes JSDoc / docstrings inline (Edit), README sections (Write), or external API docs (Write to docs/). Tools scoped to read source + write docs only.

Configuration

system: 'Generate docs from source. Use the code as ground truth.' tools: [Read, Write(docs/**, *.md), Glob]. Run after code-review-subagent passes; auto-attached as part of the PR.

Concept: subagents

MCP Server Integrations

language servers · linters · formatters

Optional but high-leverage in multi-language repos. Hook in pyright for Python type info, tsc --noEmit for TS, eslint/ruff for lint, prettier/black for formatting. Each MCP server adds language-specific context the agent can query without re-implementing it. Selected per file extension; not all loaded at once.

Configuration

MCP registry per workspace: { '.ts': ['tsc', 'eslint'], '.py': ['pyright', 'ruff'], '.go': ['gopls'] }. Routed by file extension; agent calls mcp.lint(file) and gets structured diagnostics.

Concept: mcp
04 · One concrete run

Data flow

05 · Build it

Eight steps to production

01

Profile the codebase before any task

On first invocation, run a one-time codebase-context loader: language(s), framework, top entry points, dependency graph. Cache to .claude/context-cache.json. The lead agent loads this on every subsequent task instead of re-discovering. Saves ~4K tokens per turn.

Profile the codebase before any task
# scripts/profile_codebase.py
import json, subprocess
from pathlib import Path
from collections import Counter

def profile() -> dict:
    profile = {"languages": [], "framework": None, "key_dirs": [], "top_imports": []}

    if Path("package.json").exists():
        pkg = json.loads(Path("package.json").read_text())
        profile["languages"].append("typescript")
        deps = list((pkg.get("dependencies") or {}).keys())
        if "next" in deps: profile["framework"] = "next.js"
        elif "react" in deps: profile["framework"] = "react"

    if Path("pyproject.toml").exists():
        profile["languages"].append("python")

    # Top 50 imported modules across the repo
    imports = Counter()
    for ext, regex in [("ts", r"^import .* from ['\"]"), ("py", r"^(import|from) ")]:
        for f in Path(".").rglob(f"*.{ext}"):
            if "node_modules" in f.parts or ".venv" in f.parts:
                continue
            for line in f.read_text(errors="ignore").splitlines()[:50]:
                if line.startswith(("import", "from")):
                    imports[line.split()[1].rstrip(',')] += 1
    profile["top_imports"] = [m for m, _ in imports.most_common(50)]

    profile["key_dirs"] = [
        p.name for p in Path(".").iterdir()
        if p.is_dir() and not p.name.startswith(".") and p.name not in ("node_modules", "dist")
    ]

    Path(".claude/context-cache.json").write_text(json.dumps(profile, indent=2))
    return profile

if __name__ == "__main__":
    print(json.dumps(profile(), indent=2))
↪ Concept: claude-md-hierarchy
02

Always Grep + Glob before Read

The single biggest exploration anti-pattern is Read on every file in a directory. Instead, Grep finds the symbol or pattern across the repo (returns matched files + line numbers); Glob narrows by path; only Read the files that matched. On a 1000-file repo searching for a function name, this is the difference between 1000 Reads and 12 Reads.

Always Grep + Glob before Read
# Wrong. Read everything, hope for the best
# for f in glob('**/*.ts'):
#     content = read_file(f)  # 1000 reads, context floods, agent gets lost

# Right. Grep first to locate, then Read only matched
def find_function_usages(name: str) -> list[dict]:
    """Grep + Glob first, Read only what matched."""
    # Step 1: Grep finds the pattern across the repo
    grep_result = run_tool("Grep", {
        "pattern": rf"\b{name}\b",
        "type": "ts",
        "output_mode": "files_with_matches",  # NOT 'content'. Too verbose
    })
    matched_files = grep_result["files"]  # e.g. 12 files, not 1000

    # Step 2: Glob filters further if needed
    if len(matched_files) > 30:
        # Too many. Narrow by path
        matched_files = [f for f in matched_files if "src/" in f]

    # Step 3: Read only the files that matched
    file_contents = {}
    for f in matched_files[:20]:  # cap at 20 to bound context
        file_contents[f] = run_tool("Read", {"file_path": f})

    return [{"file": f, "content": c} for f, c in file_contents.items()]
↪ Concept: context-window
03

Distribute tools across 2-3 specialised subagents

One 15-tool agent routes worse than three 5-tool specialists. Define Reader (Grep + Glob + Read), Reviewer (Read + Grep + Bash test/lint), Doc-Generator (Read + Write + Glob). Each runs in its own context. The lead agent coordinates and merges results. Routing accuracy stays high because each specialist has a clear domain.

Distribute tools across 2-3 specialised subagents
# Specialist subagent definitions
SPECIALISTS = {
    "reader": {
        "system": "You are a codebase reader. Use Grep+Glob to locate, "
                  "then Read only what matched. Never modify files.",
        "tools": ["Read", "Grep", "Glob"],
    },
    "reviewer": {
        "system": "You are an independent code reviewer. Fresh context, "
                  "no carry-over from the writer. Check against .claude/rules/.",
        "tools": ["Read", "Grep", "Bash(npm test, npm run lint, ruff check)"],
    },
    "docgen": {
        "system": "You generate docs from source. Use the code as ground truth.",
        "tools": ["Read", "Write(docs/**, *.md)", "Glob"],
    },
}

def spawn_specialist(role: str, task: str) -> dict:
    spec = SPECIALISTS[role]
    return client.messages.create(
        model="claude-sonnet-4.5",
        max_tokens=2048,
        system=spec["system"],
        tools=load_tools(spec["tools"]),
        messages=[{"role": "user", "content": task}],
    )

# Lead agent coordinates
def refactor_function(name: str, new_name: str):
    locations = spawn_specialist("reader", f"Find all usages of {name}")
    proposed_diff = generate_renames(locations, new_name)
    review = spawn_specialist("reviewer", f"Review this diff: {proposed_diff}")
    if review["verdict"] != "approve":
        return {"status": "review_failed", "issues": review["issues"]}
    return {"status": "ready_to_apply", "diff": proposed_diff, "review": review}
↪ Concept: subagents
04

Spawn the reviewer in a SEPARATE session

If the reviewer shares context with the writer, it inherits the writer's assumptions and rubber-stamps. The fix is structural: the reviewer subagent runs with a fresh messages array, fresh system prompt, no inherited tool history. It sees only the diff + the project rules. The independence is the entire point. Confirmation bias is structural, not philosophical.

Spawn the reviewer in a SEPARATE session
def review_diff_independently(diff: str, file_paths: list[str]) -> dict:
    """Spawn a fresh-context reviewer. NO carry-over from the writer."""
    REVIEWER_SYSTEM = """You are an independent code reviewer.

You have NO context from the agent that wrote this diff. You see only:
1. The diff itself
2. The current state of the affected files
3. The project rules in .claude/rules/

Check for:
- Style conformance (matches .claude/rules/)
- Test coverage (does the diff change behavior without test changes?)
- Type safety (any unsafe casts or 'any' added?)
- API breakage (did a public signature change?)

Return JSON: { verdict: 'approve'|'request_changes', issues: [{ file, line, severity, message }], summary }"""

    # CRITICAL: fresh messages array, fresh system, fresh tool list
    return client.messages.create(
        model="claude-sonnet-4.5",
        max_tokens=2048,
        system=REVIEWER_SYSTEM,
        tools=[READ_TOOL, GREP_TOOL, BASH_TEST_LINT_TOOL],  # no Edit/Write
        messages=[{
            "role": "user",
            "content": f"Diff to review:\n{diff}\n\nAffected files: {file_paths}",
        }],
    )
↪ Concept: evaluation
05

Use a scratchpad file for cross-turn state

Long productivity tasks (rename across 50 files, refactor 3 services) span many turns. Don't try to keep all state in the message history. Write it to .claude/scratchpad.md. The lead agent reads it at the start of every turn, updates it after each subagent returns, and trims old entries. The scratchpad is the agent's working memory; the message history is the audit trail.

Use a scratchpad file for cross-turn state
# .claude/scratchpad.md. Agent working memory across turns
SCRATCHPAD = Path(".claude/scratchpad.md")

def update_scratchpad(section: str, content: str):
    """Append-and-replace by section. Trims old entries past 20."""
    if not SCRATCHPAD.exists():
        SCRATCHPAD.write_text("# Agent scratchpad\n\n")
    text = SCRATCHPAD.read_text()
    # Replace existing section or append
    marker = f"## {section}"
    if marker in text:
        # Replace existing section
        before, _, rest = text.partition(marker)
        _, _, after = rest.partition("\n## ")
        new = before + marker + "\n" + content + "\n## " + after if "\n## " in rest else before + marker + "\n" + content
    else:
        new = text + "\n" + marker + "\n" + content
    SCRATCHPAD.write_text(new)

# Usage in the agent loop
update_scratchpad(
    "Rename Task: FooBar -> BazQux",
    "Files matched: 12. Already updated: 8. Remaining: src/old.ts, src/legacy.ts.",
)
↪ Concept: claude-md-hierarchy
06

Wire MCP servers per language extension

Multi-language repos benefit from language-specific MCP servers. pyright for Python types, tsc --noEmit for TS, eslint and prettier for JS, ruff for Python lint. Don't load all MCP servers at session start. Route by file extension when a tool needs them. Keeps context lean, reduces overhead.

Wire MCP servers per language extension
# Per-extension MCP routing
MCP_BY_EXTENSION = {
    ".ts": ["tsc", "eslint", "prettier"],
    ".tsx": ["tsc", "eslint", "prettier"],
    ".py": ["pyright", "ruff", "black"],
    ".go": ["gopls", "gofmt"],
    ".rs": ["rust-analyzer", "rustfmt"],
}

def select_mcp_servers(file_paths: list[str]) -> list[str]:
    """Pick the minimum set of MCP servers needed for these files."""
    extensions = {Path(f).suffix for f in file_paths}
    servers = set()
    for ext in extensions:
        for server in MCP_BY_EXTENSION.get(ext, []):
            servers.add(server)
    return sorted(servers)

# In the lead agent before spawning the reviewer
def review_with_mcp(diff: str, files: list[str]):
    mcp_servers = select_mcp_servers(files)
    return spawn_specialist(
        "reviewer",
        f"Review diff: {diff}\nUse MCP servers: {mcp_servers}",
    )
↪ Concept: mcp
07

Implement the 4D approval gate

Delegation (agent proposes), Description (clear output), Discernment (human reviews), Diligence (human approves). The 4D Framework. Code-changing actions are gated behind explicit human approval. The agent presents the diff + the reviewer's verdict + a one-paragraph summary; the human clicks approve or rejects. No auto-merge for productivity tasks; the gate exists because the cost of a bad merge is far higher than the friction of one click.

Implement the 4D approval gate
from typing import Literal, TypedDict

class ApprovalRequest(TypedDict):
    delegation: str    # what the agent proposes (1 sentence)
    description: str   # the diff + summary of changes
    discernment: dict  # reviewer subagent's verdict + issues
    risk_level: Literal["low", "medium", "high"]

def request_approval(req: ApprovalRequest) -> Literal["approve", "reject"]:
    """4D gate. ALWAYS waits for human input. No auto-approve."""
    print("=" * 60)
    print(f"DELEGATION: {req['delegation']}")
    print(f"\nDESCRIPTION:\n{req['description']}")
    print(f"\nDISCERNMENT (review verdict): {req['discernment']['verdict']}")
    if req['discernment'].get('issues'):
        print(f"  Issues:")
        for issue in req['discernment']['issues']:
            print(f"  - {issue['severity']}: {issue['message']}")
    print(f"\nRISK LEVEL: {req['risk_level']}")
    print("=" * 60)
    print("DILIGENCE: Approve? [y/N]: ", end="")
    return "approve" if input().strip().lower() == "y" else "reject"

# Usage in the lead agent
def apply_refactor_safely(diff: str, files: list[str], reviewer_verdict: dict):
    decision = request_approval({
        "delegation": f"Apply rename refactor across {len(files)} files",
        "description": f"Diff:\n{diff}",
        "discernment": reviewer_verdict,
        "risk_level": "medium" if len(files) > 20 else "low",
    })
    if decision == "approve":
        run_tool("Edit", {"diff": diff})
    else:
        print("Refactor rejected by human; no files changed.")
↪ Concept: 4d-framework
08

Use Bash structured output, not regex

Tools that emit JSON (gh pr diff --json, git log --pretty=format:%H%x09%s, npm test --reporter json, eslint --format json) replace fragile prose-parsing. Even simple commands have structured-output flags (git status --porcelain, ls -la --time-style=long-iso). Always prefer them. Regex on free-form output is the reason most agent pipelines break in the third week of production.

Use Bash structured output, not regex
# Wrong. Regex on prose output
# output = run("git log --oneline -10")
# commits = re.findall(r"^([a-f0-9]{7,}) (.+)$", output, re.M)
# # Breaks when commit message contains a colon or special chars

# Right. Structured output, parse JSON
import json, subprocess

def recent_commits(n: int = 10) -> list[dict]:
    """git log with structured output, no regex needed."""
    raw = subprocess.check_output(
        ["git", "log", f"-{n}",
         "--pretty=format:%H%x09%h%x09%s%x09%an%x09%aI"],
        text=True,
    )
    return [
        dict(zip(["sha", "short", "subject", "author", "date"], line.split("\t")))
        for line in raw.splitlines() if line
    ]

def lint_results(file: str) -> list[dict]:
    """eslint --format json gives a structured contract."""
    raw = subprocess.check_output(
        ["npx", "eslint", "--format", "json", file],
        text=True,
    )
    return json.loads(raw)  # array of { filePath, messages: [{ ruleId, severity, line, message }] }

def pr_diff_files() -> list[str]:
    """gh pr diff --name-only. One file per line, no parsing needed."""
    return subprocess.check_output(
        ["gh", "pr", "diff", "--name-only"],
        text=True,
    ).strip().split("\n")
↪ Concept: tool-calling
06 · Configuration decisions

The four decisions

DecisionRight answerWrong answerWhy
Multi-file refactoring task (20+ files)Distribute work across 2-3 specialist subagents (reader → reviewer → applier); 4-5 tools eachOne monolithic 15-tool agent does everything in one sessionRouting accuracy drops ~8% per tool past 5; 15 tools degrades quality dramatically. Specialist subagents with narrow tool whitelists keep accuracy high and contexts clean.
Exploring a 1000-file codebase for usages of a functionGrep + Glob first to locate matched files, then Read only thoseRead every file to 'understand the codebase' before decidingRead-everything-first floods context and the agent loses track of the original task. Grep returns matched files in seconds with structured output; Read narrows to ~10-20 files instead of ~1000.
Reviewing the agent's own code changesSpawn an independent reviewer subagent with fresh context, no inherited historySame session generates AND reviews the codeSame-session review inherits the writer's assumptions and rubber-stamps. Confirmation bias is structural here, not philosophical. Independence at the architectural layer is the only fix.
Auto-merging the agent's code changes4D Framework: Delegation → Description → Discernment → Diligence (human click)Auto-merge if reviewer subagent says 'approve'The cost of a bad merge (broken main, rolled-back deploy, lost time across the team) is far higher than the friction of one human click. The 4D gate exists because trust in agent output is best earned incrementally with humans in the loop on destructive actions.
07 · Failure modes

Where it breaks

Five failure pairs. Each one is one exam question. The fix is always architectural, deterministic gates, structured fields, pinned state.

Monolithic agent with 15+ tools

Single agent has 15 tools (Read, Write, Edit, Bash, Grep, Glob, WebSearch, gh, npm, jest, eslint, prettier, git, jq, sed). Tool selection accuracy drops; the agent calls Bash(cat) when Read would do, alternates between similar tools, and occasionally hangs comparing options.

AP-25
✅ Fix

Distribute tools across 2-3 specialised subagents (Reader, Reviewer, Doc-gen). Each subagent has 4-5 tools maximum. Routing accuracy stays high because each specialist has a clear tool set and a clear job.

Auto-merge without human approval

Agent generates a refactor diff, the reviewer subagent approves it, and the workflow auto-merges. A subtle regression ships to main. Rollback takes 40 min; the team's trust in the agent drops for the rest of the quarter.

AP-26
✅ Fix

4D Framework: Delegation (agent proposes) → Description (clear output) → Discernment (reviewer verdict) → Diligence (human clicks approve). The human gate is non-negotiable for code-changing actions.

Code review in the same session as code generation

Same Claude session writes the code AND reviews it. The reviewer agrees with the writer on every choice. Same context, same assumptions, same blind spots. Confirmation bias passes through; bugs reach production.

AP-27
✅ Fix

Spawn the reviewer as an independent subagent: fresh messages array, fresh system prompt, no inherited history. It sees only the diff + the project rules. Independence is structural, not philosophical.

Read-first codebase exploration

Agent runs Read on every file in src/ to 'understand the codebase'. By file 80, context is full of irrelevant content; the agent has lost the original question. Returns generic recommendations instead of specific matches.

AP-28
✅ Fix

Grep + Glob first to locate; Read only matched files. The exploration sequence is non-negotiable: locate → narrow → read. On a 1000-file repo, this is the difference between 1000 Reads and 12.

Fragile regex on tool output

Workflow parses git log --oneline output with regex to extract commit SHAs. A commit message containing parentheses or a colon breaks the regex; the workflow silently misses commits or attributes them to the wrong author.

AP-29
✅ Fix

Use structured output flags (git log --pretty=format:%H%x09%s, npm test --reporter json, eslint --format json, gh pr diff --json). Parse JSON or fixed-delimiter output. Regex over prose is the reason most agent pipelines break in the third week.

08 · Budget

Cost & latency

Codebase exploration (Grep + Read 10-15 matched files)
~$0.003-0.008 per task

Grep is one tool call (~200 tokens output, files-with-matches mode). Read on 10-15 files at ~2K tokens each = ~25K input + ~500 output. ~$0.005 typical. Avoiding read-everything saves orders of magnitude on big repos.

Code-review subagent per change-set
~$0.012 per review

Reviewer reads the diff (~3K tokens) + affected files (~6K tokens) + .claude/rules (~500 tokens) + emits structured verdict (~1K). ~$0.012 at Sonnet 4.5 prices. Cheap insurance against a bad merge.

Doc-generation subagent (per source file)
~$0.024 per file

Doc-gen reads source (~4K) + writes JSDoc/docstring (~2K output) + writes README section if asked (~3K output). ~$0.024 typical. Worth running on every commit-back-to-main to keep docs current.

Full refactor workflow end-to-end
~$0.04-0.08 per refactor

Profile + Grep + Read 12 files + propose diff + Review + Approve + Apply. Sums to ~$0.05 typical. The 4D gate adds zero token cost (human input); reviewer subagent is the dominant line item.

MCP server overhead (when loaded)
~+10-15% per call

MCP context (language-server output, lint diagnostics) adds ~500-2K tokens per call. Routed per file extension so the cost only applies when language-specific context actually helps.

09 · Ship gates

Ship checklist

Two passes. Build-time gates verify the code; run-time gates verify the system in production.

Build-time

  1. Codebase context loader runs once per session, caches to .claude/context-cache.jsonclaude-md-hierarchy
  2. Grep + Glob exploration sequence enforced; never Read-first across many filescontext-window
  3. Specialist subagents defined: Reader, Reviewer, Doc-gen. Each with 4-5 toolssubagents
  4. Reviewer subagent runs in fresh context (no inherited writer history)evaluation
  5. Tool whitelists scoped per specialist (no Edit/Write on Reader; no Edit on Reviewer)tool-calling
  6. Scratchpad file (.claude/scratchpad.md) persists cross-turn agent state
  7. MCP servers selected per file extension; not all loaded at session startmcp
  8. 4D Framework approval gate before any auto-merge or destructive action4d-framework
  9. Bash structured-output flags used everywhere (--format json, --porcelain, etc.)
  10. Tool count audit per subagent: ≤ 5 tools, no exceptions
  11. Per-task telemetry: tool selection accuracy, reviewer verdict distribution, approval-gate latency

Run-time

  • Codebase context loader runs in <2s on first session; cached for subsequent sessions
  • Grep+Glob+Read sequence enforced (lint check on subagent system prompts)
  • Specialist subagent definitions reviewed via PR. Tool whitelists are an audit gate
  • Reviewer subagent independence verified by integration test (no shared message history)
  • 4D Framework approval gate fires on every code-changing action; auto-merge disabled
  • Scratchpad TTL + size-bound: archive old entries past 20 sections
  • MCP server health checks per session start; degraded mode on outage
  • Telemetry: tool-selection accuracy, reviewer verdict distribution, approval-gate p95 latency
10 · Question patterns

Five exam-pattern questions

A developer-productivity agent explores a 1,000-file codebase looking for references to a function. It uses `Read` on every file and runs out of context after 200 files. How should you fix this?
Replace Read-everything with the Grep + Glob first, Read only matched sequence. Step 1: Grep(pattern: '\\bFunctionName\\b', output_mode: 'files_with_matches') returns the 12 files that actually contain the symbol. Step 2: Glob narrows further if needed (e.g. only src/**). Step 3: Read only the matched files. On a 1,000-file repo this is the difference between 1,000 Reads and 12. Orders of magnitude on tokens and the difference between a focused agent and a context-flooded one. Tagged to AP-28.
An agent auto-generates and auto-reviews code in a single Claude session. The reviewer always agrees with the writer, even when bugs slip through. What architectural change fixes this?
Spawn the reviewer as an independent subagent with a fresh messages array, fresh system prompt, and a tool whitelist scoped to [Read, Grep, Bash(test, lint)]. No Edit, no Write. The reviewer sees only the diff + the project rules; it has zero context from the writer. Confirmation bias is structural in same-session review; only architectural independence eliminates it. Tagged to AP-27.
You're building a documentation agent that processes 12 source files across React components, API routes, and database queries. Your first attempt is one agent with 15 tools. How many specialised subagents should you create?
Three specialists, 4-5 tools each: a Reader (Read + Grep + Glob) for codebase exploration, a Reviewer (Read + Grep + Bash test/lint) for verdicts, and a Doc-Generator (Read + Write + Glob) for output. Distribution beats monolithic 15-tool agents because routing accuracy drops ~8% per tool past 5. Each specialist has a clear domain and a clear tool set, and they coordinate via a lead agent. Tagged to AP-25.
Your productivity agent successfully rewrites a critical function and immediately auto-merges the change to main. A subtle regression ships to production. What gate was missing?
The 4D Framework approval gate. Delegation (agent proposes the change), Description (the diff + summary), Discernment (the reviewer subagent's verdict), and Diligence (a human clicks approve). The human-click step is non-negotiable for code-changing actions because the cost of a bad merge (broken main, rollback time, lost team trust) is orders of magnitude higher than the friction of one approval click. Auto-merge defeats the entire architecture. Tagged to AP-26.
An agent uses `git log --oneline` and parses the output with regex to extract commit SHAs and messages. After a commit message containing parentheses ships, the workflow silently misattributes commits. What's the rule?
Always use structured-output flags. Replace git log --oneline parsed by regex with git log --pretty=format:%H%x09%h%x09%s%x09%an%x09%aI (tab-delimited, no ambiguity). Or --pretty=format:%H then git show for full info. Same applies to npm test --reporter json, eslint --format json, gh pr diff --json. Regex over prose is the reason most agent pipelines break in their third week. Structured output is a contract; regex on prose is a guess. Tagged to AP-29.
11 · FAQ

Frequently asked

Should I use Read or Grep to explore a large codebase?
Grep first, always. Grep finds the symbol or pattern across the whole repo in one tool call (returns matched files + line numbers). Then Read only the matched files. Read-everything-first is the single biggest token waste in productivity workflows. And it floods context, which makes the agent worse, not just slower.
Can an agent review code it just generated in the same session?
Not effectively. Same-session review inherits the writer's context, assumptions, and blind spots. The reviewer rubber-stamps because it shares the writer's mental model. Spawn an independent reviewer subagent with fresh context: it sees only the diff + the project rules, with no carry-over. Independence is structural, not philosophical.
How many tools should a developer-productivity agent have?
4-5 per subagent, distributed across 2-3 specialists. One 15-tool agent loses ~8% routing accuracy per tool past 5. By 15 tools, the agent is alternating between similar options and missing obvious matches. Three specialists with clear tool sets (Reader / Reviewer / Doc-gen) outperform one generalist on every dimension.
Do I need MCP servers for code-productivity workflows?
No, but they help in multi-language repos. Built-in tools (Read, Grep, Edit, Bash) cover most tasks. MCP servers add language-specific context: pyright for Python types, tsc --noEmit for TS, ruff for Python lint, eslint for JS. Route per file extension. Don't load all MCP servers at session start.
Should productivity agents auto-merge code?
No. The 4D Framework gate is non-negotiable for code-changing actions: Delegation → Description → Discernment → Diligence (human click). The cost of a bad merge is far higher than the friction of one click. If the team really wants automation, the right move is to auto-create the PR, run the reviewer subagent, and post the verdict. But the human still clicks merge.
How do you handle multi-language codebases?
Per-extension routing for both subagents and MCP servers. The Reader subagent uses Grep with --type ts or --type py to narrow searches by language. The Reviewer subagent loads pyright for .py files and tsc for .ts files. The Doc-Generator emits language-appropriate doc syntax (JSDoc for JS/TS, docstrings for Python). One agent topology, multiple language plug-ins.
P3.4 · D1 · Agentic Architectures

Developer Productivity Agent, complete.

You've covered the full ten-section breakdown for this primitive, definition, mechanics, code, false positives, comparison, decision tree, exam patterns, and FAQ. One technical primitive down on the path to CCA-F.

Share your win →