# Developer Productivity Agent

> A team of specialised subagents that handle codebase exploration, code review, and doc generation. Each with a narrow tool whitelist and isolated context. The lead agent uses Grep + Glob first to locate relevant files (never Read everything), delegates review to an independent reviewer subagent to avoid confirmation bias, and gates every destructive action behind the 4D Framework (Delegation → Description → Discernment → Diligence) where a human clicks approve. Distribution beats monoliths: 4-5 tools per subagent across 2-3 specialists routes more accurately than one 15-tool agent.

**Sub-marker:** P3.4
**Domains:** D1 · Agentic Architectures, D2 · Tool Design + Integration
**Exam weight:** 45% of CCA-F (D1 + D2)
**Build time:** 22 minutes
**Source:** 🟢 Official Anthropic guide scenario · in published exam guide
**Canonical:** https://claudearchitectcertification.com/scenarios/developer-productivity-agent
**Last reviewed:** 2026-05-04

## In plain English

Think of this as the agent that helps a developer rename a function across a 1,000-file codebase without losing their afternoon to it. Instead of one big agent reading every file (which fails because there's too much to remember), the work is split: one tiny helper finds the matching files with grep, another reads only those files, a third reviews the proposed changes from a fresh perspective so it doesn't just rubber-stamp its own work, and a human approves the final commit. The whole point is that productivity tasks are easier when you delegate to specialists rather than asking one agent to do everything alone.

## Exam impact

Domain 1 (Agentic Architecture, 27%) tests subagent distribution, the grep-then-read exploration sequence, and the 4D Framework approval gate. Domain 2 (Tool Design, 18%) tests built-in vs MCP tool selection (`Read` vs `Bash(cat ...)`), tool count per agent (4-5, not 15), and structured output for tool results. In the published guide and the practice exam. High-yield drilling.

## The problem

### What the customer needs
- Find every reference to a function across a 1,000-file repo without reading every file.
- Review the agent's proposed changes with a fresh perspective so confirmation bias doesn't pass through.
- Generate up-to-date docs that reflect the actual code, not last quarter's spec.

### Why naive approaches fail
- Monolithic single agent with 15 tools trying to do everything → routing accuracy drops 8% per tool past 5; the agent alternates between similar tools and misses obvious matches.
- Read every file first to 'understand the codebase' → context floods at file 80; the agent has lost track of the original task by file 200.
- Same session generates AND reviews the code → confirmation bias passes through; the reviewer agrees with the writer because it shares the writer's context.

### Definition of done
- Grep / Glob locates matched files first; Read opens only those files
- Code review runs in an independent subagent with fresh context
- Tool count ≤ 5 per subagent; specialists distributed across reader / reviewer / docgen
- 4D Framework approval gate before any auto-merge or destructive action
- Bash structured-output flags (--format json, --porcelain) replace fragile regex parsing

## Concepts in play

- 🟢 **Subagents** (`subagents`), Reader / Reviewer / Doc-generator specialists
- 🟢 **Tool calling** (`tool-calling`), 4-5 tools per subagent, distributed not monolithic
- 🟢 **Context window** (`context-window`), Grep first, read only what matched
- 🟢 **Evaluation** (`evaluation`), Independent reviewer subagent (no confirmation bias)
- 🟢 **4D Framework** (`4d-framework`), Delegation, Description, Discernment, Diligence
- 🟢 **Project memory** (`claude-md-hierarchy`), Codebase context loader + scratchpad files
- 🟢 **Model Context Protocol** (`mcp`), Language servers, linters, formatters
- 🟢 **Agentic loops** (`agentic-loops`), Lead agent's coordination loop

## Components

### Built-in Tool Suite, Read · Write · Edit · Bash · Grep · Glob

The six built-in tools cover almost every productivity task. Use Grep + Glob to locate before Read; use Edit (not Write) for in-place changes to existing files; reserve Bash for actual commands (compile, test, run). Never as a fallback for file I/O. The grep-then-read sequence is the single biggest token-efficiency lever.

**Configuration:** Tool whitelist per subagent: Reader=[Read,Grep,Glob]; Reviewer=[Read,Grep,Bash(test,lint)]; Doc-gen=[Read,Write,Glob]. Never grant 15 tools to one agent. Accuracy drops 8% per tool past 5.
**Concept:** `tool-calling`

### Codebase Context Loader, imports · dependencies · architecture

Walks the repo at session start: parses package.json / pyproject.toml, reads README.md + .claude/CLAUDE.md, builds a dependency graph (top 50 imports), surfaces the architecture-decisions doc. Loaded once into the lead agent's context so it doesn't re-discover on every turn.

**Configuration:** Run on first invocation: project_type, key_dirs, top_imports[], architecture_summary. Persist as .claude/context-cache.json with hash-based invalidation. Refresh on package.json change.
**Concept:** `claude-md-hierarchy`

### Code-Review Subagent, independent · fresh context

Spawned per change-set with [Read, Grep, Bash(test,lint)] only. No Edit, no Write. Fresh context, no inherited history from the writing agent. Reviews against .claude/rules/ and the codebase context. Returns { verdict, issues: [{ line, severity, message }], summary }. Independence is the architectural point. Same-session review just rubber-stamps.

**Configuration:** system: 'You are a code reviewer. Read only. Check against .claude/rules/ and existing patterns.' tools: [Read, Grep, Bash(npm test, npm run lint)]. Receives diff + context summary; returns structured verdict.
**Concept:** `evaluation`

### Doc-Generator Subagent, writes from code, not from spec

Specialised subagent that reads source files and generates documentation that reflects what the code actually does, not what last quarter's spec said it would do. Writes JSDoc / docstrings inline (Edit), README sections (Write), or external API docs (Write to docs/). Tools scoped to read source + write docs only.

**Configuration:** system: 'Generate docs from source. Use the code as ground truth.' tools: [Read, Write(docs/, *.md), Glob]. Run after code-review-subagent passes; auto-attached as part of the PR.
**Concept:** `subagents`

### MCP Server Integrations, language servers · linters · formatters

Optional but high-leverage in multi-language repos. Hook in pyright for Python type info, tsc --noEmit for TS, eslint/ruff for lint, prettier/black for formatting. Each MCP server adds language-specific context the agent can query without re-implementing it. Selected per file extension; not all loaded at once.

**Configuration:** MCP registry per workspace: { '.ts': ['tsc', 'eslint'], '.py': ['pyright', 'ruff'], '.go': ['gopls'] }. Routed by file extension; agent calls mcp.lint(file) and gets structured diagnostics.
**Concept:** `mcp`

## Build steps

### 1. Profile the codebase before any task

On first invocation, run a one-time codebase-context loader: language(s), framework, top entry points, dependency graph. Cache to .claude/context-cache.json. The lead agent loads this on every subsequent task instead of re-discovering. Saves ~4K tokens per turn.

**Python:**

```python
# scripts/profile_codebase.py
import json, subprocess
from pathlib import Path
from collections import Counter

def profile() -> dict:
    profile = {"languages": [], "framework": None, "key_dirs": [], "top_imports": []}

    if Path("package.json").exists():
        pkg = json.loads(Path("package.json").read_text())
        profile["languages"].append("typescript")
        deps = list((pkg.get("dependencies") or {}).keys())
        if "next" in deps: profile["framework"] = "next.js"
        elif "react" in deps: profile["framework"] = "react"

    if Path("pyproject.toml").exists():
        profile["languages"].append("python")

    # Top 50 imported modules across the repo
    imports = Counter()
    for ext, regex in [("ts", r"^import .* from ['\"]"), ("py", r"^(import|from) ")]:
        for f in Path(".").rglob(f"*.{ext}"):
            if "node_modules" in f.parts or ".venv" in f.parts:
                continue
            for line in f.read_text(errors="ignore").splitlines()[:50]:
                if line.startswith(("import", "from")):
                    imports[line.split()[1].rstrip(',')] += 1
    profile["top_imports"] = [m for m, _ in imports.most_common(50)]

    profile["key_dirs"] = [
        p.name for p in Path(".").iterdir()
        if p.is_dir() and not p.name.startswith(".") and p.name not in ("node_modules", "dist")
    ]

    Path(".claude/context-cache.json").write_text(json.dumps(profile, indent=2))
    return profile

if __name__ == "__main__":
    print(json.dumps(profile(), indent=2))
```

**TypeScript:**

```typescript
// scripts/profile-codebase.ts
import { readFileSync, writeFileSync, readdirSync, statSync } from "node:fs";
import { join } from "node:path";

interface Profile {
  languages: string[];
  framework: string | null;
  key_dirs: string[];
  top_imports: string[];
}

export function profile(): Profile {
  const out: Profile = { languages: [], framework: null, key_dirs: [], top_imports: [] };

  try {
    const pkg = JSON.parse(readFileSync("package.json", "utf8"));
    out.languages.push("typescript");
    const deps = Object.keys(pkg.dependencies ?? {});
    if (deps.includes("next")) out.framework = "next.js";
    else if (deps.includes("react")) out.framework = "react";
  } catch {}

  try {
    statSync("pyproject.toml");
    out.languages.push("python");
  } catch {}

  // Top imports. Walk repo, count import statements
  const counts = new Map<string, number>();
  function walk(dir: string) {
    for (const name of readdirSync(dir)) {
      if (name.startsWith(".") || name === "node_modules") continue;
      const p = join(dir, name);
      const st = statSync(p);
      if (st.isDirectory()) walk(p);
      else if (/\.(ts|tsx|js|py)$/.test(name)) {
        const lines = readFileSync(p, "utf8").split("\n").slice(0, 50);
        for (const line of lines) {
          const m = line.match(/^(import|from)\s+([\w./@-]+)/);
          if (m) counts.set(m[2], (counts.get(m[2]) ?? 0) + 1);
        }
      }
    }
  }
  walk(".");

  out.top_imports = [...counts.entries()]
    .sort((a, b) => b[1] - a[1])
    .slice(0, 50)
    .map(([k]) => k);

  out.key_dirs = readdirSync(".").filter(
    (n) => !n.startsWith(".") && !["node_modules", "dist"].includes(n),
  );

  writeFileSync(".claude/context-cache.json", JSON.stringify(out, null, 2));
  return out;
}
```

Concept: `claude-md-hierarchy`

### 2. Always Grep + Glob before Read

The single biggest exploration anti-pattern is Read on every file in a directory. Instead, Grep finds the symbol or pattern across the repo (returns matched files + line numbers); Glob narrows by path; only Read the files that matched. On a 1000-file repo searching for a function name, this is the difference between 1000 Reads and 12 Reads.

**Python:**

```python
# Wrong. Read everything, hope for the best
# for f in glob('**/*.ts'):
#     content = read_file(f)  # 1000 reads, context floods, agent gets lost

# Right. Grep first to locate, then Read only matched
def find_function_usages(name: str) -> list[dict]:
    """Grep + Glob first, Read only what matched."""
    # Step 1: Grep finds the pattern across the repo
    grep_result = run_tool("Grep", {
        "pattern": rf"\b{name}\b",
        "type": "ts",
        "output_mode": "files_with_matches",  # NOT 'content'. Too verbose
    })
    matched_files = grep_result["files"]  # e.g. 12 files, not 1000

    # Step 2: Glob filters further if needed
    if len(matched_files) > 30:
        # Too many. Narrow by path
        matched_files = [f for f in matched_files if "src/" in f]

    # Step 3: Read only the files that matched
    file_contents = {}
    for f in matched_files[:20]:  # cap at 20 to bound context
        file_contents[f] = run_tool("Read", {"file_path": f})

    return [{"file": f, "content": c} for f, c in file_contents.items()]
```

**TypeScript:**

```typescript
// Wrong. Read everything, hope for the best
// for (const f of glob.sync("**/*.ts")) {
//   const content = readFile(f); // 1000 reads, context floods, agent gets lost
// }

// Right. Grep first to locate, then Read only matched
async function findFunctionUsages(name: string) {
  // Step 1: Grep finds the pattern across the repo
  const grepResult = await runTool("Grep", {
    pattern: `\\b${name}\\b`,
    type: "ts",
    output_mode: "files_with_matches", // NOT 'content'. Too verbose
  });
  let matchedFiles: string[] = grepResult.files;

  // Step 2: Glob filters further if needed
  if (matchedFiles.length > 30) {
    matchedFiles = matchedFiles.filter((f) => f.includes("src/"));
  }

  // Step 3: Read only the files that matched
  const out: Array<{ file: string; content: string }> = [];
  for (const f of matchedFiles.slice(0, 20)) {
    const content = await runTool("Read", { file_path: f });
    out.push({ file: f, content });
  }
  return out;
}
```

Concept: `context-window`

### 3. Distribute tools across 2-3 specialised subagents

One 15-tool agent routes worse than three 5-tool specialists. Define Reader (Grep + Glob + Read), Reviewer (Read + Grep + Bash test/lint), Doc-Generator (Read + Write + Glob). Each runs in its own context. The lead agent coordinates and merges results. Routing accuracy stays high because each specialist has a clear domain.

**Python:**

```python
# Specialist subagent definitions
SPECIALISTS = {
    "reader": {
        "system": "You are a codebase reader. Use Grep+Glob to locate, "
                  "then Read only what matched. Never modify files.",
        "tools": ["Read", "Grep", "Glob"],
    },
    "reviewer": {
        "system": "You are an independent code reviewer. Fresh context, "
                  "no carry-over from the writer. Check against .claude/rules/.",
        "tools": ["Read", "Grep", "Bash(npm test, npm run lint, ruff check)"],
    },
    "docgen": {
        "system": "You generate docs from source. Use the code as ground truth.",
        "tools": ["Read", "Write(docs/**, *.md)", "Glob"],
    },
}

def spawn_specialist(role: str, task: str) -> dict:
    spec = SPECIALISTS[role]
    return client.messages.create(
        model="claude-sonnet-4.5",
        max_tokens=2048,
        system=spec["system"],
        tools=load_tools(spec["tools"]),
        messages=[{"role": "user", "content": task}],
    )

# Lead agent coordinates
def refactor_function(name: str, new_name: str):
    locations = spawn_specialist("reader", f"Find all usages of {name}")
    proposed_diff = generate_renames(locations, new_name)
    review = spawn_specialist("reviewer", f"Review this diff: {proposed_diff}")
    if review["verdict"] != "approve":
        return {"status": "review_failed", "issues": review["issues"]}
    return {"status": "ready_to_apply", "diff": proposed_diff, "review": review}
```

**TypeScript:**

```typescript
// Specialist subagent definitions
const SPECIALISTS = {
  reader: {
    system:
      "You are a codebase reader. Use Grep+Glob to locate, " +
      "then Read only what matched. Never modify files.",
    tools: ["Read", "Grep", "Glob"],
  },
  reviewer: {
    system:
      "You are an independent code reviewer. Fresh context, " +
      "no carry-over from the writer. Check against .claude/rules/.",
    tools: ["Read", "Grep", "Bash(npm test, npm run lint, ruff check)"],
  },
  docgen: {
    system: "You generate docs from source. Use the code as ground truth.",
    tools: ["Read", "Write(docs/**, *.md)", "Glob"],
  },
} as const;

async function spawnSpecialist(
  role: keyof typeof SPECIALISTS,
  task: string,
) {
  const spec = SPECIALISTS[role];
  return client.messages.create({
    model: "claude-sonnet-4.5",
    max_tokens: 2048,
    system: spec.system,
    tools: loadTools(spec.tools as readonly string[]),
    messages: [{ role: "user", content: task }],
  });
}

// Lead agent coordinates
async function refactorFunction(name: string, newName: string) {
  const locations = await spawnSpecialist("reader", `Find all usages of ${name}`);
  const proposedDiff = generateRenames(locations, newName);
  const review = await spawnSpecialist("reviewer", `Review this diff: ${proposedDiff}`);
  if (review.verdict !== "approve") {
    return { status: "review_failed" as const, issues: review.issues };
  }
  return { status: "ready_to_apply" as const, diff: proposedDiff, review };
}
```

Concept: `subagents`

### 4. Spawn the reviewer in a SEPARATE session

If the reviewer shares context with the writer, it inherits the writer's assumptions and rubber-stamps. The fix is structural: the reviewer subagent runs with a fresh messages array, fresh system prompt, no inherited tool history. It sees only the diff + the project rules. The independence is the entire point. Confirmation bias is structural, not philosophical.

**Python:**

```python
def review_diff_independently(diff: str, file_paths: list[str]) -> dict:
    """Spawn a fresh-context reviewer. NO carry-over from the writer."""
    REVIEWER_SYSTEM = """You are an independent code reviewer.

You have NO context from the agent that wrote this diff. You see only:
1. The diff itself
2. The current state of the affected files
3. The project rules in .claude/rules/

Check for:
- Style conformance (matches .claude/rules/)
- Test coverage (does the diff change behavior without test changes?)
- Type safety (any unsafe casts or 'any' added?)
- API breakage (did a public signature change?)

Return JSON: { verdict: 'approve'|'request_changes', issues: [{ file, line, severity, message }], summary }"""

    # CRITICAL: fresh messages array, fresh system, fresh tool list
    return client.messages.create(
        model="claude-sonnet-4.5",
        max_tokens=2048,
        system=REVIEWER_SYSTEM,
        tools=[READ_TOOL, GREP_TOOL, BASH_TEST_LINT_TOOL],  # no Edit/Write
        messages=[{
            "role": "user",
            "content": f"Diff to review:\n{diff}\n\nAffected files: {file_paths}",
        }],
    )
```

**TypeScript:**

```typescript
async function reviewDiffIndependently(diff: string, filePaths: string[]) {
  // Spawn a fresh-context reviewer. NO carry-over from the writer.
  const REVIEWER_SYSTEM = `You are an independent code reviewer.

You have NO context from the agent that wrote this diff. You see only:
1. The diff itself
2. The current state of the affected files
3. The project rules in .claude/rules/

Check for:
- Style conformance (matches .claude/rules/)
- Test coverage (does the diff change behavior without test changes?)
- Type safety (any unsafe casts or 'any' added?)
- API breakage (did a public signature change?)

Return JSON: { verdict: 'approve'|'request_changes', issues: [{ file, line, severity, message }], summary }`;

  // CRITICAL: fresh messages array, fresh system, fresh tool list
  return client.messages.create({
    model: "claude-sonnet-4.5",
    max_tokens: 2048,
    system: REVIEWER_SYSTEM,
    tools: [READ_TOOL, GREP_TOOL, BASH_TEST_LINT_TOOL], // no Edit/Write
    messages: [
      {
        role: "user",
        content: `Diff to review:\n${diff}\n\nAffected files: ${filePaths.join(", ")}`,
      },
    ],
  });
}
```

Concept: `evaluation`

### 5. Use a scratchpad file for cross-turn state

Long productivity tasks (rename across 50 files, refactor 3 services) span many turns. Don't try to keep all state in the message history. Write it to .claude/scratchpad.md. The lead agent reads it at the start of every turn, updates it after each subagent returns, and trims old entries. The scratchpad is the agent's working memory; the message history is the audit trail.

**Python:**

```python
# .claude/scratchpad.md. Agent working memory across turns
SCRATCHPAD = Path(".claude/scratchpad.md")

def update_scratchpad(section: str, content: str):
    """Append-and-replace by section. Trims old entries past 20."""
    if not SCRATCHPAD.exists():
        SCRATCHPAD.write_text("# Agent scratchpad\n\n")
    text = SCRATCHPAD.read_text()
    # Replace existing section or append
    marker = f"## {section}"
    if marker in text:
        # Replace existing section
        before, _, rest = text.partition(marker)
        _, _, after = rest.partition("\n## ")
        new = before + marker + "\n" + content + "\n## " + after if "\n## " in rest else before + marker + "\n" + content
    else:
        new = text + "\n" + marker + "\n" + content
    SCRATCHPAD.write_text(new)

# Usage in the agent loop
update_scratchpad(
    "Rename Task: FooBar -> BazQux",
    "Files matched: 12. Already updated: 8. Remaining: src/old.ts, src/legacy.ts.",
)
```

**TypeScript:**

```typescript
// .claude/scratchpad.md. Agent working memory across turns
import { existsSync, readFileSync, writeFileSync } from "node:fs";

const SCRATCHPAD = ".claude/scratchpad.md";

export function updateScratchpad(section: string, content: string) {
  if (!existsSync(SCRATCHPAD)) {
    writeFileSync(SCRATCHPAD, "# Agent scratchpad\n\n");
  }
  const text = readFileSync(SCRATCHPAD, "utf8");
  const marker = `## ${section}`;
  let next: string;
  if (text.includes(marker)) {
    // Replace existing section
    const before = text.slice(0, text.indexOf(marker));
    const rest = text.slice(text.indexOf(marker) + marker.length);
    const nextMarker = rest.indexOf("\n## ");
    const after = nextMarker >= 0 ? rest.slice(nextMarker) : "";
    next = `${before}${marker}\n${content}${after}`;
  } else {
    next = `${text}\n${marker}\n${content}`;
  }
  writeFileSync(SCRATCHPAD, next);
}

// Usage in the agent loop
updateScratchpad(
  "Rename Task: FooBar -> BazQux",
  "Files matched: 12. Already updated: 8. Remaining: src/old.ts, src/legacy.ts.",
);
```

Concept: `claude-md-hierarchy`

### 6. Wire MCP servers per language extension

Multi-language repos benefit from language-specific MCP servers. pyright for Python types, tsc --noEmit for TS, eslint and prettier for JS, ruff for Python lint. Don't load all MCP servers at session start. Route by file extension when a tool needs them. Keeps context lean, reduces overhead.

**Python:**

```python
# Per-extension MCP routing
MCP_BY_EXTENSION = {
    ".ts": ["tsc", "eslint", "prettier"],
    ".tsx": ["tsc", "eslint", "prettier"],
    ".py": ["pyright", "ruff", "black"],
    ".go": ["gopls", "gofmt"],
    ".rs": ["rust-analyzer", "rustfmt"],
}

def select_mcp_servers(file_paths: list[str]) -> list[str]:
    """Pick the minimum set of MCP servers needed for these files."""
    extensions = {Path(f).suffix for f in file_paths}
    servers = set()
    for ext in extensions:
        for server in MCP_BY_EXTENSION.get(ext, []):
            servers.add(server)
    return sorted(servers)

# In the lead agent before spawning the reviewer
def review_with_mcp(diff: str, files: list[str]):
    mcp_servers = select_mcp_servers(files)
    return spawn_specialist(
        "reviewer",
        f"Review diff: {diff}\nUse MCP servers: {mcp_servers}",
    )
```

**TypeScript:**

```typescript
// Per-extension MCP routing
const MCP_BY_EXTENSION: Record<string, string[]> = {
  ".ts": ["tsc", "eslint", "prettier"],
  ".tsx": ["tsc", "eslint", "prettier"],
  ".py": ["pyright", "ruff", "black"],
  ".go": ["gopls", "gofmt"],
  ".rs": ["rust-analyzer", "rustfmt"],
};

export function selectMcpServers(filePaths: string[]): string[] {
  const exts = new Set(filePaths.map((f) => "." + (f.split(".").pop() ?? "")));
  const servers = new Set<string>();
  for (const ext of exts) {
    for (const s of MCP_BY_EXTENSION[ext] ?? []) servers.add(s);
  }
  return [...servers].sort();
}

// In the lead agent before spawning the reviewer
async function reviewWithMcp(diff: string, files: string[]) {
  const mcpServers = selectMcpServers(files);
  return spawnSpecialist(
    "reviewer",
    `Review diff: ${diff}\nUse MCP servers: ${mcpServers.join(", ")}`,
  );
}
```

Concept: `mcp`

### 7. Implement the 4D approval gate

Delegation (agent proposes), Description (clear output), Discernment (human reviews), Diligence (human approves). The 4D Framework. Code-changing actions are gated behind explicit human approval. The agent presents the diff + the reviewer's verdict + a one-paragraph summary; the human clicks approve or rejects. No auto-merge for productivity tasks; the gate exists because the cost of a bad merge is far higher than the friction of one click.

**Python:**

```python
from typing import Literal, TypedDict

class ApprovalRequest(TypedDict):
    delegation: str    # what the agent proposes (1 sentence)
    description: str   # the diff + summary of changes
    discernment: dict  # reviewer subagent's verdict + issues
    risk_level: Literal["low", "medium", "high"]

def request_approval(req: ApprovalRequest) -> Literal["approve", "reject"]:
    """4D gate. ALWAYS waits for human input. No auto-approve."""
    print("=" * 60)
    print(f"DELEGATION: {req['delegation']}")
    print(f"\nDESCRIPTION:\n{req['description']}")
    print(f"\nDISCERNMENT (review verdict): {req['discernment']['verdict']}")
    if req['discernment'].get('issues'):
        print(f"  Issues:")
        for issue in req['discernment']['issues']:
            print(f"  - {issue['severity']}: {issue['message']}")
    print(f"\nRISK LEVEL: {req['risk_level']}")
    print("=" * 60)
    print("DILIGENCE: Approve? [y/N]: ", end="")
    return "approve" if input().strip().lower() == "y" else "reject"

# Usage in the lead agent
def apply_refactor_safely(diff: str, files: list[str], reviewer_verdict: dict):
    decision = request_approval({
        "delegation": f"Apply rename refactor across {len(files)} files",
        "description": f"Diff:\n{diff}",
        "discernment": reviewer_verdict,
        "risk_level": "medium" if len(files) > 20 else "low",
    })
    if decision == "approve":
        run_tool("Edit", {"diff": diff})
    else:
        print("Refactor rejected by human; no files changed.")
```

**TypeScript:**

```typescript
interface ApprovalRequest {
  delegation: string;     // what the agent proposes (1 sentence)
  description: string;    // the diff + summary of changes
  discernment: {          // reviewer subagent's verdict + issues
    verdict: "approve" | "request_changes";
    issues: Array<{ severity: string; message: string }>;
  };
  risk_level: "low" | "medium" | "high";
}

export async function requestApproval(req: ApprovalRequest): Promise<"approve" | "reject"> {
  // 4D gate. ALWAYS waits for human input. No auto-approve.
  console.log("=".repeat(60));
  console.log(`DELEGATION: ${req.delegation}`);
  console.log(`\nDESCRIPTION:\n${req.description}`);
  console.log(`\nDISCERNMENT (review verdict): ${req.discernment.verdict}`);
  if (req.discernment.issues?.length) {
    console.log("  Issues:");
    for (const issue of req.discernment.issues) {
      console.log(`  - ${issue.severity}: ${issue.message}`);
    }
  }
  console.log(`\nRISK LEVEL: ${req.risk_level}`);
  console.log("=".repeat(60));
  process.stdout.write("DILIGENCE: Approve? [y/N]: ");
  const answer = await new Promise<string>((res) => {
    process.stdin.once("data", (d) => res(d.toString()));
  });
  return answer.trim().toLowerCase() === "y" ? "approve" : "reject";
}

// Usage in the lead agent
export async function applyRefactorSafely(
  diff: string,
  files: string[],
  reviewerVerdict: ApprovalRequest["discernment"],
) {
  const decision = await requestApproval({
    delegation: `Apply rename refactor across ${files.length} files`,
    description: `Diff:\n${diff}`,
    discernment: reviewerVerdict,
    risk_level: files.length > 20 ? "medium" : "low",
  });
  if (decision === "approve") {
    await runTool("Edit", { diff });
  } else {
    console.log("Refactor rejected by human; no files changed.");
  }
}
```

Concept: `4d-framework`

### 8. Use Bash structured output, not regex

Tools that emit JSON (gh pr diff --json, git log --pretty=format:%H%x09%s, npm test --reporter json, eslint --format json) replace fragile prose-parsing. Even simple commands have structured-output flags (git status --porcelain, ls -la --time-style=long-iso). Always prefer them. Regex on free-form output is the reason most agent pipelines break in the third week of production.

**Python:**

```python
# Wrong. Regex on prose output
# output = run("git log --oneline -10")
# commits = re.findall(r"^([a-f0-9]{7,}) (.+)$", output, re.M)
# # Breaks when commit message contains a colon or special chars

# Right. Structured output, parse JSON
import json, subprocess

def recent_commits(n: int = 10) -> list[dict]:
    """git log with structured output, no regex needed."""
    raw = subprocess.check_output(
        ["git", "log", f"-{n}",
         "--pretty=format:%H%x09%h%x09%s%x09%an%x09%aI"],
        text=True,
    )
    return [
        dict(zip(["sha", "short", "subject", "author", "date"], line.split("\t")))
        for line in raw.splitlines() if line
    ]

def lint_results(file: str) -> list[dict]:
    """eslint --format json gives a structured contract."""
    raw = subprocess.check_output(
        ["npx", "eslint", "--format", "json", file],
        text=True,
    )
    return json.loads(raw)  # array of { filePath, messages: [{ ruleId, severity, line, message }] }

def pr_diff_files() -> list[str]:
    """gh pr diff --name-only. One file per line, no parsing needed."""
    return subprocess.check_output(
        ["gh", "pr", "diff", "--name-only"],
        text=True,
    ).strip().split("\n")
```

**TypeScript:**

```typescript
// Wrong. Regex on prose output
// const output = execSync("git log --oneline -10").toString();
// const commits = [...output.matchAll(/^([a-f0-9]{7,}) (.+)$/gm)];
// // Breaks when commit message contains a colon or special chars

// Right. Structured output, parse JSON
import { execSync } from "node:child_process";

interface Commit {
  sha: string;
  short: string;
  subject: string;
  author: string;
  date: string;
}

export function recentCommits(n = 10): Commit[] {
  // git log with structured output, no regex needed.
  const raw = execSync(
    `git log -${n} --pretty=format:%H%x09%h%x09%s%x09%an%x09%aI`,
    { encoding: "utf8" },
  );
  return raw
    .split("\n")
    .filter(Boolean)
    .map((line) => {
      const [sha, short, subject, author, date] = line.split("\t");
      return { sha, short, subject, author, date };
    });
}

export function lintResults(file: string) {
  // eslint --format json gives a structured contract.
  const raw = execSync(`npx eslint --format json ${file}`, { encoding: "utf8" });
  return JSON.parse(raw); // array of { filePath, messages: [{ ruleId, severity, line, message }] }
}

export function prDiffFiles(): string[] {
  // gh pr diff --name-only. One file per line, no parsing needed.
  return execSync("gh pr diff --name-only", { encoding: "utf8" })
    .trim()
    .split("\n");
}
```

Concept: `tool-calling`

## Decision matrix

| Decision | Right answer | Wrong answer | Why |
|---|---|---|---|
| Multi-file refactoring task (20+ files) | Distribute work across 2-3 specialist subagents (reader → reviewer → applier); 4-5 tools each | One monolithic 15-tool agent does everything in one session | Routing accuracy drops ~8% per tool past 5; 15 tools degrades quality dramatically. Specialist subagents with narrow tool whitelists keep accuracy high and contexts clean. |
| Exploring a 1000-file codebase for usages of a function | Grep + Glob first to locate matched files, then Read only those | Read every file to 'understand the codebase' before deciding | Read-everything-first floods context and the agent loses track of the original task. Grep returns matched files in seconds with structured output; Read narrows to ~10-20 files instead of ~1000. |
| Reviewing the agent's own code changes | Spawn an independent reviewer subagent with fresh context, no inherited history | Same session generates AND reviews the code | Same-session review inherits the writer's assumptions and rubber-stamps. Confirmation bias is structural here, not philosophical. Independence at the architectural layer is the only fix. |
| Auto-merging the agent's code changes | 4D Framework: Delegation → Description → Discernment → Diligence (human click) | Auto-merge if reviewer subagent says 'approve' | The cost of a bad merge (broken main, rolled-back deploy, lost time across the team) is far higher than the friction of one human click. The 4D gate exists because trust in agent output is best earned incrementally with humans in the loop on destructive actions. |

## Failure modes

| Anti-pattern | Failure | Fix |
|---|---|---|
| AP-25 · Monolithic agent with 15+ tools | Single agent has 15 tools (Read, Write, Edit, Bash, Grep, Glob, WebSearch, gh, npm, jest, eslint, prettier, git, jq, sed). Tool selection accuracy drops; the agent calls Bash(cat) when Read would do, alternates between similar tools, and occasionally hangs comparing options. | Distribute tools across 2-3 specialised subagents (Reader, Reviewer, Doc-gen). Each subagent has 4-5 tools maximum. Routing accuracy stays high because each specialist has a clear tool set and a clear job. |
| AP-26 · Auto-merge without human approval | Agent generates a refactor diff, the reviewer subagent approves it, and the workflow auto-merges. A subtle regression ships to main. Rollback takes 40 min; the team's trust in the agent drops for the rest of the quarter. | 4D Framework: Delegation (agent proposes) → Description (clear output) → Discernment (reviewer verdict) → Diligence (human clicks approve). The human gate is non-negotiable for code-changing actions. |
| AP-27 · Code review in the same session as code generation | Same Claude session writes the code AND reviews it. The reviewer agrees with the writer on every choice. Same context, same assumptions, same blind spots. Confirmation bias passes through; bugs reach production. | Spawn the reviewer as an independent subagent: fresh messages array, fresh system prompt, no inherited history. It sees only the diff + the project rules. Independence is structural, not philosophical. |
| AP-28 · Read-first codebase exploration | Agent runs Read on every file in src/ to 'understand the codebase'. By file 80, context is full of irrelevant content; the agent has lost the original question. Returns generic recommendations instead of specific matches. | Grep + Glob first to locate; Read only matched files. The exploration sequence is non-negotiable: locate → narrow → read. On a 1000-file repo, this is the difference between 1000 Reads and 12. |
| AP-29 · Fragile regex on tool output | Workflow parses git log --oneline output with regex to extract commit SHAs. A commit message containing parentheses or a colon breaks the regex; the workflow silently misses commits or attributes them to the wrong author. | Use structured output flags (git log --pretty=format:%H%x09%s, npm test --reporter json, eslint --format json, gh pr diff --json). Parse JSON or fixed-delimiter output. Regex over prose is the reason most agent pipelines break in the third week. |

## Implementation checklist

- [ ] Codebase context loader runs once per session, caches to .claude/context-cache.json (`claude-md-hierarchy`)
- [ ] Grep + Glob exploration sequence enforced; never Read-first across many files (`context-window`)
- [ ] Specialist subagents defined: Reader, Reviewer, Doc-gen. Each with 4-5 tools (`subagents`)
- [ ] Reviewer subagent runs in fresh context (no inherited writer history) (`evaluation`)
- [ ] Tool whitelists scoped per specialist (no Edit/Write on Reader; no Edit on Reviewer) (`tool-calling`)
- [ ] Scratchpad file (.claude/scratchpad.md) persists cross-turn agent state
- [ ] MCP servers selected per file extension; not all loaded at session start (`mcp`)
- [ ] 4D Framework approval gate before any auto-merge or destructive action (`4d-framework`)
- [ ] Bash structured-output flags used everywhere (--format json, --porcelain, etc.)
- [ ] Tool count audit per subagent: ≤ 5 tools, no exceptions
- [ ] Per-task telemetry: tool selection accuracy, reviewer verdict distribution, approval-gate latency

## Cost &amp; latency

- **Codebase exploration (Grep + Read 10-15 matched files):** ~$0.003-0.008 per task, Grep is one tool call (~200 tokens output, files-with-matches mode). Read on 10-15 files at ~2K tokens each = ~25K input + ~500 output. ~$0.005 typical. Avoiding read-everything saves orders of magnitude on big repos.
- **Code-review subagent per change-set:** ~$0.012 per review, Reviewer reads the diff (~3K tokens) + affected files (~6K tokens) + .claude/rules (~500 tokens) + emits structured verdict (~1K). ~$0.012 at Sonnet 4.5 prices. Cheap insurance against a bad merge.
- **Doc-generation subagent (per source file):** ~$0.024 per file, Doc-gen reads source (~4K) + writes JSDoc/docstring (~2K output) + writes README section if asked (~3K output). ~$0.024 typical. Worth running on every commit-back-to-main to keep docs current.
- **Full refactor workflow end-to-end:** ~$0.04-0.08 per refactor, Profile + Grep + Read 12 files + propose diff + Review + Approve + Apply. Sums to ~$0.05 typical. The 4D gate adds zero token cost (human input); reviewer subagent is the dominant line item.
- **MCP server overhead (when loaded):** ~+10-15% per call, MCP context (language-server output, lint diagnostics) adds ~500-2K tokens per call. Routed per file extension so the cost only applies when language-specific context actually helps.

## Domain weights

- **D1 · Agentic Architectures (27%):** Subagent distribution + lead agent loop + 4D Framework gate
- **D2 · Tool Design + Integration (18%):** Built-in tool selection + per-subagent tool whitelist + Bash structured output + MCP routing

## Practice questions

### Q1. A developer-productivity agent explores a 1,000-file codebase looking for references to a function. It uses Read on every file and runs out of context after 200 files. How should you fix this?

Replace Read-everything with the Grep + Glob first, Read only matched sequence. Step 1: Grep(pattern: '\\bFunctionName\\b', output_mode: 'files_with_matches') returns the 12 files that actually contain the symbol. Step 2: Glob narrows further if needed (e.g. only src/). Step 3: Read only the matched files. On a 1,000-file repo this is the difference between 1,000 Reads and 12. Orders of magnitude on tokens and the difference between a focused agent and a context-flooded one. Tagged to AP-28.

### Q2. An agent auto-generates and auto-reviews code in a single Claude session. The reviewer always agrees with the writer, even when bugs slip through. What architectural change fixes this?

Spawn the reviewer as an independent subagent with a fresh messages array, fresh system prompt, and a tool whitelist scoped to [Read, Grep, Bash(test, lint)]. No Edit, no Write. The reviewer sees only the diff + the project rules; it has zero context from the writer. Confirmation bias is structural in same-session review; only architectural independence eliminates it. Tagged to AP-27.

### Q3. You're building a documentation agent that processes 12 source files across React components, API routes, and database queries. Your first attempt is one agent with 15 tools. How many specialised subagents should you create?

Three specialists, 4-5 tools each: a Reader (Read + Grep + Glob) for codebase exploration, a Reviewer (Read + Grep + Bash test/lint) for verdicts, and a Doc-Generator (Read + Write + Glob) for output. Distribution beats monolithic 15-tool agents because routing accuracy drops ~8% per tool past 5. Each specialist has a clear domain and a clear tool set, and they coordinate via a lead agent. Tagged to AP-25.

### Q4. Your productivity agent successfully rewrites a critical function and immediately auto-merges the change to main. A subtle regression ships to production. What gate was missing?

The 4D Framework approval gate. Delegation (agent proposes the change), Description (the diff + summary), Discernment (the reviewer subagent's verdict), and Diligence (a human clicks approve). The human-click step is non-negotiable for code-changing actions because the cost of a bad merge (broken main, rollback time, lost team trust) is orders of magnitude higher than the friction of one approval click. Auto-merge defeats the entire architecture. Tagged to AP-26.

### Q5. An agent uses git log --oneline and parses the output with regex to extract commit SHAs and messages. After a commit message containing parentheses ships, the workflow silently misattributes commits. What's the rule?

Always use structured-output flags. Replace git log --oneline parsed by regex with git log --pretty=format:%H%x09%h%x09%s%x09%an%x09%aI (tab-delimited, no ambiguity). Or --pretty=format:%H then git show for full info. Same applies to npm test --reporter json, eslint --format json, gh pr diff --json. Regex over prose is the reason most agent pipelines break in their third week. Structured output is a contract; regex on prose is a guess. Tagged to AP-29.

## FAQ

### Q1. Should I use Read or Grep to explore a large codebase?

Grep first, always. Grep finds the symbol or pattern across the whole repo in one tool call (returns matched files + line numbers). Then Read only the matched files. Read-everything-first is the single biggest token waste in productivity workflows. And it floods context, which makes the agent worse, not just slower.

### Q2. Can an agent review code it just generated in the same session?

Not effectively. Same-session review inherits the writer's context, assumptions, and blind spots. The reviewer rubber-stamps because it shares the writer's mental model. Spawn an independent reviewer subagent with fresh context: it sees only the diff + the project rules, with no carry-over. Independence is structural, not philosophical.

### Q3. How many tools should a developer-productivity agent have?

4-5 per subagent, distributed across 2-3 specialists. One 15-tool agent loses ~8% routing accuracy per tool past 5. By 15 tools, the agent is alternating between similar options and missing obvious matches. Three specialists with clear tool sets (Reader / Reviewer / Doc-gen) outperform one generalist on every dimension.

### Q4. Do I need MCP servers for code-productivity workflows?

No, but they help in multi-language repos. Built-in tools (Read, Grep, Edit, Bash) cover most tasks. MCP servers add language-specific context: pyright for Python types, tsc --noEmit for TS, ruff for Python lint, eslint for JS. Route per file extension. Don't load all MCP servers at session start.

### Q5. Should productivity agents auto-merge code?

No. The 4D Framework gate is non-negotiable for code-changing actions: Delegation → Description → Discernment → Diligence (human click). The cost of a bad merge is far higher than the friction of one click. If the team really wants automation, the right move is to auto-create the PR, run the reviewer subagent, and post the verdict. But the human still clicks merge.

### Q6. How do you handle multi-language codebases?

Per-extension routing for both subagents and MCP servers. The Reader subagent uses Grep with --type ts or --type py to narrow searches by language. The Reviewer subagent loads pyright for .py files and tsc for .ts files. The Doc-Generator emits language-appropriate doc syntax (JSDoc for JS/TS, docstrings for Python). One agent topology, multiple language plug-ins.

## Production readiness

- [ ] Codebase context loader runs in <2s on first session; cached for subsequent sessions
- [ ] Grep+Glob+Read sequence enforced (lint check on subagent system prompts)
- [ ] Specialist subagent definitions reviewed via PR. Tool whitelists are an audit gate
- [ ] Reviewer subagent independence verified by integration test (no shared message history)
- [ ] 4D Framework approval gate fires on every code-changing action; auto-merge disabled
- [ ] Scratchpad TTL + size-bound: archive old entries past 20 sections
- [ ] MCP server health checks per session start; degraded mode on outage
- [ ] Telemetry: tool-selection accuracy, reviewer verdict distribution, approval-gate p95 latency

---

**Source:** https://claudearchitectcertification.com/scenarios/developer-productivity-agent
**Vault sources:** ACP-T05 §Scenario 4 (5 ✅/❌ pairs · official guide scenario); ACP-T06 (5 practice Qs tagged to components); ACP-T07 §Lab 4 spec (built-in tools + codebase navigation + scratchpad); ACP-T08 §3.4 metadata; Course 02 Claude Code 101. Lesson 01 what is Claude Code; GAI-K05 CCA exam questions and scenarios (Scenario 4 walkthrough); 4D Framework concept page
**Last reviewed:** 2026-05-04

**Evidence tiers**, 🟢 official Anthropic doc · 🟡 partial doc / inferred · 🟠 community-derived · 🔴 disputed.
