# Claude Code for CI/CD

> Claude Code as a headless PR reviewer in GitHub Actions. The workflow uses claude -p for non-interactive execution, runs per-file independent sessions (no shared context across the 14 files in a PR. That prevents lost-in-the-middle), emits --output-format json for structured verdicts the next workflow step parses into PR comments, and explicitly declares allowed_tools in YAML (no wildcards in CI). Custom instructions in the workflow file inject the project's CLAUDE.md context. The most-tested distractor: thinking one big session reviewing all 14 files is faster. It's not, it's worse, because by file 14 the early conventions have dropped out of attention.

**Sub-marker:** P3.5
**Domains:** D3 · Agent Operations, D2 · Tool Design + Integration
**Exam weight:** 38% of CCA-F (D3 + D2)
**Build time:** 25 minutes
**Source:** 🟢 Official Anthropic guide scenario · in published exam guide and practice exam
**Canonical:** https://claudearchitectcertification.com/scenarios/claude-code-for-cicd
**Last reviewed:** 2026-05-04

## In plain English

Think of this as Claude Code working inside your CI without anyone watching. It runs every time a pull request opens, reviews each changed file in isolation, and posts inline comments back on the PR. No interactive prompts, no human at a keyboard. The agent is invoked as a one-shot command (`claude -p`), it returns a structured JSON verdict, and the workflow turns that JSON into the comments your team actually reads. The whole point is that PR review at scale needs a reviewer that does not get tired by file 14, and a CI-native agent that runs per file is exactly that.

## Exam impact

Domain 3 (Claude Code Configuration, 20%) tests the headless `-p` flag, custom_instructions, and per-file session boundaries. Domain 2 (Tool Design, 18%) tests explicit allowed_tools in CI (no wildcards) and structured-JSON output contracts. This is in the published exam guide AND practice exam. High-yield drilling territory. The 'why does file 14 contradict file 3' question is canonical.

## The problem

### What the customer needs
- PR review on every pull request without a human running the CLI manually. Claude Code triggered by GitHub Actions on pull_request.
- Per-file feedback that doesn't lose track of conventions established earlier in the diff. File 14 must agree with file 3 on style and tests.
- Structured output the workflow can parse into PR review comments, not free-form prose with brittle regex.

### Why naive approaches fail
- Single session for all 14 files → context pollutes → file 14 contradicts file 3 (lost-in-the-middle).
- Forgetting -p → workflow hangs waiting for interactive input → CI timeout after 6 hours, no review posted.
- Free-form text output → next step uses regex to extract issues → brittle, breaks on every Claude wording change.

### Definition of done
- Per-file PR review fires on every pull_request event
- Each file reviewed in its own claude -p session (isolated context)
- Output is JSON (--output-format json); workflow parses with jq, posts comments via gh CLI
- allowed_tools is an explicit list in YAML (no * wildcard)
- Project context flows in via custom_instructions reading .github/claude-context.md

## Concepts in play

- 🟢 **CLAUDE.md hierarchy** (`claude-md-hierarchy`), Project context injected via custom_instructions
- 🟢 **Context window** (`context-window`), Per-file isolation prevents lost-in-the-middle
- 🟢 **Tool calling** (`tool-calling`), Explicit allowed_tools whitelist in CI
- 🟢 **Structured outputs** (`structured-outputs`), --output-format json contract
- 🟢 **Subagents** (`subagents`), Each per-file session is effectively a subagent
- 🟢 **Evaluation** (`evaluation`), Two-pass: per-file local then PR-level integration
- 🟢 **Batch API** (`batch-api`), Overnight nightly audits for non-blocking checks
- 🟢 **Hooks** (`hooks`), PreToolUse hooks for CI cost guards (deny network egress)

## Components

### GitHub Actions Workflow, .github/workflows/claude.yml

Triggers on pull_request events. Authenticates with the Claude Code GitHub App. Loops over the changed files (via gh pr diff --name-only) and dispatches a per-file claude -p invocation. Owns retries, concurrency, and the eventual gh pr review post.

**Configuration:** on: pull_request. Steps: actions/checkout → install Claude Code CLI → for each changed file, run claude review pr -p --output-format json --custom-instructions .github/claude-context.md. Concurrency: max 4 parallel files; the rest queue.
**Concept:** `claude-md-hierarchy`

### claude -p Headless Invocation, non-interactive, one-shot per file

Runs Claude Code in headless mode. The -p flag disables the interactive REPL; the agent processes a single task, emits output to stdout, and exits. No human at a keyboard, no waiting on prompts. This is the CI primitive. Without -p, the workflow hangs.

**Configuration:** claude review pr -p --output-format json --custom-instructions ".github/claude-context.md" --files "src/auth/login.ts" --max-turns 6
**Concept:** `context-window`

### Per-File Session Isolation, one claude -p invocation per changed file

Each changed file gets its OWN headless session. No shared context across files. This is the single biggest architectural decision: 14 isolated sessions × small context > 1 session × 14 files of accumulated context. The latter triggers lost-in-the-middle by file 8-10; the former does not.

**Configuration:** Loop in workflow YAML: for f in $(gh pr diff --name-only); do claude review pr -p --files $f >> review-$f.json; done. Each file's review is independent; the parent workflow aggregates JSON.
**Concept:** `subagents`

### Structured JSON Output Contract, --output-format json + jq parsing

Claude emits a structured object per file: { file, verdict (approve | request_changes | comment), issues: [{ line, severity, message, suggestion? }], summary }. The workflow's next step parses with jq and posts inline comments via gh pr review --comment-line. No regex parsing of free-form prose.

**Configuration:** Schema (per file): { file: string, verdict: 'approve'|'request_changes'|'comment', issues: [{line: int, severity: 'blocker'|'nit'|'praise', message: string, suggestion?: string}], summary: string }
**Concept:** `structured-outputs`

### PR Comment Poster + Allowed-Tools Gate, gh pr review + explicit tool whitelist

Final workflow step reads the aggregated JSON, runs gh pr review --request-changes --body $SUMMARY and gh pr review --comment-line N $MSG per issue. Critical: the claude -p invocation declares --allowed-tools Read,Grep,Glob,Bash (no wildcard, no Edit, no Write). CI agents never need write access to the repo; they read and report.

**Configuration:** --allowed-tools "Read,Grep,Glob,Bash(git diff,git log,gh pr diff)". Wildcards in CI are a red flag. They expand the blast radius of any prompt-injection in PR content.
**Concept:** `tool-calling`

## Build steps

### 1. Scaffold the GitHub Actions workflow

Create .github/workflows/claude.yml. Trigger on pull_request events. Install the Claude Code GitHub App via /install-github-app (one-time, generates the OAuth token stored as secrets.CLAUDE_CODE_OAUTH_TOKEN). Checkout the PR head ref, install the CLI, then dispatch the per-file review loop.

**Python:**

```python
# .github/workflows/claude.yml
name: Claude PR review
on:
  pull_request:
    types: [opened, synchronize]

jobs:
  review:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      pull-requests: write
    steps:
      - uses: actions/checkout@v4
        with:
          ref: ${{ github.event.pull_request.head.sha }}
          fetch-depth: 2

      - name: Install Claude Code CLI
        run: npm i -g @anthropic-ai/claude-code

      - name: Per-file review
        env:
          CLAUDE_CODE_OAUTH_TOKEN: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }}
          GH_TOKEN: ${{ github.token }}
        run: ./.github/scripts/per-file-review.sh
```

**TypeScript:**

```typescript
// .github/workflows/claude.yml
// name: Claude PR review
// on:
//   pull_request:
//     types: [opened, synchronize]
//
// jobs:
//   review:
//     runs-on: ubuntu-latest
//     permissions:
//       contents: read
//       pull-requests: write
//     steps:
//       - uses: actions/checkout@v4
//         with:
//           ref: ${{ github.event.pull_request.head.sha }}
//           fetch-depth: 2
//
//       - name: Install Claude Code CLI
//         run: npm i -g @anthropic-ai/claude-code
//
//       - name: Per-file review
//         env:
//           CLAUDE_CODE_OAUTH_TOKEN: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }}
//           GH_TOKEN: ${{ github.token }}
//         run: ./.github/scripts/per-file-review.sh
```

Concept: `claude-md-hierarchy`

### 2. Run claude -p with --output-format json per file

Loop over the changed files (gh pr diff --name-only). For each one, run claude review in headless mode (-p), with --output-format json, --allowed-tools explicit, and --custom-instructions pointing at a markdown file that holds the project's CLAUDE.md context. Capture each file's JSON output to disk; aggregate at the end.

**Python:**

```python
# .github/scripts/per-file-review.sh
#!/usr/bin/env bash
set -euo pipefail

REVIEW_DIR=$(mktemp -d)
echo "files=$REVIEW_DIR" >> "$GITHUB_OUTPUT"

# Per-file independent sessions. NOT one big session
for f in $(gh pr diff --name-only); do
  echo "::group::Reviewing $f"
  out="$REVIEW_DIR/$(echo "$f" | tr '/' '_').json"
  claude review pr \
    -p \
    --output-format json \
    --custom-instructions ".github/claude-context.md" \
    --allowed-tools "Read,Grep,Glob,Bash(git diff,git log,gh pr diff)" \
    --files "$f" \
    --max-turns 6 \
    > "$out" || echo '{"file":"'"$f"'","verdict":"comment","issues":[],"summary":"review failed"}' > "$out"
  echo "::endgroup::"
done

# Aggregate to one file the next step parses
jq -s '.' "$REVIEW_DIR"/*.json > review-aggregate.json
```

**TypeScript:**

```typescript
// .github/scripts/per-file-review.ts (Node.js variant)
import { execSync } from "node:child_process";
import { writeFileSync, mkdtempSync } from "node:fs";
import { join } from "node:path";
import { tmpdir } from "node:os";

const reviewDir = mkdtempSync(join(tmpdir(), "claude-review-"));
const files = execSync("gh pr diff --name-only", { encoding: "utf8" })
  .trim()
  .split("\n")
  .filter(Boolean);

// Per-file independent sessions. NOT one big session
for (const f of files) {
  const out = join(reviewDir, f.replaceAll("/", "_") + ".json");
  try {
    const json = execSync(
      `claude review pr -p --output-format json ` +
        `--custom-instructions ".github/claude-context.md" ` +
        `--allowed-tools "Read,Grep,Glob,Bash(git diff,git log,gh pr diff)" ` +
        `--files "${f}" --max-turns 6`,
      { encoding: "utf8" },
    );
    writeFileSync(out, json);
  } catch {
    writeFileSync(
      out,
      JSON.stringify({
        file: f,
        verdict: "comment",
        issues: [],
        summary: "review failed",
      }),
    );
  }
}
```

Concept: `structured-outputs`

### 3. Inject project context via custom_instructions

Don't pass the entire project CLAUDE.md to claude -p. Too much. Instead, create .github/claude-context.md (committed to the repo) with the CI-relevant slice: stack, code style, what counts as a blocker vs a nit, what to skip (generated files, lockfiles). The --custom-instructions flag injects this into every per-file session.

**Python:**

```python
# .github/claude-context.md (committed; loaded by --custom-instructions)
# Project review rubric
# Stack: Next.js 15 + TypeScript strict + Drizzle. We use Server Actions
# instead of API routes; named exports only; 2-space indent.
#
# Severity rubric:
#   blocker  = type unsafe, secret leaked, missing test for changed code,
#              breaks API contract, server-action without zod validation
#   nit      = naming inconsistency, missing JSDoc, slightly verbose
#   praise   = clever simplification, good test, well-named refactor
#
# Skip (do NOT review):
#   - pnpm-lock.yaml
#   - public/scenarios/*.svg (generated)
#   - any file under .next/, dist/, build/, node_modules/
#
# Output format reminder:
#   Always emit JSON: { file, verdict, issues[], summary }.
#   verdict ∈ {approve, request_changes, comment}.
#   issues[].severity ∈ {blocker, nit, praise}.
```

**TypeScript:**

```typescript
// .github/claude-context.md (committed; loaded by --custom-instructions)
// # Project review rubric
// # Stack: Next.js 15 + TypeScript strict + Drizzle. We use Server Actions
// # instead of API routes; named exports only; 2-space indent.
// #
// # Severity rubric:
// #   blocker  = type unsafe, secret leaked, missing test for changed code,
// #              breaks API contract, server-action without zod validation
// #   nit      = naming inconsistency, missing JSDoc, slightly verbose
// #   praise   = clever simplification, good test, well-named refactor
// #
// # Skip (do NOT review):
// #   - pnpm-lock.yaml
// #   - public/scenarios/*.svg (generated)
// #   - any file under .next/, dist/, build/, node_modules/
// #
// # Output format reminder:
// #   Always emit JSON: { file, verdict, issues[], summary }.
// #   verdict ∈ {approve, request_changes, comment}.
// #   issues[].severity ∈ {blocker, nit, praise}.
```

Concept: `claude-md-hierarchy`

### 4. Lock allowed_tools. No wildcards in CI

In CI, the agent processes content from PR authors. Including authors outside your org. That content is untrusted. Wildcard --allowed-tools '*' lets a prompt-injection in a PR body or commit message escalate to write access on your repo. Always declare an explicit list: Read, Grep, Glob, and a NARROW Bash whitelist. Never Edit, Write, or open-ended Bash.

**Python:**

```python
# WRONG. Open invitation to prompt injection
# claude review pr -p --allowed-tools "*"
#
# WRONG. Bash with no whitelist
# claude review pr -p --allowed-tools "Read,Bash"
#
# RIGHT. Every tool explicit, Bash narrowly scoped to read-only commands
ALLOWED='Read,Grep,Glob,Bash(git diff,git log --oneline -20,gh pr diff,gh pr view)'
claude review pr -p \
  --allowed-tools "$ALLOWED" \
  --files "$f" \
  --max-turns 6 \
  --output-format json
#
# Periodically audit: grep workflows for any "*" or unscoped Bash
# rg -n 'allowed-tools.*"[^"]*\*' .github/workflows/
```

**TypeScript:**

```typescript
// WRONG. Open invitation to prompt injection
// execSync('claude review pr -p --allowed-tools "*"');
//
// WRONG. Bash with no whitelist
// execSync('claude review pr -p --allowed-tools "Read,Bash"');
//
// RIGHT. Every tool explicit, Bash narrowly scoped to read-only commands
const ALLOWED =
  "Read,Grep,Glob,Bash(git diff,git log --oneline -20,gh pr diff,gh pr view)";
execSync(
  `claude review pr -p --allowed-tools "${ALLOWED}" --files "${f}" --max-turns 6 --output-format json`,
);

// Periodically audit: grep workflows for any "*" or unscoped Bash
// rg -n 'allowed-tools.*"[^"]*\*' .github/workflows/
```

Concept: `tool-calling`

### 5. Parse JSON and post inline PR comments

Aggregate the per-file JSON outputs, then transform into gh pr review calls. One inline comment per issue (line + body), one summary review at the end (approve / request_changes / comment based on whether any blockers fired). The gh CLI handles the GitHub REST mechanics; you just feed it structured input.

**Python:**

```python
# .github/scripts/post-comments.sh
#!/usr/bin/env bash
set -euo pipefail

# Aggregate file from previous step
agg=review-aggregate.json

# Per-issue inline comments
jq -c '.[].issues[] as $i | { file: .[].file, line: $i.line, body: $i.message }' "$agg" \
  | while read -r row; do
      file=$(echo "$row" | jq -r .file)
      line=$(echo "$row" | jq -r .line)
      body=$(echo "$row" | jq -r .body)
      gh pr review --comment --body "$body" --comment-line "$line" -- "$file"
    done

# Summary verdict
blockers=$(jq '[.[] | .issues[] | select(.severity=="blocker")] | length' "$agg")
if [ "$blockers" -gt 0 ]; then
  gh pr review --request-changes --body "$blockers blocker(s) found. See inline comments."
else
  gh pr review --approve --body "LGTM. Inline nits/praise where applicable."
fi
```

**TypeScript:**

```typescript
// .github/scripts/post-comments.ts
import { execSync } from "node:child_process";
import { readFileSync } from "node:fs";

interface Issue {
  line: number;
  severity: "blocker" | "nit" | "praise";
  message: string;
}
interface FileReview {
  file: string;
  verdict: "approve" | "request_changes" | "comment";
  issues: Issue[];
  summary: string;
}

const agg: FileReview[] = JSON.parse(
  readFileSync("review-aggregate.json", "utf8"),
);

// Per-issue inline comments
for (const fr of agg) {
  for (const issue of fr.issues) {
    execSync(
      `gh pr review --comment --body "${issue.message}" --comment-line ${issue.line} -- "${fr.file}"`,
    );
  }
}

// Summary verdict
const blockers = agg.flatMap((fr) =>
  fr.issues.filter((i) => i.severity === "blocker"),
);
if (blockers.length > 0) {
  execSync(
    `gh pr review --request-changes --body "${blockers.length} blocker(s) found. See inline comments."`,
  );
} else {
  execSync(
    `gh pr review --approve --body "LGTM. Inline nits/praise where applicable."`,
  );
}
```

Concept: `structured-outputs`

### 6. Cap concurrency and add a per-PR cost budget

Per-file fan-out is parallel by default. But uncapped parallelism can exhaust the GitHub Actions concurrent-job limit and stack up token spend. Cap at ~4 parallel files. Add a per-PR token budget (env var checked at the start of each file) that aborts further reviews if the running PR would exceed the cap. Cost predictability beats marginal latency wins.

**Python:**

```python
# In per-file-review.sh. Bounded concurrency via xargs -P
gh pr diff --name-only \
  | xargs -n 1 -P 4 -I {} bash -c '
      file="$1"
      out="$REVIEW_DIR/$(echo "$file" | tr / _).json"
      claude review pr -p --output-format json --files "$file" \
        --custom-instructions ".github/claude-context.md" \
        --allowed-tools "Read,Grep,Glob,Bash(git diff,git log,gh pr diff)" \
        --max-turns 6 > "$out"
    ' _ {}

# Per-PR token budget guard (run before the loop)
TOKEN_BUDGET=${CLAUDE_PR_TOKEN_BUDGET:-200000}
estimated=$(gh pr diff --name-only | wc -l | awk -v b="$TOKEN_BUDGET" '{ print int(b / NR) }')
if [ "$estimated" -lt 5000 ]; then
  echo "::warning::PR has too many files for budget $TOKEN_BUDGET; consider splitting"
  exit 0
fi
```

**TypeScript:**

```typescript
// In per-file-review.ts. Bounded concurrency via simple semaphore
const MAX_PARALLEL = 4;
let inFlight = 0;
const queue: Array<() => Promise<void>> = files.map((f) => async () => {
  // ...claude review for f...
});

async function run() {
  while (queue.length || inFlight) {
    while (inFlight < MAX_PARALLEL && queue.length) {
      const job = queue.shift()!;
      inFlight++;
      job().finally(() => {
        inFlight--;
      });
    }
    await new Promise((r) => setTimeout(r, 50));
  }
}

// Per-PR token budget guard
const TOKEN_BUDGET = Number(process.env.CLAUDE_PR_TOKEN_BUDGET ?? 200_000);
const estimatedPerFile = Math.floor(TOKEN_BUDGET / files.length);
if (estimatedPerFile < 5000) {
  console.warn(
    `::warning::PR has too many files for budget ${TOKEN_BUDGET}; consider splitting`,
  );
  process.exit(0);
}

await run();
```

Concept: `context-window`

### 7. Use Batch API for nightly audits, sync API for blocking review

PR review is blocking. The developer is waiting; sync API is the right call. But you also want a nightly audit pass (drift detection, security regression scan) that doesn't need to finish in minutes. That's where the Batch API earns its 50% discount: submit overnight, results in 24h, review the next morning. Two different APIs for two different latency budgets.

**Python:**

```python
# Sync API. Pre-merge blocking review (latency matters)
# (this is what the per-file-review.sh script above uses)

# Batch API. Overnight audit job (latency doesn't matter, cost does)
import anthropic, json

client = anthropic.Anthropic()

# Build a batch from yesterday's merged PRs
requests = []
for pr in get_merged_prs(since="24h"):
    for f in pr.changed_files:
        requests.append({
            "custom_id": f"audit-{pr.number}-{f.path}",
            "params": {
                "model": "claude-sonnet-4.5",
                "max_tokens": 1024,
                "messages": [{"role": "user", "content": f.diff}],
            },
        })

batch = client.messages.batches.create(requests=requests)
print(f"Submitted batch {batch.id}; will be ready in ~24h")
# Tomorrow morning, fetch results, write to a Slack #audit-drift channel.
```

**TypeScript:**

```typescript
// Sync API. Pre-merge blocking review (latency matters)
// (this is what the per-file-review.ts script above uses)

// Batch API. Overnight audit job (latency doesn't matter, cost does)
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();

interface MergedPr { number: number; changed_files: Array<{ path: string; diff: string }>; }
declare function getMergedPrs(opts: { since: string }): MergedPr[];

const requests = [];
for (const pr of getMergedPrs({ since: "24h" })) {
  for (const f of pr.changed_files) {
    requests.push({
      custom_id: `audit-${pr.number}-${f.path}`,
      params: {
        model: "claude-sonnet-4.5",
        max_tokens: 1024,
        messages: [{ role: "user", content: f.diff }],
      },
    });
  }
}

const batch = await client.messages.batches.create({ requests });
console.log(`Submitted batch ${batch.id}; will be ready in ~24h`);
// Tomorrow morning, fetch results, write to a Slack #audit-drift channel.
```

Concept: `batch-api`

### 8. Add a CI cost-guard hook + alerting

Wrap the claude -p invocation in a PreToolUse hook that aborts the review if the PR is over a token-budget threshold (e.g. >100K tokens of diff). This protects you from a runaway 10-million-line PR exhausting your monthly Claude budget in one CI run. Pair with a workflow alert to a Slack channel when the hook fires.

**Python:**

```python
# .claude/hooks/ci_cost_guard.py
import sys, json, subprocess

MAX_TOKENS_PER_PR = 200_000

def main():
    payload = json.loads(sys.stdin.read())
    if payload["tool_name"] != "claude_review_pr":
        sys.exit(0)
    diff = subprocess.check_output(["gh", "pr", "diff"], text=True)
    estimated = len(diff) // 4  # ~4 chars per token rule of thumb
    if estimated > MAX_TOKENS_PER_PR:
        print(
            f"PR exceeds token budget: ~{estimated} tokens > "
            f"{MAX_TOKENS_PER_PR}. Split the PR or raise the limit "
            f"deliberately.",
            file=sys.stderr,
        )
        sys.exit(2)  # DENY
    sys.exit(0)
```

**TypeScript:**

```typescript
// .claude/hooks/ci-cost-guard.ts
import { readFileSync } from "node:fs";
import { execSync } from "node:child_process";

const MAX_TOKENS_PER_PR = 200_000;
const payload = JSON.parse(readFileSync(0, "utf8"));
if (payload.tool_name !== "claude_review_pr") process.exit(0);

const diff = execSync("gh pr diff", { encoding: "utf8" });
const estimated = Math.floor(diff.length / 4);
if (estimated > MAX_TOKENS_PER_PR) {
  process.stderr.write(
    `PR exceeds token budget: ~${estimated} tokens > ${MAX_TOKENS_PER_PR}. ` +
      `Split the PR or raise the limit deliberately.\n`,
  );
  process.exit(2); // DENY
}
process.exit(0);
```

Concept: `hooks`

## Decision matrix

| Decision | Right answer | Wrong answer | Why |
|---|---|---|---|
| Reviewing a 14-file PR | Per-file independent claude -p sessions, aggregated at the end | One claude -p session reviewing all 14 files together | Single-session review hits lost-in-the-middle by file 8-10; conventions established on file 3 drop out of attention by file 14. Per-file isolation eliminates that failure mode entirely. |
| Output format from claude -p in CI | --output-format json (parsed with jq → gh pr review) | Free-form prose, parsed downstream with regex | Regex over prose is brittle. Every Claude wording change breaks the pipeline. JSON contract is stable and the SDK guarantees the shape. |
| Tool access in CI | Explicit --allowed-tools list (Read, Grep, Glob, narrow Bash) | Wildcard --allowed-tools '*' or unscoped Bash | PR content is untrusted (external contributors, prompt injection vectors). Wildcards expand the blast radius; explicit whitelists cap it. CI agents never need write access. They read and report. |
| Pre-merge review vs nightly audit | Sync API for blocking pre-merge; Batch API for non-blocking overnight | Use one API for both | Pre-merge review is latency-critical (developer waiting); sync API is right. Nightly audit is cost-sensitive but not latency-bound; Batch API at 50% discount earns its keep. Two budgets, two APIs. |

## Failure modes

| Anti-pattern | Failure | Fix |
|---|---|---|
| AP-15 · Same session for all files | Workflow runs ONE claude -p over all 14 changed files. By file 14, lost-in-the-middle has dropped the conventions established on file 3. Inline comments on file 14 contradict the comments on file 3. | Per-file independent sessions. Loop the 14 files, run one claude -p invocation per file, aggregate the JSON outputs at the end. Each file gets a fresh, focused context. |
| AP-16 · Forgot the -p flag | Workflow invokes claude review without -p. The CLI starts in interactive mode, waits for input, and the GitHub Actions runner times out after 6 hours with no review posted. | Always pass -p for non-interactive headless execution. CI hangs are silent failures; the -p flag is what makes Claude Code CI-safe in the first place. |
| AP-17 · Unstructured text output | Workflow asks Claude to 'review the file and post inline comments'. Output is free-form prose. Next step uses regex to extract issues. Every Claude phrasing change breaks the regex; the workflow silently posts nothing. | Always pass --output-format json. The output is a structured contract: { file, verdict, issues[], summary }. Workflow parses with jq and posts via gh pr review --comment-line. |
| AP-18 · Wildcard --allowed-tools in CI | Workflow uses --allowed-tools '*' for convenience. A prompt-injection in a PR description tricks the agent into running Bash(rm -rf .) or writing a malicious file. Repo state corrupted; PR reviewer can't tell what happened. | Always declare an explicit allowed-tools list: Read,Grep,Glob,Bash(git diff,git log,gh pr diff). CI agents never need write tools. Wildcards expand the prompt-injection blast radius; explicit lists cap it. |
| AP-19 · No project context in CI | claude -p runs without --custom-instructions. Claude doesn't know the project's stack, code style, or what counts as a blocker. Reviews are generic; flags style-correct code as 'should use named exports' in a default-exports codebase. | Commit .github/claude-context.md with the CI-relevant slice of CLAUDE.md (stack, severity rubric, files to skip). Pass it via --custom-instructions .github/claude-context.md on every claude -p invocation. |

## Implementation checklist

- [ ] .github/workflows/claude.yml exists; triggers on pull_request opened+synchronize
- [ ] Claude Code GitHub App installed; CLAUDE_CODE_OAUTH_TOKEN in repo secrets
- [ ] Workflow runs claude review with -p (headless) on every invocation (`context-window`)
- [ ] Per-file loop. Each changed file in its own claude -p session (`subagents`)
- [ ] --output-format json on every invocation; jq parses the aggregate (`structured-outputs`)
- [ ] --allowed-tools is an explicit list (no wildcards, no Edit/Write) (`tool-calling`)
- [ ] .github/claude-context.md committed; passed via --custom-instructions (`claude-md-hierarchy`)
- [ ] Concurrency capped (xargs -P 4 or simple JS semaphore)
- [ ] PreToolUse cost-guard hook denies PRs over the token budget (`hooks`)
- [ ] Nightly audit job uses Batch API (50% discount, 24h SLA) (`batch-api`)
- [ ] PR comments posted via gh pr review --comment-line + summary verdict

## Cost &amp; latency

- **Per-PR review (avg 8 files, 4 parallel):** ~$0.05-0.12 per PR, 8 files × ~10K input tokens (file diff + context-md) + ~2K output (JSON verdict) ≈ $0.04-0.10 + parallel overhead. Cap at ~$0.15 with the cost-guard hook to bound runaway 50-file PRs.
- **Nightly audit (Batch API, ~50 PRs/day):** ~$0.50-1.20 per night, 50 PRs × 8 files × ~5K tokens (audit-only, lighter prompt) at Batch API 50% discount. Result ready next morning; no developer waiting.
- **Custom-instructions caching:** ~30% savings on warm cache, .github/claude-context.md is stable across all per-file invocations within a single PR. Mark it cache_control: ephemeral; 5-min TTL keeps it warm across files in one workflow run.
- **p95 PR-review latency:** ~45-90 seconds for 8-file PR, Per-file ~8-15s × 4 parallel = ~30s + JSON aggregation + gh pr review posts. Fast enough that the developer doesn't context-switch away while waiting.
- **Cost-guard hook savings:** Prevents $X.XX runaways on outlier PRs, A 200-file PR (rare but real. E.g. lockfile bumps) without the hook would burn ~$3-6 in one workflow run. The hook denies before the loop starts; cost reverts to $0.

## Domain weights

- **D3 · Agent Operations (20%):** claude -p flag + custom_instructions + .github/workflows/claude.yml + per-file isolation
- **D2 · Tool Design + Integration (18%):** Explicit allowed_tools whitelist + structured JSON output contract + Batch API split

## Practice questions

### Q1. A GitHub Actions CI pipeline runs Claude Code to review PRs. It processes all 14 modified files in a SINGLE claude -p session. After 8 files the output becomes repetitive and misses issues. By file 14, an inline comment contradicts a decision made on file 3. Why?

Lost-in-the-middle effect across the 14-file context. Single-session review accumulates context; by file 8-10 attention dilutes and conventions established on file 3 drop out. The fix is per-file independent sessions: loop the changed files, run one claude -p invocation per file, aggregate the JSON outputs at the end. Each file gets fresh, focused context. Tagged to AP-15.

### Q2. Your CI/CD workflow hangs when calling claude review pr.md. The GitHub Actions runner eventually times out after 6 hours with no review posted. What flag is missing?

The -p flag. Without it, Claude Code starts in interactive mode and waits for human input. Which never arrives in CI, so the runner just hangs. -p (or --print) puts the CLI into headless one-shot mode: it processes a single task, emits output to stdout, and exits. Always pass -p in any non-interactive context. Tagged to AP-16.

### Q3. A GitHub Action posts inline review comments on PR lines. Claude outputs free-form prose; the workflow uses regex to extract issues, line numbers, and severities. After every Claude wording update, the regex breaks and reviews silently fail. What architectural change fixes this for good?

Pass --output-format json. Claude emits a structured object per file: { file, verdict, issues: [{ line, severity, message, suggestion? }], summary }. The workflow parses with jq and posts via gh pr review --comment-line. Regex over prose is brittle by construction; structured-output contracts are the only durable answer. Tagged to AP-17.

### Q4. Your CI/CD pipeline uses --allowed-tools '*' because the team didn't want to maintain a list. A PR description includes a prompt-injection that tricks Claude into running Bash(rm -rf .). Repo state corrupted. What's the rule?

No wildcards in CI. Ever. PR content (description, commit messages, diff, file contents) is untrusted. An external contributor or a prompt-injection in any of those vectors can escalate. Always declare an explicit list: --allowed-tools "Read,Grep,Glob,Bash(git diff,git log,gh pr diff,gh pr view)". CI review agents never need Edit, Write, or unscoped Bash; they read and report. Tagged to AP-18.

### Q5. Your Claude Code CI action runs without project context. Reviews flag style-correct code as 'should use named exports' in a default-exports codebase. What YAML field provides the project-specific rubric?

--custom-instructions pointing at a committed markdown file (e.g. .github/claude-context.md). The file documents the project's stack, severity rubric (what counts as a blocker vs nit), and files to skip (lockfiles, generated assets). Don't pass the entire CLAUDE.md. Too much context. Pass the CI-relevant slice. Tagged to AP-19.

## FAQ

### Q1. Is claude -p the same as the SDK?

Functionally similar; ergonomically different. The SDK is for code that needs programmatic control (custom orchestration, response streaming, tool definitions in code). claude -p is for shell-driven workflows (CI, cron, dev scripts) where the input is a prompt and the output is a structured response. CI workflows almost always want -p; bespoke automation almost always wants the SDK.

### Q2. What's the maximum file count per PR before this approach breaks down?

No hard limit, but ~30 files is the ergonomic ceiling for blocking pre-merge review. Past that, sync-API latency adds up (~30s × ceil(N/4) parallel). For 30+ file PRs, switch to Batch-API audit (results next morning) and use the sync API only for files that touch security-critical paths (auth/, payments/, infra/). Two-track review.

### Q3. Can I run claude -p on the same PR every time it's updated?

Yes. That's the synchronize event. The workflow trigger should be on: pull_request: types: [opened, synchronize]. Synchronize fires on every push to the PR head ref. Idempotency: each run reviews the *current* diff, so old comments stay until they're stale; if you want to dismiss outdated reviews, add a gh pr review --dismiss step that targets reviews on commits no longer at HEAD.

### Q4. How do I keep the cost predictable if the team merges 100+ PRs a day?

Three levers, in priority order: (1) PreToolUse cost-guard hook that denies any single PR over a token budget. Protects from outliers; (2) --max-turns capped at 4-6. Bounds the worst case per file; (3) Concurrency cap (xargs -P 4 or JS semaphore). Bounds parallel Claude API calls in flight. With all three, monthly cost variance stays inside ±10%.

### Q5. Should the CI workflow have write access to the repo?

No. Read-only on the repo, write-only on PR comments and reviews. GitHub Actions permissions: block: contents: read, pull-requests: write. The CI agent reads code, runs gh pr diff, posts comments. It never needs to push commits. Same logic that bans write tools in --allowed-tools applies at the GitHub permission layer.

### Q6. What's in .github/claude-context.md vs the project's main CLAUDE.md?

The CI slice, not the whole thing. Main CLAUDE.md targets developers running Claude Code interactively. It covers full stack, conventions, examples, troubleshooting (~300-500 lines). .github/claude-context.md is the trimmed CI rubric: stack one-liner, severity definitions, files to skip, output-format reminder (~40-80 lines). Smaller context = faster review + cheaper tokens.

### Q7. Can I run claude -p reviews with custom Skills?

Yes, and you should. Create .claude/skills/code-reviewer/SKILL.md with the team's review rubric, allowed tools, and output schema. The CI workflow invokes the Skill explicitly: claude review pr -p --skill code-reviewer .... Skills are version-controlled (live in the repo), so the rubric evolves with the codebase and CI behaviour stays in sync.

## Production readiness

- [ ] .github/workflows/claude.yml committed and triggers on pull_request
- [ ] Smoke test: open a 1-file PR; review posts within 60s with structured comments
- [ ] Stress test: open a 30-file PR; concurrency cap prevents runner exhaustion
- [ ] Cost-guard test: open a synthetic 200-file PR; hook denies before loop starts
- [ ] Schema lint: validate every JSON output against the file-review schema
- [ ] Permissions audit: workflow permissions: block is contents:read + pull-requests:write only
- [ ] Tools audit: no --allowed-tools '*' anywhere in .github/workflows/
- [ ] Nightly audit Batch API job runs and posts to #audit-drift on completion

---

**Source:** https://claudearchitectcertification.com/scenarios/claude-code-for-cicd
**Vault sources:** ACP-T05 §Scenario 5 (5 ✅/❌ pairs · official guide scenario); ACP-T08 §3.5 metadata; Course 04 Claude Code in Action. Lesson 12 GitHub integration; ACP-T06 (5 practice Qs tagged to components); ACP-T07 §Lab 5 spec (claude -p, JSON output, independent sessions); GAI-K05 CCA exam questions and scenarios
**Last reviewed:** 2026-05-04

**Evidence tiers**, 🟢 official Anthropic doc · 🟡 partial doc / inferred · 🟠 community-derived · 🔴 disputed.
