The problem
What the customer needs
- PR review on every pull request without a human running the CLI manually. Claude Code triggered by GitHub Actions on
pull_request. - Per-file feedback that doesn't lose track of conventions established earlier in the diff. File 14 must agree with file 3 on style and tests.
- Structured output the workflow can parse into PR review comments, not free-form prose with brittle regex.
Why naive approaches fail
- Single session for all 14 files → context pollutes → file 14 contradicts file 3 (lost-in-the-middle).
- Forgetting
-p→ workflow hangs waiting for interactive input → CI timeout after 6 hours, no review posted. - Free-form text output → next step uses regex to extract issues → brittle, breaks on every Claude wording change.
- Per-file PR review fires on every
pull_requestevent - Each file reviewed in its own claude -p session (isolated context)
- Output is JSON (
--output-format json); workflow parses with jq, posts comments via gh CLI allowed_toolsis an explicit list in YAML (no*wildcard)- Project context flows in via
custom_instructionsreading.github/claude-context.md
The system
What each part does
5 components, each owns a concept. Click any card to drill into the underlying primitive.
GitHub Actions Workflow
.github/workflows/claude.yml
Triggers on pull_request events. Authenticates with the Claude Code GitHub App. Loops over the changed files (via gh pr diff --name-only) and dispatches a per-file claude -p invocation. Owns retries, concurrency, and the eventual gh pr review post.
Configuration
on: pull_request. Steps: actions/checkout → install Claude Code CLI → for each changed file, run claude review pr -p --output-format json --custom-instructions .github/claude-context.md. Concurrency: max 4 parallel files; the rest queue.
claude -p Headless Invocation
non-interactive, one-shot per file
Runs Claude Code in headless mode. The -p flag disables the interactive REPL; the agent processes a single task, emits output to stdout, and exits. No human at a keyboard, no waiting on prompts. This is the CI primitive. Without -p, the workflow hangs.
Configuration
claude review pr -p --output-format json --custom-instructions ".github/claude-context.md" --files "src/auth/login.ts" --max-turns 6
Per-File Session Isolation
one claude -p invocation per changed file
Each changed file gets its OWN headless session. No shared context across files. This is the single biggest architectural decision: 14 isolated sessions × small context > 1 session × 14 files of accumulated context. The latter triggers lost-in-the-middle by file 8-10; the former does not.
Configuration
Loop in workflow YAML: for f in $(gh pr diff --name-only); do claude review pr -p --files $f >> review-$f.json; done. Each file's review is independent; the parent workflow aggregates JSON.
Structured JSON Output Contract
--output-format json + jq parsing
Claude emits a structured object per file: { file, verdict (approve | request_changes | comment), issues: [{ line, severity, message, suggestion? }], summary }. The workflow's next step parses with jq and posts inline comments via gh pr review --comment-line. No regex parsing of free-form prose.
Configuration
Schema (per file): { file: string, verdict: 'approve'|'request_changes'|'comment', issues: [{line: int, severity: 'blocker'|'nit'|'praise', message: string, suggestion?: string}], summary: string }
PR Comment Poster + Allowed-Tools Gate
gh pr review + explicit tool whitelist
Final workflow step reads the aggregated JSON, runs gh pr review --request-changes --body $SUMMARY and gh pr review --comment-line N $MSG per issue. Critical: the claude -p invocation declares --allowed-tools Read,Grep,Glob,Bash (no wildcard, no Edit, no Write). CI agents never need write access to the repo; they read and report.
Configuration
--allowed-tools "Read,Grep,Glob,Bash(git diff,git log,gh pr diff)". Wildcards in CI are a red flag. They expand the blast radius of any prompt-injection in PR content.
Data flow
Eight steps to production
Scaffold the GitHub Actions workflow
Create .github/workflows/claude.yml. Trigger on pull_request events. Install the Claude Code GitHub App via /install-github-app (one-time, generates the OAuth token stored as secrets.CLAUDE_CODE_OAUTH_TOKEN). Checkout the PR head ref, install the CLI, then dispatch the per-file review loop.
# .github/workflows/claude.yml
name: Claude PR review
on:
pull_request:
types: [opened, synchronize]
jobs:
review:
runs-on: ubuntu-latest
permissions:
contents: read
pull-requests: write
steps:
- uses: actions/checkout@v4
with:
ref: ${{ github.event.pull_request.head.sha }}
fetch-depth: 2
- name: Install Claude Code CLI
run: npm i -g @anthropic-ai/claude-code
- name: Per-file review
env:
CLAUDE_CODE_OAUTH_TOKEN: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }}
GH_TOKEN: ${{ github.token }}
run: ./.github/scripts/per-file-review.shRun claude -p with --output-format json per file
Loop over the changed files (gh pr diff --name-only). For each one, run claude review in headless mode (-p), with --output-format json, --allowed-tools explicit, and --custom-instructions pointing at a markdown file that holds the project's CLAUDE.md context. Capture each file's JSON output to disk; aggregate at the end.
# .github/scripts/per-file-review.sh
#!/usr/bin/env bash
set -euo pipefail
REVIEW_DIR=$(mktemp -d)
echo "files=$REVIEW_DIR" >> "$GITHUB_OUTPUT"
# Per-file independent sessions. NOT one big session
for f in $(gh pr diff --name-only); do
echo "::group::Reviewing $f"
out="$REVIEW_DIR/$(echo "$f" | tr '/' '_').json"
claude review pr \
-p \
--output-format json \
--custom-instructions ".github/claude-context.md" \
--allowed-tools "Read,Grep,Glob,Bash(git diff,git log,gh pr diff)" \
--files "$f" \
--max-turns 6 \
> "$out" || echo '{"file":"'"$f"'","verdict":"comment","issues":[],"summary":"review failed"}' > "$out"
echo "::endgroup::"
done
# Aggregate to one file the next step parses
jq -s '.' "$REVIEW_DIR"/*.json > review-aggregate.jsonInject project context via custom_instructions
Don't pass the entire project CLAUDE.md to claude -p. Too much. Instead, create .github/claude-context.md (committed to the repo) with the CI-relevant slice: stack, code style, what counts as a blocker vs a nit, what to skip (generated files, lockfiles). The --custom-instructions flag injects this into every per-file session.
# .github/claude-context.md (committed; loaded by --custom-instructions)
# Project review rubric
# Stack: Next.js 15 + TypeScript strict + Drizzle. We use Server Actions
# instead of API routes; named exports only; 2-space indent.
#
# Severity rubric:
# blocker = type unsafe, secret leaked, missing test for changed code,
# breaks API contract, server-action without zod validation
# nit = naming inconsistency, missing JSDoc, slightly verbose
# praise = clever simplification, good test, well-named refactor
#
# Skip (do NOT review):
# - pnpm-lock.yaml
# - public/scenarios/*.svg (generated)
# - any file under .next/, dist/, build/, node_modules/
#
# Output format reminder:
# Always emit JSON: { file, verdict, issues[], summary }.
# verdict ∈ {approve, request_changes, comment}.
# issues[].severity ∈ {blocker, nit, praise}.Lock allowed_tools. No wildcards in CI
In CI, the agent processes content from PR authors. Including authors outside your org. That content is untrusted. Wildcard --allowed-tools '*' lets a prompt-injection in a PR body or commit message escalate to write access on your repo. Always declare an explicit list: Read, Grep, Glob, and a NARROW Bash whitelist. Never Edit, Write, or open-ended Bash.
# WRONG. Open invitation to prompt injection
# claude review pr -p --allowed-tools "*"
#
# WRONG. Bash with no whitelist
# claude review pr -p --allowed-tools "Read,Bash"
#
# RIGHT. Every tool explicit, Bash narrowly scoped to read-only commands
ALLOWED='Read,Grep,Glob,Bash(git diff,git log --oneline -20,gh pr diff,gh pr view)'
claude review pr -p \
--allowed-tools "$ALLOWED" \
--files "$f" \
--max-turns 6 \
--output-format json
#
# Periodically audit: grep workflows for any "*" or unscoped Bash
# rg -n 'allowed-tools.*"[^"]*\*' .github/workflows/Parse JSON and post inline PR comments
Aggregate the per-file JSON outputs, then transform into gh pr review calls. One inline comment per issue (line + body), one summary review at the end (approve / request_changes / comment based on whether any blockers fired). The gh CLI handles the GitHub REST mechanics; you just feed it structured input.
# .github/scripts/post-comments.sh
#!/usr/bin/env bash
set -euo pipefail
# Aggregate file from previous step
agg=review-aggregate.json
# Per-issue inline comments
jq -c '.[].issues[] as $i | { file: .[].file, line: $i.line, body: $i.message }' "$agg" \
| while read -r row; do
file=$(echo "$row" | jq -r .file)
line=$(echo "$row" | jq -r .line)
body=$(echo "$row" | jq -r .body)
gh pr review --comment --body "$body" --comment-line "$line" -- "$file"
done
# Summary verdict
blockers=$(jq '[.[] | .issues[] | select(.severity=="blocker")] | length' "$agg")
if [ "$blockers" -gt 0 ]; then
gh pr review --request-changes --body "$blockers blocker(s) found. See inline comments."
else
gh pr review --approve --body "LGTM. Inline nits/praise where applicable."
fiCap concurrency and add a per-PR cost budget
Per-file fan-out is parallel by default. But uncapped parallelism can exhaust the GitHub Actions concurrent-job limit and stack up token spend. Cap at ~4 parallel files. Add a per-PR token budget (env var checked at the start of each file) that aborts further reviews if the running PR would exceed the cap. Cost predictability beats marginal latency wins.
# In per-file-review.sh. Bounded concurrency via xargs -P
gh pr diff --name-only \
| xargs -n 1 -P 4 -I {} bash -c '
file="$1"
out="$REVIEW_DIR/$(echo "$file" | tr / _).json"
claude review pr -p --output-format json --files "$file" \
--custom-instructions ".github/claude-context.md" \
--allowed-tools "Read,Grep,Glob,Bash(git diff,git log,gh pr diff)" \
--max-turns 6 > "$out"
' _ {}
# Per-PR token budget guard (run before the loop)
TOKEN_BUDGET=${CLAUDE_PR_TOKEN_BUDGET:-200000}
estimated=$(gh pr diff --name-only | wc -l | awk -v b="$TOKEN_BUDGET" '{ print int(b / NR) }')
if [ "$estimated" -lt 5000 ]; then
echo "::warning::PR has too many files for budget $TOKEN_BUDGET; consider splitting"
exit 0
fiUse Batch API for nightly audits, sync API for blocking review
PR review is blocking. The developer is waiting; sync API is the right call. But you also want a nightly audit pass (drift detection, security regression scan) that doesn't need to finish in minutes. That's where the Batch API earns its 50% discount: submit overnight, results in 24h, review the next morning. Two different APIs for two different latency budgets.
# Sync API. Pre-merge blocking review (latency matters)
# (this is what the per-file-review.sh script above uses)
# Batch API. Overnight audit job (latency doesn't matter, cost does)
import anthropic, json
client = anthropic.Anthropic()
# Build a batch from yesterday's merged PRs
requests = []
for pr in get_merged_prs(since="24h"):
for f in pr.changed_files:
requests.append({
"custom_id": f"audit-{pr.number}-{f.path}",
"params": {
"model": "claude-sonnet-4.5",
"max_tokens": 1024,
"messages": [{"role": "user", "content": f.diff}],
},
})
batch = client.messages.batches.create(requests=requests)
print(f"Submitted batch {batch.id}; will be ready in ~24h")
# Tomorrow morning, fetch results, write to a Slack #audit-drift channel.Add a CI cost-guard hook + alerting
Wrap the claude -p invocation in a PreToolUse hook that aborts the review if the PR is over a token-budget threshold (e.g. >100K tokens of diff). This protects you from a runaway 10-million-line PR exhausting your monthly Claude budget in one CI run. Pair with a workflow alert to a Slack channel when the hook fires.
# .claude/hooks/ci_cost_guard.py
import sys, json, subprocess
MAX_TOKENS_PER_PR = 200_000
def main():
payload = json.loads(sys.stdin.read())
if payload["tool_name"] != "claude_review_pr":
sys.exit(0)
diff = subprocess.check_output(["gh", "pr", "diff"], text=True)
estimated = len(diff) // 4 # ~4 chars per token rule of thumb
if estimated > MAX_TOKENS_PER_PR:
print(
f"PR exceeds token budget: ~{estimated} tokens > "
f"{MAX_TOKENS_PER_PR}. Split the PR or raise the limit "
f"deliberately.",
file=sys.stderr,
)
sys.exit(2) # DENY
sys.exit(0)The four decisions
| Decision | Right answer | Wrong answer | Why |
|---|---|---|---|
| Reviewing a 14-file PR | Per-file independent claude -p sessions, aggregated at the end | One claude -p session reviewing all 14 files together | Single-session review hits lost-in-the-middle by file 8-10; conventions established on file 3 drop out of attention by file 14. Per-file isolation eliminates that failure mode entirely. |
| Output format from claude -p in CI | --output-format json (parsed with jq → gh pr review) | Free-form prose, parsed downstream with regex | Regex over prose is brittle. Every Claude wording change breaks the pipeline. JSON contract is stable and the SDK guarantees the shape. |
| Tool access in CI | Explicit --allowed-tools list (Read, Grep, Glob, narrow Bash) | Wildcard --allowed-tools '*' or unscoped Bash | PR content is untrusted (external contributors, prompt injection vectors). Wildcards expand the blast radius; explicit whitelists cap it. CI agents never need write access. They read and report. |
| Pre-merge review vs nightly audit | Sync API for blocking pre-merge; Batch API for non-blocking overnight | Use one API for both | Pre-merge review is latency-critical (developer waiting); sync API is right. Nightly audit is cost-sensitive but not latency-bound; Batch API at 50% discount earns its keep. Two budgets, two APIs. |
Where it breaks
Five failure pairs. Each one is one exam question. The fix is always architectural, deterministic gates, structured fields, pinned state.
Workflow runs ONE claude -p over all 14 changed files. By file 14, lost-in-the-middle has dropped the conventions established on file 3. Inline comments on file 14 contradict the comments on file 3.
Per-file independent sessions. Loop the 14 files, run one claude -p invocation per file, aggregate the JSON outputs at the end. Each file gets a fresh, focused context.
Workflow invokes claude review without -p. The CLI starts in interactive mode, waits for input, and the GitHub Actions runner times out after 6 hours with no review posted.
AP-16Always pass -p for non-interactive headless execution. CI hangs are silent failures; the -p flag is what makes Claude Code CI-safe in the first place.
Workflow asks Claude to 'review the file and post inline comments'. Output is free-form prose. Next step uses regex to extract issues. Every Claude phrasing change breaks the regex; the workflow silently posts nothing.
AP-17Always pass --output-format json. The output is a structured contract: { file, verdict, issues[], summary }. Workflow parses with jq and posts via gh pr review --comment-line.
Workflow uses --allowed-tools '*' for convenience. A prompt-injection in a PR description tricks the agent into running Bash(rm -rf .) or writing a malicious file. Repo state corrupted; PR reviewer can't tell what happened.
Always declare an explicit allowed-tools list: Read,Grep,Glob,Bash(git diff,git log,gh pr diff). CI agents never need write tools. Wildcards expand the prompt-injection blast radius; explicit lists cap it.
claude -p runs without --custom-instructions. Claude doesn't know the project's stack, code style, or what counts as a blocker. Reviews are generic; flags style-correct code as 'should use named exports' in a default-exports codebase.
Commit .github/claude-context.md with the CI-relevant slice of CLAUDE.md (stack, severity rubric, files to skip). Pass it via --custom-instructions .github/claude-context.md on every claude -p invocation.
Cost & latency
8 files × ~10K input tokens (file diff + context-md) + ~2K output (JSON verdict) ≈ $0.04-0.10 + parallel overhead. Cap at ~$0.15 with the cost-guard hook to bound runaway 50-file PRs.
50 PRs × 8 files × ~5K tokens (audit-only, lighter prompt) at Batch API 50% discount. Result ready next morning; no developer waiting.
.github/claude-context.md is stable across all per-file invocations within a single PR. Mark it cache_control: ephemeral; 5-min TTL keeps it warm across files in one workflow run.
Per-file ~8-15s × 4 parallel = ~30s + JSON aggregation + gh pr review posts. Fast enough that the developer doesn't context-switch away while waiting.
A 200-file PR (rare but real. E.g. lockfile bumps) without the hook would burn ~$3-6 in one workflow run. The hook denies before the loop starts; cost reverts to $0.
Ship checklist
Two passes. Build-time gates verify the code; run-time gates verify the system in production.
Build-time
- .github/workflows/claude.yml exists; triggers on pull_request opened+synchronize
- Claude Code GitHub App installed; CLAUDE_CODE_OAUTH_TOKEN in repo secrets
- Workflow runs claude review with -p (headless) on every invocation↗ context-window
- Per-file loop. Each changed file in its own claude -p session↗ subagents
- --output-format json on every invocation; jq parses the aggregate↗ structured-outputs
- --allowed-tools is an explicit list (no wildcards, no Edit/Write)↗ tool-calling
- .github/claude-context.md committed; passed via --custom-instructions↗ claude-md-hierarchy
- Concurrency capped (xargs -P 4 or simple JS semaphore)
- PreToolUse cost-guard hook denies PRs over the token budget↗ hooks
- Nightly audit job uses Batch API (50% discount, 24h SLA)↗ batch-api
- PR comments posted via gh pr review --comment-line + summary verdict
Run-time
- .github/workflows/claude.yml committed and triggers on pull_request
- Smoke test: open a 1-file PR; review posts within 60s with structured comments
- Stress test: open a 30-file PR; concurrency cap prevents runner exhaustion
- Cost-guard test: open a synthetic 200-file PR; hook denies before loop starts
- Schema lint: validate every JSON output against the file-review schema
- Permissions audit: workflow
permissions:block is contents:read + pull-requests:write only - Tools audit: no
--allowed-tools '*'anywhere in.github/workflows/ - Nightly audit Batch API job runs and posts to #audit-drift on completion
6 exam-pattern questions
Tap an answer to check it instantly, then expand the full breakdown for the rationale per option, the mental model under test, and the priority order across distractors.
Tap your answer to check it.
Tap your answer to check it.
Tap your answer to check it.
Tap your answer to check it.
Tap your answer to check it.
1 more questions on this scenario in the practice pillar. Take a mock exam ↗
Frequently asked
Is claude -p the same as the SDK?
claude -p is for shell-driven workflows (CI, cron, dev scripts) where the input is a prompt and the output is a structured response. CI workflows almost always want -p; bespoke automation almost always wants the SDK.What's the maximum file count per PR before this approach breaks down?
Can I run claude -p on the same PR every time it's updated?
on: pull_request: types: [opened, synchronize]. Synchronize fires on every push to the PR head ref. Idempotency: each run reviews the *current* diff, so old comments stay until they're stale; if you want to dismiss outdated reviews, add a gh pr review --dismiss step that targets reviews on commits no longer at HEAD.How do I keep the cost predictable if the team merges 100+ PRs a day?
xargs -P 4 or JS semaphore). Bounds parallel Claude API calls in flight. With all three, monthly cost variance stays inside ±10%.Should the CI workflow have write access to the repo?
permissions: block: contents: read, pull-requests: write. The CI agent reads code, runs gh pr diff, posts comments. It never needs to push commits. Same logic that bans write tools in --allowed-tools applies at the GitHub permission layer.What's in `.github/claude-context.md` vs the project's main CLAUDE.md?
.github/claude-context.md is the trimmed CI rubric: stack one-liner, severity definitions, files to skip, output-format reminder (~40-80 lines). Smaller context = faster review + cheaper tokens.Can I run claude -p reviews with custom Skills?
.claude/skills/code-reviewer/SKILL.md with the team's review rubric, allowed tools, and output schema. The CI workflow invokes the Skill explicitly: claude review pr -p --skill code-reviewer .... Skills are version-controlled (live in the repo), so the rubric evolves with the codebase and CI behaviour stays in sync.