What does an architecture contract actually contain?

Four parts: a layered dependency graph (which layer may import which), naming conventions (route files, schema files, test files), the authorized refactor surface (directories the agent may modify), and the unauthorized zones (generated code, secret boundaries, vendor lockfiles). A useful contract fits in 30 lines. It is not a manifesto; it is a structured prefix the agent reads at the start of every session.

Why pass the contract as a system-prompt prefix instead of relying on CLAUDE.md?

Both work for the parent session; only the prefix works for subagents and remote agents. CLAUDE.md is a parent-session filesystem read (see /knowledge/subagent-claude-md-inheritance for the inheritance failure mode). Subagents start with empty context. A contract injected as a system-prompt prefix travels wherever the prompt travels; CLAUDE.md does not. For multi-agent pipelines, prefix is the only reliable propagation mechanism.

What is scoped refactor authorization and why does it matter?

A declaration of which directories the agent may modify and which abstraction layers it may touch. Without it, the most common production failure is the agent refactoring 30 files to fix a one-file bug. With it, attempts to write outside the scope are rejected by a tool-call filter or a hook. The fix is structural, not behavioral; you cannot rely on the agent honoring a polite request to stay in scope when the most efficient path to passing the test crosses the boundary.

What is plan-confirm-execute and how does it map to Plan Mode?

Plan-confirm-execute is the orchestration pattern. The agent emits a structured plan (files to touch, abstractions to introduce, public-API changes). A human or orchestrator approves. Only then does the agent execute. Plan Mode is Claude Code's built-in implementation of the pattern. Per the /concepts/plan-mode page in the vault: 'The plan-mode loop has four decision points. Explore, Propose, Decide, Execute. Each phase is fully visible; you can exit, iterate, or reject without partial damage.'

When is Plan Mode overkill?

Single-file bugfixes with a clear stack trace; rename refactors driven by an IDE; mechanical codemods. Per the vault: 'The trigger threshold is 3+ steps involving architectural or design decisions. A single-file bugfix with a clear stack trace does not need plan mode.' Test: can you confidently describe the change without exploring the codebase first? If yes, skip Plan Mode. If no, require it.

How does retrieval over codebase embeddings help with architecture awareness?

It keeps the agent aware of cross-file contracts without dumping the whole repo into the context window. A retrieval query against an embedding index returns the 5-10 most relevant snippets for a given change. The agent sees the contract surface (public API, types, callers) without loading the implementations. Saves tokens, preserves attention budget, and prevents the 'agent invents a function that already exists' failure.

What is an architecture linter and which one should I use?

A static tool that checks code against declared module-boundary rules. Examples: Deptrac (PHP), ArchUnit (Java/Kotlin), dependency-cruiser (TypeScript), import-linter (Python). The pattern is the same across all of them: declare layers, declare allowed imports, fail CI on violations. The linter runs after the agent commits, catches drift, and gives the orchestrator a deterministic feedback signal it can pass back to the agent for repair.

What is the single highest-leverage step?

Plan-confirm-execute. The contract sets the rules, but the gate is where rules get enforced before the diff exists. A team that ships only the gate, with no formal contract, still avoids the worst failure mode (silent over-refactor). A team that ships only the contract, with no gate, ends up arguing with a finished diff every time.

How does this show up on the CCA-F exam?

Across three domains. D1 (Agentic Architectures, 27%) tests the orchestrator-subagent topology - plan-confirm-execute is a named pattern. D2 (Tool Design, 18%) tests the authorization scope you grant to tool calls - refactor authorization is the worked example. D5 (Context + Reliability, 15%) tests retrieval and memory-injected context. Stem pattern: 'A coding agent produces a refactor that violates module boundaries. Which workflow control would have prevented it?' Distractor: 'Use a more capable model.'

Can I get away with hooks instead of plan mode?

Partially. Hooks are reactive enforcement (PreToolUse denies an action; PostToolUse normalizes a result). They can block out-of-scope writes deterministically. But hooks fire on each tool call; they cannot evaluate a multi-step plan as a whole. The strongest stacks use both: Plan Mode for architectural decisions, hooks for per-call enforcement. Per the /concepts/system-prompts page: 'Deterministic enforcement beats prompt guidance for high-stakes decisions.'

How do I keep the architecture contract from drifting from reality?

Generate it from source. The dependency graph comes from the build system, not from documentation. The naming conventions come from a linter config, not from a README. The authorized refactor surface is a real file (a YAML manifest the orchestrator reads). When the build changes, the contract changes; when the contract changes, the agent's instructions change. No drift, because there is only one source.

What is the biggest mistake teams make when they first try this?

Treating the contract as documentation instead of as input. A README that says 'follow our architecture' does not reach the agent. The contract must be machine-readable, injected into the prompt, enforced at the gate, and validated in CI. Four touchpoints, not one. The teams that ship documentation and hope for the best are the same teams who report 'the agent does not respect our conventions' two weeks later.

Architecture-Aware, Refactor-Authorized Workflows: Getting Useful Results from AI Coding Agents

01 · TLDR

The short version

AI coding agents do not respect what you do not tell them. Five controls turn an unreliable agent into a useful one: an architecture contract injected as a system-prompt prefix, scoped refactor authorization, a plan-confirm-execute handshake, retrieval over codebase embeddings, and CI validation against an architecture linter. Plan Mode is Claude Code's implementation of the handshake. The pattern maps to D1 (orchestration topology), D2 (tool authorization), and D5 (memory and reliability). The exam distractor to reject is always "use a more capable model" - the defect is workflow scope, not capability.

02 · Why this matters in production

The over-refactor failure mode

A common production scene: the user asks the agent to fix a single failing test. Twenty minutes later the agent returns a PR that rewrites the service it lives in, introduces a new abstraction it did not need, and renames a public export three callers depend on. The original test passes. Everything else is on fire. The bug is not the model. The bug is that the agent was authorized to touch every file and was never told what the architecture is, so it picked the shortest path to green tests - which crossed every boundary in the codebase.

Per the /concepts/plan-mode page in the vault: "Direct execution starts modifying files immediately on a clear request. Plan mode is for ambiguous or multi-faceted requests where there are 3+ viable architectures with tradeoffs. Without upfront analysis, rework is inevitable." The over-refactor is the rework. The fix is structural - you cannot prompt your way out of a workflow that lets the agent ship a 30-file diff before a human sees the plan.

03 · The mechanics

Five controls that compound, not five tactics to pick from

1. Architecture contract. A short, machine-readable declaration of layers, naming, and the authorized refactor surface. It lives as a system-prompt prefix so it travels to subagents (which do not inherit CLAUDE.md - see /knowledge/subagent-claude-md-inheritance). A useful contract fits in 30 lines.

# Architecture contract

Layers (allowed imports flow top-to-bottom):
  app/        -> may import components/, lib/, convex/_generated
  components/ -> may import lib/, convex/_generated
  convex/     -> may import lib/
  lib/        -> pure, no Next.js or Convex imports

Naming:
  - Route files: app/.../page.tsx
  - Convex schema: convex/schema.ts
  - Tests: *.test.ts colocated with source

Authorized refactor surface:
  - components/templates/*
  - lib/markup.tsx

Unauthorized (read-only):
  - convex/_generated/
  - .env.local, *.env

2. Scoped refactor authorization.The authorized surface is not just documentation; it is enforced at the tool layer. A PreToolUse hook reads the Write/Edit tool call, checks the path against the authorized list, exits 0 to allow or 2 to deny with a message routed back to the model. Per the /scenarios/agentic-tool-design pattern in the vault: "The single-most-effective lever for converting probabilistic prompt-only policies into 100%-deterministic gates."

3. Plan-confirm-execute.The agent emits a structured plan first (files to touch, abstractions to introduce, public-API changes). A human or orchestrator approves. Only then does the agent write code. Per the /concepts/plan-mode page: "The plan-mode loop has four decision points. Explore: Claude reads the codebase, writes a diagnosis. Propose: Claude offers 2-3 options with explicit tradeoffs. Decide: you approve, request changes, or reject. Execute: only after approval, Claude modifies files and runs tests."

4. Retrieval over codebase embeddings.An embedding index over the repo. When the agent needs to understand a function it has not seen, it queries the index instead of dumping the whole file into context. Returns 5-10 relevant snippets - the contract surface, not the implementation. Saves tokens, preserves attention, prevents the "invented a function that already exists" failure mode. Memory-injected context is the D5 reliability lever.

5. Architecture linter in CI. The contract is also a linter config. Dependency-cruiser, ArchUnit, Deptrac, import-linter - pick the one for your language. The linter fails the build on boundary violations. The orchestrator picks up the failure, feeds it back to the agent, and the agent fixes the drift in the next iteration. Deterministic enforcement at the CI boundary catches what the gate at the prompt layer missed.

The controls compound. The contract sets the rules. The scoped authorization enforces them per tool call. The handshake gates the plan. Retrieval keeps the agent informed. The CI linter catches everything else. Skip any one and the others get exercised harder; skip two and the failure mode you were trying to prevent starts shipping again.

04 · Decision rule and checklist

Seven steps to a useful coding-agent workflow

Generate the architecture contract from source. Build-system graph, lint config, real file paths. Not a README. Not a wiki page.
Inject the contract as a system-prompt prefix. Not as a CLAUDE.md hint; subagents do not see it. Prefix travels everywhere the prompt goes.
Wire a PreToolUse hook for scoped refactor authorization. Match on Write/Edit; check the path against the authorized list; deny with a clear message routed back to the model.
Require Plan Mode for any change of 3+ steps or with architectural impact.Skip it for trivial edits. The bar is "can I describe the change without exploring the codebase first"; if no, require plan.
Stand up an embedding index over the repo. Refresh on every merge. Expose as a retrieval tool the agent can call before reading files.
Add an architecture linter to CI. Same rules as the contract; failure on boundary violations; output piped back to the orchestrator.
Track over-refactor rate as a workflow KPI. Count diffs that touch more files than the plan declared. Trend it. If it climbs, your gate is leaking.

05 · Common anti-patterns

Five recurring failures

Documentation-as-contract.A README that says "follow our architecture." Cause: confusing human-readable intent with machine-readable input. Fix: a YAML/JSON manifest the orchestrator reads and injects.
Hopeful scoping. The system prompt politely asks the agent to stay inside components/. Cause: no enforcement. Fix: a PreToolUse hook checking the path on every Write/Edit.
Plan Mode for everything or for nothing.Either every trivial edit blocks on approval, or nothing does. Cause: missing trigger heuristic. Fix: the 3-step threshold and the "can I describe it cold" test.
Codebase context dumped wholesale. The agent loads ten 2KB files to find one function signature. Cause: no retrieval layer. Fix: an embedding index that returns just the contract surface.
No CI gate. The contract lives in the prompt only; nothing catches drift after the merge. Cause: treating the prompt as the last line of defense. Fix: an architecture linter that fails the build deterministically.

06 · CCA-F exam mapping

How this shows up on the exam

Domains: D1 Agentic Architectures (27%) · D2 Tool Design (18%) · D5 Context + Reliability (15%)
What is tested: Whether you reach for a workflow control when the symptom is an over-refactor or boundary violation. The exam expects scope and handshake answers, not model-capability or window-size answers.
Stem pattern: A coding agent produces a refactor that violates module boundaries. Which single workflow control would have prevented it?
Distractor to reject: "Use a more capable model." The defect is workflow scope, not model capability. Model vs Design heuristic per ACP-T03 §6.
Second distractor: "Add more detailed prompt instructions." Prompt-only policies are probabilistic; deterministic enforcement (hooks, gates, linters) beats prompt guidance for high-stakes decisions.
Third distractor: "Increase the context window so the agent sees more files." Larger windows do not introduce module boundaries; retrieval over the contract surface does.

07 · Sources

Vault and external references

Vault: data/aeo/reports/2026-05-17-recommendations.md §Signal for architecture-aware agentic workflows - source of the five-control framing.
Vault: data/aeo/reports/2026-05-16-recommendations.md §Signal - earliest formulation of the architecture-aware-AI-coding-agent recommendation.
Vault: public/concepts/plan-mode.md §What it is / §How it works - authoritative Plan Mode four-phase loop and 3-step trigger threshold.
Vault: public/scenarios/agentic-tool-design.md§PreToolUse Hook - "The single-most-effective lever for converting probabilistic prompt-only policies into 100%-deterministic gates."
Vault: public/concepts/system-prompts.md §How it works - deterministic enforcement beats prompt guidance; two-layer enforcement model.
Vault: 06-knowledge/digital-marketing/aeo-geo/gregisenberg/aeg-k0755-be-a-10x-vibe-coder-claude-code-cursor-mcp.md - external validation of the "force AI to outline steps via Plan Mode before executing" workflow tactic.
Vault: 06-knowledge/coding/cod-k16-ohmyopenagent-architecture-review.md - architecture-review pattern with LSP + AST-grep + TDD verification feeding the refactor loop.
Vault: 02-tasks/acp-t08-route-content-and-design-specs.md §Template B v3 - canonical IA sequence: user request → plan mode → approval → skill invocation → tool calls → validation.

Architecture-Aware, Refactor-Authorized Workflows: Getting Useful Results from AI Coding Agents.

The short version

The over-refactor failure mode

Five controls that compound, not five tactics to pick from

Seven steps to a useful coding-agent workflow

Five recurring failures

How this shows up on the exam

Vault and external references

Adjacent reads

Agentic loops

Plan mode

Subagent CLAUDE.md inheritance

Related

Share this primitive