# Claude Architect Certification Prep — Exam-Critical Subset --- --- > 25 technical-primitive concept pages — the minimum corpus to answer CCA-F prep questions. Skip scenarios + knowledge for token efficiency. --- --- Canonical pages: https://claudearchitectcertification.com/concepts --- Full corpus: https://claudearchitectcertification.com/llms-full.txt (concepts + scenarios + knowledge) --- Per-page twins: https://claudearchitectcertification.com/concepts/{slug}.md --- # Agentic Loops > An agentic loop turns one API response into a controlled, multi-turn system. The harness reads stop_reason, dispatches tools, appends observations, and decides whether to continue. The common failure is treating natural-language text as the termination signal. **Domain:** D1 · Agentic Architectures (27% of CCA-F exam) **Canonical:** https://claudearchitectcertification.com/concepts/agentic-loops **Last reviewed:** 2026-05-04 ## Quick stats - **Loop stages:** 4 - **Exam domain:** D1 - **Core exit lines:** 3 - **Text parsing tolerance:** 0 - **Scenario links:** 6 ## What it is A single call is one prompt and one response. An agentic loop is a harness pattern that repeats the call after every tool result. The user experience feels like one agent, but the engineering surface is a loop, a message list, and a termination check. ## Exam-pattern questions ### Q1. Your agentic loop keeps running after Claude has clearly finished its task. Which control was likely missed? The harness was checking content shape (text presence) instead of stop_reason. The most-tested distractor is "add a max_iterations cap". The right answer is "branch the loop on stop_reason 'end_turn'", structural field, never natural language. ### Q2. A subagent returns wrong facts about a customer the coordinator clearly mentioned earlier. Why? The coordinator forgot that subagents do not inherit conversation history. The distractor answer says "the subagent's max_tokens was too low". The actual fix is to pass every needed fact in the subagent's task string explicitly. ### Q3. An agent loop hits a budget ceiling and you keep raising max_iterations to make it pass. What is the real problem? max_iterations is a safety cap, not the primary exit condition. Repeatedly raising it masks the underlying bug, usually a missing tool_result append, ambiguous tool descriptions, or two tools the model alternates between. ### Q4. Your refund agent emits the words "I'm processing your refund now" but then makes another tool call. The harness exits early. Why? The harness is parsing the response text for completion phrases. Claude can return text and a tool_use block in the same response; only stop_reason is authoritative. The fix is to ignore the text and read stop_reason: "tool_use" to continue the loop. ### Q5. Long-document extraction stalls at chunk 18 every single time, regardless of model. What is happening? By turn 17 the message list contains 17 turns of tool_use plus tool_result blocks; turn 18 hits stop_reason: "max_tokens". The fix is context windowing: summarize old turns into a facts block, drop the verbose history, continue. Raising max_tokens does not fix it past a certain point. ### Q6. Your code-review CI bot returns valid JSON with empty findings: [] even on PRs that clearly have issues. Why? The bot is treating the presence of any text response as success. It hit stop_reason: "max_tokens" mid-review, but the code did not branch on that signal, so it serialized whatever was in the buffer, which happened to be empty. The fix is to handle max_tokens as a partial-result signal and emit truncated: true. ### Q7. A tool throws an exception inside your harness; on the next iteration Claude requests the same tool again with the same arguments. Why? Either the exception escaped to the loop and was caught silently, or the harness wrote a generic "tool failed" to tool_result. Claude saw no actionable error and re-requested. The fix is to catch the exception and return a structured error with a hint, e.g. {"error": "customer_not_found", "hint": "verify cus_ prefix"}. ### Q8. Two of your tools have similar names (fetch_data and get_data). The model picks the wrong one 30% of the time. Best first fix? Rewrite the descriptions, not the model. Each description should be 4 lines: what it does, when to use it, edge cases, and ordering boundaries with other tools. Bigger models route slightly better, but vague descriptions are the root cause; fine-tuning is a last resort. ## FAQ ### Q1. How is this different from a normal API call? The harness sends the cumulative message list again after each tool result. State accumulates over turns. ### Q2. Where do hooks fit? Hooks validate or enforce around tool calls. They do not replace stop_reason-based termination. --- **Source:** https://claudearchitectcertification.com/concepts/agentic-loops **Vault sources:** ACP-T03 §4.1; GAI-K04 §1 **Last reviewed:** 2026-05-04 **Evidence tiers**, 🟢 official Anthropic doc / API contract · 🟡 partial doc / inferred · 🟠 community-derived · 🔴 disputed. --- # Subagents > Subagents are specialized, isolated agents spawned by a coordinator to handle domain-specific tasks while preserving conversation isolation. They do NOT inherit memory, the coordinator passes context explicitly in the prompt. Hub-and-spoke prevents context creep from parallel work. **Domain:** D1 · Agentic Architectures (27% of CCA-F exam) **Canonical:** https://claudearchitectcertification.com/concepts/subagents **Last reviewed:** 2026-05-04 ## Quick stats - **Coordinator duties:** 4 - **Tool categories:** 3 - **Inheritance modes:** 1 - **Exam domain:** D1 - **Scopes:** 2 ## What it is Subagents are markdown-defined agents that operate in isolated contexts, spawned on demand by a coordinator. Each has its own system prompt, tool permissions, and model selection. Unlike multi-turn continuation, subagents receive complete context passed explicitly by the coordinator and do not carry forward prior conversation state. They excel at parallelizing independent tasks, scoping tool access (e.g., read-only review agents), and keeping verbose work out of the main thread. ## Exam-pattern questions ### Q1. Your code-review subagent missed three critical issues that you can clearly see in the file. Why? The subagent received only the file path, not the file contents, and lacked a Read tool. Subagents do not inherit coordinator history or environment. Embed the file content (or grant Read) explicitly in the task string. ### Q2. Two subagents are returning conflicting reports about the same bug. How do you resolve it? You don't, the coordinator does. Subagents never communicate directly. Aggregate both reports in the coordinator, ask Claude to reconcile, or escalate to a human reviewer with both reports as evidence. ### Q3. A research subagent ran for 40 turns and returned a perfect summary, but the bill is huge. What's the architectural fix? Define a structured output format in the subagent's system prompt. Without one, subagents wander. A clear output shape ({findings: [], confidence: number}) doubles as a stopping cue and caps token cost. ### Q4. Your retriever subagent has [Read, Grep, Glob, WebSearch, Bash, Edit] and accidentally modified a config file. What was the design error? Tool overscoping. A retriever should never have Edit. Restrict allowed-tools to [Read, Grep, Glob, WebSearch]. The minimum needed for the role. ### Q5. You spawned 4 subagents in parallel; 3 finished, the 4th hangs forever. How do you debug? Check the 4th subagent's task string. The most common cause is ambiguity, the subagent is asking for clarification or wandering. Add max_iterations and a deterministic output format to bound it. Don't retry blindly. ### Q6. Your coordinator passes the entire chat history to each subagent for "context." Subagents respond with confused outputs. Why? Subagents don't inherit history; passing it confuses them (they were not part of that conversation). Embed only the facts the subagent needs in a self-contained task string. Treat each subagent as starting from zero. ### Q7. A subagent returns stop_reason: "max_tokens" with a partial summary. Production code aborts. What should it do instead? Treat it as a partial result, not failure. Either accept the partial (with a confidence flag) or spawn a refined subagent ("focus on files X to Y only"). Aborting wastes the subagent's actual work. ### Q8. Your coordinator spawns Tool A, then waits for results, then spawns Tool B based on A's output. Why is this slower than ideal? It's sequential when it could be parallel for independent work. If Tool B doesn't strictly depend on Tool A's specific output, spawn both at once with Promise.all. The coordinator merges both outputs. ## FAQ ### Q1. Can subagents call other subagents? Yes, but not directly, the original coordinator must spawn the nested ones and manage all context passing. ### Q2. Subagent vs. separate Claude Code project? Subagents are lightweight isolated sessions inside one project. Separate projects are fully independent. Use subagents for in-project parallelism. ### Q3. When should work stay in the main thread vs. fork? If verbose or exploratory and not needed visible, fork. If reasoning needs to flow naturally in context, keep inline. --- **Source:** https://claudearchitectcertification.com/concepts/subagents **Vault sources:** ACP-T03 §4 (Domain 1); GAI-K04 §8 (hub-and-spoke); ASC-A01 Course 16 **Last reviewed:** 2026-05-04 **Evidence tiers**, 🟢 official Anthropic doc / API contract · 🟡 partial doc / inferred · 🟠 community-derived · 🔴 disputed. --- # Stop Reason > stop_reason is the authoritative struct field that says why Claude stopped, either end_turn (finished) or tool_use (wants to call a tool). Parsing natural-language phrases for termination is the most-tested distractor; it's unreliable and fails in production. Checking stop_reason is the only deterministic loop control. **Domain:** D1 · Agentic Architectures (27% of CCA-F exam) **Canonical:** https://claudearchitectcertification.com/concepts/stop-reason **Last reviewed:** 2026-05-04 ## Quick stats - **Core values:** 2 - **Canonical pattern:** 1 - **Anti-patterns:** 3 - **Exam domain:** D1 - **Reliability vs NL parsing:** 100% ## What it is stop_reason is a field in Claude's API response that signals why generation stopped. The two values you act on are "end_turn" (model is done) and "tool_use" (model wants tools executed). It is the authoritative loop-control signal. Termination based on regex or substring matches against text is probabilistic and breaks in production. Structured field checking is the correct pattern. ## Exam-pattern questions ### Q1. Your loop checks response.text.includes('done') to decide termination. What can go wrong? Claude may say "I'm done now" while emitting a tool_use block in the same response. The text is preamble; the tool_use is the real next step. Branch on stop_reason, not text. ### Q2. A response has both a text block and a tool_use block. Which should you handle first? Branch on stop_reason. If it's tool_use, execute the tool call regardless of text presence. If it's end_turn, the text is the final response. Block-level inspection is unreliable; the field is authoritative. ### Q3. You set max_iterations = 5 to prevent infinite loops. The agent fails on legitimate 7-iteration tasks. What's the real fix? Find why the loop is unbounded: missing tool_result append, ambiguous tool descriptions, or two tools that look interchangeable. Caps mask bugs; stop_reason is the primary signal. Raise the cap to a safety buffer, not the primary control. ### Q4. An agent loop hits stop_reason: "max_tokens". Your code throws an error and exits. What should production code do? Treat max_tokens as a partial-result signal, not a failure. Save what was generated, then either raise max_tokens for a retry or chunk the input. The agent did real work; don't discard it. ### Q5. You set a custom stop_sequences parameter. The agent stops mid-task. Why? Your stop sequence matched an unintended substring. Inspect response.stop_sequence to see which one triggered. Either tighten the sequence (more specific) or remove it. Custom stop sequences are a sharp tool. ### Q6. Your TypeScript handler has if/else branches for end_turn, tool_use, and max_tokens. What's missing? A stop_sequence branch for completeness, and a default branch for unknown values (defensive). Use a discriminated union type so the compiler enforces exhaustive checks. ### Q7. An agent calls the same tool 5 times in a row with identical input. Why? You forgot to append the tool_result block to the message list, so Claude doesn't know the tool ran. Without the result, the model re-requests indefinitely until max_iterations saves you. ### Q8. Why is stop_reason better than counting tool_use blocks for loop control? Block counting requires you to enumerate response.content and check types. stop_reason is a single authoritative field set during generation, designed for control flow. It's faster, cleaner, and immune to multi-block responses. ## FAQ ### Q1. Can Claude return text alongside a tool_use block? Yes, both can appear in the same response. Always check stop_reason, not text presence. ### Q2. What if stop_reason is something else? Other values like 'stop_sequence' or 'max_tokens' indicate error or limit conditions, not normal agentic flow. ### Q3. Does stop_reason guarantee tool args are valid? No. It tells you Claude wanted to call. You must still validate arguments before execution. --- **Source:** https://claudearchitectcertification.com/concepts/stop-reason **Vault sources:** ACP-T03 §4.1; GAI-K04 §1 **Last reviewed:** 2026-05-04 **Evidence tiers**, 🟢 official Anthropic doc / API contract · 🟡 partial doc / inferred · 🟠 community-derived · 🔴 disputed. --- # Session State & Persistence > Session state is the running message list (and any external persistence) that gives an agentic loop continuity. The exam tests progressive summarization tradeoffs (token savings vs. precision loss) and when checkpointing or external store reads are required. Full content lands in SCRUM-21 follow-up. **Domain:** D1 · Agentic Architectures (27% of CCA-F exam) **Canonical:** https://claudearchitectcertification.com/concepts/session-state **Last reviewed:** 2026-05-04 ## Quick stats - **Persistence layers:** 3 - **Exam domain:** D1 - **Trap pattern:** summary loss - **Coverage tier:** B - **Linked scenarios:** 2 ## Exam-pattern questions ### Q1. Your support agent forgets the customer ID by turn 30. What's the architectural fix? Pin a CASE_FACTS block in the system prompt with customer_id, order_id, refund_amount. Re-read every turn. The most-tested distractor is "increase max_tokens"; the right answer is structural state management. ### Q2. A subagent returns wrong findings; coordinator rediscovers the same problems on the next subagent spawn. Why? Coordinator didn't capture obstacles_encountered from the first subagent. Add a structured field so the coordinator stores and acts on the discovery, then includes the resolution in the next spawn's task string. ### Q3. Long-document extraction dies at chapter 18 every run. What's happening? Message list contains 17 turns of tool_use + tool_result; chapter 18 hits stop_reason: "max_tokens". Fix: checkpoint state, reset the session, reload the checkpoint, continue. Don't increase the model size. ### Q4. Which is wrong: summarize verbose reasoning, or summarize critical facts? Summarizing facts is wrong. Pin them in a CASE_FACTS block. Reasoning chains are summarizable; transactional values (amounts, IDs) are not. ### Q5. Subagent spawns receive the parent's full conversation history. Why is this an anti-pattern? Bloats the subagent context and dilutes focus. Pass a focused prompt with extracted facts and the specific subtask. Subagents work better with less, not more. ### Q6. Cross-task context (vendor matrix path) lives in: project Instructions or session messages? Project Instructions. Sessions are task-scoped (discarded at end). Project Instructions persist across tasks; folder files persist evolving state. ### Q7. When you hit max_tokens, does increasing the model's window solve the problem? Temporarily, yes. Architecturally, no. Larger windows defer the problem; the fix is active state management (case-facts, checkpointing, summarization) regardless of window size. ### Q8. An escalation hands off the entire conversation transcript to a human. What's the better pattern? Structured escalation block: customer_id, order_id, amount, reason, partial_status, recommended_action. Compact (200-500 chars). Human triages in 10 seconds vs 5 minutes for transcript. --- **Source:** https://claudearchitectcertification.com/concepts/session-state **Vault sources:** ACP-T03 §5 progressive summarization; GAI-K04 §8 multi-agent memory; ASC-A01 Course 3 **Last reviewed:** 2026-05-04 **Evidence tiers**, 🟢 official Anthropic doc / API contract · 🟡 partial doc / inferred · 🟠 community-derived · 🔴 disputed. --- # Human-in-the-Loop Escalation > Escalation is the deterministic handoff path when policy thresholds, low confidence, or repeated failure conditions are hit. The exam pattern: prompt-only escalation policy is wrong; the correct architecture uses hooks or hard gates. Full content in SCRUM-21 follow-up. **Domain:** D1 · Agentic Architectures (27% of CCA-F exam) **Canonical:** https://claudearchitectcertification.com/concepts/escalation **Last reviewed:** 2026-05-04 ## Quick stats - **Trigger types:** 3 - **Exam domain:** D1 - **Right answer:** deterministic gate - **Coverage tier:** B - **Trap:** prompt-only policy ## Exam-pattern questions ### Q1. Refund agent uses prompt-only enforcement: "escalate refunds over $500". Production failures show 5% of refunds violate the policy. Fix? Replace with a PreToolUse hook that checks if amount > 500: escalate. Deterministic gate. Prompt-only is probabilistic; hook is 100% enforcement. ### Q2. Two vendors return contradictory market sizes. The agent picks the median and continues. Why is this wrong? Ambiguity is an escalation trigger. Agent must surface the conflict, not silently average. Block: {conflict, sources, recommendation: "human sourcing"}. Otherwise the report is silently inaccurate. ### Q3. After PreToolUse blocks a refund, the agent retries the same call on the next loop. Why? The escalation didn't update the database with refund_approval flag. Round-trip pattern: human approves → DB flag set → agent retries → hook sees flag → allows execution. Without the flag, indefinite re-escalation. ### Q4. User says "speak to a manager." Should the agent negotiate first or escalate immediately? Escalate immediately. No reasoning, no multi-turn negotiation. Structural keyword check: if "manager" in message: escalate_immediately. This is policy, not judgment. ### Q5. Permission failure (403 from infrastructure), should the agent retry or escalate? Escalate. Permission failures are non-retryable error category 4. Block routes to the relevant on-call engineer with task description and partial status. Retrying wastes time and obscures the real fix (privilege grant). ### Q6. Difference between an error and an escalation? Error = unhandled (agent crashes, tool fails unexpectedly). Escalation = handled (agent recognizes a designed condition, stops gracefully, submits structured block). Escalation is a designed path, not a failure. ### Q7. Should a customer's tone (anger, frustration) trigger escalation? No. Sentiment is not a trigger. Only explicit conditions: policy exception, permission failure, ambiguous input, explicit request for human. Sentiment is orthogonal to escalation rules. ### Q8. Customer-blocking workflow vs batch workflow: how does the escalation pattern differ? Customer-blocking: agent stops, async queue with 5-10 min SLA, user sees "escalated, response in ~5min". Batch: save block, continue with fallback (auto-approve up to $100, escalate the rest). Design determines the pattern, not the trigger. --- **Source:** https://claudearchitectcertification.com/concepts/escalation **Vault sources:** ACP-T03 §4.1 escalation protocol; ACP-T05 Scenario 1 **Last reviewed:** 2026-05-04 **Evidence tiers**, 🟢 official Anthropic doc / API contract · 🟡 partial doc / inferred · 🟠 community-derived · 🔴 disputed. --- # Tool Calling > Tool calling is how Claude decides to invoke external functions and pass structured arguments. Tools are routing infrastructure, good descriptions reduce the need for classifiers or few-shot examples. The quality of tool design is the primary lever for correct task routing, not model size. **Domain:** D2 · Tool Design + Integration (18% of CCA-F exam) **Canonical:** https://claudearchitectcertification.com/concepts/tool-calling **Last reviewed:** 2026-05-04 ## Quick stats - **Description components:** 5 - **Optimal tools/agent:** 4–5 - **Degradation threshold:** 18+ - **Exam domain:** D2 - **tool_choice modes:** 3 ## What it is Tool calling is Claude's ability to request execution of external functions by returning structured tool_use blocks with function name and arguments. Tools serve as routing infrastructure, they guide Claude's decisions. Each tool needs a JSON schema describing inputs, outputs, and behavior. Good descriptions (what it does, when to use, edge cases, boundaries) are the primary determinant of correct routing. ## Exam-pattern questions ### Q1. Your agent calls the wrong tool 30% of the time across 8 similar tools. What's the first fix? Audit tool descriptions. Add the four-part anatomy: what it does, when to use, edge cases, ordering boundaries. Vague descriptions cause misrouting; bigger models don't fix it. ### Q2. You add 25 tools and selection accuracy drops sharply. Why? Beyond ~18 tools, attention dilutes and Claude can't reliably select. Split into specialized subagents, each with 4-5 tools max. Or merge overlapping tools. ### Q3. A tool's input_schema is {data: any} and Claude returns null. What happened? The schema gave no constraint, so Claude has no deterministic target. Model guessed null. Specify exact fields with types and examples; constraints reduce hallucination. ### Q4. Your tool throws an exception. The agent retries the same call with the same input. Why? Your harness swallowed the exception and returned an empty tool_result. Claude saw no error and re-requested. Return {error: "reason", hint: "how to fix"} so Claude can recover. ### Q5. Two tools have similar names: fetch_user and get_user. The agent alternates between them. Fix? Disambiguate in descriptions: "fetch_user: use only when you have an email. get_user: use when you have a numeric ID." Or merge them into one tool with an enum input. ### Q6. You force tool_choice: any to guarantee structured output. The agent still returns empty results. Why? any forces a tool call but the model still picks which one, and might pick wrong if descriptions overlap. Either force a specific tool or fix description disambiguation. ### Q7. Your tool returns 50KB of JSON per call. The loop hits max_tokens after 3 iterations. Architectural fix? Tools should return what's needed, not everything. Add a fields parameter to the tool's input schema so Claude can request specific fields. Or use a resource (read-only catalog) for catalog data. ### Q8. A tool needs the customer ID but Claude calls it without one. The harness errors out. What should the schema enforce? Mark customer_id as required in input_schema. Claude won't call without it. Combine with a clear description: "call only after verify_customer returns a customer_id." ## FAQ ### Q1. Is tool calling the same as MCP? No. Tool calling is the mechanism. MCP is infrastructure that exposes pre-built tools through that mechanism. ### Q2. tool_choice 'any' vs specific tool? 'any' forces a tool call but lets Claude pick which. Specific forces a particular tool, used for mandatory first steps. ### Q3. Should every action be a tool? No. Use tools for external operations and actions you must control. Keep internal reasoning off-tool. --- **Source:** https://claudearchitectcertification.com/concepts/tool-calling **Vault sources:** ACP-T03 §4.2; GAI-K04 §11; ASC-A01 Course 6 **Last reviewed:** 2026-05-04 **Evidence tiers**, 🟢 official Anthropic doc / API contract · 🟡 partial doc / inferred · 🟠 community-derived · 🔴 disputed. --- # Tool Choice > tool_choice is the parameter that controls whether Claude can decide to use a tool ("auto"), must call any tool ("any"), or must call a specific tool by name. Use specific-tool forcing for mandatory-first-step operations like identity verification before refunds. **Domain:** D2 · Tool Design + Integration (18% of CCA-F exam) **Canonical:** https://claudearchitectcertification.com/concepts/tool-choice **Last reviewed:** 2026-05-04 ## Quick stats - **Modes:** 3 - **Default:** auto - **Forced use cases:** 2 - **Exam domain:** D2 - **Guarantee:** 100% ## What it is tool_choice is an API parameter that constrains Claude's tool decision. "auto" lets the model decide whether to call any tool (default). "any" forces a tool call from the available set. {"type":"tool","name":"X"} forces tool X. Forced modes guarantee execution for compliance, structured-output extraction, or mandatory verification gates. ## Exam-pattern questions ### Q1. You enable extended_thinking and set tool_choice: any. The API returns a 400 error. Why? Extended thinking is incompatible with any and forced tool_choice. Use auto only when thinking is enabled. The model reasons, then decides whether to call a tool. ### Q2. Your refund flow uses tool_choice: any to guarantee a tool call. The agent calls lookup_order instead of verify_customer first. Fix? Switch to forced ({type: "tool", name: "verify_customer"}). any lets the model pick; only forced guarantees a specific tool. Use forced for mandatory architecture gates. ### Q3. The user asks a simple question and your forced tool_choice calls process_refund with empty input. What went wrong? Forced tool_choice removes the model's option to return text. For ambiguous requests where text is a valid response, use auto. Forced is for mandatory operations only. ### Q4. Your support agent uses tool_choice: auto. 30% of the time it asks clarifying questions instead of calling tools. Is this a bug? Not a bug, that's auto behavior. The model decides whether a tool is needed. If you need guaranteed tool calls, switch to any (model picks) or forced (you pick). Each has a tradeoff. ### Q5. You set tool_choice: any but the agent calls a tool and immediately returns valid input. Then a clarifying tool... Why is it slower than expected? Each tool call is a full turn round-trip. If the same task can be done in fewer calls (e.g. one tool with more params instead of three sequential calls), redesign the tool surface for fewer turns. ### Q6. Your forced tool_choice produces tool calls with hallucinated arguments. How do you guard against this? Force the tool, then validate inputs in your harness. Return structured errors via tool_result if inputs are invalid. The model retries with corrected arguments. Forced tool_choice does not validate inputs. ### Q7. Why is tool_choice: any rejected when extended_thinking is on, but allowed without it? Extended thinking generates reasoning tokens before the final response. Forcing a tool call would constrain the post-thinking output, breaking the reasoning contract. Anthropic disabled the combination at the API layer. ### Q8. You want both deep reasoning and a guaranteed tool call. Is that possible? Not in a single turn. Use extended thinking with auto for the reasoning step, then a follow-up turn with forced tool_choice if the model decides to act. Two turns, not one. ## FAQ ### Q1. Can I force multiple tools in sequence? Not in one call. tool_choice forces one call per request. Sequence by setting tool_choice on each successive call. ### Q2. What if I force a tool but the context doesn't fit? Claude still calls it, often with bad arguments. Only force when context guarantees a sensible call. ### Q3. Is forcing a tool the same as a hook? No. tool_choice is an API constraint on the model. Hooks run code before/after execution. Use both for full control. --- **Source:** https://claudearchitectcertification.com/concepts/tool-choice **Vault sources:** ACP-T03 §4.2 modes table; GAI-K04 §7 **Last reviewed:** 2026-05-04 **Evidence tiers**, 🟢 official Anthropic doc / API contract · 🟡 partial doc / inferred · 🟠 community-derived · 🔴 disputed. --- # Model Context Protocol > MCP is a communication standard that lets Claude access pre-built tools, resources, and prompts from specialized servers without you writing integration code. Connect to a GitHub MCP server instead of writing GitHub API tools yourself. Servers are configured in .mcp.json (project) or ~/.claude.json (user). **Domain:** D2 · Tool Design + Integration (18% of CCA-F exam) **Canonical:** https://claudearchitectcertification.com/concepts/mcp **Last reviewed:** 2026-05-04 ## Quick stats - **Primitives:** 3 - **Config levels:** 2 - **Transports:** 3 - **Exam domain:** D2 - **Pre-built sets:** 18+ ## What it is MCP decouples tool definition from application code. An MCP server wraps a service's API and exposes pre-built tools, resources (read-only data catalogs), and prompts (pre-written instructions). Your Claude application is an MCP client, connecting via stdio, SSE, or streamable HTTP. This eliminates repetitive schema authoring and error handling. Servers configure with environment variables for credentials. ## Exam-pattern questions ### Q1. You add a new MCP server to .mcp.json but Claude Code doesn't see its tools. What did you forget? Restart the Claude Code session. The framework caches the tool list at startup. Changes to .mcp.json require a restart, or a manual mcp restart command in newer versions. ### Q2. Two MCP servers expose tools with the same name (Read). What happens? The config is invalid. Tool names must be globally unique across all MCP servers. Prefix tools by server name (GithubRead, FileRead) or disable one of the conflicting servers. ### Q3. A cloud MCP server returns {error: "rate_limited"}. Your agent retries 5 times and exhausts budget. Better approach? Server should return structured rate-limit info: {error: "rate_limited", retry_after: 30}. Your harness reads this and waits before retrying. Don't hammer; respect the hint. ### Q4. You hardcoded a GitHub token in .mcp.json. A teammate clones the repo and your token leaks. Fix? Use ${GITHUB_TOKEN} env-var expansion. Each developer sets their own token in their environment. The .mcp.json is committed; the token is not. ### Q5. An MCP server resource lists 1000 items. Claude calls a tool 1000 times, one per item. Why? The resource is a catalog (read-only data Claude can scan), not a list of pre-staged actions. Reduce tool calls by structuring the resource so Claude can scan and pick targeted items, not iterate. ### Q6. Your custom MCP server crashes when Claude calls a tool with malformed JSON. What's the right error handling? Wrap tool execution in try/except. Return {error: "invalid_input", detail: "..."} as the tool_result. Claude reads it and retries with corrected input. Crashes leave the framework hanging. ### Q7. You want to add a custom internal API. Should you build a custom MCP server or expose tools directly via the SDK? MCP for shared/team use (multiple developers, multiple Claude sessions). Direct SDK tools for app-specific custom logic. MCP adds a small protocol overhead but enables reuse. ### Q8. An MCP server's tool description is auto-generated from the underlying API and reads like docstring noise. The model misroutes. Fix? Override the description in your MCP server config. The vendor's auto-generated description may be too generic. Write a 4-line description with what/when/edge-cases/boundaries. ## FAQ ### Q1. Do I have to use MCP? No. Write tools yourself if you prefer. MCP saves work when pre-built servers exist for the service you need. ### Q2. Can I create my own MCP server? Yes. Anthropic ships Python and Node.js SDKs; community implementations cover other languages. ### Q3. What languages can MCP servers be written in? Any. Python and Node.js are official; community SDKs cover the rest. --- **Source:** https://claudearchitectcertification.com/concepts/mcp **Vault sources:** ACP-T03 §4.2; GAI-K04 §13; ASC-A01 Courses 7 + 10 **Last reviewed:** 2026-05-04 **Evidence tiers**, 🟢 official Anthropic doc / API contract · 🟡 partial doc / inferred · 🟠 community-derived · 🔴 disputed. --- # SDK Hooks (Pre/PostToolUse) > Hooks are deterministic code that runs before or after tool calls. They enforce policy that prompt-only patterns can't guarantee, refund limits, identity verification, audit logging. The exam pattern: prompt-only enforcement is wrong; hook is the deterministic gate. Full content in SCRUM-21 follow-up. **Domain:** D2 · Tool Design + Integration (18% of CCA-F exam) **Canonical:** https://claudearchitectcertification.com/concepts/hooks **Last reviewed:** 2026-05-04 ## Quick stats - **Hook types:** 2 - **Exam domain:** D2 - **Trap:** prompt-only policy - **Coverage tier:** B - **Right answer:** deterministic ## Exam-pattern questions ### Q1. System prompt says "never refund more than 500." Production shows 3% violation rate. What's the architectural fix? Replace prompt with a PreToolUse hook checking amount <= 500. Exits 2 (deny) on violation. Prompt is probabilistic; hook is deterministic. The exam tests this Prompt-vs-Hook heuristic repeatedly. ### Q2. PostToolUse hook detects a policy violation and exits 2. Did the violation happen? Yes. PostToolUse runs after the tool completes. The transaction already happened. Use PreToolUse to prevent operations; PostToolUse is for logging, normalizing, cascading after the fact. ### Q3. Hook exits with code 1 and the agent halts unexpectedly. Why? Exit 1 = internal hook error (halts the agent). Exit 2 = deny (Claude reads stderr error, retries with adjusted args). Use exit 1 only for hook bugs (JSON parse failure, uncaught exception). For policy denial, always exit 2. ### Q4. Matcher "process.*" is supposed to fire on process_refund and process_charge, but it fires on neither. Why? Matchers are literal pipe-separated strings, not regex. Use "process_refund|process_charge" to match both. Regex patterns are silently ignored; matching is exact-string by tool name. ### Q5. Why must hook commands use absolute paths? Relative paths are vulnerable to path-interception attacks (MITRE T1574.007). An attacker plants a malicious script earlier in PATH; your hook runs theirs. Use /home/user/hooks/script.sh, not ./hooks/script.sh. Use ${HOME} or setup-script substitution for portability. ### Q6. Setting "async": true on a PreToolUse hook to avoid blocking. Why does this fail? PreToolUse cannot be async. Claude is waiting for the allow/deny decision before the tool runs. Async PreToolUse causes timeout. Async only works on PostToolUse (the tool already ran; the hook is reactive feedback). ### Q7. A hook needs the customer ID from prior conversation history. How does it get it? It doesn't. Hooks are stateless and isolated. They receive only tool_name, tool_input, session_id, hook_event_name. Pass anything you need explicitly via tool_input. The agent's job is to include needed context in tool calls. ### Q8. Two PreToolUse hooks match the same tool. What happens? The SDK runs them in sequence. Each hook gets stdin from the previous (or original tool_input). If the first exits 2 (deny), subsequent hooks don't run. Use multiple hooks to separate concerns: security check + quota check + audit. --- **Source:** https://claudearchitectcertification.com/concepts/hooks **Vault sources:** ACP-T03 §4.2 enforcement hierarchy; ACP-T03 §4.4 prompt vs hook trap; ASC-A01 Course 6 **Last reviewed:** 2026-05-04 **Evidence tiers**, 🟢 official Anthropic doc / API contract · 🟡 partial doc / inferred · 🟠 community-derived · 🔴 disputed. --- # Tool Evaluation & Testing > Evaluation is how you measure tool-call correctness against ground-truth datasets. Coverage in vault is thin; needs Phase 6 research for full authoring. **Domain:** D2 · Tool Design + Integration (18% of CCA-F exam) **Canonical:** https://claudearchitectcertification.com/concepts/evaluation **Last reviewed:** 2026-05-04 ## Quick stats - **Coverage tier:** C - **Exam domain:** D2 - **Status:** stub - **Vault depth:** thin - **Action:** research ## Exam-pattern questions ### Q1. You spot-check 5 examples manually. Production scores reveal 12% failure rate. What's the architectural gap? Manual spot-checks miss edge cases. A prompt working on 5 chosen examples might fail on 10% of real traffic. Evals run 50-100 diverse cases automatically and catch corner cases. Spot-checks are sanity, not proof. ### Q2. Tool-using agent's score is 95% on text-output evals. Production fails are still high. Why? Tool-call evals matter more for agents. An agent making the right calls but writing bad text succeeds at its job. Bad calls with great text fails. Always run tool-sequence evals; text quality is secondary. ### Q3. Eval score climbs from 80% to 87% after a prompt edit. Is the change ready to ship? Not yet. Run on fresh test cases you haven't trained against. Score plateau on a fixed suite can mask overfitting to the prompt. Production monitoring + shadow-mode evals reveal blind spots faster than test cases. ### Q4. A teammate says "Use Claude to grade the agent's outputs." When is this advice wrong? For deterministic properties (tool sequence, schema validation, rule compliance), use code-based grading. Deterministic, reproducible, no hallucination. Use Claude grading only for subjective cases (tone, completeness, empathy). ### Q5. You generate test cases by running the current agent and treating its output as expected. Why is this circular? You're measuring whether the agent repeats itself, not whether it's correct. Golden cases must be externally validated: human review, expert sign-off, or reference implementations. Grading against unverified data is theater. ### Q6. A prompt edit drops the score from 85% to 82%. What should you do? Roll back immediately. You've detected a regression. Analyze which cases regressed, decide if the trade-off is worth it, or iterate further. Evals let you make this call in minutes, not weeks. ### Q7. Your golden suite has 50 cases, but a production failure happens on a scenario not in the suite. What does this prove? Evals score only on test cases you designed. 95% on the suite is necessary but not sufficient. Pair evals with production monitoring; real users reveal blind spots faster than fixed test cases. ### Q8. Why are evals part of tool design, not just QA? Tool descriptions and schemas evolve; evals catch when an edit causes regressions. Run on every PR; failures gate the merge. Evals make tool design measurable, not just artisanal. --- **Source:** https://claudearchitectcertification.com/concepts/evaluation **Vault sources:** ACP-T03 §4.3 evaluation overview; ASC-A01 Course 6 evals lessons **Last reviewed:** 2026-05-04 **Evidence tiers**, 🟢 official Anthropic doc / API contract · 🟡 partial doc / inferred · 🟠 community-derived · 🔴 disputed. --- # CLAUDE.md Hierarchy > CLAUDE.md is persistent project memory loaded automatically by Claude Code. A three-level hierarchy (user → project → directory) lets you set global defaults, team standards, and per-directory rules. Project-level files are version-controlled and shared. Use @import to keep CLAUDE.md modular. **Domain:** D3 · Agent Operations (20% of CCA-F exam) **Canonical:** https://claudearchitectcertification.com/concepts/claude-md-hierarchy **Last reviewed:** 2026-05-04 ## Quick stats - **Hierarchy levels:** 3 - **File locations:** 2 - **Import syntax:** @import - **Exam domain:** D3 - **Hard rule limit:** , ## What it is CLAUDE.md is a markdown file that gives Claude Code persistent project memory. It loads automatically at session start and is appended to the system context. Three scopes: user (~/.claude/CLAUDE.md), project (.claude/CLAUDE.md or root CLAUDE.md), and directory (per-folder file that loads when editing inside it). Project files are version-controlled and shared with your team. @import lets you split rules into modular files. ## Exam-pattern questions ### Q1. A teammate's PRs use a different code style than yours. They have rules in ~/.claude/CLAUDE.md but not in the repo. Fix? Move team rules to .claude/CLAUDE.md (project root, version-controlled). User-level rules are personal; only project rules are shared via git. Team standards belong in the repo. ### Q2. You add a rule to .claude/CLAUDE.md mid-session. Claude doesn't follow it. Why? CLAUDE.md is loaded once at session start. Mid-session edits don't take effect until restart. End the session, restart, and the new rules apply. ### Q3. Your testing rule file sits in .claude/rules/testing.md but doesn't auto-load when editing test files. What's missing? Check the YAML frontmatter paths glob. Make sure it matches your file paths exactly. Try paths: ["/*.test.tsx", "/*.spec.ts"] for broad coverage. ### Q4. You have @import ./shared.md in two CLAUDE.md files, and shared.md has @import ../main.md. Claude Code hangs on load. Why? Circular import. Claude Code does not detect cycles. Refactor: don't have child rule files import their parents. Keep the import graph acyclic. ### Q5. A new contributor follows the README but writes code that violates project conventions. They don't read CLAUDE.md. Architectural fix? CLAUDE.md auto-loads when they use Claude Code, no reading required. If they bypass Claude Code (write by hand), enforce conventions via linters and CI checks. CLAUDE.md complements but doesn't replace tooling. ### Q6. Your project root CLAUDE.md is 1,500 lines and Claude's responses are getting slower. Why? CLAUDE.md is appended to every prompt. 1,500 lines is too much. Modularize: move detailed rules to .claude/rules/*.md with paths globs. Keep root CLAUDE.md to ~300 lines of universal team rules. ### Q7. A path-scoped rule file's paths: ["src//*.ts"] matches but the rules don't apply to a file in src/lib/utils.ts. What's wrong? Glob syntax. matches multiple directory levels, so src//*.ts should match src/lib/utils.ts. If not, check for typos in the glob or restart Claude Code. ### Q8. You want different rules for api/ (backend) and app/ (frontend). Best structure? Two path-scoped rule files: .claude/rules/backend.md with paths: ["api/"], .claude/rules/frontend.md with paths: ["app/"]. Keep root CLAUDE.md for shared rules. Each rule auto-loads only when relevant. ## FAQ ### Q1. If I have project + directory CLAUDE.md, which wins? Both load. Directory rules apply only inside that directory; project rules apply everywhere. They are scoped, not conflicting. ### Q2. What goes in user-level CLAUDE.md? Personal preferences that don't vary by project, your indent style, commit message format. Not team standards. ### Q3. Does CLAUDE.md affect performance? Slightly, it appends to every prompt. Keep it concise; use @import if it grows past ~1000 lines. --- **Source:** https://claudearchitectcertification.com/concepts/claude-md-hierarchy **Vault sources:** ACP-T03 §4.3; GAI-K04 §3; ASC-A01 Courses 2 + 4 **Last reviewed:** 2026-05-04 **Evidence tiers**, 🟢 official Anthropic doc / API contract · 🟡 partial doc / inferred · 🟠 community-derived · 🔴 disputed. --- # Plan Mode > Plan Mode forces Claude Code to draft and approve a multi-step plan before executing. Use it for non-trivial changes (3+ steps); skip for trivial edits. The exam pattern: when do you require Plan Mode vs. direct execution. Full content in SCRUM-21 follow-up. **Domain:** D3 · Agent Operations (20% of CCA-F exam) **Canonical:** https://claudearchitectcertification.com/concepts/plan-mode **Last reviewed:** 2026-05-04 ## Quick stats - **Trigger threshold:** 3+ steps - **Exam domain:** D3 - **Decision factor:** blast radius - **Coverage tier:** B - **Skip case:** trivial edit ## Exam-pattern questions ### Q1. Single-file null-check: should you use plan mode? No. Plan mode has overhead. Reserve for 3+ step architectural decisions where there are multiple valid approaches and rework would be expensive. A clear bug-fix uses direct execution. ### Q2. 30-file refactor: direct execution or plan mode? Plan mode is mandatory. Blast radius too large to wing it. Without upfront analysis, you commit to a suboptimal architecture mid-refactor and rework hours of work. Plan mode's iteration cost (milliseconds) replaces rework cost (hours). ### Q3. The plan looks bad. Should you execute and fix the result later? Reject the plan and iterate (zero cost, same session). Tell Claude what's wrong; Claude revises. Executing a bad plan and reworking is expensive. Plan mode's value is in iteration cost being free, not in execution being safe. ### Q4. You're working in an unfamiliar codebase (different framework, different conventions). Plan mode optional? Mandatory. Unfamiliar code means hidden interdependencies. Plan mode makes them visible: Claude explores, identifies risks, proposes a strategy respecting framework conventions. Direct execution in unfamiliar code is a minefield. ### Q5. Can you switch from direct execution to plan mode mid-task? Plan mode is a mode, not a patch. Switching mid-work commits to prior decisions made in direct mode. For complex tasks, start in plan mode. Switching late captures only future steps; the architectural commitments are already made. ### Q6. Plan mode in CI/CD pipelines: viable? No. Plan mode requires human approval (a decision gate). Only for interactive development. For automated CI/CD refactors, use direct execution with comprehensive testing. ### Q7. Plan mode and the -p flag: same purpose? Different. -p makes Claude Code non-interactive (CI/CD). Plan mode is interactive approval-gated. Different tools for different goals: plan mode for refactoring, -p for headless automation. ### Q8. Codebase is too large for plan mode to read entirely. How do you scope? Scope the request. Instead of "refactor the whole monolith," say "refactor the customer domain into a microservice." Plan mode explores only the relevant parts. For massive codebases, pre-summarize architecture in ARCHITECTURE.md. --- **Source:** https://claudearchitectcertification.com/concepts/plan-mode **Vault sources:** ACP-T03 §4.3 plan mode; GAI-K04 §15 decision table; ASC-A01 Course 4 **Last reviewed:** 2026-05-04 **Evidence tiers**, 🟢 official Anthropic doc / API contract · 🟡 partial doc / inferred · 🟠 community-derived · 🔴 disputed. --- # Skills > Skills are reusable, on-demand task workflows defined in markdown with YAML frontmatter. Unlike CLAUDE.md (always loaded), skills load only when Claude matches the description to the current task. Skills can restrict tool access and accept arguments. **Domain:** D3 · Agent Operations (20% of CCA-F exam) **Canonical:** https://claudearchitectcertification.com/concepts/skills **Last reviewed:** 2026-05-04 ## Quick stats - **Frontmatter fields:** 5 - **Scopes:** 2 - **Load pattern:** on-demand - **Exam domain:** D3 - **Context isolation:** fork ## What it is Skills are markdown files that teach Claude Code how to handle specific, repetitive tasks. Each has a YAML frontmatter (name, description, allowed-tools, argument-hint, context). The description drives Claude's matching, if you ask 'review this PR,' Claude matches the PR-review skill and loads it. Project skills (.claude/skills/) ship with the team; user skills (~/.claude/skills/) follow you. ## Exam-pattern questions ### Q1. Your skill description is "helps with code" and Claude activates it for every request. Fix? Description is too generic. Rewrite as task-specific: "validates Python security patterns; use when reviewing code for SQL injection or hardcoded secrets." Specific descriptions enable accurate matching. ### Q2. A skill has allowed-tools: Read, Grep, Glob, Bash and you ask Claude to edit a file. It refuses. Why? Skills enforce allowed-tools at activation time. While the skill is active, Claude can only use those tools. Either add Edit to allowed-tools, or invoke a different skill that has Edit. ### Q3. You wrote a 2,000-line SKILL.md. Claude responses are slow when the skill is active. Better structure? Use progressive disclosure. Keep SKILL.md under 500 lines. Move detailed references to references/, examples to examples/, scripts to scripts/. Link from SKILL.md: "if user asks about X, read references/guide.md." ### Q4. Your skill works on your machine but a teammate can't trigger it. Where's the file? Personal skills (~/.claude/skills/) don't sync. Move it to .claude/skills/ (project, version-controlled). Commit and push. Teammates pull and the skill is available. ### Q5. Two skills match the same request. Which one activates? Both load into context simultaneously. If their instructions conflict, results are unpredictable. Make descriptions non-overlapping: each skill should describe a unique task with no semantic overlap. ### Q6. A skill has scripts in scripts/ but they don't execute when invoked. Why? Scripts must be marked executable (chmod +x) and the SKILL.md must reference them with the correct relative path. Test the script standalone first; then verify SKILL.md instructions are clear about when/how to run it. ### Q7. When does a skill trump CLAUDE.md, and when does CLAUDE.md trump a skill? CLAUDE.md is always-loaded, applying to every conversation. Skills load on-demand when their description matches. They're complementary: CLAUDE.md for universal team standards; skills for repeated task workflows. Neither trumps the other; they layer. ### Q8. Your CI pipeline invokes Claude with --skill ci-review but the skill isn't activated. What's wrong? The flag must match the skill's name field exactly (lowercase, hyphenated). Check the SKILL.md frontmatter. Also verify the skill file is in .claude/skills/ (not ~/.claude/skills/ for project-scoped CI). ## FAQ ### Q1. What does context: fork do? Runs the skill in an isolated subagent and returns a summary. Keeps the main conversation clean. ### Q2. Can I invoke a skill explicitly? Yes, type @skill skill-name or just the name. Most flows rely on auto-detection via description. ### Q3. Do subagents inherit my skills? No, list them in the subagent's frontmatter skills field if you want them available. --- **Source:** https://claudearchitectcertification.com/concepts/skills **Vault sources:** ACP-T03 §4.3 skills; GAI-K04 §6 frontmatter; ASC-A01 Course 15 **Last reviewed:** 2026-05-04 **Evidence tiers**, 🟢 official Anthropic doc / API contract · 🟡 partial doc / inferred · 🟠 community-derived · 🔴 disputed. --- # Checkpoints & Session Management > Checkpoints save and restore conversation state. Vault coverage is thin; full authoring requires Phase 6 research. **Domain:** D3 · Agent Operations (20% of CCA-F exam) **Canonical:** https://claudearchitectcertification.com/concepts/checkpoints **Last reviewed:** 2026-05-04 ## Quick stats - **Coverage tier:** C - **Exam domain:** D3 - **Status:** stub - **Vault depth:** thin - **Action:** research ## Exam-pattern questions ### Q1. Session crashes at turn 12. You restart fresh. What did you lose? All agent context: prior reasoning, accumulated facts, learned patterns. Files survive but agent context does not. Checkpoints preserve the full message thread, so resuming is precision-perfect. ### Q2. Long task: should you summarize at each checkpoint to save tokens? No. Summarization erases precision. Checkpoints preserve the full message list (verbatim). The exam tests this distinction repeatedly: summarization is the opposite of checkpointing. ### Q3. Checkpoints and fork_session: same pattern? Different. Checkpoints resume linearly (A → A+1 → A+2). fork_session branches from a common baseline (A → A1 AND A2). Linear resumption vs branching exploration. ### Q4. Two checkpoints from different sessions, can you merge them? No. Checkpoints are single-session snapshots. Merging is unsupported and corrupts context. To combine work, use manual synthesis or explicit hand-offs (output of A as input to B). ### Q5. When should you save a checkpoint? At meaningful milestones: daily boundaries, after a major phase completes, before a risky decision, after human-in-the-loop approval. For a 30-turn project, 3-5 checkpoints (one per phase or week). ### Q6. Checkpoint loaded; can you edit it before continuing? Technically yes, but don't. Editing corrupts message sequences and breaks Claude's understanding. If you need to correct something, load the checkpoint, acknowledge the issue in a new message, and continue. Claude adjusts. ### Q7. Restoring a checkpoint resets the conversation cost? No. Restoring is one logical session (no reset). Tokens are charged for the restored messages as if part of one continuous session. Plan accordingly: a 30-turn checkpoint costs 30 turns to resume. ### Q8. Can checkpoints be used with the Batch API? No. The Batch API processes asynchronously and returns results; it's not a continuous agent session. Checkpoints are for interactive multi-turn agent work that spans days or sessions. --- **Source:** https://claudearchitectcertification.com/concepts/checkpoints **Vault sources:** ASC-A01 Course 3 checkpoints **Last reviewed:** 2026-05-04 **Evidence tiers**, 🟢 official Anthropic doc / API contract · 🟡 partial doc / inferred · 🟠 community-derived · 🔴 disputed. --- # System Prompts & Instructions > System prompts establish role, constraints, format, and tool guidance. Anatomy: role, capability boundaries, style/format rules, tool guidance, examples. Full content in SCRUM-21 follow-up. **Domain:** D4 · Prompt Engineering (20% of CCA-F exam) **Canonical:** https://claudearchitectcertification.com/concepts/system-prompts **Last reviewed:** 2026-05-04 ## Quick stats - **Anatomy parts:** 5 - **Exam domain:** D4 - **Coverage tier:** B - **Common trap:** vague role - **Best length:** concise ## Exam-pattern questions ### Q1. System prompt says "be conservative with refunds." Production shows 5% policy violations. Why? Natural-language guidance is advice the model ignores under pressure. Encode the policy in tool code (if amount > 500: deny). System prompt = macro guidance; tools = enforcement. ### Q2. Vague system prompt vs precise five-section charter: how much accuracy difference? 30-60% reduction in misrouting and off-policy behavior. A precise charter (role, boundaries, format, tool guidance, examples) outperforms "be helpful" by orders of magnitude. Every section is load-bearing. ### Q3. Update the system prompt mid-conversation: what happens? Each call re-reads the system prompt; the new prompt takes effect on the next call. Creates inconsistency mid-conversation: turn 5 with old prompt vs turn 6 with new prompt produces shifting behavior. Define the full charter upfront. ### Q4. System prompt is 5,000 tokens long. What's the issue? Attention dilution. Tight 500-1500 token charters outperform verbose. The model allocates less attention per section when the prompt bloats. Every section must be load-bearing; prune the rest. ### Q5. Why are example patterns the highest-ROI section of a system prompt? A 2-3 example before/after pair reduces misclassification by 30-60%. Examples show Claude exactly what success looks like in concrete cases. Vague descriptions can't substitute for one worked example. ### Q6. System prompt + tool descriptions: which takes precedence on routing? Tool descriptions take precedence (structural enforcement via SDK). System prompt is linguistic guidance. Align them to avoid confusion: prompt says "verify first"; tool description says "verify_customer: Use first." Redundant but safe. ### Q7. Cache the system prompt: when is it not worth it? Below ~1024 tokens (cache overhead exceeds savings) or for one-off requests (no repeat reads). For loops repeating the same system + tools, caching saves ~88% on those tokens per turn. ### Q8. Use a system prompt instead of an agentic loop for complex tasks: viable? No. System prompt defines role and rules. Agentic loop is the control structure (the while block that re-sends messages on tool_use). They're orthogonal; you need both. --- **Source:** https://claudearchitectcertification.com/concepts/system-prompts **Vault sources:** ACP-T03 §4.4 prompt engineering; ACP-T04 §6.B system prompt anatomy; ASC-A01 Course 6 **Last reviewed:** 2026-05-04 **Evidence tiers**, 🟢 official Anthropic doc / API contract · 🟡 partial doc / inferred · 🟠 community-derived · 🔴 disputed. --- # Prompt Caching > Prompt caching reduces cost (~90%) on repeated context like long system prompts and tool definitions. Cache breakpoints, TTL, and cache_control field placement are exam patterns. Full content in SCRUM-21 follow-up. **Domain:** D4 · Prompt Engineering (20% of CCA-F exam) **Canonical:** https://claudearchitectcertification.com/concepts/prompt-caching **Last reviewed:** 2026-05-04 ## Quick stats - **Cost reduction:** ~90% - **Exam domain:** D4 - **Coverage tier:** B - **Trigger:** repeated context - **Field:** cache_control ## Exam-pattern questions ### Q1. Cache the entire message history to speed up long conversations: works? No. The message list grows every turn (new user + assistant messages). Caching requires immutable content. Cache the system prompt and tool definitions, not the messages. ### Q2. Daily news content marked cache_control: ephemeral. Why is this wrong? Ephemeral caching is for content that doesn't change within 5 minutes. Daily content causes cache misses on every change. Cache only immutable content; daily-update content gets no benefit. ### Q3. Cached content from one API key leaks to another. Security issue? No, this can't happen. Caching is per API key, per conversation. Each conversation gets its own cache. Cache isolation is a security feature. ### Q4. After 5 minutes the cache automatically extends? No. After 5 minutes of no access, the cache expires completely. The next call re-reads at full token cost. Plan for cache expiry in long-running loops. ### Q5. How much does caching save on a 1000-token system prompt called 10 times? ~81-88%. First call: 1000 tokens (full). Calls 2-10: 100 each (10% of cache price). Total: 1000 + 900 = ~1900 tokens vs 10,000 fresh. ### Q6. Caching vs the Batch API: same purpose? Different. Caching: 90% on reused content within a 5-min window (instant). Batch API: 50% on all tokens (24-hour wait). Caching for interactive loops; Batch for async bulk. ### Q7. Tool definitions: cacheable like the system prompt? Yes. Tool definitions are stable across turns. Mark with cache_control: ephemeral (or rely on SDK auto-caching). Combined with cached system prompt, saves ~90% on the fixed prefix per turn. ### Q8. Caching with subagents: each subagent gets its own cache? Yes. Each subagent caches its own system prompt and tools. Caches are separate (per subagent). Subagent A's cache doesn't affect subagent B. --- **Source:** https://claudearchitectcertification.com/concepts/prompt-caching **Vault sources:** ACP-T03 §5 prompt caching; ACP-T04 prompt-caching-optimization; ASC-A01 Course 6 **Last reviewed:** 2026-05-04 **Evidence tiers**, 🟢 official Anthropic doc / API contract · 🟡 partial doc / inferred · 🟠 community-derived · 🔴 disputed. --- # Batch API > Message Batches API: 50% discount for async, non-time-sensitive workloads. Vault coverage thin; needs Phase 6 research. **Domain:** D4 · Prompt Engineering (20% of CCA-F exam) **Canonical:** https://claudearchitectcertification.com/concepts/batch-api **Last reviewed:** 2026-05-04 ## Quick stats - **Discount:** 50% - **Exam domain:** D4 - **Coverage tier:** C - **Status:** stub - **Action:** research ## Exam-pattern questions ### Q1. Use Batch API for a CI/CD pre-merge check that must block the PR. Viable? No. Batch processes within 24 hours, not immediately. Pre-merge needs synchronous responses (minutes). Use the standard Messages API for latency-sensitive workflows. ### Q2. Customer-facing chatbot uses Batch API. What goes wrong? 24-hour latency is unacceptable for interactive UX. The customer waits 24 hours for a response; they leave. Use synchronous API for real-time. Batch is for no one waiting. ### Q3. Batch supports multi-turn tool calling, right? No. Batch is single-turn per request. No tool continuation, no agentic loops, no streaming. Each JSONL line is a separate messages.create() request. For tool loops, use synchronous API. ### Q4. 50% savings means always use Batch. True? No. 50% savings only justify 24-hour latency if no one is waiting. For interactive tasks, the cost of waiting (user frustration, business loss) exceeds the 50% savings. Batch is for asynchronous workloads only. ### Q5. Batch API processes 50,000 requests faster than synchronous? No, cheaper not faster. Batch processes within 24 hours; synchronous in parallel finishes in minutes. If you need speed, use synchronous + concurrency. If you need cost, use Batch. ### Q6. What's the maximum batch size? Up to 10,000 requests per batch. For larger workloads, split into multiple batches. Each batch is independent; submit them in parallel for higher throughput. ### Q7. How do you correlate requests with results? Use the custom_id field. Each request defines a custom_id; the result includes the same custom_id. Match them to pair request with response. Without custom_id, results are unordered and untraceable. ### Q8. Can you cancel a batch after submitting? No. Once submitted, the batch runs to completion. Plan carefully before submitting; cancellation is not supported. Test on small batches first, then scale to 10,000. --- **Source:** https://claudearchitectcertification.com/concepts/batch-api **Vault sources:** ACP-T03 §5 batches **Last reviewed:** 2026-05-04 **Evidence tiers**, 🟢 official Anthropic doc / API contract · 🟡 partial doc / inferred · 🟠 community-derived · 🔴 disputed. --- # Structured Outputs > Structured outputs guarantee Claude returns JSON matching a schema instead of natural language. Force a tool with a JSON schema; the API enforces structure, not your parser. Schema design (nullable fields, 'unclear' enum values) prevents fabrication. **Domain:** D4 · Prompt Engineering (20% of CCA-F exam) **Canonical:** https://claudearchitectcertification.com/concepts/structured-outputs **Last reviewed:** 2026-05-04 ## Quick stats - **Anti-fabrication patterns:** 3 - **Enforcement:** tool_use - **Schema elements:** 2 - **Exam domain:** D4 - **Guarantee:** 100% ## What it is Structured outputs are JSON responses guaranteed to match a schema by forcing Claude to call a tool instead of returning unstructured text. Define a tool with a JSON schema; set tool_choice to force it. Claude must return valid JSON. This eliminates parsing and makes downstream processing deterministic. Schema design prevents fabrication: nullable fields, 'unclear' enum values, structured claim-source mappings. ## Exam-pattern questions ### Q1. You prompt Claude for JSON and 15% of responses include explanatory text. How do you fix this for production? Stop prompting for JSON; use tool_use with a JSON schema and tool_choice: forced. Token generation is constrained to match the schema; structure is guaranteed. Prompt-only approaches are probabilistic. ### Q2. Your schema has {refund_reason: {type: "string"}}. Claude returns "reason: unable to determine" when the contract is silent. What's wrong? The schema forces a string, so Claude fabricates one when source is silent. Fix: {refund_reason: {type: ["string", "null"]}}. Or add an enum: ["refund", "replacement", "unclear"]. Give Claude a way to say "I don't know." ### Q3. A validation-retry loop runs 5 times and still fails. What's the next step? Don't retry forever. After 2-3 retries, escalate to a human reviewer with the document and last-attempt extraction. Failure to converge usually means the source is genuinely ambiguous, not that Claude is broken. ### Q4. You force tool_choice: forced for extraction. Sometimes Claude returns tool_use with empty input. Why? Forced tool_choice guarantees the tool fires, not that input is valid. Empty input means the source has no extractable data. Validate input in your harness; return a structured error so Claude can ask for more context or escalate. ### Q5. Your extraction tool returns confidence: 0.4 for a critical field. What should production do? Route low-confidence extractions to human review. Low confidence is a feature: it means Claude is honest about uncertainty. Auto-accept high-confidence; queue medium for batch review; escalate low immediately. ### Q6. You use tool_choice: forced with extended_thinking. The API returns 400. Why? Extended thinking is incompatible with forced or any tool_choice. Use auto when thinking is on. If extraction must be guaranteed, drop thinking; if reasoning matters more, accept text fallback. ### Q7. Schema validation passes but the extracted customer_id is "customer_001" when the document says "cus_abc123". What's the gap? Schema enforces structure (string), not content correctness. Add a regex pattern: "pattern": "^cus_[a-z0-9]+$". Or validate semantically in your harness and trigger validation-retry on mismatch. ### Q8. You're using Batch API for 1,000 extractions. 50 fail validation. What's the cost-aware retry strategy? Batch doesn't auto-retry. Submit failures as a new batch with the original document + the validation error. Most converge on the second pass. Truly stubborn cases go to human review. Don't re-run all 1,000. ## FAQ ### Q1. Does forcing a tool guarantee correctness? No. It guarantees the format. Content can still be wrong; validate semantics after. ### Q2. Use tool_choice for every extraction? If downstream code expects JSON, yes. For exploratory work, auto is fine. ### Q3. Can I use tool_choice for nested structures? Yes. Define nested objects in the schema with "type": "object" and "properties". --- **Source:** https://claudearchitectcertification.com/concepts/structured-outputs **Vault sources:** ACP-T03 §4.4; GAI-K04 §18 anti-fabrication schema **Last reviewed:** 2026-05-04 **Evidence tiers**, 🟢 official Anthropic doc / API contract · 🟡 partial doc / inferred · 🟠 community-derived · 🔴 disputed. --- # Vision & Multimodal > Vision lets Claude process images alongside text. Vault coverage thin; needs Phase 6 research. **Domain:** D4 · Prompt Engineering (20% of CCA-F exam) **Canonical:** https://claudearchitectcertification.com/concepts/vision-multimodal **Last reviewed:** 2026-05-04 ## Quick stats - **Input type:** image - **Exam domain:** D4 - **Coverage tier:** C - **Status:** stub - **Action:** research ## Exam-pattern questions ### Q1. Vision request: should you use base64 inline or the Files API? Base64 for <10 images, Files API for batch (50+). Base64 inflates JSON ~33%; Files API stores once and references via file_id, cutting bandwidth and ~40% on token cost for repeated reuse. ### Q2. High-res 4000×3000 images for OCR accuracy. Worth the cost? No. Higher resolution increases token cost linearly with no accuracy gain above 1200×1500. Diminishing returns at ~1000 px width. Downresample for cost; you don't lose precision. ### Q3. Claude returns invalid JSON from a vision extraction. Increase max_tokens? No. Schema-validation failure usually signals ambiguity in the image (poor scan, missing data), not token exhaustion. Retry with clarification or escalate to human review. More tokens won't help. ### Q4. Vision always produces accurate output. True? No. Claude hallucinates fields when tables are ambiguous or images degraded. Always validate JSON against your schema, track confidence per field, escalate low-confidence results. ### Q5. 100-page PDF: send as one mega-image or split by page? Split by page, process each separately. 100 requests but each cheaper than one mega-request (token cost is sublinear per page). Batch process pages in parallel for speed. ### Q6. Image size 1200×1500 vs 1024×768 for invoice extraction. Which? 1200×1500 for dense text and tables; 1024×768 for screenshots or clean documents. Match resolution to information density. Anything above 1500 wastes tokens. ### Q7. Vision and PII (faces, credit cards): can Claude redact in-place? No explicit redaction. Ask Claude to flag PII regions in JSON: {region: "top-left", pii_type: "credit_card"}. Redact client-side with image processing. Don't ship raw images with PII to logs. ### Q8. Token cost of a 1024×768 screenshot vs a 1200×1500 document page? ~400 tokens for the screenshot, ~1000 for the document page. Roughly equal per pixel. Plan token budget by resolution + density. --- **Source:** https://claudearchitectcertification.com/concepts/vision-multimodal **Vault sources:** ACP-T03 capabilities; ASC-A01 Course 6 vision **Last reviewed:** 2026-05-04 **Evidence tiers**, 🟢 official Anthropic doc / API contract · 🟡 partial doc / inferred · 🟠 community-derived · 🔴 disputed. --- # Streaming > Streaming returns tokens as they're generated for low-latency UX. Vault coverage thin; needs Phase 6 research. **Domain:** D4 · Prompt Engineering (20% of CCA-F exam) **Canonical:** https://claudearchitectcertification.com/concepts/streaming **Last reviewed:** 2026-05-04 ## Quick stats - **Use case:** low latency - **Exam domain:** D4 - **Coverage tier:** C - **Status:** stub - **Action:** research ## Exam-pattern questions ### Q1. Streaming reduces token cost: true? No. Streaming costs the same per token. It's a UX feature (responsive text), not a cost optimization. The total tokens generated and billed are identical to non-streaming. ### Q2. Stream connection drops mid-response. Retry the entire request? No. The drop doesn't lose what was already received. The client retains the buffered text. Retrying sends a new prompt and wastes tokens on duplicate work. Log the error, decide based on context. ### Q3. Tool_use blocks stream character-by-character: parse JSON as it arrives? No. Tool_use blocks arrive complete (in ContentBlockStart) or as chunked input_json_delta events that must be fully accumulated before parsing. Parsing partial JSON fails. ### Q4. Display every event to the user immediately: good UX? No. Some events (MessageStart) are metadata, not displayable. Filter: display only ContentBlockDelta.text. Meta events go to logging or internal state tracking. ### Q5. Cancel a streaming request by closing the connection: stops the bill? No. Closing stops receiving, but the request is still processed server-side and billed up to that point. Cancellation is not a cost-saving mechanism; budget for completion before opening the stream. ### Q6. Latency of the first streamed token? ~100-200ms from request to first ContentBlockDelta.text event. Similar to non-streaming first-token time. Streaming optimizes time-to-screen, not time-to-first-token. ### Q7. Streaming response shorter than 1 second: noticeable UX improvement? No. Responses >3-5 seconds become noticeably more responsive with streaming. Shorter (<1 sec) shows negligible improvement. Use streaming for long-form responses, not quick queries. ### Q8. Emit streamed chunks to a browser client: which protocol? Server-Sent Events (SSE). Server: open /stream endpoint, iterate Claude stream, response.write(event) each chunk. Client: const es = new EventSource('/stream'); es.onmessage = .... Simple, reliable, browser-native. --- **Source:** https://claudearchitectcertification.com/concepts/streaming **Vault sources:** ACP-T03 capabilities **Last reviewed:** 2026-05-04 **Evidence tiers**, 🟢 official Anthropic doc / API contract · 🟡 partial doc / inferred · 🟠 community-derived · 🔴 disputed. --- # Attention Engineering > Attention engineering is the discipline of placing critical context where the model attends most strongly, high in the prompt, in the system message, or in repeated facts blocks. Full content in SCRUM-21 follow-up. **Domain:** D4 · Prompt Engineering (20% of CCA-F exam) **Canonical:** https://claudearchitectcertification.com/concepts/attention-engineering **Last reviewed:** 2026-05-04 ## Quick stats - **Strong-attention zones:** 3 - **Exam domain:** D4 - **Coverage tier:** B - **Trap:** buried context - **Pattern:** place high ## Exam-pattern questions ### Q1. Critical fact buried in paragraph 5 of context. Agent misses it 40% of the time. Why? Lost-in-the-Middle effect. Mid-context positions get ~40-50% attention vs 90%+ at start/end. Move the fact to the top (system prompt or CASE_FACTS block) or end (current question). Position trumps content. ### Q2. Bold or capitalize important facts to boost attention? No. The transformer doesn't parse Markdown or capitalization. customer_id: 12345 and customer_id: 12345 have same attention weight. Position is what matters. ### Q3. Add important context at the very end for maximum emphasis? Recency bias helps, but only for the final question/request. Context placed at the end is treated as supporting detail, not fact. System prompt is always position 0 for constraints; case-facts at top of user message for transactional data. ### Q4. Put all context in the system prompt to ensure it's always attended? No. System prompts are limited (~2000 tokens effective); they're for role and constraints, not transactional facts. Use system prompt for rules; CASE_FACTS block in user message for facts. ### Q5. Lost-in-the-Middle is a myth: it doesn't affect Claude? False. Empirically documented (Liu et al. 2023) and replicated across all transformers including Claude. Mid-context accuracy drops 40-50%. Mitigate with structural changes, not by ignoring the effect. ### Q6. Agent forgets a key fact despite it being in context. Increase max_tokens? No. Forgetting is an attention-weight issue, not a token budget issue. More tokens won't help. Restructure: move the fact to the top or end. ### Q7. When should you start windowing in an agentic loop? After 4-6 turns (8-12 messages). Beyond that, early turns degrade. Optimal: window at turn 5 or when message list >10KB. Pre-emptive is better than reactive. ### Q8. What goes in the system prompt vs the CASE_FACTS block? System prompt: role, constraints, output schema (rules that don't change per instance). CASE_FACTS: customer ID, order ID, amount, dispute summary (transactional facts that change per instance). Both are needed; they layer. --- **Source:** https://claudearchitectcertification.com/concepts/attention-engineering **Vault sources:** ACP-T03 §15 attention engineering **Last reviewed:** 2026-05-04 **Evidence tiers**, 🟢 official Anthropic doc / API contract · 🟡 partial doc / inferred · 🟠 community-derived · 🔴 disputed. --- # 4D Framework > Delegation, Description, Discernment, Diligence, Anthropic's 4D framework for agent prompt design. Vault coverage thin; needs Phase 6 research. **Domain:** D4 · Prompt Engineering (20% of CCA-F exam) **Canonical:** https://claudearchitectcertification.com/concepts/4d-framework **Last reviewed:** 2026-05-04 ## Quick stats - **Pillars:** 4 - **Exam domain:** D4 - **Coverage tier:** C - **Status:** stub - **Action:** research ## Exam-pattern questions ### Q1. Should we use Claude to verify legal text before sending to a client? No, that's wrong Delegation. Legal text requires human authority and accountability. Use Claude to draft for human review. Delegation assigns work to leverage each agent's strengths; lawyers review, Claude drafts. ### Q2. "Be helpful" as the prompt: what's missing? Description (Product, Process, Performance). "Be helpful" is vague. Specify the format (JSON, bullets), the reasoning steps (chain-of-thought), and the tone (friendly advisor, expert reviewer). Vague Description produces vague output. ### Q3. Discernment fails: agent's output is wrong but you ship anyway. What's the meta-failure? Skipping Diligence. Discernment detected the flaw; Diligence demands you don't deploy without addressing it. Either fix (loop back to Description) or escalate to human. Shipping known-flawed output is the opposite of Diligence. ### Q4. Auto-approve refunds <$100 without human review: smart automation or bad practice? Bad practice without Diligence. Even small amounts demand audit logging and periodic human review. Diligence is "I validated this before shipping"; auto-approval without validation skips that step. ### Q5. Description includes Product, Process, Performance. What's the difference? Product: desired output format (JSON schema, markdown, prose). Process: step-by-step instructions for reasoning (chain-of-thought, few-shot examples). Performance: tone, style, role (friendly, expert, advisor). All three together = strong Description. ### Q6. Skip Diligence on low-stakes tasks: ever acceptable? Sometimes. For brainstorming or summarizing a blog post, minimal Diligence is fine. For compliance, financial, legal, healthcare, all four Ds are mandatory. Match Diligence rigor to stakes. ### Q7. The 4D Framework is a technical architecture (like MVC)? No. It's a mental model for human-AI collaboration. Guides how you think about delegating, communicating, validating, deploying. Not a code pattern; a discipline. ### Q8. Discernment detects a flaw: where do you loop back? To Description. Refine the prompt: tighten Product (output format), clarify Process (reasoning steps), adjust Performance (tone). If refinement doesn't help, reconsider Delegation (is this task suited for AI at all?). --- **Source:** https://claudearchitectcertification.com/concepts/4d-framework **Vault sources:** ClaudeCertifications course materials **Last reviewed:** 2026-05-04 **Evidence tiers**, 🟢 official Anthropic doc / API contract · 🟡 partial doc / inferred · 🟠 community-derived · 🔴 disputed. --- # Prompt Engineering Techniques > Prompt engineering is a craft of seven techniques that turn a brittle one-shot prompt into a production-grade contract: few-shot examples, iterative refinement against an eval suite, anti-fabrication schemas, structured templates, explicit do-don't constraints, output anchoring via tools, and test-driven prompt edits. The exam trap is treating any one technique as 'the answer'; production prompts compose all seven. **Domain:** D4 · Prompt Engineering (20% of CCA-F exam) **Canonical:** https://claudearchitectcertification.com/concepts/prompt-engineering-techniques **Last reviewed:** 2026-05-04 ## Quick stats - **Core techniques:** 7 - **Iterations to converge:** 5-15 - **Eval suite size:** 20-50 cases - **Few-shot example sweet spot:** 2-5 pairs - **Exam domain:** D4 ## What it is Prompt engineering is the discipline of shaping Claude's output by writing a prompt, measuring it against test cases, observing where it fails, and tightening the prompt until the eval score plateaus. It is not writing one clever sentence. A production prompt converges over 5-15 iterations against a frozen suite of 20-50 cases, of which 3-5 are known-failure cases that anchor the regression boundary. Per ASC-A01 Course 11 §16, prompt v1 scoring 7.66 climbs to 8.7 in v2 only because the eval gave a quantitative delta to chase. The seven load-bearing techniques are few-shot prompting (showing 2-5 input-output pairs so Claude can pattern-match the harder cases), iterative refinement (the Draft → Test → Observe → Refine → Anchor loop), anti-fabrication patterns (nullable fields, unclear enum values, citation-required answers), structured templates (role + objective + tone + tools + constraints + format + escalation), constraint elicitation (concrete do-don't lists with worked examples), output anchoring (force a tool_use schema, do not ask 'output JSON' in prose), and test-driven prompting (every change validated against a frozen eval suite, regression-free or roll back). The reason every Domain 4 question on the exam is a composition question, never a single-technique question, is that real production prompts are layered. A refund agent uses a structured system-prompt template, three few-shot examples covering the sarcasm-style edge case, an unclear enum option for the refund-reason field, a forced tool_choice for output anchoring, and a 50-case eval that gates every prompt PR. Strip any one layer and the leak rate climbs above 5%. Per ACP-T03 §4.4, natural-language 'output JSON' prompts leak structure ~15% of the time; tool-anchored prompts leak 0%. ## How it works Iterative refinement is the spine. You start with a baseline prompt that almost works, build a small eval set (20-50 cases is the sweet spot, of which 3-5 are deliberate failure cases you have already seen Claude get wrong), and run the prompt against the suite to get a score. Score is the only signal that matters; subjective 'this feels better' is theater. Per ASC-A01 Course 12 §22, the score itself isn't inherently good or bad. What matters is whether you can improve it by refining your prompts. Each refinement targets the lowest-scoring case, you write a tighter instruction or add an example, then re-run the entire suite to make sure no other case regressed. Few-shot prompting is the highest-ROI lever inside the loop. Per ASC-A01 Course 12 §29, you wrap each example in and XML tags, you choose 2-5 examples that cover the failure modes (sarcasm, ambiguity, edge formats), and you optionally add a one-line explanation of *why* the example is ideal. A 2-3 example before-after pair reduces misclassification by 30-60%, the single biggest single-edit accuracy gain you can make. The mistake is using examples Claude already gets right; always pick examples that match the cases you have observed Claude fail on. Output anchoring and anti-fabrication are where natural-language prompting reaches its ceiling. Asking 'please output JSON' in prose leaks structure roughly 15% of the time under load. The fix is tool_use with a JSON schema and tool_choice: forced, the API constrains token generation to match the schema, so structure is guaranteed at 100%. Fabrication is a separate problem, schemas guarantee shape, not truth. Per ACP-T03 §4.4 and the structured-data-extraction scenario doc, when a source is genuinely silent, give the model two honest exits: nullable fields (type: ['string', 'null']) and an unclear / not_provided enum value. Without these escape hatches, required-string fields force fabrication and the leak rate climbs above 5%. ## Where you'll see it in production ### Refund agent prompt PR gate Every prompt change PR runs against a 60-case golden suite (20 happy path, 20 edge, 10 escalations, 10 known-failure). The CI fails the PR if the average score drops or any previously-passing case regresses. Per ASC-A01 Course 11 §15, the alternative, 'test once and decide it's good enough', carries a significant risk of breaking in production. Three iterations typically take a prompt from 80% to 95%. ### Sarcasm-aware sentiment classifier A social-listening team's first prompt scored 71% on a sarcasm-heavy suite. Adding three / pairs that explicitly covered Plan-9-style ironic praise lifted the score to 88%. The team picked the examples directly from the eval's lowest-scoring cases, which per ASC-A01 Course 12 §29 is the canonical way to source few-shot examples. No model upgrade required. ### Contract-extraction anti-fabrication A legal-ops extractor was fabricating termination_clause text when the contract was silent. The fix was schema-level, not prompt-level: nullable strings plus an unclear enum option in the tool input schema. Per the structured-data-extraction scenario doc, fabrication rate dropped from 8% to under 0.5% once the model had an honest exit. The system prompt remained almost unchanged. ## Code examples ### Few-shot prompt with iterative-eval harness **Python:** ```python from anthropic import Anthropic import json client = Anthropic() SYSTEM = """You classify tweet sentiment as positive, negative, or unclear. DO emit one of: positive | negative | unclear. DON'T add commentary or explanation. DON'T pick positive when the tone is sarcastic. Here are example input-output pairs: I love how my flight was delayed three hours. negative Best coffee I've had all week, genuinely. positive idk what to make of this product unclear""" def classify(tweet: str) -> str: resp = client.messages.create( model="claude-opus-4-5", max_tokens=10, system=SYSTEM, messages=[{"role": "user", "content": tweet}], ) return resp.content[0].text.strip().lower() def run_eval(cases_path: str) -> float: cases = [json.loads(line) for line in open(cases_path)] correct = sum(1 for c in cases if classify(c["tweet"]) == c["expected"]) score = correct / len(cases) print(f"Score: {score:.2%} ({correct}/{len(cases)})") return score # Iterate: edit SYSTEM, re-run, compare. Ship only on regression-free improvement. v1_score = run_eval("sentiment_cases.jsonl") ``` > Few-shot pairs cover the sarcasm failure case explicitly. The eval harness reports a single number per prompt version; iterate until the score plateaus. **TypeScript:** ```typescript import Anthropic from "@anthropic-ai/sdk"; import { readFileSync } from "node:fs"; const client = new Anthropic(); const SYSTEM = `You classify tweet sentiment as positive, negative, or unclear. DO emit one of: positive | negative | unclear. DON'T add commentary or explanation. DON'T pick positive when the tone is sarcastic. I love how my flight was delayed three hours. negative Best coffee I've had all week, genuinely. positive idk what to make of this product unclear`; async function classify(tweet: string): Promise { const resp = await client.messages.create({ model: "claude-opus-4-5", max_tokens: 10, system: SYSTEM, messages: [{ role: "user", content: tweet }], }); const block = resp.content[0]; return block.type === "text" ? block.text.trim().toLowerCase() : ""; } async function runEval(casesPath: string): Promise { const cases = readFileSync(casesPath, "utf8") .trim() .split("\n") .map((l) => JSON.parse(l) as { tweet: string; expected: string }); let correct = 0; for (const c of cases) { const out = await classify(c.tweet); if (out === c.expected) correct += 1; } const score = correct / cases.length; console.log(`Score: ${(score * 100).toFixed(1)}% (${correct}/${cases.length})`); return score; } await runEval("sentiment_cases.jsonl"); ``` > Same shape in TypeScript. Each prompt revision becomes a new SYSTEM constant; you compare scores side-by-side. ### Output anchoring with anti-fabrication schema **Python:** ```python from anthropic import Anthropic client = Anthropic() # Anti-fabrication: nullable + 'unclear' enum exit. EXTRACT_TOOL = { "name": "extract_refund_decision", "description": "Extract refund decision from a customer ticket.", "input_schema": { "type": "object", "properties": { "decision": { "type": "string", "enum": ["approve", "deny", "escalate", "unclear"], }, "refund_reason": {"type": ["string", "null"]}, "amount_usd": {"type": ["number", "null"]}, "confidence": {"type": "number", "minimum": 0, "maximum": 1}, }, "required": ["decision", "confidence"], }, } def extract(ticket: str) -> dict: resp = client.messages.create( model="claude-opus-4-5", max_tokens=512, tools=[EXTRACT_TOOL], tool_choice={"type": "tool", "name": "extract_refund_decision"}, messages=[{"role": "user", "content": ticket}], ) for block in resp.content: if block.type == "tool_use": return block.input raise RuntimeError("forced tool_choice did not fire") # 'unclear' + nullable fields let the model say 'I don't know' honestly. result = extract("Customer wants a refund. No order ID provided.") # Expected: {"decision": "escalate", "refund_reason": null, "amount_usd": null, "confidence": 0.3} ``` > Tool-anchored output is 100% structured. The 'unclear' enum and nullable fields cut fabrication on silent sources from 8% to under 0.5%. **TypeScript:** ```typescript import Anthropic from "@anthropic-ai/sdk"; const client = new Anthropic(); const EXTRACT_TOOL: Anthropic.Tool = { name: "extract_refund_decision", description: "Extract refund decision from a customer ticket.", input_schema: { type: "object", properties: { decision: { type: "string", enum: ["approve", "deny", "escalate", "unclear"], }, refund_reason: { type: ["string", "null"] }, amount_usd: { type: ["number", "null"] }, confidence: { type: "number", minimum: 0, maximum: 1 }, }, required: ["decision", "confidence"], }, }; async function extract(ticket: string): Promise { const resp = await client.messages.create({ model: "claude-opus-4-5", max_tokens: 512, tools: [EXTRACT_TOOL], tool_choice: { type: "tool", name: "extract_refund_decision" }, messages: [{ role: "user", content: ticket }], }); const tool = resp.content.find( (b): b is Anthropic.ToolUseBlock => b.type === "tool_use", ); if (!tool) throw new Error("forced tool_choice did not fire"); return tool.input; } ``` > Same anti-fabrication pattern in TypeScript. The schema is the contract; prompt-level pleas to 'be honest' are not a substitute. ## Looks-right vs actually-wrong | Looks right | Actually wrong | |---|---| | Adding 'output valid JSON, do not include any other text' to the system prompt to enforce structure. | Natural-language prompting for structure leaks ~15% of the time under load. Per ACP-T03 §4.4, the only structural guarantee is tool_use with a JSON schema and tool_choice: forced. Prompt instructions are advice, not enforcement. | | Spending three days hand-tuning the wording of the system prompt to fix a 12% accuracy gap. | Per ASC-A01 Course 12 §29, adding 2-3 well-chosen few-shot examples typically yields a 30-60% misclassification reduction in a single edit. Prose tweaks deliver diminishing returns; concrete examples deliver step-changes. | | Using Claude itself to grade 100% of your eval cases because it scales better than humans. | Per the evaluation concept (docid #f86eb3), model-based grading is not reproducible across model versions and fails on deterministic checks (tool sequences, JSON schemas). Use code-based grading for anything measurable; reserve model grading for subjective qualities like tone. | | Required-string field for refund_reason so the schema always returns a value the downstream code can parse. | Required strings force fabrication when the source is silent. Per the structured-data-extraction scenario, fabrication climbs above 5% without an unclear enum option or a nullable type. Always give the model an honest exit. | | Iterating the prompt 20+ times against the same 10 hand-picked test cases until the score reaches 100%. | Per the evaluation concept §Q3, that's overfitting. Score plateau on a tiny fixed suite masks production blind spots. Grow the suite to 50+ externally-validated cases and pair with shadow-mode production evals. | ## Decision tree 1. **Are you anchoring the output format?** - **Yes:** Use tool_use with a JSON schema and tool_choice: forced. Structure is guaranteed at 100%, no parsing risk. - **No:** Prompt-only 'output JSON' leaks ~15% under load. Switch to tool-anchored output before iterating further. 2. **Do you have a frozen 20-50 case eval suite, of which 3-5 are known-failure cases?** - **Yes:** Iterate: edit prompt, re-run suite, ship only on regression-free improvement. Per ASC-A01 Course 11 §16, target a measurable score delta per version. - **No:** Build the suite before tuning the prompt. Even a 10-case suite catches 80% of regressions; without it, every change is a coin flip. 3. **Is your accuracy gap concentrated in specific edge cases (sarcasm, ambiguity, format variance)?** - **Yes:** Add 2-3 few-shot / examples that cover those exact cases. Per ASC-A01 Course 12 §29, this is the highest-ROI single edit. - **No:** If the gap is uniform, your problem is the structured template (role + objective + format + constraints), not the examples. Rewrite the system prompt skeleton first. 4. **Is the model fabricating values when the source is silent?** - **Yes:** Schema-level fix: nullable types and an unclear / not_provided enum option. Per the structured-data-extraction scenario, this drops fabrication from 8% to under 0.5%. - **No:** If fabrication only happens under load, check whether your schema requires fields the source rarely contains; relax the required list. ## Exam-pattern questions ### Q1. Your extraction prompt says 'output valid JSON'. 12% of responses include explanatory prose. Best fix? Stop prompting for JSON; use tool_use with a JSON schema and tool_choice: forced. The most-tested distractor is "add 'do not include any text outside the JSON' to the system prompt". Per ACP-T03 §4.4, prompt-only is probabilistic (~15% leak); tool-anchored output is structurally guaranteed (0% leak). ### Q2. Your sentiment classifier misses sarcastic tweets ~30% of the time. The cheapest fix? Add 2-3 few-shot examples that specifically demonstrate sarcasm, wrapped in / XML tags. Per ASC-A01 Course 12 §29, this technique reduces misclassification by 30-60%. The distractor is "upgrade to a larger model"; the right answer is one prompt edit with concrete failure-case examples. ### Q3. A teammate edited the system prompt and shipped to production. A week later, accuracy dropped 6 points. What was missing from their workflow? An eval suite with regression detection. Per ASC-A01 Course 11 §16, every prompt edit must run against a 20-50 case suite to compare scores (e.g. v1 = 7.66, v2 = 8.7) before merging. Distractor: "more A/B testing in production". Right answer: gate prompt PRs on offline eval delta, not on subjective review. ### Q4. Your refund agent's refund_reason field returns fabricated reasons when the contract is silent. Schema-level fix? Make the field nullable AND add an unclear enum option: {type: ['string', 'null'], enum: ['refund', 'replacement', 'unclear', null]}. Per the structured-data-extraction scenario doc, required-string fields force fabrication when the source is silent. The distractor is "add a temperature: 0 setting"; the actual fix is giving the model an honest exit in the schema. ### Q5. Your prompt iterates 47 times and the eval score keeps oscillating between 78% and 82%. What's the underlying problem? The eval set is too small or not golden. Per the evaluation concept page, scores oscillating without a trend means random variance is dominating signal. Distractor: "use a more capable model". Right answer: grow the suite to 50+ externally-validated cases, then iterate. Without golden data, evals measure consistency, not correctness. ### Q6. Your system prompt uses vague guidance like 'be careful with refunds' and policy violations stay at 5%. Why doesn't the model improve when you add 'be especially careful'? Vague descriptions are floor-level guidance the model ignores under load. Per the system-prompts concept (docid #5ca2ae), encode the policy as a do-don't list with concrete examples (e.g. "DO escalate refunds > $500. DON'T auto-approve any refund without an order_id."). Distractor: "add stronger language like NEVER". Right answer: replace adjectives with concrete rules and worked examples. ### Q7. You add a great new few-shot example and the targeted case finally passes, but two unrelated cases regress. What do you do? Roll back, then iterate. Per the evaluation concept (docid #f86eb3), regression-free is a non-negotiable gate. The distractor is "ship anyway because the targeted case was the priority". Right answer: the new example is teaching Claude something that conflicts with the others; either add a counter-example for the regressing cases or rewrite the new example to be more specific. ### Q8. A new prompt scores 92% on your 50-case suite. You ship. Production failures climb. Why was the eval misleading? The suite was overfit to known cases. Per the evaluation concept page §Q3, score plateau on a fixed suite can mask overfitting; production users hit scenarios you didn't design for. Distractor: "you should have raised the bar to 95%". Right answer: pair offline evals with shadow-mode evaluation on real traffic, and grow the golden suite from production failures. ## FAQ ### Q1. Why XML tags for few-shot examples instead of plain bullets? Per ASC-A01 Course 12 §29, and tags create unambiguous boundaries Claude can attend to. Plain bullets blur into the surrounding instructions. ### Q2. How many few-shot examples is too many? Past 5-7 examples you hit attention dilution and token cost without meaningful gains. Curate the set; replace weak examples instead of adding more. ### Q3. Should the eval grader be Claude or code? Use code-based grading for deterministic checks (tool sequence, JSON schema). Use model-based grading only for subjective qualities (tone, completeness). Per the evaluation concept page, code grading is reproducible; model grading is not. ### Q4. Can I skip the eval suite if my use case is internal-only? No. Without an eval, every prompt edit is a coin flip. Even a 10-case suite catches 80% of regressions. Build the suite before you tune the prompt. ### Q5. Where do constraints belong, system prompt or tool descriptions? Hard constraints (refund > $500 escalates) belong in tool code. System prompt restates them as guidance. Per the system-prompts concept, tool descriptions take precedence on routing; system prompt is linguistic guidance. --- **Source:** https://claudearchitectcertification.com/concepts/prompt-engineering-techniques **Vault sources:** ASC-A01 Course 12 §29 providing examples (#70748e); ASC-A01 Course 11 §16 typical eval workflow (#9d53f3); ASC-A01 Course 12 §22 code-based grading (#af5dc2); ACP-T03 §4.4 prompt engineering and anti-fabrication; Scenario: structured-data-extraction (#4fe21b); Concept: evaluation (#f86eb3); Concept: system-prompts (#5ca2ae) **Last reviewed:** 2026-05-04 **Evidence tiers**, 🟢 official Anthropic doc / API contract · 🟡 partial doc / inferred · 🟠 community-derived · 🔴 disputed. --- # Context Window Management > Context window management is how you keep long conversations within limits without dropping critical facts. Patterns: case-facts blocks, progressive summarization, retrieval. Full content in SCRUM-21 follow-up. **Domain:** D5 · Context + Reliability (15% of CCA-F exam) **Canonical:** https://claudearchitectcertification.com/concepts/context-window **Last reviewed:** 2026-05-04 ## Quick stats - **Patterns:** 3 - **Exam domain:** D5 - **Coverage tier:** B - **Trap:** summary loss - **Right answer:** facts block ## Exam-pattern questions ### Q1. Context window resets after each turn? No. Cumulative across all turns. Messages accumulate; budget shrinks. Once you hit 200K total, new requests fail with context_length_exceeded. ### Q2. Streaming reduces context window cost? No. Streaming is a UX feature; it doesn't change token cost. Same total tokens billed, streamed or not. Use streaming for responsiveness, not cost. ### Q3. Windowing at turn 11 of a 12-turn loop. Worth it? Marginal value. You've already paid for turns 1-10 of accumulated context. Optimal: window at turn 5 if the conversation is verbose, or turn 3 if each turn is expensive. Windowing late is reactive; preemptive is better. ### Q4. Subagent receives the parent's full 100KB conversation history. What goes wrong? Context starvation. Subagent wastes 100K of its 200K budget on irrelevant history. Coordinator should extract a 2KB TASK_CONTEXT and pass only that. Subagents do better with less, not more. ### Q5. Increasing max_tokens fixes a context_length_exceeded error? No. 200K is the input budget; max_tokens is the output budget. Increasing output budget doesn't help an input limit. Window the input (summarize old turns into a CASE_FACTS block, drop verbose history). ### Q6. Cache the message history to keep all turns within budget? No. The message list grows every turn; caching requires immutable content. Cache the system prompt and tool definitions (constants); the growing message list is always fresh. ### Q7. 1M-token window models: caching not needed, just use the bigger window? Bigger windows are not free. Cost scales with input tokens. Filling a 1M window costs 5x a 200K window even if it fits. Use a 200K window thoughtfully (case-facts + summary + recent turns) for most production workloads. ### Q8. Optimal windowing target: at what % of capacity? 50-60% capacity. If your loop generates 20K tokens/turn, window at turn 5 (100K used) to have 100K left for the next 5 turns. Below 50% wastes capacity; above 70% risks running out before the next window cycle. --- **Source:** https://claudearchitectcertification.com/concepts/context-window **Vault sources:** ACP-T03 §5 context; GAI-K04 §9 **Last reviewed:** 2026-05-04 **Evidence tiers**, 🟢 official Anthropic doc / API contract · 🟡 partial doc / inferred · 🟠 community-derived · 🔴 disputed. --- # Case Facts Block > A case-facts block is an immutable set of transactional data (customer ID, order details, refund amount, policy limits) included at the top of every prompt during a multi-turn conversation. Unlike summaries (which lose precision), case facts are complete and never paraphrased. **Domain:** D5 · Context + Reliability (15% of CCA-F exam) **Canonical:** https://claudearchitectcertification.com/concepts/case-facts-block **Last reviewed:** 2026-05-04 ## Quick stats - **Typical facts:** 5–10 - **Turn threshold:** 20+ - **Position:** top - **Exam domain:** D5 - **Turns supported:** ∞ ## What it is A case-facts block is a structured, immutable set of core transactional data included in every prompt of a conversation. It contains facts that don't change (customer ID, order amount, policy limits), extracted once and reused. Unlike summarization (which compresses and loses detail), facts are preserved in full. Place at the top of every message so Claude always has accurate context without re-reading prior history. Critical in long conversations where context fill forces omission of prior reasoning. ## Exam-pattern questions ### Q1. A support conversation hits turn 30 and the model loses the customer ID. You're using progressive summarization. What's missing? A case-facts block at the top of every message. The summary compressed the customer ID into pronouns. The block survives summarization because it lives outside the summarizable history. ### Q2. Your case-facts block has customer_id, order_id, amount. By turn 20 the agent calls process_refund(amount="about $250"). Why? Either the block was not re-injected per turn, or it was paraphrased into the summary. The block must be immutable and at the top of every message. Verify with a sanity check helper before each messages.create() call. ### Q3. The user provides a corrected customer ID mid-conversation. How do you update the case-facts block? Only the user can correct facts. Listen for explicit corrections ("the customer ID is actually X") and replace the field. Add a new field if facts diverge (customer_id_revised). Never let the agent self-modify the block. ### Q4. You pass the case-facts block to a subagent in the task string. The subagent ignores it and asks for the customer ID. Why? Either the task string buried the block at the bottom (low attention) or the subagent's system prompt didn't reference it. Place the block at the top of the task and explicitly instruct: "use only the values in CASE_FACTS." ### Q5. Your case-facts block grows to 50 fields. Tokens add up. How do you trim? Include only essential transactional data: IDs, amounts, dates, status flags. Remove narrative or explanatory prose, those belong in the conversation. Aim for 50-500 tokens. ### Q6. After multi-session storage and retrieval, the case-facts block sometimes loses fields. What's the architecture fix? Store the authoritative facts in your application database, not just in the message list. Rebuild the block from DB on session resume. Backup ensures recovery if the block is corrupted in transit. ### Q7. You're using Anthropic's prompt caching. Should the case-facts block be cached? Yes, ideal candidate. The block is stable across turns (identity-level data) and contains no per-request secrets. Mark with cache_control: {type: "ephemeral"}. Saves ~90% on input cost for that section per turn. ### Q8. The conversation is 60 turns and you're hitting context limits. What's the right preservation strategy? Progressive summarization with the case-facts preservation helper. Extract the block, summarize middle turns into 3 sentences (excluding the block from summary input), rebuild: block + summary + last 3 turns. Block survives intact. ## FAQ ### Q1. What if the facts block is 100+ fields? Prioritize the critical ones. If genuinely that many, the case scope is too broad. Usually 5–10 fields cover 80% of decisions. ### Q2. Should facts include intermediate findings? No. Facts are immutable source data. Findings belong in the conversation. ### Q3. Can the facts block change between turns? No. That defeats the purpose. Identical across all turns. New constraints become NEW fields, never overwrites. --- **Source:** https://claudearchitectcertification.com/concepts/case-facts-block **Vault sources:** ACP-T03 §5; GAI-K04 §9 **Last reviewed:** 2026-05-04 **Evidence tiers**, 🟢 official Anthropic doc / API contract · 🟡 partial doc / inferred · 🟠 community-derived · 🔴 disputed.