P3.1 · D1 + D2 + D5 · Process

Customer Support Resolution Agent.

Think of this as the AI teammate that sits behind your support inbox. When a customer writes in about a refund, a tech glitch, or an account question, this agent reads the message, looks up who the customer is, decides what they actually need, and either solves it on the spot or hands the case to a human with all the context already prepared. It exists because most support questions follow the same handful of patterns, and answering them in seconds (instead of hours) is the difference between a customer who stays and a customer who churns. Everything below is how that simple idea is wired up safely in production.

22 min build·5 components·8 concepts

A production refund/escalation agent built on Claude. The harness reads stop_reason, dispatches tools through a registry, gates risky operations with a PreToolUse hook (deterministic policy enforcement), pins customer state in a case-facts block, and routes blocked calls to a structured escalation queue. Three-domain coverage makes this the single highest-weight scenario on the exam.

60% exam weight
SourceOfficial Anthropic guide scenario
What do the colours mean?
Green
Official Anthropic doc or API contract
Yellow
Partial doc / inferred
Orange
Community-derived
Red
Disputed / changes frequently
Stack
Python or TypeScript SDK · CRM · queue
Needs
Tool calling · stop_reason · hooks
Exam
60% of CCA-F (D1 + D2 + D5). 27% D1 · 18% D2 · 15% D5. Highest-weight scenario on the test. Master this one and you've covered most of it.
Loop — the ACP mascot — illustrated as a calm customer-support agent at a walnut desk with headset, notebook, and a small speech-bubble holding an inbound question.
End-to-end flow60% of CCA-F (D1 + D2 + D5)
01 · Problem framing

The problem

What the customer needs

  1. Resolve the request in one turn without multiple agent transfers.
  2. Get identity verified before any account-modifying action.
  3. See a clear path to a human when the agent can't help.

Why naive approaches fail

  1. Single-agent chatbots forget the customer ID by turn 8 (no case-facts pinning).
  2. Prompt-only refund-cap policy leaks 3% of refunds above the limit (no deterministic hook).
  3. Sentiment-triggered escalation creates false positives: angry users with valid policy denials get escalated unnecessarily.
Definition of done
  • p95 resolution latency < 12 seconds end-to-end
  • Refund-cap violations = 0 (hook-enforced, not prompt-enforced)
  • Audit log entry per ticket with case-facts snapshot
  • CSAT ≥ 4.2/5 across resolved tickets
02 · Architecture

The system

03 · Component detail

What each part does

5 components, each owns a concept. Click any card to drill into the underlying primitive.

Tool Registry

verify · lookup · process · escalate

Holds the 4-5 tools the specialist agent can call. Each tool has a clear description and JSON schema. Tool count stays low to keep routing accurate.

Configuration

tools: [verify_customer, lookup_order, process_refund, escalate_to_human]. tool_choice: auto. Each description is 4 lines: what it does, when to use, edge cases, ordering with peers.

Concept: tool-calling

PreToolUse Hook

policy gate · deterministic

Sits between Claude's tool_use request and actual tool execution. Enforces refund caps, escalation triggers, and time-of-day limits. Exits 2 (deny) on violation.

Configuration

Hook fires before process_refund. Reads tool_input.amount, compares to policy.refund_cap. Exit 2 with stderr message routes Claude to retry with adjusted args or escalate.

Concept: hooks

Case-Facts Block

pinned customer state

Pinned at the top of every system-prompt iteration. Holds customer_id, order_id, refund_amount, policy_limit. Survives summarization. Re-read every turn.

Configuration

system: f"CASE_FACTS: {customer_id} · {order_id} · ${amount} · cap=${cap}". Updated by hooks after state-changing tool calls.

Concept: case-facts-block

Specialist Agent

the agentic loop

Runs the messages.create() loop. Reads stop_reason after every response: end_turn → exit, tool_use → execute + append result + continue, max_tokens → save partial.

Configuration

while True: resp = client.messages.create(...). if resp.stop_reason "end_turn": break. if resp.stop_reason "tool_use": execute_tools(...).

Concept: agentic-loops

Escalation Queue

structured handoff

Receives blocked calls from PreToolUse hook + low-confidence + sentiment-triggered escalations. Each entry has a structured context block (cus_id, reason, partial_status, recommended_action).

Configuration

queue.push({customer_id, intent, partial_state, blocked_tool, reason, recommended_action}). Human triages in ~10s vs 5min for transcript review.

Concept: escalation
04 · One concrete run

Data flow

05 · Build it

Eight steps to production

01

Define the system prompt with case-facts

Anchor the agent's role + constraints + the case-facts block at the very top of the system prompt. The case-facts block is the immutable truth about this customer + order + policy.

Define the system prompt with case-facts
from anthropic import Anthropic

client = Anthropic()

def build_system_prompt(case_facts: dict) -> str:
    return f"""You are a customer support agent for ACME.

CASE_FACTS (immutable; re-read every turn):
- customer_id: {case_facts['customer_id']}
- order_id: {case_facts['order_id']}
- refund_amount: ${case_facts['amount']}
- policy_cap: ${case_facts['cap']}

Constraints:
- Verify customer before ANY account-modifying call.
- Refunds above policy_cap MUST escalate (a hook enforces this).
- Branch on stop_reason. Never on response text."""
↪ Concept: case-facts-block
02

Define the 4-tool registry

Keep the tool count between 4-5. Each tool description is structured in 4 lines: what / when / edge cases / ordering. This is the primary lever for correct routing, fix descriptions, not the model.

Define the 4-tool registry
tools = [
    {
        "name": "verify_customer",
        "description": (
            "Look up a customer by customer_id and confirm they are active.\n"
            "Use this BEFORE any other tool that mentions the customer.\n"
            "Edge cases: returns 'not_found' if customer_id is missing.\n"
            "Always run before lookup_order or process_refund."
        ),
        "input_schema": {"type": "object", "properties": {
            "customer_id": {"type": "string"}
        }, "required": ["customer_id"]},
    },
    # ... lookup_order, process_refund, escalate_to_human
]
↪ Concept: tool-calling
03

Wire the PreToolUse policy hook

The hook is deterministic, prompt-only enforcement leaks 3% of cases past the cap. Exit 2 to deny; exit 0 to allow; the SDK reads stderr to route Claude back with feedback.

Wire the PreToolUse policy hook
# .claude/hooks/refund_policy.py
import sys, json, os

POLICY_CAP = float(os.environ.get("REFUND_CAP", "500"))

def main():
    payload = json.loads(sys.stdin.read())
    if payload["tool_name"] != "process_refund":
        sys.exit(0)  # not our concern, allow
    amount = payload["tool_input"].get("amount", 0)
    if amount > POLICY_CAP:
        print(f"refund ${amount} exceeds cap ${POLICY_CAP}, escalate", file=sys.stderr)
        sys.exit(2)  # DENY
    sys.exit(0)  # allow

if __name__ == "__main__":
    main()
↪ Concept: hooks
04

Run the agent loop on stop_reason

Branch on the structured field, never the response text. end_turn → exit. tool_use → execute, append, continue. max_tokens → save partial. stop_sequence → custom termination.

Run the agent loop on stop_reason
def run_agent_loop(user_msg: str, case_facts: dict, max_iter: int = 15):
    messages = [{"role": "user", "content": user_msg}]
    for _ in range(max_iter):
        resp = client.messages.create(
            model="claude-sonnet-4.5",
            max_tokens=4096,
            system=build_system_prompt(case_facts),
            tools=tools,
            messages=messages,
        )
        if resp.stop_reason == "end_turn":
            return extract_text(resp)
        if resp.stop_reason == "tool_use":
            tool_uses = [b for b in resp.content if b.type == "tool_use"]
            results = [execute_tool(t) for t in tool_uses]
            messages.append({"role": "assistant", "content": resp.content})
            messages.append({"role": "user", "content": results})
            continue
        if resp.stop_reason == "max_tokens":
            return {"status": "partial", "text": extract_text(resp)}
    return {"status": "iteration_cap"}
↪ Concept: stop-reason
05

Add the structured escalation block

When the hook denies or the agent reaches stop_reason with low confidence, push a structured block, not the transcript, to the human queue. Triage time drops from 5 minutes to 10 seconds.

Add the structured escalation block
def escalate(case_facts: dict, reason: str, partial: dict) -> dict:
    return {
        "customer_id": case_facts["customer_id"],
        "order_id": case_facts["order_id"],
        "intent": partial.get("intent", "unknown"),
        "partial_status": partial.get("last_action"),
        "blocker": reason,
        "recommended_action": derive_recommendation(reason),
        "evidence": [partial.get("last_tool_result")],
    }
↪ Concept: escalation
06

Wire the sentiment + confidence gates

Two final guards on the response: sentiment monitor (orthogonal to policy, distress alone never triggers a refund) + confidence threshold. Either gate can route to escalation.

Wire the sentiment + confidence gates
def post_response_gates(response: str, agent_confidence: float):
    sentiment = sentiment_score(response)
    if sentiment == "distressed" and agent_confidence < 0.7:
        return {"action": "escalate", "reason": "low_confidence_distressed"}
    if agent_confidence < 0.5:
        return {"action": "escalate", "reason": "low_confidence"}
    return {"action": "send"}
↪ Concept: escalation
07

Cache the system prompt for cost

The system prompt + tool definitions don't change between turns. Mark them with cache_control: ephemeral and pay roughly 90% less for those bytes on every turn.

Cache the system prompt for cost
resp = client.messages.create(
    model="claude-sonnet-4.5",
    max_tokens=4096,
    system=[
        {
            "type": "text",
            "text": build_system_prompt(case_facts),
            "cache_control": {"type": "ephemeral"},
        },
    ],
    tools=tools,  # tools also auto-cached when stable
    messages=messages,
)
↪ Concept: prompt-caching
08

Audit log every resolution

Every closed ticket writes a row: customer_id, agent_path, tool_calls, escalation_reason (if any), elapsed_ms, CSAT. This is your replay tool when production breaks at turn 18.

Audit log every resolution
def audit_log(case_facts: dict, agent_path: list, elapsed_ms: int, csat: int | None):
    db.audit.insert({
        "ts": datetime.utcnow(),
        "customer_id": case_facts["customer_id"],
        "order_id": case_facts["order_id"],
        "tool_calls": [c["name"] for c in agent_path if c["type"] == "tool_use"],
        "stop_reasons": [c["stop_reason"] for c in agent_path],
        "elapsed_ms": elapsed_ms,
        "csat": csat,
    })
↪ Concept: evaluation
06 · Configuration decisions

The four decisions

DecisionRight answerWrong answerWhy
tool_choice"auto" (default)"any" or {type:"tool",name:"X"}Customer requests are open-ended, let Claude pick. Forced tools only for mandatory first steps.
stop_reason handlingbranch on field; max_tokens = partialparse response text for 'done'Text-shape parsing is the most-tested distractor. The structured field is authoritative.
Session statecase-facts block + threaded messagesprogressive summarization of customer_idTransactional values (IDs, amounts) must be pinned, never paraphrased.
Cache TTLephemeral on system + toolsno cachingSystem prompt + tool defs are stable across turns. ~90% cost reduction on those bytes.
07 · Failure modes

Where it breaks

Five failure pairs. Each one is one exam question. The fix is always architectural, deterministic gates, structured fields, pinned state.

Loop termination

Code checks response.text.includes('done') to decide termination.

AP-12
✅ Fix

Branch on stop_reason == 'end_turn'. Text + tool_use can co-exist in one response.

Refund cap enforcement

System prompt says 'never refund more than $500'. Production sees 3% violations.

AP-18
✅ Fix

PreToolUse hook checks tool_input.amount <= 500. Deterministic gate.

Escalation triggers

Customer raises voice → agent escalates regardless of policy.

AP-22
✅ Fix

Sentiment is orthogonal. Trigger only on policy exception, ambiguity, or explicit request.

Customer state retention

By turn 8, agent has summarized cus_42 → 'a customer wanting a refund'.

AP-35
✅ Fix

Pin CASE_FACTS block in system prompt. Re-read every turn. Never paraphrased.

Identity verification skip

Agent calls lookup_order first; pulls wrong record 12% of the time.

AP-08
✅ Fix

Programmatic prerequisite: verify_customer is called via tool description ordering. Hook can also enforce.

08 · Budget

Cost & latency

Per-conversation tokens
~3,200 input · 1,400 output

8 avg turns × (system + tools + accumulating history). Cache hits ~70% on system+tools.

Per-conversation cost
~$0.018 (Sonnet 4.5)

Pre-cache: ~$0.04. With ephemeral cache on system+tools: ~$0.018. ~55% reduction.

p95 latency
8.2 seconds

Streaming first token in ~150ms. Tool round-trips 1.5-2s each. 4 tool calls × 2s + 800ms compose.

Cache hit rate
≥ 70% on system+tools

5-min TTL on ephemeral. Continuous traffic keeps cache warm.

09 · Ship gates

Ship checklist

Two passes. Build-time gates verify the code; run-time gates verify the system in production.

Build-time

  1. System prompt anatomy: role · constraints · case-facts · escalation trigger
  2. Case-facts block pinned + re-read every turncase-facts-block
  3. 4-tool registry with structured 4-line descriptionstool-calling
  4. PreToolUse hook for refund cap (deterministic)hooks
  5. Loop branches on stop_reason, never on textstop-reason
  6. Identity verification prerequisite enforced via tool ordering
  7. Structured escalation block (not transcript) on handoffescalation
  8. Sentiment + confidence post-response gates
  9. Prompt caching on system + tools (cache_control: ephemeral)prompt-caching
  10. Audit log written per closed ticket
  11. Conversation history bounded by case-facts windowing
  12. Iteration cap (max_iter=15) as a safety net, not the primary control

Run-time

  • Unit tests on every tool's input validation
  • Integration test: end-to-end refund flow against test CRM
  • Hook test: fire mock tool_input with amount > cap, verify exit 2
  • Sentiment classifier evaluated against ≥ 200 labeled tickets
  • Latency monitor: alert if p95 > 12s for ≥ 5 min
  • Cost monitor: alert if per-conversation cost > $0.03
  • Escalation queue dashboard with SLA breach alerts
  • Runbook: top-5 escalation reasons + recommended human actions
10 · Question patterns

Five exam-pattern questions

Your refund agent uses prompt-only enforcement: 'never refund over $500'. Production logs show 3% of refunds violate the policy. What's the architectural fix?
Replace prompt-only enforcement with a PreToolUse hook that validates tool_input.amount <= 500. The hook exits 2 (deny) on violation, providing deterministic policy enforcement. Prompt-only is probabilistic and leaks ~3-5% in production. Tagged to AP-18 in the anti-pattern catalog.
Your agent loop terminates after 7 turns by checking `response.text.includes('done')`. The customer says they're stuck. What's wrong?
Text-parse termination is unreliable. Claude can return [text, tool_use] in the same response, where text is preamble and tool_use is the real next step. Branch on stop_reason == "end_turn". The text "I'm done" can appear while stop_reason is still tool_use. Tagged to AP-12.
By turn 8, the agent has lost the customer's order ID. What's the architectural fix?
Pin a CASE_FACTS block at the top of the system prompt with customer_id, order_id, amount, policy_cap. Re-read every turn. Transactional values (IDs, amounts) must never be summarized, only reasoning chains can be paraphrased. Tagged to AP-35.
An angry customer asks for a refund that exceeds the policy. Your agent escalates. Why is this wrong?
Sentiment is orthogonal to policy. Distress alone is not an escalation trigger. The hook should evaluate the policy violation independently. If amount > cap, hook denies (escalation by policy, not sentiment). If amount ≤ cap, agent processes regardless of customer mood. Tagged to AP-22.
You add a 6th tool to the registry and the agent's tool-selection accuracy drops 8%. What's happening?
Tool count past 4-5 degrades routing. Each new tool adds ambiguity; descriptions overlap; the model alternates. Either (a) consolidate tools (merge lookup_order_details + lookup_order_statuslookup_order), or (b) move rare-use tools to a sub-agent. The Anthropic guide caps the optimum at ~5 tools per agent.
11 · FAQ

Frequently asked

Why a separate hook for refund cap instead of putting it in the system prompt?
Prompt-only enforcement is probabilistic. Claude follows the rule ~95-97% of the time, leaving 3-5% leakage. Hooks are deterministic, they read structured tool_input fields and exit 2 to deny. For policy-bearing limits (refunds, escalation thresholds), determinism is required. Use prompts for tone and behavior; use hooks for hard policy.
How many tools should the registry have?
4-5 is the optimum per Anthropic's customer-support guide. Fewer means the agent has to compose multiple low-level calls into one task. More degrades selection accuracy, overlapping descriptions cause the model to alternate. If you need >5, split into specialist agents (e.g., refund agent + tech agent + account agent) and route between them with a triage classifier.
Should the system prompt include few-shot examples of past conversations?
Sparingly. 1-2 high-quality examples can lock tone and tool-use pattern. More than 3 starts crowding the cache and dilutes attention. Better leverage: pin a clear tool registry with detailed descriptions + a sharp CASE_FACTS block. Examples are for edge-case behavior; descriptions are for routing.
What's the difference between sentiment escalation and policy escalation?
Policy escalation: the agent hits a structural condition that requires a human (refund > cap, identity unverifiable, ambiguous request). Triggered by hooks or explicit conditions. Sentiment escalation: the customer shows distress. Sentiment is *orthogonal*, distress alone never warrants escalation. Combine them only as a tie-breaker (low confidence + distress = escalate).
How do I handle a customer who switches topics mid-conversation?
Re-route through triage. If the new intent maps to a different specialist (e.g., refund → tech), don't try to handle it inline. Push the original case-facts to the new specialist's task string + spawn (or context-switch) a new agent. Trying to handle multi-intent in one specialist agent erodes accuracy and pollutes the case-facts block.
What's a good escalation queue SLA?
5-10 minutes for customer-blocking flows; 2-4 hours for batch flows (overnight refund reconciliation). Mark each escalation with intent + urgency from the triage stage; route customer-blocking ones to the live queue, batch ones to a daily review. The structured block format is the same; only the SLA differs.
Should I cache the message history across turns?
No. The message list grows monotonically, caching it has marginal value (each turn changes the cache key). Cache the system prompt + tool definitions instead, those are stable across turns and account for 60-80% of token cost on long conversations. ~5-min TTL on ephemeral cache is sufficient for live chat traffic.
When should I use a sub-agent instead of expanding this one?
When (a) the new flow is parallelizable (e.g., research a customer's order history while another agent handles billing), (b) the new flow needs different tool scope (read-only research vs write-capable refund), or (c) the new flow generates verbose intermediate work that pollutes the main case-facts block. Use sub-agents for isolation; use this agent for inline reasoning.
How do I prevent infinite loops?
stop_reason is the primary control, branch on it, never on text. Iteration cap (max_iter=15) is a safety net, not the primary control. If you hit the cap regularly, the bug is upstream: missing tool_result append, ambiguous tool descriptions, or two tools alternating. Raising the cap masks the real issue.
What should the audit log capture?
Per closed ticket: customer_id, order_id, full tool_call sequence (just names + timestamps), stop_reason per turn, elapsed_ms, escalation_reason (if any), csat (if surveyed). Skip the full transcript, the structured trace is enough to replay any failure. Store for 90 days minimum.
P3.1 · D1 · Agentic Architectures

Customer Support Resolution Agent, complete.

You've covered the full ten-section breakdown for this primitive, definition, mechanics, code, false positives, comparison, decision tree, exam patterns, and FAQ. One technical primitive down on the path to CCA-F.

Share your win →