What is late-stage summarization dollar-amount drift?

When an agent compresses long conversations mid-session, exact numerical values (prices, balances, totals) are lossy. The model paraphrases figures rather than preserving them verbatim, so downstream turns receive corrupted ground truth - e.g., $14,837.92 collapses into 'around $15,000' and is then cited with false confidence in later turns.

What is the root cause of dollar-amount drift after summarization?

Compressed summaries treat numeric tokens as ordinary semantic content. Precise figures become approximations because the summarizer optimizes for prose density, not arithmetic fidelity. The model then anchors on its own approximation as fact in subsequent turns.

Fix 1: How does structured memory anchoring prevent dollar-amount drift?

Extract all monetary values into a typed key-value store - a tool call result or system-prompt ledger - before summarization runs. Inject that ledger verbatim into every subsequent context window so figures are never derived from prose summaries. The summary is allowed to drift; the ledger never does.

Fix 2: How does summarization scope fencing preserve exact figures?

Instruct the summarizer to pass numeric entities through unsummarized using a sentinel pattern such as $14,837.92 . Prose stays compressed while figures stay exact. This is cheaper than ledger maintenance for one-off conversations but doesn't scale to multi-session memory.

What CCA-F exam signal does dollar-amount drift map to?

This is a D5 (Context + Reliability) signal. Exam questions test whether candidates distinguish semantic compression loss from retrieval failure. The correct fix is always re-injection of structured state - not longer context windows, not re-prompting the summary, not switching models.

CCA-F Reference · 20 Anti-Patterns · All 5 Domains

20 anti-patterns to avoid on the CCA-F exam.

CCA-F exam distractors are almost always anti-patterns dressed up to look correct. If you can spot the anti-pattern, you can eliminate 2-3 wrong answers before reading them in detail. 8 are critical (drop you below the 720 pass mark if missed), 10 are common, and 2 are edge cases.

Free referenceAll 5 domainsDistractor playbook

← Back to Exam Guide Test yourself: free 10Q diagnostic →

Domain 1 · 27% of CCA-F · 5 anti-patterns

Agentic Architectures

AP-LOOP-01

Critical

Parsing natural language for loop termination

Pattern-matching on phrases like 'I'm done' or 'task complete' to decide when an agentic loop should exit. The model's narration is descriptive, not authoritative - it'll drift, paraphrase, or hallucinate the exit phrase under load.

Check stop_reason: 'tool_use' vs 'end_turn'

The API's stop_reason field is the authoritative loop-termination signal. 'tool_use' means continue (run the requested tool); 'end_turn' means the model has signaled completion. Read the structured field; don't infer from prose.

Deep-dive: /concepts/agentic-loops →

AP-LOOP-02

Critical

Arbitrary iteration caps as the primary stopping rule

Setting max_iterations=10 as the main exit condition. Caps scale linearly with the bug - legitimate long tasks (multi-file refactors, deep research) get prematurely cut, and broken loops still burn budget up to the cap.

Let stop_reason drive the loop, use a cap as a safety net

stop_reason is the primary terminator. The cap is a guard against pathological loops, not the design. If you're hitting the cap regularly, your prompt or tool surface is the problem, not the cap.

Deep-dive: /concepts/agentic-loops →

AP-MULTI-01

Common

Subagents inherit the full parent conversation

Forwarding the entire parent context to every subagent. Burns tokens, contaminates the subagent's reasoning with irrelevant turns, and breaks context isolation guarantees that make hub-and-spoke architectures predictable.

Pass only the task brief; subagent runs in a clean context

Subagents get the explicit task description plus any narrow inputs they need (case facts, tool results). Context isolation is the whole reason to use a subagent - preserve it.

Deep-dive: /concepts/subagents →

AP-MULTI-02

Common

Sentiment-based escalation routing

Escalating to a human reviewer when the user 'sounds frustrated.' Sentiment ≠ complexity. Frustrated users with simple requests waste reviewer time; calm users with policy edge cases get missed.

Escalate on task complexity, policy gaps, and confidence signals

Structured criteria: policy gap detected, low retrieval coverage, customer explicitly asks for human, tool failure on retry. Sentiment is at best a secondary signal, never the primary trigger.

Deep-dive: /concepts/escalation →

AP-MULTI-03

Common

Self-reported confidence as an escalation signal

Asking the model 'how confident are you?' and routing on the answer. Self-reported confidence is uncalibrated and reflexively high; the model rates its own work charitably.

Programmatic checks: schema validation, retrieval-coverage, retry counts

Trigger escalation on structured signals - missing required JSON fields, retrieval coverage below threshold, ≥2 tool retries, policy-block hooks fired. These are external to the model's self-perception.

Deep-dive: /concepts/escalation →

Domain 2 · 18% of CCA-F · 4 anti-patterns

Tool Design + Integration

AP-TOOL-01

Critical

18+ tools registered on a single agent

Long tool lists degrade selection accuracy. The model wastes context evaluating irrelevant tools and confuses similarly-named ones (e.g. search_docs vs search_web vs search_knowledge).

4-5 tools per agent; spawn subagents for additional surfaces

Keep the tool surface narrow and use subagents (via the Task tool) for orthogonal capabilities. Each subagent has its own 4-5 tools scoped to its job.

Deep-dive: /concepts/tool-calling →

AP-TOOL-02

Critical

Generic error messages ('Operation failed')

Returning {error: 'failed'} from a tool. The model has no signal for whether to retry, switch tools, escalate, or fall back - it'll guess, usually wrong.

Structured errors: isError, errorCategory, isRetryable, context

Errors return a fixed schema: {isError: true, errorCategory: 'rate_limit' | 'auth' | 'not_found' | …, isRetryable: bool, retryAfterMs?: number, context: '…'}. The model can route on the category and respect the retry signal.

Deep-dive: /concepts/tool-calling →

AP-TOOL-03

Common

Silently returning empty results as success

A search tool that returns [] for both 'no matches found' and 'auth token expired.' The model believes the search worked and proceeds to wrong conclusions on missing data.

Distinguish 'access failed' from 'genuinely empty'

Empty-results-on-success is fine for actual zero matches. Auth failures, partial outages, and quota exhaustion must surface as errors with the appropriate errorCategory so the model can react.

Deep-dive: /concepts/tool-calling →

AP-MCP-01

Edge case

Mixing project and user MCP configurations

Editing .mcp.json AND ~/.claude.json with the same servers. Resolution order surprises teammates - a server enabled in the user config silently overrides the project config and ships unintended capabilities to CI.

Project-scoped MCP in .mcp.json (committed); user config for personal tooling

Production agents read .mcp.json which lives in the repo and is reviewable. The user-scoped ~/.claude.json is for personal tooling that should NEVER fire in CI or for other developers.

Deep-dive: /concepts/mcp →

Domain 3 · 20% of CCA-F · 4 anti-patterns

Agent Operations

AP-CC-01

Critical

Prompt-based enforcement for business rules

Telling Claude 'never refund over $500' in the system prompt. Probabilistic enforcement - under load or adversarial inputs, the model will violate the rule. Compliance teams notice.

Programmatic hooks (PreToolUse) for deterministic enforcement

Wrap the refund tool with a PreToolUse hook that returns an error for refunds > $500. Deterministic. Auditable. The model never sees the high-value path.

Deep-dive: /concepts/hooks →

AP-CC-02

Common

Ignoring the CLAUDE.md hierarchy

Hardcoding project rules in user-scoped ~/.claude.json or in ad-hoc system prompts. Other team members don't inherit them; CI runs with a different rule set than developers.

CLAUDE.md committed at repo root; .claude/rules/ for topic-specific

Hierarchy: user → project → directory, with .claude/rules/ for topic-specific imports via @import. Everything is in the repo and reviewable.

Deep-dive: /concepts/claude-md-hierarchy →

AP-CC-03

Common

Slash commands with allowed-tools: '*'

A custom slash command that grants all tools by default. PR descriptions containing prompt injections trigger Bash(rm -rf .) and corrupt the workspace.

Explicit allowed-tools per command; tightest possible scope

Each command in .claude/commands/<name>.md declares its allowed-tools as a narrow list. Add Bash sparingly and only when the command genuinely needs to shell out.

Deep-dive: /concepts/slash-commands →

AP-CC-04

Common

Same-session self-review of generated code

Asking the same Claude session that wrote the code to review it. The session retains its own reasoning bias and rationalizes the original choices instead of flagging them.

Separate session for review; fork_session or fresh process in CI

Generator and reviewer run in isolated sessions. The reviewer sees only the final code, not the reasoning trail that produced it.

Deep-dive: /concepts/session-state →

Domain 4 · 20% of CCA-F · 4 anti-patterns

Prompt Engineering

AP-PROMPT-01

Critical

Vague instructions ('be careful', 'use good judgment')

Soft instructions leave compliance to model intuition. Under load, on edge cases, or with adversarial inputs, 'use good judgment' resolves to whatever pattern dominated training.

Explicit criteria with concrete decision rules

Replace 'be careful with money' with 'For any amount > $500 OR currency != USD, return REQUIRES_REVIEW with the structured fields {amount, currency, reason}.' Testable, deterministic, debuggable.

Deep-dive: /concepts/prompt-engineering-techniques →

AP-PROMPT-02

Common

Few-shot examples omitted on ambiguous tasks

Skipping examples and relying on the instruction text alone for tasks like classification or extraction. Ambiguity in the instruction compounds with model variability.

2-4 few-shot examples for any ambiguous classification or extraction

Examples calibrate the model far better than additional instruction prose. 2 typical + 1 edge-case + 1 negative example covers most decision boundaries.

Deep-dive: /concepts/prompt-engineering-techniques →

AP-PROMPT-03

Critical

JSON output with no schema validation + retry loop

Asking the model 'return JSON' without a schema and parsing the first response. Malformed JSON, missing required fields, or wrong types crash downstream - silently corrupts data at scale.

tool_use with declared JSON schema + validation-retry loop

Force structured output via tool_use with a JSON schema. Validate every response; on failure append the specific schema error to the conversation and retry. 2-3 retries catches >99% of edge cases.

Deep-dive: /concepts/tool-choice →

AP-PROMPT-04

Edge case

tool_choice = 'auto' when only one tool is correct

Using auto when the task genuinely has one valid tool path. The model occasionally narrates an answer in text instead of calling the tool, breaking downstream consumers.

tool_choice = 'any' or forced specific tool when the call is required

If your pipeline requires the tool result, force the call. 'any' if any tool will do, or the explicit tool name if exactly one is correct.

Deep-dive: /concepts/tool-choice →

Domain 5 · 15% of CCA-F · 3 anti-patterns

Context + Reliability

AP-CTX-01

Critical

Progressive summarization losing case facts

Compressing earlier turns aggressively to save tokens. Customer-state facts (order numbers, plan tier, prior decisions) get summarized into prose and then lost - the model regenerates 'plausible' values.

Pinned case-facts block; only summarize narration

Case facts live in an immutable block at the top of the context, never summarized. Conversational narration can be compressed freely.

Deep-dive: /concepts/case-facts-block →

AP-CTX-02

Common

Aggregate accuracy metrics masking per-document failures

Reporting 'extraction accuracy = 94%' across all document types. A failure mode on invoices (which are 10% of volume but 100% of dollar value) gets hidden by the average.

Stratified metrics: accuracy per document type, per field, per confidence band

Aggregates hide failure modes. Always report broken down by the dimension that drives business impact - document type, field, language, customer tier.

Deep-dive: /concepts/context-window →

AP-CTX-03

Common

Context degradation in long-running sessions

Letting a single session run for hours of conversation without checkpointing. Early-turn details fade, later-turn contradictions compound, and the model loses thread on multi-step plans.

Periodic /compact, scratchpad files, crash-recovery manifests

Compact regularly. For long-running agentic work, write scratchpad files that survive session reset and include a manifest of completed steps so recovery is deterministic.

Deep-dive: /concepts/context-window →

Spotlight · D5 Context + Reliability

Late-Stage Summarization Dollar-Amount Drift: Causes and Fixes

One of the most common D5 traps: an agent reports wrong dollar amounts from earlier turns after a mid-session summarization. The diagnosis is always the same - semantic compression collapsed precise figures into prose approximations. The fix is structural, not prompt-level.

What is late-stage summarization dollar-amount drift?: When an agent compresses long conversations mid-session, exact numerical values (prices, balances, totals) are lossy. The model paraphrases figures rather than preserving them verbatim, so downstream turns receive corrupted ground truth - e.g., $14,837.92 collapses into 'around $15,000' and is then cited with false confidence in later turns.
What is the root cause of dollar-amount drift after summarization?: Compressed summaries treat numeric tokens as ordinary semantic content. Precise figures become approximations because the summarizer optimizes for prose density, not arithmetic fidelity. The model then anchors on its own approximation as fact in subsequent turns.
Fix 1: How does structured memory anchoring prevent dollar-amount drift?: Extract all monetary values into a typed key-value store - a tool call result or system-prompt ledger - before summarization runs. Inject that ledger verbatim into every subsequent context window so figures are never derived from prose summaries. The summary is allowed to drift; the ledger never does.
Fix 2: How does summarization scope fencing preserve exact figures?: Instruct the summarizer to pass numeric entities through unsummarized using a sentinel pattern such as <preserve>$14,837.92</preserve>. Prose stays compressed while figures stay exact. This is cheaper than ledger maintenance for one-off conversations but doesn't scale to multi-session memory.
What CCA-F exam signal does dollar-amount drift map to?: This is a D5 (Context + Reliability) signal. Exam questions test whether candidates distinguish semantic compression loss from retrieval failure. The correct fix is always re-injection of structured state - not longer context windows, not re-prompting the summary, not switching models.

Last updated May 17, 2026

Ready to test yourself?

Take the free 10-question diagnostic. Two questions per domain, no signup, about 10 minutes. Most distractors are one of the 20 patterns above - see how many you can spot.

Start the free diagnostic ← Back to Exam Guide

20 anti-patterns to avoid on the CCA-F exam.

Parsing natural language for loop termination

Check stop_reason: 'tool_use' vs 'end_turn'

Arbitrary iteration caps as the primary stopping rule

Let stop_reason drive the loop, use a cap as a safety net

Subagents inherit the full parent conversation

Pass only the task brief; subagent runs in a clean context

Sentiment-based escalation routing

Escalate on task complexity, policy gaps, and confidence signals

Self-reported confidence as an escalation signal

Programmatic checks: schema validation, retrieval-coverage, retry counts

18+ tools registered on a single agent

4-5 tools per agent; spawn subagents for additional surfaces

Generic error messages ('Operation failed')

Structured errors: isError, errorCategory, isRetryable, context

Silently returning empty results as success

Distinguish 'access failed' from 'genuinely empty'

Mixing project and user MCP configurations

Project-scoped MCP in .mcp.json (committed); user config for personal tooling

Prompt-based enforcement for business rules

Programmatic hooks (PreToolUse) for deterministic enforcement

Ignoring the CLAUDE.md hierarchy

CLAUDE.md committed at repo root; .claude/rules/ for topic-specific

Slash commands with allowed-tools: '*'

Explicit allowed-tools per command; tightest possible scope

Same-session self-review of generated code

Separate session for review; fork_session or fresh process in CI

Vague instructions ('be careful', 'use good judgment')

Explicit criteria with concrete decision rules

Few-shot examples omitted on ambiguous tasks

2-4 few-shot examples for any ambiguous classification or extraction

JSON output with no schema validation + retry loop

tool_use with declared JSON schema + validation-retry loop

tool_choice = 'auto' when only one tool is correct

tool_choice = 'any' or forced specific tool when the call is required

Progressive summarization losing case facts

Pinned case-facts block; only summarize narration

Aggregate accuracy metrics masking per-document failures

Stratified metrics: accuracy per document type, per field, per confidence band

Context degradation in long-running sessions

Periodic /compact, scratchpad files, crash-recovery manifests

Late-Stage Summarization Dollar-Amount Drift: Causes and Fixes

Ready to test yourself?

Related