Attention Engineering (D4, 20% of CCA-F) - Claude Architect Concept

01 · Summary

TLDR

Attention engineering is the discipline of placing critical context where the model attends most strongly, high in the prompt, in the system message, or in repeated facts blocks. community pattern

3

Strong-attention zones

D4

Exam domain

B

Coverage tier

buried context

Trap

place high

Pattern

02 · Definition

What it is

Attention engineering is the discipline of placing critical information in high-attention zones of your prompt to compensate for the transformer's Lost-in-the-Middle effect. LLMs exhibit a U-shaped attention curve across their context window. The beginning (system prompt, first 10%) and end (last 5%) receive disproportionately high attention. The middle 40-80% is effectively "lost", overlooked, underweighted, or forgotten. Not a bug; it's a property of how transformers attend.

The empirical phenomenon was documented in Liu et al. (2023, "Lost in the Middle"). Models drop from 90%+ retrieval accuracy on a fact in position 1 to 40-50% in position 50, then recover to 85%+ by position 100. If you have 200 facts and the key one is in position 50, the model might miss it entirely. Cascades into production failures: missing a refund amount in a contract paragraph, overlooking a critical API constraint in tool descriptions.

Three architectural patterns counter the effect. System-prompt pinning: immutable instructions live in the system prompt (always position 0, highest attention). Persistent case facts: transactional details live in a CASE_FACTS block at the top of the user message. Recency bias: place the current question/request at the end. The pattern: [CASE_FACTS] ... [CONTEXT] ... [CURRENT_QUESTION], not buried in the middle.

The most-tested anti-pattern is treating this as a linguistic problem ("write important facts in bold") instead of structural. Bolding does nothing; the transformer doesn't parse Markdown emphasis. The fix is position in the prompt sequence. Also explains why agentic loops grow: not for redundancy, but because early turns become "invisible" as later turns pile up. Windowing mitigates.

03 · Mechanics

How it works

The attention mechanism computes query-key-value dot products across every token. Position-independent initialization means token 1 and token 100 start with equal opportunity. However, learned attention patterns during training encoded a recency bias: model learned to weight recent tokens (end of sequence) higher because those are usually the most relevant (current question, not problem statement from paragraph 1). Middle positions get lowest weight.

Evidence: ask Claude a question dependent on a fact buried at position 50. The attention vector for position 50 will have lower magnitude than for position 1 or 100. It attends, but with reduced weight, leading to misinterpretation or forgetting. Happens regardless of how "important" the fact feels.

To exploit the U-shape, place information strategically. System prompts are position 0, always highest, use for role + output format + constraints. For user messages, structure: [METADATA] [CASE_FACTS] [SUPPORTING_CONTEXT] [QUESTION]. The first section gets high attention; the last (question) gets high via recency. For tool descriptions, put the most important constraint first: "Do not refund >$500" should appear in the first line, not buried in paragraph 3.

In agentic loops, this manifests as context windowing. After turn 5, the message list is [user_msg, asst_1, tool_1, ..., asst_5]. Turn 1's user message is at position 1 (high attention) but semantically displaced by 10-15 intermediate messages. Solution: after turn 5, summarize turns 1-4 into PRIOR_WORK: {summary}, drop verbose history, append turn 5. The summary is now position 1 (high attention via recency).

Attention Engineering mechanics, painterly diagram featuring Loop mascot.

04 · In production

Where you'll see it

Customer support refund policy enforcement

Policy buried in paragraph 5 of context: agent misses it, approves $800 refund. Fix: pin policy to system prompt. Always position 0, never missed. Production: lost-in-middle errors drop from 12% to <0.5%.

Multi-step research with current file at top

Subagent receives 20 files; current file mentioned in middle. Misses 40% of time. Fix: structure task as CURRENT_FILE: {filename}\n\nSUPPORTING_FILES: {list}. Current file is now at top.

Escalation protocol buried in middle

Customer support instructions: "on payment decline, escalate to billing-team@" in paragraph 7 of 15. Agent misses, attempts customer contact, breaches SLA. Fix: extract escalation rules to system prompt or pinned ESCALATION_PROCEDURES block.

Long conversation degradation

Agent loops 10 turns, 15KB message history. Customer ID in turn 1; current question in turn 10. Customer ID forgotten midway through turn 8. Fix: windowing at turn 6. Extract CASE_FACTS: {customer_id, amount, dispute_summary}, drop turns 1-5.

05 · Implementation

Code examples

Prompt structuring + windowing for Lost-in-the-Middle

from anthropic import Anthropic
client = Anthropic()

def support_with_windowing(case_id: str, facts: dict, current_q: str):
    """Support agent with context windowing to prevent lost-in-the-middle."""
    messages = []
    turn = 0
    max_turns = 15

    system = """You are a support agent.
CONSTRAINTS (highest attention zone):
- Refunds >$500 require manager approval (escalate via ESCALATE_TO_MANAGER tool)
- Never refund full amount for "not satisfied"; partial only
- Always verify customer ID in CASE_FACTS before processing
"""

    while turn < max_turns:
        turn += 1

        # Window at turn 5: drop verbose history, keep CASE_FACTS
        if turn == 5 and len(messages) > 8:
            prior = "Customer verified, order found, attempting refund processing."
            messages = [{
                "role": "user",
                "content": f"""CASE_FACTS:
- Customer ID: {facts['customer_id']}
- Order ID: {facts['order_id']}
- Refund Amount: ${facts['amount']}
- Dispute: {facts['issue']}
- Prior Work: {prior}

Current Question: {current_q}""",
            }]
        else:
            if turn == 1:
                content = f"""CASE_FACTS:
- Customer ID: {facts['customer_id']}
- Order ID: {facts['order_id']}
- Refund Amount: ${facts['amount']}
- Dispute: {facts['issue']}

Question: {current_q}"""
            else:
                content = current_q
            messages.append({"role": "user", "content": content})

        resp = client.messages.create(
            model="claude-opus-4-5", max_tokens=1024, system=system, messages=messages,
        )
        if resp.stop_reason == "end_turn":
            return resp.content[0].text
        messages.append({"role": "assistant", "content": resp.content[0].text})
        current_q = "Continue processing."

    return "Max turns exceeded."

Windowing at turn 5: drop verbose history, keep CASE_FACTS at top of new message. Constraints in system prompt (always position 0).

06 · Distractor patterns

Looks right, isn't

Each row pairs a plausible-looking pattern with the failure it actually creates. These are the shapes exam distractors are built from.

Looks right

Bold or capitalize important facts to make them stand out.

Actually wrong

Transformer doesn't parse Markdown or capitalization. **customer_id: 12345** and customer_id: 12345 have same attention weight. Position is what matters.

Looks right

Add important context at the end for maximum emphasis.

Actually wrong

End is high-attention via recency, but only for final question/request. Context at end is treated as supporting detail. System prompt is position 0 for constraints.

Looks right

Put all context in the system prompt to ensure it's always attended.

Actually wrong

System prompts are limited (~2000 tokens effective); they're for role and constraints, not transactional facts. Use CASE_FACTS block in user message for facts.

Looks right

Lost-in-the-middle is a myth.

Actually wrong

Empirically documented (Liu et al. 2023). Replicated across all transformers including Claude. Mid-context accuracy drops 40-50%. Mitigate with structural changes.

Looks right

If the agent forgets a key fact, increase max_tokens.

Actually wrong

Forgetting is attention-weight issue, not token budget. More tokens won't help. Restructure: move fact to top or end.

07 · Compare

Side-by-side

Aspect	Lost-in-the-Middle	System-prompt pinning	Recency bias	Windowing
Problem	Mid-context facts overlooked	Constraints not enforced	Old context forgotten	Long history loses facts
Manifestation	40-50% drop at position 50	Policy violations	Early-turn facts forgotten by turn 8	Customer ID forgotten mid-loop
Fix	Move facts to top/bottom	Move rules to system prompt	Place question at end	Summarize + drop old
Cost	Restructure only	~10% more tokens	Reorder only	Summarization call (extra turn)
Effort	Low	Very low	Very low	Medium
Effectiveness	80-90% accuracy recovery	99%+ enforcement	High recency weighting	70%+ accuracy recovery

08 · When to use

Decision tree

01

Have constraints (policies, rules) that must always apply?

YesPin to system prompt. System prompt is always position 0.

NoUse CASE_FACTS block in user message for facts.

02

Have transactional facts that must survive every turn?

YesCASE_FACTS block at top of every user message. Or windowing.

NoSimple prompt structure fine.

03

Agent loop running >5 turns?

YesImplement windowing at turn 5: summarize prior work, drop verbose history.

NoNo windowing needed.

04

Answer to the question buried in middle context?

YesRestructure: move answer-critical fact to top (CASE_FACTS) or end (QUESTION).

NoPrompt structure sufficient.

05

Agent forgetting details despite them being in earlier turns?

YesLost-in-the-middle. Windowing will fix.

NoPossibly tool-description or instruction clarity issue.

09 · On the exam

Question patterns

Attention Engineering exam trap, painterly cautionary scene featuring Loop mascot.

52 V2 questions wired to this concept. Tap an answer to check it instantly — you'll see whether it's right and why — then expand the full breakdown for the mental model and all four rationales.

Your coordinator passes the entire chat history to each subagent for context. Subagents respond with confused outputs. Why?

Tap your answer to check it.

Your support agent forgets the customer ID by turn 30 of a long conversation. What is the architectural fix?

Tap your answer to check it.

Which of these is wrong: summarizing verbose reasoning chains, or summarizing critical transactional facts?

Tap your answer to check it.

Subagent spawns receive the parent's full conversation history. Why is this an anti-pattern?

Tap your answer to check it.

Your agent calls the wrong tool 30% of the time across 8 similar tools. What is the first fix?

Tap your answer to check it.

You add 25 tools to a single agent and selection accuracy drops sharply. Why?

Tap your answer to check it.

46 additional questions for this concept live in the practice pillar. Take a mock exam ↗

10 · FAQ

Frequently asked

Does Lost-in-the-Middle affect Claude differently than other models?

All transformers exhibit it, including Claude. U-shaped curve is universal; magnitudes vary.

What percentage drop in middle?

Empirically, 40-50% at position 50 vs 90%+ at position 1 or 100. Depends on context size and model.

Fix Lost-in-the-Middle by increasing max_tokens?

No. Attention weight issue, not token budget. More tokens won't restore mid-context accuracy. Restructure prompt.

Always use a CASE_FACTS block?

Yes if conversation involves transactional details. For pure reasoning, less critical.

When to start windowing?

After 4-6 turns (8-12 messages). Beyond that, early turns degrade. Optimal: window at turn 5 or when message list >10KB.

How to write a good CASE_FACTS block?

Key-value pairs, no prose. - Customer ID: 12345 not "The customer has ID 12345". Scannable; one fact per line.

What goes in system prompt vs CASE_FACTS?

System: role, constraints, output schema. CASE_FACTS: customer ID, order ID, amount, dispute summary, transactional facts that change per instance.

Does formatting (bold, capitals) help mid-context facts?

No. Transformers don't parse Markdown. Position is all that matters.

Use a CASE_FACTS in subagent calls?

Yes. Pass it explicitly in the subagent's task string, at the top. Subagents don't inherit history.

How long should CASE_FACTS be?

Compact, 50-500 tokens. Essential transactional data only. Narrative belongs in conversation.

11 · Practice with AI

Work this with your AI

Work this concept hands-on with Claude Code, Codex, or claude.ai. Copy a prompt, paste it into your assistant, and practise in tandem. Each one keeps you active (explain it back, get drilled, or build) rather than just reading.

Drill it like the exam (scenario MCQs)
Practice in the exam's scenario-MCQ format with trap awareness.
Explain it back (Feynman)
Build durable, transferable understanding of a concept you can half-state.
Test me, adapting the difficulty
Active recall practice on a concept you think you know.
Check my prerequisites first
Before studying a concept that keeps not sticking.
Find the high-leverage 20%
When a domain feels too big and you are short on time.

Attention Engineering.

TLDR

What it is

How it works

Where you'll see it

Customer support refund policy enforcement

Multi-step research with current file at top

Escalation protocol buried in middle

Long conversation degradation

Code examples

Looks right, isn't

Side-by-side

Decision tree

Have constraints (policies, rules) that must always apply?

Have transactional facts that must survive every turn?

Agent loop running >5 turns?

Answer to the question buried in middle context?

Agent forgetting details despite them being in earlier turns?

Question patterns

Frequently asked

Work this with your AI

Test yourself

Attention Engineering, complete.

Attention Engineering.

TLDR

What it is

How it works

Where you'll see it

Customer support refund policy enforcement

Multi-step research with current file at top

Escalation protocol buried in middle

Long conversation degradation

Code examples

Looks right, isn't

Side-by-side

Decision tree

Have constraints (policies, rules) that must always apply?

Have transactional facts that must survive every turn?

Agent loop running >5 turns?

Answer to the question buried in middle context?

Agent forgetting details despite them being in earlier turns?

Question patterns

Frequently asked

Work this with your AI

Test yourself

Attention Engineering, complete.

Share this primitive