# Attention Engineering

> Attention engineering is the discipline of placing critical context where the model attends most strongly, high in the prompt, in the system message, or in repeated facts blocks. Full content in SCRUM-21 follow-up.

**Domain:** D4 · Prompt Engineering (20% of CCA-F exam)
**Canonical:** https://claudearchitectcertification.com/concepts/attention-engineering
**Last reviewed:** 2026-05-04

## Quick stats

- **Strong-attention zones:** 3
- **Exam domain:** D4
- **Coverage tier:** B
- **Trap:** buried context
- **Pattern:** place high

## What it is

Attention engineering is the discipline of placing critical information in high-attention zones of your prompt to compensate for the transformer's Lost-in-the-Middle effect. LLMs exhibit a U-shaped attention curve across their context window. The beginning (system prompt, first 10%) and end (last 5%) receive disproportionately high attention. The middle 40-80% is effectively "lost", overlooked, underweighted, or forgotten. Not a bug; it's a property of how transformers attend.

The empirical phenomenon was documented in Liu et al. (2023, "Lost in the Middle"). Models drop from 90%+ retrieval accuracy on a fact in position 1 to 40-50% in position 50, then recover to 85%+ by position 100. If you have 200 facts and the key one is in position 50, the model might miss it entirely. Cascades into production failures: missing a refund amount in a contract paragraph, overlooking a critical API constraint in tool descriptions.

Three architectural patterns counter the effect. System-prompt pinning: immutable instructions live in the system prompt (always position 0, highest attention). Persistent case facts: transactional details live in a CASE_FACTS block at the top of the user message. Recency bias: place the current question/request at the end. The pattern: [CASE_FACTS] ... [CONTEXT] ... [CURRENT_QUESTION], not buried in the middle.

The most-tested anti-pattern is treating this as a linguistic problem ("write important facts in bold") instead of structural. Bolding does nothing; the transformer doesn't parse Markdown emphasis. The fix is position in the prompt sequence. Also explains why agentic loops grow: not for redundancy, but because early turns become "invisible" as later turns pile up. Windowing mitigates.

## How it works

The attention mechanism computes query-key-value dot products across every token. Position-independent initialization means token 1 and token 100 start with equal opportunity. However, learned attention patterns during training encoded a recency bias: model learned to weight recent tokens (end of sequence) higher because those are usually the most relevant (current question, not problem statement from paragraph 1). Middle positions get lowest weight.

Evidence: ask Claude a question dependent on a fact buried at position 50. The attention vector for position 50 will have lower magnitude than for position 1 or 100. It attends, but with reduced weight, leading to misinterpretation or forgetting. Happens regardless of how "important" the fact feels.

To exploit the U-shape, place information strategically. System prompts are position 0, always highest, use for role + output format + constraints. For user messages, structure: [METADATA] [CASE_FACTS] [SUPPORTING_CONTEXT] [QUESTION]. The first section gets high attention; the last (question) gets high via recency. For tool descriptions, put the most important constraint first: "Do not refund >$500" should appear in the first line, not buried in paragraph 3.

In agentic loops, this manifests as context windowing. After turn 5, the message list is [user_msg, asst_1, tool_1, ..., asst_5]. Turn 1's user message is at position 1 (high attention) but semantically displaced by 10-15 intermediate messages. Solution: after turn 5, summarize turns 1-4 into PRIOR_WORK: {summary}, drop verbose history, append turn 5. The summary is now position 1 (high attention via recency).

## Where you'll see it in production

### Customer support refund policy enforcement

Policy buried in paragraph 5 of context: agent misses it, approves $800 refund. Fix: pin policy to system prompt. Always position 0, never missed. Production: lost-in-middle errors drop from 12% to <0.5%.

### Multi-step research with current file at top

Subagent receives 20 files; current file mentioned in middle. Misses 40% of time. Fix: structure task as CURRENT_FILE: {filename}\n\nSUPPORTING_FILES: {list}. Current file is now at top.

### Escalation protocol buried in middle

Customer support instructions: "on payment decline, escalate to billing-team@" in paragraph 7 of 15. Agent misses, attempts customer contact, breaches SLA. Fix: extract escalation rules to system prompt or pinned ESCALATION_PROCEDURES block.

### Long conversation degradation

Agent loops 10 turns, 15KB message history. Customer ID in turn 1; current question in turn 10. Customer ID forgotten midway through turn 8. Fix: windowing at turn 6. Extract CASE_FACTS: {customer_id, amount, dispute_summary}, drop turns 1-5.

## Code examples

### Prompt structuring + windowing for Lost-in-the-Middle

**Python:**

```python
from anthropic import Anthropic
client = Anthropic()

def support_with_windowing(case_id: str, facts: dict, current_q: str):
    """Support agent with context windowing to prevent lost-in-the-middle."""
    messages = []
    turn = 0
    max_turns = 15

    system = """You are a support agent.
CONSTRAINTS (highest attention zone):
- Refunds >$500 require manager approval (escalate via ESCALATE_TO_MANAGER tool)
- Never refund full amount for "not satisfied"; partial only
- Always verify customer ID in CASE_FACTS before processing
"""

    while turn < max_turns:
        turn += 1

        # Window at turn 5: drop verbose history, keep CASE_FACTS
        if turn == 5 and len(messages) > 8:
            prior = "Customer verified, order found, attempting refund processing."
            messages = [{
                "role": "user",
                "content": f"""CASE_FACTS:
- Customer ID: {facts['customer_id']}
- Order ID: {facts['order_id']}
- Refund Amount: ${facts['amount']}
- Dispute: {facts['issue']}
- Prior Work: {prior}

Current Question: {current_q}""",
            }]
        else:
            if turn == 1:
                content = f"""CASE_FACTS:
- Customer ID: {facts['customer_id']}
- Order ID: {facts['order_id']}
- Refund Amount: ${facts['amount']}
- Dispute: {facts['issue']}

Question: {current_q}"""
            else:
                content = current_q
            messages.append({"role": "user", "content": content})

        resp = client.messages.create(
            model="claude-opus-4-5", max_tokens=1024, system=system, messages=messages,
        )
        if resp.stop_reason == "end_turn":
            return resp.content[0].text
        messages.append({"role": "assistant", "content": resp.content[0].text})
        current_q = "Continue processing."

    return "Max turns exceeded."
```

> Windowing at turn 5: drop verbose history, keep CASE_FACTS at top of new message. Constraints in system prompt (always position 0).

**TypeScript:**

```typescript
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();

interface Facts { customer_id: string; order_id: string; amount: number; issue: string; }

async function supportWithWindowing(caseId: string, facts: Facts, currentQ: string) {
  const messages: Anthropic.MessageParam[] = [];
  let turn = 0;
  const maxTurns = 15;

  const system = `You are a support agent.
CONSTRAINTS:
- Refunds >$500 require approval (escalate)
- Never refund full for "not satisfied" (partial only)
- Verify CASE_FACTS customer ID before processing
`;

  while (turn < maxTurns) {
    turn += 1;

    if (turn === 5 && messages.length > 8) {
      // Window: reset history to CASE_FACTS + current question
      const priorSummary = "Customer verified, order found, refund processing.";
      messages.splice(0, messages.length, {
        role: "user" as const,
        content: `CASE_FACTS:
- Customer ID: ${facts.customer_id}
- Order ID: ${facts.order_id}
- Refund Amount: \$${facts.amount}
- Dispute: ${facts.issue}
- Prior Work: ${priorSummary}

Current Question: ${currentQ}`,
      });
    } else {
      const content = turn === 1
        ? `CASE_FACTS:\n- Customer ID: ${facts.customer_id}\n- Order ID: ${facts.order_id}\n- Refund Amount: \$${facts.amount}\n- Dispute: ${facts.issue}\n\nQuestion: ${currentQ}`
        : currentQ;
      messages.push({ role: "user", content });
    }

    const resp = await client.messages.create({ model: "claude-opus-4-5", max_tokens: 1024, system, messages });
    if (resp.stop_reason === "end_turn") {
      return resp.content[0].type === "text" ? resp.content[0].text : "";
    }
    messages.push({ role: "assistant", content: resp.content[0].type === "text" ? resp.content[0].text : "" });
    currentQ = "Continue.";
  }
  return "Max turns exceeded.";
}
```

> Same windowing at turn 5: reset to CASE_FACTS + current question. Constraints in system prompt.

## Looks-right vs actually-wrong

| Looks right | Actually wrong |
|---|---|
| Bold or capitalize important facts to make them stand out. | Transformer doesn't parse Markdown or capitalization. customer_id: 12345 and customer_id: 12345 have same attention weight. Position is what matters. |
| Add important context at the end for maximum emphasis. | End is high-attention via recency, but only for final question/request. Context at end is treated as supporting detail. System prompt is position 0 for constraints. |
| Put all context in the system prompt to ensure it's always attended. | System prompts are limited (~2000 tokens effective); they're for role and constraints, not transactional facts. Use CASE_FACTS block in user message for facts. |
| Lost-in-the-middle is a myth. | Empirically documented (Liu et al. 2023). Replicated across all transformers including Claude. Mid-context accuracy drops 40-50%. Mitigate with structural changes. |
| If the agent forgets a key fact, increase max_tokens. | Forgetting is attention-weight issue, not token budget. More tokens won't help. Restructure: move fact to top or end. |

## Comparison

| Aspect | Lost-in-the-Middle | System-prompt pinning | Recency bias | Windowing |
| --- | --- | --- | --- | --- |
| Problem | Mid-context facts overlooked | Constraints not enforced | Old context forgotten | Long history loses facts |
| Manifestation | 40-50% drop at position 50 | Policy violations | Early-turn facts forgotten by turn 8 | Customer ID forgotten mid-loop |
| Fix | Move facts to top/bottom | Move rules to system prompt | Place question at end | Summarize + drop old |
| Cost | Restructure only | ~10% more tokens | Reorder only | Summarization call (extra turn) |
| Effort | Low | Very low | Very low | Medium |
| Effectiveness | 80-90% accuracy recovery | 99%+ enforcement | High recency weighting | 70%+ accuracy recovery |

## Decision tree

1. **Have constraints (policies, rules) that must always apply?**
   - **Yes:** Pin to system prompt. System prompt is always position 0.
   - **No:** Use CASE_FACTS block in user message for facts.

2. **Have transactional facts that must survive every turn?**
   - **Yes:** CASE_FACTS block at top of every user message. Or windowing.
   - **No:** Simple prompt structure fine.

3. **Agent loop running >5 turns?**
   - **Yes:** Implement windowing at turn 5: summarize prior work, drop verbose history.
   - **No:** No windowing needed.

4. **Answer to the question buried in middle context?**
   - **Yes:** Restructure: move answer-critical fact to top (CASE_FACTS) or end (QUESTION).
   - **No:** Prompt structure sufficient.

5. **Agent forgetting details despite them being in earlier turns?**
   - **Yes:** Lost-in-the-middle. Windowing will fix.
   - **No:** Possibly tool-description or instruction clarity issue.

## Exam-pattern questions

### Q1. Critical fact buried in paragraph 5 of context. Agent misses it 40% of the time. Why?

Lost-in-the-Middle effect. Mid-context positions get ~40-50% attention vs 90%+ at start/end. Move the fact to the top (system prompt or CASE_FACTS block) or end (current question). Position trumps content.

### Q2. Bold or capitalize important facts to boost attention?

No. The transformer doesn't parse Markdown or capitalization. customer_id: 12345 and customer_id: 12345 have same attention weight. Position is what matters.

### Q3. Add important context at the very end for maximum emphasis?

Recency bias helps, but only for the final question/request. Context placed at the end is treated as supporting detail, not fact. System prompt is always position 0 for constraints; case-facts at top of user message for transactional data.

### Q4. Put all context in the system prompt to ensure it's always attended?

No. System prompts are limited (~2000 tokens effective); they're for role and constraints, not transactional facts. Use system prompt for rules; CASE_FACTS block in user message for facts.

### Q5. Lost-in-the-Middle is a myth: it doesn't affect Claude?

False. Empirically documented (Liu et al. 2023) and replicated across all transformers including Claude. Mid-context accuracy drops 40-50%. Mitigate with structural changes, not by ignoring the effect.

### Q6. Agent forgets a key fact despite it being in context. Increase max_tokens?

No. Forgetting is an attention-weight issue, not a token budget issue. More tokens won't help. Restructure: move the fact to the top or end.

### Q7. When should you start windowing in an agentic loop?

After 4-6 turns (8-12 messages). Beyond that, early turns degrade. Optimal: window at turn 5 or when message list >10KB. Pre-emptive is better than reactive.

### Q8. What goes in the system prompt vs the CASE_FACTS block?

System prompt: role, constraints, output schema (rules that don't change per instance). CASE_FACTS: customer ID, order ID, amount, dispute summary (transactional facts that change per instance). Both are needed; they layer.

## FAQ

### Q1. Does Lost-in-the-Middle affect Claude differently than other models?

All transformers exhibit it, including Claude. U-shaped curve is universal; magnitudes vary.

### Q2. What percentage drop in middle?

Empirically, 40-50% at position 50 vs 90%+ at position 1 or 100. Depends on context size and model.

### Q3. Fix Lost-in-the-Middle by increasing max_tokens?

No. Attention weight issue, not token budget. More tokens won't restore mid-context accuracy. Restructure prompt.

### Q4. Always use a CASE_FACTS block?

Yes if conversation involves transactional details. For pure reasoning, less critical.

### Q5. When to start windowing?

After 4-6 turns (8-12 messages). Beyond that, early turns degrade. Optimal: window at turn 5 or when message list >10KB.

### Q6. How to write a good CASE_FACTS block?

Key-value pairs, no prose. - Customer ID: 12345 not "The customer has ID 12345". Scannable; one fact per line.

### Q7. What goes in system prompt vs CASE_FACTS?

System: role, constraints, output schema. CASE_FACTS: customer ID, order ID, amount, dispute summary, transactional facts that change per instance.

### Q8. Does formatting (bold, capitals) help mid-context facts?

No. Transformers don't parse Markdown. Position is all that matters.

### Q9. Use a CASE_FACTS in subagent calls?

Yes. Pass it explicitly in the subagent's task string, at the top. Subagents don't inherit history.

### Q10. How long should CASE_FACTS be?

Compact, 50-500 tokens. Essential transactional data only. Narrative belongs in conversation.

---

**Source:** https://claudearchitectcertification.com/concepts/attention-engineering
**Vault sources:** ACP-T03 §15 attention engineering
**Last reviewed:** 2026-05-04

**Evidence tiers** — 🟢 official Anthropic doc / API contract · 🟡 partial doc / inferred · 🟠 community-derived · 🔴 disputed.
