On this page
TLDR
Attention engineering is the discipline of placing critical context where the model attends most strongly, high in the prompt, in the system message, or in repeated facts blocks. community pattern
What it is
Attention engineering is the discipline of placing critical information in high-attention zones of your prompt to compensate for the transformer's Lost-in-the-Middle effect. LLMs exhibit a U-shaped attention curve across their context window. The beginning (system prompt, first 10%) and end (last 5%) receive disproportionately high attention. The middle 40-80% is effectively "lost", overlooked, underweighted, or forgotten. Not a bug; it's a property of how transformers attend.
The empirical phenomenon was documented in Liu et al. (2023, "Lost in the Middle"). Models drop from 90%+ retrieval accuracy on a fact in position 1 to 40-50% in position 50, then recover to 85%+ by position 100. If you have 200 facts and the key one is in position 50, the model might miss it entirely. Cascades into production failures: missing a refund amount in a contract paragraph, overlooking a critical API constraint in tool descriptions.
Three architectural patterns counter the effect. System-prompt pinning: immutable instructions live in the system prompt (always position 0, highest attention). Persistent case facts: transactional details live in a CASE_FACTS block at the top of the user message. Recency bias: place the current question/request at the end. The pattern: [CASE_FACTS] ... [CONTEXT] ... [CURRENT_QUESTION], not buried in the middle.
The most-tested anti-pattern is treating this as a linguistic problem ("write important facts in bold") instead of structural. Bolding does nothing; the transformer doesn't parse Markdown emphasis. The fix is position in the prompt sequence. Also explains why agentic loops grow: not for redundancy, but because early turns become "invisible" as later turns pile up. Windowing mitigates.
How it works
The attention mechanism computes query-key-value dot products across every token. Position-independent initialization means token 1 and token 100 start with equal opportunity. However, learned attention patterns during training encoded a recency bias: model learned to weight recent tokens (end of sequence) higher because those are usually the most relevant (current question, not problem statement from paragraph 1). Middle positions get lowest weight.
Evidence: ask Claude a question dependent on a fact buried at position 50. The attention vector for position 50 will have lower magnitude than for position 1 or 100. It attends, but with reduced weight, leading to misinterpretation or forgetting. Happens regardless of how "important" the fact feels.
To exploit the U-shape, place information strategically. System prompts are position 0, always highest, use for role + output format + constraints. For user messages, structure: [METADATA] [CASE_FACTS] [SUPPORTING_CONTEXT] [QUESTION]. The first section gets high attention; the last (question) gets high via recency. For tool descriptions, put the most important constraint first: "Do not refund >$500" should appear in the first line, not buried in paragraph 3.
In agentic loops, this manifests as context windowing. After turn 5, the message list is [user_msg, asst_1, tool_1, ..., asst_5]. Turn 1's user message is at position 1 (high attention) but semantically displaced by 10-15 intermediate messages. Solution: after turn 5, summarize turns 1-4 into PRIOR_WORK: {summary}, drop verbose history, append turn 5. The summary is now position 1 (high attention via recency).

Where you'll see it
Customer support refund policy enforcement
Policy buried in paragraph 5 of context: agent misses it, approves $800 refund. Fix: pin policy to system prompt. Always position 0, never missed. Production: lost-in-middle errors drop from 12% to <0.5%.
Multi-step research with current file at top
Subagent receives 20 files; current file mentioned in middle. Misses 40% of time. Fix: structure task as CURRENT_FILE: {filename}\n\nSUPPORTING_FILES: {list}. Current file is now at top.
Escalation protocol buried in middle
Customer support instructions: "on payment decline, escalate to billing-team@" in paragraph 7 of 15. Agent misses, attempts customer contact, breaches SLA. Fix: extract escalation rules to system prompt or pinned ESCALATION_PROCEDURES block.
Long conversation degradation
Agent loops 10 turns, 15KB message history. Customer ID in turn 1; current question in turn 10. Customer ID forgotten midway through turn 8. Fix: windowing at turn 6. Extract CASE_FACTS: {customer_id, amount, dispute_summary}, drop turns 1-5.
Code examples
from anthropic import Anthropic
client = Anthropic()
def support_with_windowing(case_id: str, facts: dict, current_q: str):
"""Support agent with context windowing to prevent lost-in-the-middle."""
messages = []
turn = 0
max_turns = 15
system = """You are a support agent.
CONSTRAINTS (highest attention zone):
- Refunds >$500 require manager approval (escalate via ESCALATE_TO_MANAGER tool)
- Never refund full amount for "not satisfied"; partial only
- Always verify customer ID in CASE_FACTS before processing
"""
while turn < max_turns:
turn += 1
# Window at turn 5: drop verbose history, keep CASE_FACTS
if turn == 5 and len(messages) > 8:
prior = "Customer verified, order found, attempting refund processing."
messages = [{
"role": "user",
"content": f"""CASE_FACTS:
- Customer ID: {facts['customer_id']}
- Order ID: {facts['order_id']}
- Refund Amount: ${facts['amount']}
- Dispute: {facts['issue']}
- Prior Work: {prior}
Current Question: {current_q}""",
}]
else:
if turn == 1:
content = f"""CASE_FACTS:
- Customer ID: {facts['customer_id']}
- Order ID: {facts['order_id']}
- Refund Amount: ${facts['amount']}
- Dispute: {facts['issue']}
Question: {current_q}"""
else:
content = current_q
messages.append({"role": "user", "content": content})
resp = client.messages.create(
model="claude-opus-4-5", max_tokens=1024, system=system, messages=messages,
)
if resp.stop_reason == "end_turn":
return resp.content[0].text
messages.append({"role": "assistant", "content": resp.content[0].text})
current_q = "Continue processing."
return "Max turns exceeded."Looks right, isn't
Each row pairs a plausible-looking pattern with the failure it actually creates. These are the shapes exam distractors are built from.
Bold or capitalize important facts to make them stand out.
Transformer doesn't parse Markdown or capitalization. **customer_id: 12345** and customer_id: 12345 have same attention weight. Position is what matters.
Add important context at the end for maximum emphasis.
End is high-attention via recency, but only for final question/request. Context at end is treated as supporting detail. System prompt is position 0 for constraints.
Put all context in the system prompt to ensure it's always attended.
System prompts are limited (~2000 tokens effective); they're for role and constraints, not transactional facts. Use CASE_FACTS block in user message for facts.
Lost-in-the-middle is a myth.
Empirically documented (Liu et al. 2023). Replicated across all transformers including Claude. Mid-context accuracy drops 40-50%. Mitigate with structural changes.
If the agent forgets a key fact, increase max_tokens.
Forgetting is attention-weight issue, not token budget. More tokens won't help. Restructure: move fact to top or end.
Side-by-side
| Aspect | Lost-in-the-Middle | System-prompt pinning | Recency bias | Windowing |
|---|---|---|---|---|
| Problem | Mid-context facts overlooked | Constraints not enforced | Old context forgotten | Long history loses facts |
| Manifestation | 40-50% drop at position 50 | Policy violations | Early-turn facts forgotten by turn 8 | Customer ID forgotten mid-loop |
| Fix | Move facts to top/bottom | Move rules to system prompt | Place question at end | Summarize + drop old |
| Cost | Restructure only | ~10% more tokens | Reorder only | Summarization call (extra turn) |
| Effort | Low | Very low | Very low | Medium |
| Effectiveness | 80-90% accuracy recovery | 99%+ enforcement | High recency weighting | 70%+ accuracy recovery |
Decision tree
Have constraints (policies, rules) that must always apply?
Have transactional facts that must survive every turn?
Agent loop running >5 turns?
Answer to the question buried in middle context?
Agent forgetting details despite them being in earlier turns?
Question patterns

52 V2 questions wired to this concept. Tap an answer to check it instantly — you'll see whether it's right and why — then expand the full breakdown for the mental model and all four rationales.
Tap your answer to check it.
Tap your answer to check it.
Tap your answer to check it.
Tap your answer to check it.
Tap your answer to check it.
Tap your answer to check it.
46 additional questions for this concept live in the practice pillar. Take a mock exam ↗
Frequently asked
Does Lost-in-the-Middle affect Claude differently than other models?
What percentage drop in middle?
Fix Lost-in-the-Middle by increasing max_tokens?
Always use a CASE_FACTS block?
When to start windowing?
How to write a good CASE_FACTS block?
- Customer ID: 12345 not "The customer has ID 12345". Scannable; one fact per line.What goes in system prompt vs CASE_FACTS?
Does formatting (bold, capitals) help mid-context facts?
Use a CASE_FACTS in subagent calls?
How long should CASE_FACTS be?
Work this with your AI
Work this concept hands-on with Claude Code, Codex, or claude.ai. Copy a prompt, paste it into your assistant, and practise in tandem. Each one keeps you active (explain it back, get drilled, or build) rather than just reading.
- Drill it like the exam (scenario MCQs)Practice in the exam's scenario-MCQ format with trap awareness.
- Explain it back (Feynman)Build durable, transferable understanding of a concept you can half-state.
- Test me, adapting the difficultyActive recall practice on a concept you think you know.
- Check my prerequisites firstBefore studying a concept that keeps not sticking.
- Find the high-leverage 20%When a domain feels too big and you are short on time.
