On this page
TLDR
Session state is the running message list (and any external persistence) that gives an agentic loop continuity. The exam tests progressive summarization tradeoffs (token savings vs. precision loss) and when checkpointing or external store reads are required. agentic-systems research
What it is
A session is the persistent context window in which an agentic task runs. It holds the message list, accumulated file reads, tool calls, and responses. The mental model: a session is a mutable, bounded space (typically 100K to 200K tokens) where the model builds and refines its understanding. Unlike chat conversations that survive indefinitely, a session is scoped to a task.
Session state management is the architectural constraint that makes or breaks long-running agentic systems. The model reads the entire session history on every turn, so every tool call, every file read, every intermediate result consumes tokens. As the session grows, the model eventually hits stop_reason: "max_tokens" mid-task. The fix is progressive state management: track what to keep (case facts), what to compress (verbose reasoning), and what to page (file contents).
Three layers govern state lifecycle in production. In-session: as work progresses, intermediate outputs can be summarized; critical facts (customer ID, refund amount) stay in a case-facts block. Inter-session: when a task ends, context can persist via the project Instructions panel or working files. Subagent handoff: a subagent runs its own session, then returns only a structured summary, not its full reasoning.
The deepest anti-pattern is progressive summarization without immutable facts. Repeatedly summarizing erases specific transactional details. A refund of $247.83 becomes "around $250". A customer ID becomes "the account mentioned earlier". The fix is structural: extract immutable case facts once at the start, preserve them verbatim through every summarization pass. Summaries compress reasoning, never facts.
How it works
Session state begins with initialization. In Claude Code, a session starts when you create a task; the harness reads project Instructions, loads global preferences, and appends the user's task description. For subagents, initialization is different: the parent agent writes a focused prompt, and the subagent boots with a custom system prompt. Every piece of context the subagent needs must be explicitly passed in the delegating prompt, never assumed.
As the session runs, state accumulates and competes. Each tool call appends a request and a result to the message list. By turn 8, the list contains 7 turns of requests, 7 turns of responses, and all intermediate outputs. Each turn the model reads this entire history. The session's effective capacity shrinks. The model's reasoning quality also degrades, a phenomenon called lost in the middle: key facts buried in a 30-turn conversation get overlooked.
State checkpoints occur at natural boundaries. When a research subagent finishes, it returns a structured summary (findings, sources, obstacles_encountered) instead of its 30-turn conversation. The parent appends the summary, discards the subagent's full context, and continues. For long document-processing tasks that hit max_tokens, the harness should summarize processed chunks into a checkpoint file, reset the session, reload the checkpoint, and continue.
Persistence across sessions happens via project Instructions and working files. If a subagent discovers a workaround (an API needs a special flag), it reports this in obstacles_encountered; the parent decides whether to add it to project Instructions for future tasks. Running notes, glossaries, and context that should survive across tasks live in the project folder, not in the conversation. This is how context carries across sessions without re-explaining the same things each time.

Where you'll see it
Customer support state through 12-turn refund flow
Turn 1 calls get_customer, turn 2 calls lookup_order, turn 3 calls get_refund_policy. By turn 8 the context is half full. Critical facts (customer_id, order_id, refund_amount) are scattered across turns. The fix is to pin a case-facts block at the context start. Every turn, the model reads this block first; the facts never degrade.
Multi-agent research with structured handoffs
A coordinator delegates research_market_size, analyze_competitors, assess_regulatory_risk to three subagents. Each returns a summary with structured sources array. The coordinator can compare sources and resolve contradictions. Without structured provenance, the coordinator either picks majority vote (discarding outliers) or escalates without information.
Long document processing hits max_tokens at chapter 18
A 150-page contract needs entity extraction. By turn 15 the session has read 150 pages. Turn 16 fails with stop_reason: "max_tokens". The harness writes accumulated entities to a checkpoint, resets the session with a fresh instruction "Resume entity extraction. Previously extracted: ...", and continues. Checkpointing prevents restart-from-scratch on long batch tasks.
Cowork projects carry context across tasks
Day 1: synthesize three vendor proposals into an evaluation matrix. Day 2: update the Q2 headcount plan. Tasks 1 and 2 are separate sessions. But the project's Instructions say vendor matrix lives in ./proposals/matrix.xlsx. When the analyst says "compare new budget against approved vendor costs," the model reads instructions and knows where to look. State persists in Instructions and folder, not in the conversation.
Code examples
from anthropic import Anthropic
import json
client = Anthropic()
def run_support_task(customer_msg: str, case_facts: dict):
"""Pin immutable facts in system prompt, never summarize them."""
system_prompt = f"""You are a customer support agent.
CASE FACTS (immutable, update never):
{json.dumps(case_facts, indent=2)}
Always refer to these facts. Do not summarize them. If they change,
explicitly request an update. Do not infer changes from conversation."""
messages = [{"role": "user", "content": customer_msg}]
for turn in range(10):
resp = client.messages.create(
model="claude-opus-4-5",
max_tokens=2048,
system=system_prompt,
messages=messages,
)
if resp.stop_reason == "end_turn":
return {"status": "ok", "turns": turn + 1}
messages.append({"role": "assistant", "content": resp.content})
# ... handle tool_use, append tool_result ...
return {"status": "max_iterations"}
# Initialize with immutable facts
facts = {
"customer_id": "cust_12345",
"order_id": "ord_98765",
"refund_amount_requested": 247.83,
"policy_max_refund": 500.00,
}
run_support_task("Refund order #98765", facts)Looks right, isn't
Each row pairs a plausible-looking pattern with the failure it actually creates. These are the shapes exam distractors are built from.
Summarize the conversation every 5 turns to save tokens.
Summarization erases specific facts (amounts, IDs, dates). Immutable case facts must be extracted once and pinned, never summarized. Summarize the reasoning, not the facts.
Use a larger model or increase max_tokens when a session runs out of context.
Larger models have larger windows but the problem recurs at higher scale. The fix is active state management: checkpoint, trim, page. Token limits enforce discipline; raising them hides the root cause.
Store subagent context in the parent's message list so the parent can see the reasoning.
Subagents return summaries, not full context. The parent's context should stay clean. If you need reasoning, ask the subagent to include obstacles_encountered in its output format.
When a task completes, save the entire conversation as a backup.
The conversation is a tool-littered record. Save structured results (extracted entities, decisions, recommendations) in clean format. The conversation has no lasting value; the decisions do.
Pass full session history to a subagent so it has all context.
Full history bloats subagent context and dilutes focus. Pass a focused prompt with extracted facts and the specific subtask. Subagents work better with less, not more.
Side-by-side
| Pattern | Session Duration | Primary Risk | Fix |
|---|---|---|---|
| Pinned case-facts block | Any length | Facts eroded by summarization | Extract once, pin in system context |
| Subagent summary | Multi-agent | Parent rediscovers same problems | Structured obstacles_encountered field |
| Checkpoint and resume | Long batch tasks | max_tokens mid-task | Save state, reset session, reload |
| Lost-in-the-middle mitigation | 10+ turns | Key facts buried | Anchor at top, trim verbose outputs |
| Project Instructions persistence | Multi-task | Shorthand fails across tasks | Persist in Instructions and folder files |
| Structured escalation block | Any length | Human spends time decoding context | Pre-structured handoff with all facts |
Decision tree
Will the task run more than 8 turns?
Is this single-agent or multi-agent?
obstacles_encountered. Parent stays clean via summaries.Does the work span multiple tasks (Task 1 ends, Task 2 begins)?
Is there an irreversible action (payment, deletion)?
Is the session at risk of hitting max_tokens?
Question patterns

66 V2 questions wired to this concept. Tap an answer to check it instantly — you'll see whether it's right and why — then expand the full breakdown for the mental model and all four rationales.
Tap your answer to check it.
Tap your answer to check it.
Tap your answer to check it.
Tap your answer to check it.
Tap your answer to check it.
Tap your answer to check it.
60 additional questions for this concept live in the practice pillar. Take a mock exam ↗
Frequently asked
What's the difference between summarization and pinning case facts?
customer_id, refund_amount) in original precision. Both can coexist: summarize reasoning, pin the facts.When should subagent context be discarded?
Is max_tokens a session-state problem or model selection?
max_tokens hides the issue. The real problem is active state management: checkpoint early, trim outputs, page data. A well-managed session stays in budget.If a subagent discovers a workaround, where does it go?
obstacles_encountered. The parent reads it and decides whether to add it to project Instructions for future tasks.How does Cowork's Instructions panel differ from system prompt?
What happens to context when a Cowork task completes?
Is progressive summarization always bad?
When should I send an escalation block instead of full transcript?
How does checkpointing differ from saving the message list?
What is lost-in-the-middle and how do I prevent it?
Work this with your AI
Work this concept hands-on with Claude Code, Codex, or claude.ai. Copy a prompt, paste it into your assistant, and practise in tandem. Each one keeps you active (explain it back, get drilled, or build) rather than just reading.
- Drill it like the exam (scenario MCQs)Practice in the exam's scenario-MCQ format with trap awareness.
- Explain it back (Feynman)Build durable, transferable understanding of a concept you can half-state.
- Test me, adapting the difficultyActive recall practice on a concept you think you know.
- Check my prerequisites firstBefore studying a concept that keeps not sticking.
- Find the high-leverage 20%When a domain feels too big and you are short on time.
