Template L · Concept · D1 Agentic Architectures

Agentic Loops.

11 min read·10 sections·Tier A

An agentic loop turns one API response into a controlled, multi-turn system. The harness reads stop_reason, dispatches tools, appends observations, and decides whether to continue. The common failure is treating natural-language text as the termination signal.

Official API patternDomain 1Appears in 6 scenarios
Agentic Loops — hero illustration featuring Loop mascot in a warm gallery scene.
Domain D1Agentic Architectures · 27%
On this page
01 · Summary

TLDR

An agentic loop turns one API response into a controlled, multi-turn system. The harness reads stop_reason, dispatches tools, appends observations, and decides whether to continue. The common failure is treating natural-language text as the termination signal.

4
Loop stages
D1
Exam domain
3
Core exit lines
0
Text parsing tolerance
6
Scenario links
02 · Definition

What it is

An agentic loop is a harness pattern that turns a single API call into a controlled, multi-turn system. The mental model: send a messages.create() request, read the response, branch on stop_reason, and either exit or append the tool result and call again. The user perceives one fluid agent conversation. The engineering surface is a plain while-loop, a growing message list, and a single termination check, nothing more, nothing less.

What makes the loop agentic rather than ordinary is that the model decides the next step, not the developer. Each turn, Claude inspects the message history and the available tools, then either returns a final response (stop_reason: "end_turn") or requests one or more tool executions (stop_reason: "tool_use"). Your harness executes the tools, appends tool_result blocks, and loops. The model's autonomy is bounded by the tools you provided, the budget you allotted (max_tokens, max_iterations), and the stop conditions you respect.

The contract is structural, not linguistic. Claude communicates termination through the stop_reason field, a small enum (end_turn, tool_use, max_tokens, stop_sequence). It does not communicate termination through the natural language of the response. A response can contain the words "I'm done" while stop_reason is still tool_use. A response can be silent on completion while stop_reason is already end_turn. Always read the field, never the text.

Production agentic loops fail in three predictable ways: text-shape parsing (treating the presence or absence of text as a completion signal), iteration caps as primary control (using max_iterations = 10 instead of fixing the root cause of a runaway loop), and missing tool_result appends (forgetting to push the tool execution result back into the message list, which makes Claude re-request the same tool indefinitely). Each is a one-line bug, and each is the most-tested distractor pattern on Domain 1 of the exam.

03 · Mechanics

How it works

The loop runs in four conceptual stages per iteration. First, Think: Claude reads the cumulative message list, evaluates the request against available tools, and forms a plan. Second, Act: if a tool call is needed, Claude emits a tool_use block with a name and structured input. Third, Observe: your harness executes the tool, captures the result (or error), and appends a tool_result block to the message list. Fourth, Decide: Claude inspects the new history and either continues calling tools or returns a final answer. The stop_reason field surfaces the decision.

The message list grows monotonically. Every assistant response is appended; every tool execution result is appended. By turn 8, the list contains the full conversation: original user message, 7 assistant responses, 7 batches of tool results. This is what makes loops context-bound, you can't run forever because each turn enlarges the prompt, and eventually you hit max_tokens or context limits. Plan the budget before you start the loop, not when it's already three turns in.

Tool execution is your responsibility, not the model's. The tool_use block tells you what Claude wants to call (name, input JSON); your harness wraps the actual function call in try/except, validates inputs, returns either the result or a structured error message. Claude does not see exceptions, it sees whatever string you place in tool_result.content. If you swallow errors silently, Claude will retry the same broken tool call. Always return structured error messages so Claude can recover.

Termination is a single field check, repeated per iteration. end_turn means the model is done; you exit the loop and present the final text. tool_use means the model wants more tools; you execute, append, continue. max_tokens means the model exhausted its output budget mid-task; you save the partial result and decide whether to chunk or raise the limit. stop_sequence means a custom stop string matched (only relevant if you set one). Anything else is an error worth logging, but in practice, you'll see only these four values.

Agentic Loops mechanics, painterly diagram featuring Loop mascot.
04 · In production

Where you'll see it

Customer support refund flow

A support agent receives a refund request. Turn 1 calls verify_customer. Turn 2 calls lookup_order. Turn 3 calls process_refund with policy hooks attached. Each turn the harness inspects stop_reason. The most-frequent production bug is checking the response text for the word "done" instead, Claude often says "I'm processing your refund" (text) while still emitting a tool_use block (the real next step), so the loop exits early and the refund never executes. The fix is one line: branch on stop_reason, not on content[0].type.

Multi-agent research orchestration

A coordinator delegates papers, regulations, and competitor analysis to three research subagents in parallel. The coordinator's loop polls each subagent's stop_reason. When a subagent returns end_turn, the coordinator collects the structured result; when tool_use, it forwards the call to the right tool registry. The trap here is missing tool_result appends between subagent turns, the coordinator forgets to push the result back into the subagent's message list, the subagent re-requests the same tool, and the loop runs until max_iterations saves it. The cap masks the real bug.

Headless CI code-review bot

GitHub Actions invokes Claude with claude -p on a PR diff. The agent calls Read, Grep, and Bash repeatedly to inspect changed files, run tests, and produce a structured review JSON. On large PRs, stop_reason: "max_tokens" arrives mid-review, the bot must emit a partial JSON summary with verdict: "hold" and a truncated: true flag, then exit cleanly. Production fails when the bot treats text-presence as success: it emits a JSON with empty findings: [] and a stale verdict: "approved" field, silently green-lighting a PR it never finished reading.

Long-document entity extraction

A 200-page contract gets entity-extracted by a single agent loop. Each turn calls mark_entity on a document chunk; the harness accumulates marked spans. Around chunk 18, stop_reason: "max_tokens" arrives because the assistant message list now contains 17 turns of tool_use + tool_result and there's no room for a 19th. The fix is context windowing: at every checkpoint, summarize earlier turns into a single facts block, drop the verbose history, and continue. Without windowing the loop dies at the same point in every document, regardless of model size.

Operations runbook executor

An on-call assistant runs a deployment runbook against a Kubernetes cluster. Each turn calls one of kubectl_describe, read_logs, restart_pod, or verify_health. The loop halts on stop_reason: "end_turn" after the assistant confirms the service is healthy. The common production bug is a stale tool_result append: the runbook calls restart_pod, the harness restarts, but the harness forgets to push the result back. The model then re-issues the same restart on the next turn and bounces the same pod three times before a human intervenes.

Conversational data analyst

A finance team asks an analytics assistant questions in natural language. The loop branches: simple aggregates resolve in two turns (one query_warehouse call plus a summary); cohort and funnel analyses span eight to twelve turns with intermediate build_view and validate_metric calls. The assistant terminates only when stop_reason: "end_turn" arrives with a chart spec attached. Iteration caps would silently truncate the cohort analyses; using stop_reason lets simple and complex questions share the same loop.

05 · Implementation

Code examples

Canonical agentic loop with stop_reason control
from anthropic import Anthropic

client = Anthropic()

def run_agent(user_msg: str, tools: list, system: str, max_iter: int = 15):
    messages = [{"role": "user", "content": user_msg}]

    for i in range(max_iter):
        resp = client.messages.create(
            model="claude-opus-4-5",
            max_tokens=2048,
            system=system,
            tools=tools,
            messages=messages,
        )

        # CRITICAL: branch on stop_reason. Never on text content.
        if resp.stop_reason == "end_turn":
            text = "".join(b.text for b in resp.content if b.type == "text")
            return {"status": "ok", "text": text, "iterations": i + 1}

        if resp.stop_reason == "tool_use":
            messages.append({"role": "assistant", "content": resp.content})
            tool_results = []
            for block in resp.content:
                if block.type == "tool_use":
                    try:
                        result = execute_tool(block.name, block.input)
                    except Exception as e:
                        result = {"error": str(e)}
                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": json.dumps(result),
                    })
            messages.append({"role": "user", "content": tool_results})
            continue

        if resp.stop_reason == "max_tokens":
            return {"status": "partial", "iterations": i + 1,
                    "reason": "token_limit"}

        return {"status": "error", "stop_reason": resp.stop_reason}

    return {"status": "error", "reason": "iteration_cap_exceeded"}
Branch on stop_reason. Wrap tool execution in try/except, Claude needs structured errors to recover.
06 · Distractor patterns

Looks right, isn't

Each row pairs a plausible-looking pattern with the failure it actually creates. These are the shapes exam distractors are built from.

Looks right

Check if the response text contains "done" or "complete" to know when to stop.

Actually wrong

Natural-language phrases vary across generations and across models. Claude may say "I'm done now" while still emitting a tool_use block in the same response. Only stop_reason is authoritative, the text is the user-facing rendering, not the control signal.

Looks right

Set max_iterations = 10 to prevent runaway loops.

Actually wrong

Caps mask the real failure (missing tool_result append, ambiguous tool description, two tools that look interchangeable). A legitimate refund flow may need 12 iterations on a complex case. The cap converts a bug into a silent partial success. Fix the cause; don't bound the symptom.

Looks right

If response.content[0].type === "text", the agent is finished.

Actually wrong

Claude returns text and tool_use blocks in the same response. A typical assistant turn looks like [text, tool_use, tool_use], the leading text is preamble ("I'll start by checking the order"), not termination. Only stop_reason === "end_turn" is the exit signal.

Looks right

When stop_reason: "max_tokens" arrives, the loop crashed and we should retry from scratch.

Actually wrong

max_tokens is a normal partial-result signal, the model hit the output budget mid-task. Save the partial work, raise max_tokens for the retry, or chunk the input. Restarting from scratch loses real progress and burns budget twice.

Looks right

Wrap the entire tool-execution block in try/except and return a generic "tool failed" string to Claude.

Actually wrong

Generic errors strip the information Claude needs to recover. Return structured errors like {"error": "customer_id not found", "hint": "verify format starts with cus_"}. Claude reads them, adjusts arguments, and retries successfully, that's the whole point of the loop.

07 · Compare

Side-by-side

AspectAgentic LoopSingle-Turn CallSubagent-DelegatedWorkflow Orchestrator
Termination signalstop_reason='end_turn'Single response, exitSubagent loop; coordinator pollsStep-graph completion
Tool executionMulti-turn, results fed backNo toolsSubagent owns tools; returns summaryCoordinator routes per-step
Context fillGrows monotonically per turnNoneMain stays clean; subagent context discardedPer-step contexts; reset between
Failure modeText-shape parsing, missing tool_resultn/aCoordinator assumes inheritanceStep config drift over time
Best forMulti-step tasks (3–15 turns)Simple Q&A, classificationVerbose work, parallel researchFixed multi-step pipelines
Cost shapeLinear in turns × tokensConstant per callCoordinator + N subagent budgetsPer-step + orchestration overhead
08 · When to use

Decision tree

01

Does the task need multiple tool calls to reach a result?

YesUse an agentic loop. Branch on stop_reason; never check text shape.
NoUse a single-turn call. messages.create() once, extract response.
02

Is the work verbose or independent (e.g., reading 50 files)?

YesDelegate to a subagent. Pass explicit context as a task string; coordinator stays clean.
NoRun inline. Coordinator handles the loop directly.
03

Are you hitting stop_reason: "max_tokens" mid-loop?

YesImplement context windowing, summarize old turns into a facts block, drop verbose history, continue.
NoToken budget is healthy. Carry on.
04

Are tools throwing exceptions inside the harness?

YesWrap each tool in try/except. Return structured errors as tool_result so Claude can recover.
NoDon't add error handling for failures that can't happen, trust the framework.
05

Is the same tool being called repeatedly with no progress?

YesCheck that you're appending the tool_result block. Without it, Claude re-requests indefinitely.
NoLoop is converging. Let it run.
09 · On the exam

Question patterns

Agentic Loops exam trap — painterly cautionary scene featuring Loop mascot.
Your agentic loop keeps running after Claude has clearly finished its task. Which control was likely missed?
The harness was checking content shape (text presence) instead of stop_reason. The most-tested distractor is "add a max_iterations cap". The right answer is "branch the loop on stop_reason == 'end_turn'", structural field, never natural language.
A subagent returns wrong facts about a customer the coordinator clearly mentioned earlier. Why?
The coordinator forgot that subagents do not inherit conversation history. The distractor answer says "the subagent's max_tokens was too low". The actual fix is to pass every needed fact in the subagent's task string explicitly.
An agent loop hits a budget ceiling and you keep raising `max_iterations` to make it pass. What is the real problem?
max_iterations is a safety cap, not the primary exit condition. Repeatedly raising it masks the underlying bug, usually a missing tool_result append, ambiguous tool descriptions, or two tools the model alternates between.
Your refund agent emits the words "I'm processing your refund now" but then makes another tool call. The harness exits early. Why?
The harness is parsing the response text for completion phrases. Claude can return text and a tool_use block in the same response; only stop_reason is authoritative. The fix is to ignore the text and read stop_reason: "tool_use" to continue the loop.
Long-document extraction stalls at chunk 18 every single time, regardless of model. What is happening?
By turn 17 the message list contains 17 turns of tool_use plus tool_result blocks; turn 18 hits stop_reason: "max_tokens". The fix is context windowing: summarize old turns into a facts block, drop the verbose history, continue. Raising max_tokens does not fix it past a certain point.
Your code-review CI bot returns valid JSON with empty `findings: []` even on PRs that clearly have issues. Why?
The bot is treating the presence of any text response as success. It hit stop_reason: "max_tokens" mid-review, but the code did not branch on that signal, so it serialized whatever was in the buffer, which happened to be empty. The fix is to handle max_tokens as a partial-result signal and emit truncated: true.
A tool throws an exception inside your harness; on the next iteration Claude requests the same tool again with the same arguments. Why?
Either the exception escaped to the loop and was caught silently, or the harness wrote a generic "tool failed" to tool_result. Claude saw no actionable error and re-requested. The fix is to catch the exception and return a structured error with a hint, e.g. {"error": "customer_not_found", "hint": "verify cus_ prefix"}.
Two of your tools have similar names (`fetch_data` and `get_data`). The model picks the wrong one 30% of the time. Best first fix?
Rewrite the descriptions, not the model. Each description should be 4 lines: what it does, when to use it, edge cases, and ordering boundaries with other tools. Bigger models route slightly better, but vague descriptions are the root cause; fine-tuning is a last resort.
10 · FAQ

Frequently asked

How is an agentic loop different from a normal API call?
A normal call is one prompt → one response. An agentic loop is a while block that re-sends the cumulative message list after every tool result. State accumulates across turns; the model decides each next step until stop_reason: "end_turn".
Where do hooks (PreToolUse, PostToolUse) fit relative to the loop?
Hooks run inside the tool_use branch, before or after your harness executes the tool. They validate arguments, enforce policy, or log audit data. They do not replace stop_reason-based termination; that's the loop's responsibility.
What's the difference between `stop_reason` and `stop_sequence`?
stop_reason is the field that says *why* the model stopped (end_turn, tool_use, max_tokens, stop_sequence). stop_sequence is the matched string when stop_reason === "stop_sequence". You only see this if you passed a stop_sequences parameter in the request.
Should I always call `tool_use` and never use `tool_choice: "any"` or forced?
Use tool_choice: "auto" (the default) for open-ended loops. Use forced ({type: "tool", name: X}) only for mandatory first steps like identity verification. "any" and forced are incompatible with extended_thinking, see /concepts/tool-choice.
Can a single response have both text and tool_use blocks?
Yes. A typical assistant message is [text, tool_use] or [text, tool_use, tool_use]. The text is preamble ("Let me check that for you"); the tool_use is the action. Always check `stop_reason`, not block presence.
What's the right value for max_iterations?
It depends on the task. 5 is too tight for multi-step refunds. 50 is paranoid. Sensible default: 15 for support flows, 30 for research. The cap is a safety net, not the primary control, use stop_reason for that.
How do I debug a loop that runs forever?
Log every iteration's stop_reason and the names of any tool_use blocks. The two most common causes are: (1) you're not appending tool_result to the message list, so the model keeps requesting the same tool; (2) two tools have overlapping descriptions and the model alternates between them.
Should I retry on `stop_reason: "max_tokens"`?
Don't retry blindly, you'll burn budget. Save the partial work, raise max_tokens if the input was just slightly over, or chunk the input if it's structurally too large. Treat it as normal partial completion, not failure.
How do agentic loops interact with prompt caching?
The system prompt and tool definitions can be cached, they don't change between turns. Mark them as cache_control: {type: "ephemeral"} and pay roughly 90% less for that section on every turn. The growing message list cannot be cached because each turn changes it.
What if my agent needs to keep state across user sessions?
Loops handle in-session state. Cross-session state (e.g., conversation history that survives a page refresh) lives in your database; you reload it into messages at session start. The loop doesn't manage persistence, that's your application's job.