On this page
TLDR
stop_reason is the authoritative struct field that says why Claude stopped, either end_turn (finished) or tool_use (wants to call a tool). Parsing natural-language phrases for termination is the most-tested distractor; it's unreliable and fails in production. Checking stop_reason is the only deterministic loop control. messages API enum
What it is
stop_reason is a four-value enum field on every messages.create() response that signals why Claude stopped generating. The values you act on: end_turn (model is done), tool_use (model wants tools executed), max_tokens (output budget exhausted), and stop_sequence (custom stop string matched). It is the authoritative termination signal in any agentic loop.
The contract is structural, not linguistic. A response can contain the words "I'm done" while stop_reason is still tool_use, the text is preamble ("let me verify the customer first"), the tool_use is the real action. Conversely, a response can be silent on completion while stop_reason: "end_turn" signals it. Always read the field; never read the text.
The two production branches you write code for every day are end_turn (exit the loop) and tool_use (execute tools, append results, continue). Everything else is either a graceful partial (max_tokens) or a rare edge case (stop_sequence). The exam drills these four branches relentlessly because text-shape parsing is the #1 bug: developers check content[0].type === "text" to decide "the agent is done," but a typical assistant message is [text, tool_use, tool_use].
`max_tokens` is not an error, it's a normal partial-result signal. When the output budget is exhausted mid-task, Claude emits stop_reason: "max_tokens" with whatever was generated so far. Save the partial work, then either raise max_tokens for a retry or chunk the input. Treating it as a crash loses real progress. The exam tests whether you recognize partial completion as a design decision, not a failure mode.
How it works
Every messages.create() call returns a Message object with a stop_reason field. The value is set by Claude during generation: if the model finishes its response, stop_reason is end_turn; if it requests a tool, the first tool_use block is appended to content and stop_reason becomes tool_use; if output reaches max_tokens, generation stops there. The field is always present and always one of these four values.
The canonical branching pattern: if stop_reason == "end_turn" then exit else if stop_reason == "tool_use" then execute and continue else if stop_reason == "max_tokens" then save partial else if stop_reason == "stop_sequence" then inspect match. Missing a branch is a logic bug, it crashes silently or retries indefinitely. TypeScript's discriminated unions make this safer: the compiler catches missed cases.
Tool execution happens only when stop_reason == "tool_use". In that case, content is an array containing one or more tool_use blocks (with name, id, input JSON). Your harness iterates, executes the tool, appends a tool_result block to the message list. The list grows: [user_msg, assistant_resp_1, tool_results_1, assistant_resp_2, tool_results_2, ...]. Growth is bounded by context size and by the stop_reason signal.
The max_tokens parameter sets the output budget per turn, not globally. If you ask for 2048 tokens and the model runs out at 1800, stop_reason: "max_tokens" arrives. The next call with max_tokens=4096 gets a fresh budget. This distinction matters: `max_tokens` is per-turn, not cumulative. Many developers confuse this and assume a single max_tokens value caps the entire loop. It does not.

Where you'll see it
Chatbot turn termination
User asks for an account balance. Agent calls get_balance. Response: stop_reason='tool_use'. Agent executes, appends result. Next turn: stop_reason='end_turn' with the answer. If code instead checks content[0].type === 'text', it exits at any text presence, even mid-tool-use.
Long-document extraction with token limits
Extracting entities from a 200-page contract. Mid-extraction, stop_reason='max_tokens' arrives. Code must save state, return partial result, and either chunk or raise the limit. Treating max_tokens as success silently truncates output.
Headless CI run
Claude in -p mode emits a single response. CI script reads stop_reason from the JSON output. end_turn → success path; max_tokens → truncated; stop_sequence → matched a custom guard. The script must branch all four outcomes.
Code examples
from anthropic import Anthropic
client = Anthropic()
def handle_response(resp) -> dict:
"""Branch on stop_reason. Never inspect content shape for termination."""
if resp.stop_reason == "end_turn":
# Normal completion. Extract text and exit loop.
text = "".join(b.text for b in resp.content if b.type == "text")
return {"status": "ok", "text": text}
if resp.stop_reason == "tool_use":
# Continue loop: execute tools, append results, resend.
return {"status": "continue", "tool_calls": [
{"id": b.id, "name": b.name, "input": b.input}
for b in resp.content if b.type == "tool_use"
]}
if resp.stop_reason == "max_tokens":
# Token budget exhausted. Return partial; chunk on retry.
partial = "".join(b.text for b in resp.content if b.type == "text")
return {"status": "partial", "text": partial,
"next_action": "chunk_input_or_raise_max_tokens"}
if resp.stop_reason == "stop_sequence":
# A custom stop_sequence matched. Treat as intentional termination.
return {"status": "stopped", "matched_sequence": resp.stop_sequence}
# Unknown stop_reason, log and fail safely.
return {"status": "error", "stop_reason": resp.stop_reason}
Looks right, isn't
Each row pairs a plausible-looking pattern with the failure it actually creates. These are the shapes exam distractors are built from.
If response has no tool_use blocks, the agent is done.
A response can have both text and tool_use blocks. stop_reason is the authoritative field. If stop_reason='tool_use', continue the loop even if text is present.
Treat max_tokens as a failure and abort.
max_tokens is a normal partial-result signal. Save what was returned, then either raise max_tokens or chunk the input. Aborting loses the partial work.
Stop the agent preemptively when token usage approaches the cap.
Preemptive stops lose real work and behave unpredictably. Let the agent finish its turn and read stop_reason, Claude already manages this gracefully.
Streaming responses don't have a stop_reason, so handle them differently.
Streaming responses do carry stop_reason, it arrives in the final `message_delta` event of the SSE stream. The shape is {type: "message_delta", delta: {stop_reason: "end_turn"}}. Code that only inspects content deltas and ignores the closing message_delta will miss the termination signal entirely.
When stop_reason: "refusal" arrives, log it and abort the loop.
`refusal` is a fifth value that lands when Claude declines for safety reasons; it is not interchangeable with `end_turn`. Aborting silently hides a policy event from your audit log. Treat it like its own branch: surface to the user, capture in telemetry, and never retry blindly with the same prompt, you'll burn budget on guaranteed refusals.
Side-by-side
| stop_reason | Meaning | Next action | Production risk |
|---|---|---|---|
| end_turn | Agent finished naturally | Exit loop, present text | Low, success path |
| tool_use | Agent wants to call a tool | Execute, append result, continue | High if loop checks text instead |
| max_tokens | Output token budget hit | Return partial, plan retry/chunk | Medium, surfaces undersized max_tokens |
| stop_sequence | Custom stop string matched | Exit, inspect for intent | Low, only if you set it intentionally |
| refusal | Safety policy declined the response | Surface to user; do not retry blindly | Medium, signals prompt or policy issue |
| pause_turn | Long-running tool (e.g. server tools) paused mid-turn | Resume by re-sending the message list | Low, only with extended/server tools |
Decision tree
Are you building an agentic loop with tools?
Could max_tokens fire on long inputs?
Are you setting a custom stop_sequence parameter?
Are you streaming with stream: true or the SDK's stream helper?
stop_reason from the final message_delta event, not from content_block_delta events. Buffer the full stream and inspect final_message.stop_reason.response.stop_reason directly off the message object.Are you in a regulated domain (medical, legal, finance) where refusals matter?
refusal branch. Log to your audit pipeline; never retry the same prompt automatically; surface a sanitized message to the end user.Question patterns

27 V2 questions wired to this concept. Tap an answer to check it instantly — you'll see whether it's right and why — then expand the full breakdown for the mental model and all four rationales.
Tap your answer to check it.
Tap your answer to check it.
Tap your answer to check it.
Tap your answer to check it.
Tap your answer to check it.
Tap your answer to check it.
21 additional questions for this concept live in the practice pillar. Take a mock exam ↗
Frequently asked
Can `stop_reason` change between streaming chunks and the final response?
message_delta event. Intermediate content_block_delta events have no stop_reason field. If you read it before the stream closes, you'll get null, that's the bug.How do `stop_reason` and `stop_sequence` (the parameter) relate?
stop_sequences is a request parameter (an array of strings). stop_reason is a response field. When one of your sequences matches, stop_reason === "stop_sequence" and response.stop_sequence holds the matched string. Without setting the parameter, you'll never see this value.If `stop_reason: "max_tokens"` arrives mid-tool_use, is the tool call still valid?
} of the input JSON). Truncated tool calls have malformed input, your harness must validate before executing. A partially-streamed tool_use should be discarded; raise max_tokens and re-send.What's the cost difference between `end_turn` and `max_tokens` exits?
cache_control.Should I set `max_tokens` as high as possible to avoid `max_tokens` stop_reason?
max_tokens reserves capacity but doesn't charge for unused tokens, however, higher values increase latency because the model plans a longer generation. Set it to a realistic ceiling for the task (e.g. 2048 for chat, 8192 for extraction), not the absolute max.How do I detect that a loop is stuck because I forgot to append `tool_result`?
stop_reason: "tool_use" repeats with the same `tool_use.id` across turns. The model is waiting for that specific result. Log the id per iteration; if the same id appears twice, your harness skipped the append. Without tool_result, Claude has no way to advance.Does `stop_reason` exist on the Batch API responses?
stop_reason in its result object. Batch jobs that include tool-using messages are unusual (you can't continue a loop async), but the field is still populated for diagnostics.What happens to `stop_reason` when extended thinking is enabled?
thinking block before the response, but stop_reason reflects the final generation state. Thinking blocks don't change termination semantics, they're additional content, not a separate signal.Can a single response have `stop_reason: "end_turn"` AND a `tool_use` block?
tool_use block is present, stop_reason is always tool_use. The mutual exclusion is structural: Claude commits to one or the other per turn. If you see end_turn with a tool block in your code, it's a parsing bug, you're inspecting the wrong response object.How do I test stop_reason branches in unit tests without burning API budget?
Message object is plain JSON: {stop_reason, content, usage, ...}. Construct fixtures for each value (end_turn, tool_use, max_tokens, stop_sequence, refusal) and feed them through your handler. Never use real API calls in tests for control-flow logic, the test isn't validating Claude's behavior, it's validating your branching.Work this with your AI
Work this concept hands-on with Claude Code, Codex, or claude.ai. Copy a prompt, paste it into your assistant, and practise in tandem. Each one keeps you active (explain it back, get drilled, or build) rather than just reading.
- Drill it like the exam (scenario MCQs)Practice in the exam's scenario-MCQ format with trap awareness.
- Explain it back (Feynman)Build durable, transferable understanding of a concept you can half-state.
- Test me, adapting the difficultyActive recall practice on a concept you think you know.
- Check my prerequisites firstBefore studying a concept that keeps not sticking.
- Find the high-leverage 20%When a domain feels too big and you are short on time.
