On this page
TLDR
tool_choice is the parameter that controls whether Claude can decide to use a tool ("auto"), must call any tool ("any"), or must call a specific tool by name. Use specific-tool forcing for mandatory-first-step operations like identity verification before refunds. messages API tool_choice
What it is
tool_choice is a request-level parameter that controls whether Claude must, may, or must not call a tool. Three modes: {type: "auto"} (default, model decides), {type: "any"} (model must call any tool), and {type: "tool", name: "X"} (forced to a specific tool). It bridges between probabilistic (the model decides) and deterministic (you decide) tool execution. The wrong choice leaves 30% of your output as unparseable text when determinism is required.
Without override, Claude reads the message history, evaluates available tools, and decides whether a tool is needed at all. In refund flows this is dangerous: 30% of the time Claude asks clarifying questions instead of calling verify_customer. With {type: "tool", name: "verify_customer"} that 30% becomes 0%. The tradeoff is inflexibility: forced tool_choice removes legitimate text-only responses. Use forced sparingly, only for mandatory architectural checkpoints.
The incompatibility with extended thinking is architectural, not cosmetic. When thinking: {type: "enabled"} is active, tool_choice: "any" and forced both return 400 validation errors. You must fall back to tool_choice: "auto". Reasoning-heavy tasks cannot guarantee tool calls; the model reasons, then decides whether to act. The fix is reframing: use extended thinking for analysis, not for deterministic control of execution.
Production fails when tool_choice is mistaken for a safety valve instead of a contract enforcer. tool_choice: "any" does not make your loop robust if tool descriptions are vague, the model still misroutes. It ensures a tool is called, not that the right tool is called. Combining unclear descriptions + forced tool_choice = guaranteed wrong action. Fix descriptions first, then use tool_choice to enforce when, not what.
How it works
When you call messages.create() with tool_choice, the request is validated before the forward pass. If thinking is enabled and tool_choice is anything other than auto, the API returns a 400 error immediately, no model inference happens. This check happens before token accounting, so invalid combinations fail instantly and cheaply. Valid combinations proceed to the transformer.
Inside the forward pass, the model sees three elements: the system prompt, the messages, and your tool_choice specification. With auto, Claude generates and samples a stop_reason: either end_turn (text is enough) or tool_use (tools needed). With any, the model is nudged to produce a tool_use block, sampling weights shift to favor tool calls over text termination. With forced, tool generation is implicit; sampling never considers end_turn.
The stop_reason is independent of tool_choice. A forced tool call still returns stop_reason: "tool_use". The tool's name is guaranteed to be X; the input might be empty or malformed, but the structure is assured. This is why forced tool_choice is safe for mandatory architecture gates: it guarantees the attempt, not the validity. Argument validation is still your harness's job.
Tool execution remains your responsibility. Even with tool_choice: "any", a tool call might have invalid input (missing required fields, wrong types). Your harness catches it, returns a structured error, and lets Claude retry. The difference is confidence: with auto, you code for text-or-tool branches and handle tool absence; with any, you code only for tool branches. Wrong inputs are handled identically in both.

Where you'll see it
Mandatory verification before refunds
Refund workflow forces verify_customer as the first call via tool_choice = {type:'tool', name:'verify_customer'}. Removes probabilistic skipping. Combined with PostToolUse policy hooks for amount limits, prompt-only enforcement of the same policy lands ~70% reliable in production.
Guaranteed structured extraction from unknown documents
Document type may be invoice / receipt / contract. Use tool_choice='any' with three extraction tools (one per type). Model picks the right one if descriptions are clear. 'auto' would return free text 30% of the time.
Subagent must take action
Research subagent must call at least one tool (search / load / query). Use tool_choice='any' so empty-text returns become impossible. Subagent always emits a tool call before returning.
Code examples
from anthropic import Anthropic
client = Anthropic()
# Mode 1: 'auto' (default), model picks tool OR returns text
resp_auto = client.messages.create(
model="claude-opus-4-5", max_tokens=512, tools=tools,
tool_choice={"type": "auto"},
messages=[{"role": "user", "content": "Help me with a refund."}],
)
# 30% chance: text only ("Sure, can you tell me your order number?")
# 70% chance: tool_use block with verify_customer or lookup_order
# Mode 2: 'any', model MUST call a tool, picks which one
resp_any = client.messages.create(
model="claude-opus-4-5", max_tokens=512, tools=tools,
tool_choice={"type": "any"},
messages=[{"role": "user", "content": "Help me with a refund."}],
)
# 100% chance: tool_use block. Model picks (likely verify_customer first).
# Mode 3: forced, model MUST call this specific tool
resp_forced = client.messages.create(
model="claude-opus-4-5", max_tokens=512, tools=tools,
tool_choice={"type": "tool", "name": "verify_customer"},
messages=[{"role": "user", "content": "Help me with a refund."}],
)
# 100% chance: tool_use block calling verify_customer.
Looks right, isn't
Each row pairs a plausible-looking pattern with the failure it actually creates. These are the shapes exam distractors are built from.
Use 'any' to guarantee deterministic output.
'any' forces a tool call but the model still picks which one. If two tool descriptions overlap, the wrong one fires. For true determinism, use forced ({type:'tool', name:X}).
Forced tool_choice removes the need for hooks.
Forced ensures the tool is called, it does NOT validate inputs. Add PreToolUse hooks for argument validation (e.g., refund amount within policy).
Use 'any' or forced when extended_thinking is on for safer behavior.
Both modes fail validation when thinking is enabled. Only 'auto' is compatible. If you need both thinking and a guaranteed tool call, drop thinking or accept text fallback.
Set tool_choice: "any" once at session start and Claude will keep calling tools every turn.
`tool_choice` is per-request, not per-session. Each messages.create() call evaluates it independently. If turn 5 omits tool_choice, the default auto returns and Claude may emit text. Pin it on every call that needs the constraint, or wrap the SDK in a helper that injects it automatically.
Setting disable_parallel_tool_use: true is unrelated to tool_choice.
It's actually nested inside the tool_choice object: {type: "auto", disable_parallel_tool_use: true}. The flag forces sequential tool calls, useful when downstream side effects must happen in order. Most exam-tested gotcha: developers expect a top-level parameter, never find it, and ship buggy parallel writes.
Side-by-side
| Mode | Behavior | Use case | Extended thinking |
|---|---|---|---|
| auto (default) | Model picks: tool OR text | Open-ended assistance | Compatible |
| any | Must call a tool; model picks which | Subagent must act; structured-only flows | Incompatible |
| {type:'tool', name:X} | Must call this exact tool | Mandatory first steps (verify before refund) | Incompatible |
| none | Tools available but model returns text | Rare: only allow tools as fallback | Compatible |
| auto + disable_parallel_tool_use | Picks 0 or 1 tool, never multiple in parallel | Sequential side-effect tools (writes, payments) | Compatible |
| any + disable_parallel_tool_use | Must call exactly one tool, no parallelism | Mandatory single action; no fan-out | Incompatible |
Decision tree
Is a SPECIFIC tool mandatory (verify-before-refund, identity gate)?
Must the model take SOME tool action (any tool)?
Are you using extended_thinking?
Do you need to prevent Claude from calling multiple tools in one turn?
disable_parallel_tool_use: true to the tool_choice object. Critical when tool results have ordered side effects (payments, writes, sends).Are you in an agentic loop where the FIRST turn must be a tool but later turns can return text?
Question patterns

31 V2 questions wired to this concept. Tap an answer to check it instantly — you'll see whether it's right and why — then expand the full breakdown for the mental model and all four rationales.
Tap your answer to check it.
fetch_user and get_user. The agent alternates between them. Fix?Tap your answer to check it.
tool_choice: any to guarantee structured output. The agent still returns empty results. Why?Tap your answer to check it.
Tap your answer to check it.
tool_choice: any. The API returns 400. Why?Tap your answer to check it.
tool_choice: any to guarantee a tool call. The agent calls lookup_order instead of verify_customer first. Fix?Tap your answer to check it.
25 additional questions for this concept live in the practice pillar. Take a mock exam ↗
Frequently asked
What's the latency cost of `tool_choice: "any"` vs `"auto"`?
"any" removes the model's option to short-circuit with text, so you almost always pay for one extra round-trip (the tool execution + result append). On simple queries that "auto" would answer in one turn, "any" doubles your latency.Can I dynamically change `tool_choice` based on conversation state?
"auto" once verified, switch to "any" for action-taking phases. Wrap the SDK call in a state machine that sets tool_choice per turn based on a flag like case.verified.Does `tool_choice: "none"` actually exist?
{type: "none"} makes tools available but tells the model not to call them. Used rarely, mostly when you want the model to discuss tools without invoking them (e.g. a help-text turn explaining what tools the agent has). For production, tool_choice simply absent is more idiomatic.If forced tool_choice + extended thinking errors out, can I work around it?
"auto" to plan, second call uses forced without thinking to execute; (2) drop thinking and use a stronger system prompt to compensate. The exam answer is: there's no flag to override, the incompatibility is structural.Does `tool_choice` affect prompt caching?
tool_choice parameter is part of the request body but not part of the cached prefix. You can vary it freely across calls and still hit the same cached tools and system prefix.What happens if I set `tool_choice: {type: "tool", name: "X"}` but tool X isn't in the `tools` array?
Why does `"any"` sometimes still pick the same tool repeatedly even with multiple tools available?
tool_choice. If two descriptions overlap, the model has a slight bias toward the first matching one. Audit overlap; the fix is description rewriting, not adding more tools.Can I use forced tool_choice with the Batch API?
tool_choice. Useful for bulk extraction pipelines: 1000 documents, all forced to the same extract_fields tool. The 50% batch discount applies, structured-extraction batches are one of the cheapest LLM patterns.How do I debug a forced tool_choice that returns malformed input?
tool_use.input is wrong, fix two things: (1) tighten the JSON Schema with enum, format, and explicit required; (2) rewrite the description to spell out input rules ("customer_id MUST start with cus_"). Then add a PreToolUse validator hook as the safety net.Is there a way to force a tool only when certain text appears in the user message?
tool_choice, you'd compute that in your application code. Inspect the user message, set tool_choice accordingly, then send. The API has no conditional `tool_choice`, all decision logic lives in your harness, which is the right separation of concerns.Work this with your AI
Work this concept hands-on with Claude Code, Codex, or claude.ai. Copy a prompt, paste it into your assistant, and practise in tandem. Each one keeps you active (explain it back, get drilled, or build) rather than just reading.
- Drill it like the exam (scenario MCQs)Practice in the exam's scenario-MCQ format with trap awareness.
- Explain it back (Feynman)Build durable, transferable understanding of a concept you can half-state.
- Test me, adapting the difficultyActive recall practice on a concept you think you know.
- Check my prerequisites firstBefore studying a concept that keeps not sticking.
- Find the high-leverage 20%When a domain feels too big and you are short on time.
