On this page
TLDR
System prompts establish role, constraints, format, and tool guidance. Anatomy: role, capability boundaries, style/format rules, tool guidance, examples. messages API system field
What it is
A system prompt is the foundational instruction that shapes Claude's entire behavior for a conversation. Sent with every messages.create() call, it lives at the top of the message list and persists across all turns. Unlike user messages (ephemeral within a turn), the system prompt anchors the role, boundaries, format, tool guidance, and example patterns for every response. It is the charter document that governs what Claude will and won't do.
The anatomy of a production system prompt has five load-bearing sections: role definition (who Claude is, what it's accountable for), task boundaries (what's in scope, what's forbidden), output format (response structure, JSON schemas), tool guidance (when to use which tools, ordering), and example patterns (2-3 concrete before/after pairs). A vague "be helpful" fails at scale; a precise charter reduces hallucination, misrouting, and off-policy behavior by 30-60% depending on stakes.
The contract is structural, not linguistic. A system prompt is not persuasion ("please be thoughtful") or hope ("avoid errors"). It is a policy document the model reads and uses to allocate attention. Every phrase is load-bearing. "Do not refund above $500" is ignored without a hook; if amount > 500: deny in tool code is enforced 100%. Deterministic enforcement beats prompt guidance for high-stakes decisions.
Production failures cluster around two gaps: role-boundary confusion ("be conservative" without specifying what that means) and missing example patterns (the model guesses style from tone). The exam drills both. A common distractor: "upgrade the model to fix policy compliance." The real fix is a system prompt that embeds policy into tool descriptions, not as linguistic guidance.
How it works
Every messages.create() call includes a system parameter. Claude reads it first, before the message list. It shapes internal planning: what goals to optimize, which tools are available, what to do with ambiguity. The system prompt is stateless across turns: each turn re-reads it, so updates are immediately visible. Unlike user context that accumulates, the system prompt is a constant reset.
The five-section anatomy is signal compression. Role tells Claude its job ("You are a refund agent..."). Boundaries enumerate allowed/forbidden actions. Format specifies the shape (JSON with {status, amount}). Tool guidance is the when-to-use logic ("Always call verify_customer first"). Examples show before/after pairs: customer requests $600 refund → worked example showing the reasoning.
The system prompt interacts with tool descriptions to form a two-layer enforcement model. System prompt is macro policy (roles, broad guardrails). Tool descriptions are micro policy (when to call X, what inputs are valid). Without both, enforcement is incomplete. A system prompt saying "always verify first" plus a tool description saying "verify_customer: Use first" is redundant but safe. Vague tool descriptions create misrouting.
Caching the system prompt is the highest-leverage optimization. The system prompt is read-only across turns, so mark with cache_control: {type: "ephemeral"}. Cache persists 5 minutes, every subsequent call pays ~90% less. For a 1000-token system prompt called 10 times, save 9000 tokens. Caching is free and automatic; just add the annotation.

Where you'll see it
Customer support refund agent
Five-section system prompt: role ("refund policy enforcer"), boundaries ("never >$500 lifetime"), format (JSON), tool guidance ("verify → lookup → process"), examples ($600 denied because lifetime $300 + $600 = $900). Agent respects $500 cap 99%+ of time. Without system prompt, model guesses.
Multi-turn research synthesis with citation discipline
Coordinator's system prompt defines: role (orchestrator), boundaries (cite sources; never aggregate without attribution), format (JSON array of findings with provenance), tool guidance (subagents do NOT inherit history), examples. Re-read every turn, citation discipline holds across 6-turn conversation.
Structured data extraction with validation
Invoice pipeline's system prompt covers role (extract without hallucination), boundaries (return null if absent; never invent), format (JSON schema), tool guidance (use mark_entity per field), examples (3 invoices showing correct extraction). Cached, so 1000 extraction requests pay once for the prompt.
Code review bot in CI/CD
GitHub Actions invokes Claude with claude -p. System prompt defines role (security + perf reviewer), boundaries (only flag with test cases; no speculation), format (JSON {verdict, critical_issues, minor_issues}), tool guidance (Read + Grep, no Edit), examples (2 PR diffs). CI parses JSON deterministically.
Code examples
from anthropic import Anthropic
client = Anthropic()
SYSTEM_PROMPT = """# Refund Agent Charter
## 1. Role Definition
You are a customer support agent for refund processing. Accountable for policy compliance and customer experience.
## 2. Task Boundaries
- ALWAYS call verify_customer first
- Refunds <= $500 lifetime: ALLOWED
- Refunds > $500 lifetime: DENIED, escalate to human
- Never negotiate the $500 policy
## 3. Output Format
JSON: { status: "approved"|"denied"|"escalated", amount, reason, customer_message }
## 4. Tool Guidance
Order: verify_customer -> lookup_order -> process_refund
Skip verification = escalate.
## 5. Example Patterns
Example 1 (approve): $247.83 + prior $150 = $397.83 < $500 -> approve
Example 2 (deny): $600 + prior $300 = $900 > $500 -> deny + escalate
Example 3 (explicit): "speak to manager" -> escalate immediately
"""
def run_agent(user_msg: str, tools: list):
messages = [{"role": "user", "content": user_msg}]
for i in range(10):
resp = client.messages.create(
model="claude-opus-4-5",
max_tokens=1024,
# Cache the system prompt: ~90% savings on subsequent calls
system=[{
"type": "text",
"text": SYSTEM_PROMPT,
"cache_control": {"type": "ephemeral"},
}],
tools=tools,
messages=messages,
)
if resp.stop_reason == "end_turn":
return resp.content[0].text
# ... handle tool_use, append tool_result ...
return "max_iterations"Looks right, isn't
Each row pairs a plausible-looking pattern with the failure it actually creates. These are the shapes exam distractors are built from.
Write the policy as natural-language guidance: 'be conservative with refunds.'
Natural-language guidance is advice the model ignores under load. Only deterministic tool code (checking $500 in process_refund) enforces 100%.
Update the system prompt mid-conversation to clarify ambiguity.
Each call re-reads the system prompt; updating is possible but creates inconsistency. Define the full charter upfront.
System prompt can replace tool governance; don't burden tool descriptions.
Two-layer enforcement: system prompt = macro policy, tool descriptions = micro policy. Omitting either creates gaps.
Longer system prompts give Claude more context.
5,000+ token prompts cause attention dilution. Tight 500-1500 token charters outperform verbose. Every section must be load-bearing.
Skip example patterns; they're noise.
Example patterns are the highest-ROI section. A before/after pair reduces misclassification by 30-60%. 2-3 examples is right.
Side-by-side
| Aspect | System Prompt | Tool Description | User Instruction | Cached vs Fresh |
|---|---|---|---|---|
| Scope | Entire conversation | Per-tool when/how | Single-turn | Cached: constant |
| Persistence | Constant across turns | Constant | Turn-specific | Cached: 5-min TTL |
| Enforcement | Policy guidance (medium) | Structural (high) | Linguistic hint (low) | Cached: ~90% cheaper |
| Update timing | Next call | Next call | Immediate | Cached: post-expiry |
| Best for | Role, boundaries, examples | Routing, validation | Clarification | High-volume loops |
| Token cost | 100% fresh, 10% cached | Not cached | Per turn | Cached: 10 vs 100 |
Decision tree
Building an agentic loop running 5+ turns?
cache_control: ephemeral. Save 90%.Is there a policy that must be enforced 100% of the time?
Struggling with the agent misrouting?
Multi-step task with mandatory ordering?
Need output in specific format?
Question patterns

56 V2 questions wired to this concept. Tap an answer to check it instantly — you'll see whether it's right and why — then expand the full breakdown for the mental model and all four rationales.
Tap your answer to check it.
Tap your answer to check it.
Tap your answer to check it.
Tap your answer to check it.
tool_choice: any to guarantee a tool call. The agent calls lookup_order instead of verify_customer first. Fix?Tap your answer to check it.
tool_choice calls process_refund with empty input. What went wrong?Tap your answer to check it.
50 additional questions for this concept live in the practice pillar. Take a mock exam ↗
Frequently asked
Difference between system prompt and user message?
How long should a system prompt be?
Can I have multiple system prompts in the same conversation?
messages.create() call. To shift roles, append a user message or spawn a subagent.How does caching work?
system with cache_control: ephemeral. Cached 5 minutes. Subsequent calls pay ~90% less. After 5 min or new conversation, expires.System prompts or tool descriptions for policy enforcement?
What if I update the system prompt between calls?
Can system prompts replace few-shot examples in tool descriptions?
Do I re-send the system prompt every call?
cache_control: ephemeral, you only pay once per 5-minute window.What if system prompt conflicts with tool descriptions?
Use system prompt instead of agentic loop?
Work this with your AI
Work this concept hands-on with Claude Code, Codex, or claude.ai. Copy a prompt, paste it into your assistant, and practise in tandem. Each one keeps you active (explain it back, get drilled, or build) rather than just reading.
- Drill it like the exam (scenario MCQs)Practice in the exam's scenario-MCQ format with trap awareness.
- Explain it back (Feynman)Build durable, transferable understanding of a concept you can half-state.
- Test me, adapting the difficultyActive recall practice on a concept you think you know.
- Check my prerequisites firstBefore studying a concept that keeps not sticking.
- Find the high-leverage 20%When a domain feels too big and you are short on time.
