D4.1 · Domain 4 · Prompt Engineering · 20% of CCA-F

System Prompts & Instructions.

9 min read·10 sections·Tier A

System prompts establish role, constraints, format, and tool guidance. Anatomy: role, capability boundaries, style/format rules, tool guidance, examples. messages API system field

Deep-dive coming soonDomain 4
System Prompts & Instructions, hero illustration featuring Loop mascot in a warm gallery scene.
Domain D4Prompt Engineering · 20%
On this page
01 · Summary

TLDR

System prompts establish role, constraints, format, and tool guidance. Anatomy: role, capability boundaries, style/format rules, tool guidance, examples. messages API system field

5
Anatomy parts
D4
Exam domain
B
Coverage tier
vague role
Common trap
concise
Best length
02 · Definition

What it is

A system prompt is the foundational instruction that shapes Claude's entire behavior for a conversation. Sent with every messages.create() call, it lives at the top of the message list and persists across all turns. Unlike user messages (ephemeral within a turn), the system prompt anchors the role, boundaries, format, tool guidance, and example patterns for every response. It is the charter document that governs what Claude will and won't do.

The anatomy of a production system prompt has five load-bearing sections: role definition (who Claude is, what it's accountable for), task boundaries (what's in scope, what's forbidden), output format (response structure, JSON schemas), tool guidance (when to use which tools, ordering), and example patterns (2-3 concrete before/after pairs). A vague "be helpful" fails at scale; a precise charter reduces hallucination, misrouting, and off-policy behavior by 30-60% depending on stakes.

The contract is structural, not linguistic. A system prompt is not persuasion ("please be thoughtful") or hope ("avoid errors"). It is a policy document the model reads and uses to allocate attention. Every phrase is load-bearing. "Do not refund above $500" is ignored without a hook; if amount > 500: deny in tool code is enforced 100%. Deterministic enforcement beats prompt guidance for high-stakes decisions.

Production failures cluster around two gaps: role-boundary confusion ("be conservative" without specifying what that means) and missing example patterns (the model guesses style from tone). The exam drills both. A common distractor: "upgrade the model to fix policy compliance." The real fix is a system prompt that embeds policy into tool descriptions, not as linguistic guidance.

03 · Mechanics

How it works

Every messages.create() call includes a system parameter. Claude reads it first, before the message list. It shapes internal planning: what goals to optimize, which tools are available, what to do with ambiguity. The system prompt is stateless across turns: each turn re-reads it, so updates are immediately visible. Unlike user context that accumulates, the system prompt is a constant reset.

The five-section anatomy is signal compression. Role tells Claude its job ("You are a refund agent..."). Boundaries enumerate allowed/forbidden actions. Format specifies the shape (JSON with {status, amount}). Tool guidance is the when-to-use logic ("Always call verify_customer first"). Examples show before/after pairs: customer requests $600 refund → worked example showing the reasoning.

The system prompt interacts with tool descriptions to form a two-layer enforcement model. System prompt is macro policy (roles, broad guardrails). Tool descriptions are micro policy (when to call X, what inputs are valid). Without both, enforcement is incomplete. A system prompt saying "always verify first" plus a tool description saying "verify_customer: Use first" is redundant but safe. Vague tool descriptions create misrouting.

Caching the system prompt is the highest-leverage optimization. The system prompt is read-only across turns, so mark with cache_control: {type: "ephemeral"}. Cache persists 5 minutes, every subsequent call pays ~90% less. For a 1000-token system prompt called 10 times, save 9000 tokens. Caching is free and automatic; just add the annotation.

System Prompts & Instructions mechanics, painterly diagram featuring Loop mascot.
04 · In production

Where you'll see it

Customer support refund agent

Five-section system prompt: role ("refund policy enforcer"), boundaries ("never >$500 lifetime"), format (JSON), tool guidance ("verify → lookup → process"), examples ($600 denied because lifetime $300 + $600 = $900). Agent respects $500 cap 99%+ of time. Without system prompt, model guesses.

Multi-turn research synthesis with citation discipline

Coordinator's system prompt defines: role (orchestrator), boundaries (cite sources; never aggregate without attribution), format (JSON array of findings with provenance), tool guidance (subagents do NOT inherit history), examples. Re-read every turn, citation discipline holds across 6-turn conversation.

Structured data extraction with validation

Invoice pipeline's system prompt covers role (extract without hallucination), boundaries (return null if absent; never invent), format (JSON schema), tool guidance (use mark_entity per field), examples (3 invoices showing correct extraction). Cached, so 1000 extraction requests pay once for the prompt.

Code review bot in CI/CD

GitHub Actions invokes Claude with claude -p. System prompt defines role (security + perf reviewer), boundaries (only flag with test cases; no speculation), format (JSON {verdict, critical_issues, minor_issues}), tool guidance (Read + Grep, no Edit), examples (2 PR diffs). CI parses JSON deterministically.

05 · Implementation

Code examples

Five-section system prompt with caching
from anthropic import Anthropic

client = Anthropic()

SYSTEM_PROMPT = """# Refund Agent Charter

## 1. Role Definition
You are a customer support agent for refund processing. Accountable for policy compliance and customer experience.

## 2. Task Boundaries
- ALWAYS call verify_customer first
- Refunds <= $500 lifetime: ALLOWED
- Refunds > $500 lifetime: DENIED, escalate to human
- Never negotiate the $500 policy

## 3. Output Format
JSON: { status: "approved"|"denied"|"escalated", amount, reason, customer_message }

## 4. Tool Guidance
Order: verify_customer -> lookup_order -> process_refund
Skip verification = escalate.

## 5. Example Patterns
Example 1 (approve): $247.83 + prior $150 = $397.83 < $500 -> approve
Example 2 (deny): $600 + prior $300 = $900 > $500 -> deny + escalate
Example 3 (explicit): "speak to manager" -> escalate immediately
"""

def run_agent(user_msg: str, tools: list):
    messages = [{"role": "user", "content": user_msg}]
    for i in range(10):
        resp = client.messages.create(
            model="claude-opus-4-5",
            max_tokens=1024,
            # Cache the system prompt: ~90% savings on subsequent calls
            system=[{
                "type": "text",
                "text": SYSTEM_PROMPT,
                "cache_control": {"type": "ephemeral"},
            }],
            tools=tools,
            messages=messages,
        )
        if resp.stop_reason == "end_turn":
            return resp.content[0].text
        # ... handle tool_use, append tool_result ...
    return "max_iterations"
Five sections: role, boundaries, format, tool guidance, examples. Cache_control: ephemeral on system saves ~90% per turn after the first.
06 · Distractor patterns

Looks right, isn't

Each row pairs a plausible-looking pattern with the failure it actually creates. These are the shapes exam distractors are built from.

Looks right

Write the policy as natural-language guidance: 'be conservative with refunds.'

Actually wrong

Natural-language guidance is advice the model ignores under load. Only deterministic tool code (checking $500 in process_refund) enforces 100%.

Looks right

Update the system prompt mid-conversation to clarify ambiguity.

Actually wrong

Each call re-reads the system prompt; updating is possible but creates inconsistency. Define the full charter upfront.

Looks right

System prompt can replace tool governance; don't burden tool descriptions.

Actually wrong

Two-layer enforcement: system prompt = macro policy, tool descriptions = micro policy. Omitting either creates gaps.

Looks right

Longer system prompts give Claude more context.

Actually wrong

5,000+ token prompts cause attention dilution. Tight 500-1500 token charters outperform verbose. Every section must be load-bearing.

Looks right

Skip example patterns; they're noise.

Actually wrong

Example patterns are the highest-ROI section. A before/after pair reduces misclassification by 30-60%. 2-3 examples is right.

07 · Compare

Side-by-side

AspectSystem PromptTool DescriptionUser InstructionCached vs Fresh
ScopeEntire conversationPer-tool when/howSingle-turnCached: constant
PersistenceConstant across turnsConstantTurn-specificCached: 5-min TTL
EnforcementPolicy guidance (medium)Structural (high)Linguistic hint (low)Cached: ~90% cheaper
Update timingNext callNext callImmediateCached: post-expiry
Best forRole, boundaries, examplesRouting, validationClarificationHigh-volume loops
Token cost100% fresh, 10% cachedNot cachedPer turnCached: 10 vs 100
08 · When to use

Decision tree

01

Building an agentic loop running 5+ turns?

YesUse a five-section system prompt with cache_control: ephemeral. Save 90%.
NoSingle-turn call: caching marginal.
02

Is there a policy that must be enforced 100% of the time?

YesEncode in tool code (if amount > 500: deny), not system prompt. System prompts are guidance; tools are enforcement.
NoSystem prompt guidance is acceptable.
03

Struggling with the agent misrouting?

YesFix tool descriptions, not system prompt. Add when-to-use clarity and edge-cases.
NoRouting healthy; focus on examples.
04

Multi-step task with mandatory ordering?

YesDocument in tool guidance section. Re-read every turn.
NoOrdering flexible.
05

Need output in specific format?

YesDefine in output format section. Pair with example patterns showing exact format.
NoNatural-language fine; format section still useful for consistency.
09 · On the exam

Question patterns

System Prompts & Instructions exam trap, painterly cautionary scene featuring Loop mascot.

56 V2 questions wired to this concept. Tap an answer to check it instantly — you'll see whether it's right and why — then expand the full breakdown for the mental model and all four rationales.

Two of your tools have similar names (fetch_data and get_data). The model picks the wrong one 30% of the time. What is the best first fix?

Tap your answer to check it.

A research subagent ran for 40 turns and returned a perfect summary, but the bill is huge. What is the architectural fix?

Tap your answer to check it.

Cross-task context like a vendor matrix path should live in: project Instructions or session messages?

Tap your answer to check it.

Your agent calls the wrong tool 30% of the time across 8 similar tools. What is the first fix?

Tap your answer to check it.

Your refund flow uses tool_choice: any to guarantee a tool call. The agent calls lookup_order instead of verify_customer first. Fix?

Tap your answer to check it.

The user asks a simple question and your forced tool_choice calls process_refund with empty input. What went wrong?

Tap your answer to check it.

50 additional questions for this concept live in the practice pillar. Take a mock exam ↗

10 · FAQ

Frequently asked

Difference between system prompt and user message?
System prompt is constant and re-read every turn (the charter). User message is a single-turn request that accumulates in history.
How long should a system prompt be?
500-1500 tokens. Every section must be load-bearing: role, boundaries, format, tool guidance, examples.
Can I have multiple system prompts in the same conversation?
No. One system per messages.create() call. To shift roles, append a user message or spawn a subagent.
How does caching work?
Mark system with cache_control: ephemeral. Cached 5 minutes. Subsequent calls pay ~90% less. After 5 min or new conversation, expires.
System prompts or tool descriptions for policy enforcement?
Both. System = macro policy, tool descriptions = micro policy. Use deterministic tool code for 100% enforcement.
What if I update the system prompt between calls?
New prompt used on next call. Creates inconsistency mid-conversation. Define charter upfront.
Can system prompts replace few-shot examples in tool descriptions?
Partially. System can include 2-3 example patterns. Tool descriptions can include input examples. Both useful; different purposes.
Do I re-send the system prompt every call?
Yes. With cache_control: ephemeral, you only pay once per 5-minute window.
What if system prompt conflicts with tool descriptions?
Tool descriptions take precedence (structural). System prompt is linguistic. Align them to avoid confusion.
Use system prompt instead of agentic loop?
No. System prompt defines role; agentic loop is the control structure. Need both.
11 · Practice with AI

Work this with your AI

Work this concept hands-on with Claude Code, Codex, or claude.ai. Copy a prompt, paste it into your assistant, and practise in tandem. Each one keeps you active (explain it back, get drilled, or build) rather than just reading.

  • Drill it like the exam (scenario MCQs)
    Practice in the exam's scenario-MCQ format with trap awareness.
  • Explain it back (Feynman)
    Build durable, transferable understanding of a concept you can half-state.
  • Test me, adapting the difficulty
    Active recall practice on a concept you think you know.
  • Check my prerequisites first
    Before studying a concept that keeps not sticking.
  • Find the high-leverage 20%
    When a domain feels too big and you are short on time.
Self-check

Test yourself

Three diagnostic questions on this primitive. Reveal each answer when you have a guess. Want a full 60-question mock? Open the mock hub →

Q1System prompt says "be conservative with refunds." Production shows 5% policy violations. Why?
Natural-language guidance is advice the model ignores under pressure. Encode the policy in tool code (if amount > 500: deny). System prompt = macro guidance; tools = enforcement.
Q2Vague system prompt vs precise five-section charter: how much accuracy difference?
30-60% reduction in misrouting and off-policy behavior. A precise charter (role, boundaries, format, tool guidance, examples) outperforms "be helpful" by orders of magnitude. Every section is load-bearing.
Q3Update the system prompt mid-conversation: what happens?
Each call re-reads the system prompt; the new prompt takes effect on the next call. Creates inconsistency mid-conversation: turn 5 with old prompt vs turn 6 with new prompt produces shifting behavior. Define the full charter upfront.
Last reviewed: 2026-05-04·Refresh cadence: monthly
D4.1 · D4 · Prompt Engineering

System Prompts & Instructions, complete.

You've covered the full ten-section breakdown for this primitive, definition, mechanics, code, false positives, comparison, decision tree, exam patterns, and FAQ. One technical primitive down on the path to CCA-F.

More platforms →