System Prompts & Instructions (D4, 20% of CCA-F) - Claude Architect Concept

01 · Summary

TLDR

System prompts establish role, constraints, format, and tool guidance. Anatomy: role, capability boundaries, style/format rules, tool guidance, examples. messages API system field

5

Anatomy parts

D4

Exam domain

B

Coverage tier

vague role

Common trap

concise

Best length

02 · Definition

What it is

A system prompt is the foundational instruction that shapes Claude's entire behavior for a conversation. Sent with every messages.create() call, it lives at the top of the message list and persists across all turns. Unlike user messages (ephemeral within a turn), the system prompt anchors the role, boundaries, format, tool guidance, and example patterns for every response. It is the charter document that governs what Claude will and won't do.

The anatomy of a production system prompt has five load-bearing sections: role definition (who Claude is, what it's accountable for), task boundaries (what's in scope, what's forbidden), output format (response structure, JSON schemas), tool guidance (when to use which tools, ordering), and example patterns (2-3 concrete before/after pairs). A vague "be helpful" fails at scale; a precise charter reduces hallucination, misrouting, and off-policy behavior by 30-60% depending on stakes.

The contract is structural, not linguistic. A system prompt is not persuasion ("please be thoughtful") or hope ("avoid errors"). It is a policy document the model reads and uses to allocate attention. Every phrase is load-bearing. "Do not refund above $500" is ignored without a hook; if amount > 500: deny in tool code is enforced 100%. Deterministic enforcement beats prompt guidance for high-stakes decisions.

Production failures cluster around two gaps: role-boundary confusion ("be conservative" without specifying what that means) and missing example patterns (the model guesses style from tone). The exam drills both. A common distractor: "upgrade the model to fix policy compliance." The real fix is a system prompt that embeds policy into tool descriptions, not as linguistic guidance.

03 · Mechanics

How it works

Every messages.create() call includes a system parameter. Claude reads it first, before the message list. It shapes internal planning: what goals to optimize, which tools are available, what to do with ambiguity. The system prompt is stateless across turns: each turn re-reads it, so updates are immediately visible. Unlike user context that accumulates, the system prompt is a constant reset.

The five-section anatomy is signal compression. Role tells Claude its job ("You are a refund agent..."). Boundaries enumerate allowed/forbidden actions. Format specifies the shape (JSON with {status, amount}). Tool guidance is the when-to-use logic ("Always call verify_customer first"). Examples show before/after pairs: customer requests $600 refund → worked example showing the reasoning.

The system prompt interacts with tool descriptions to form a two-layer enforcement model. System prompt is macro policy (roles, broad guardrails). Tool descriptions are micro policy (when to call X, what inputs are valid). Without both, enforcement is incomplete. A system prompt saying "always verify first" plus a tool description saying "verify_customer: Use first" is redundant but safe. Vague tool descriptions create misrouting.

Caching the system prompt is the highest-leverage optimization. The system prompt is read-only across turns, so mark with cache_control: {type: "ephemeral"}. Cache persists 5 minutes, every subsequent call pays ~90% less. For a 1000-token system prompt called 10 times, save 9000 tokens. Caching is free and automatic; just add the annotation.

System Prompts & Instructions mechanics, painterly diagram featuring Loop mascot.

04 · In production

Where you'll see it

Customer support refund agent

Five-section system prompt: role ("refund policy enforcer"), boundaries ("never >$500 lifetime"), format (JSON), tool guidance ("verify → lookup → process"), examples ($600 denied because lifetime $300 + $600 = $900). Agent respects $500 cap 99%+ of time. Without system prompt, model guesses.

Multi-turn research synthesis with citation discipline

Coordinator's system prompt defines: role (orchestrator), boundaries (cite sources; never aggregate without attribution), format (JSON array of findings with provenance), tool guidance (subagents do NOT inherit history), examples. Re-read every turn, citation discipline holds across 6-turn conversation.

Structured data extraction with validation

Invoice pipeline's system prompt covers role (extract without hallucination), boundaries (return null if absent; never invent), format (JSON schema), tool guidance (use mark_entity per field), examples (3 invoices showing correct extraction). Cached, so 1000 extraction requests pay once for the prompt.

Code review bot in CI/CD

GitHub Actions invokes Claude with claude -p. System prompt defines role (security + perf reviewer), boundaries (only flag with test cases; no speculation), format (JSON {verdict, critical_issues, minor_issues}), tool guidance (Read + Grep, no Edit), examples (2 PR diffs). CI parses JSON deterministically.

05 · Implementation

Code examples

Five-section system prompt with caching

from anthropic import Anthropic

client = Anthropic()

SYSTEM_PROMPT = """# Refund Agent Charter

## 1. Role Definition
You are a customer support agent for refund processing. Accountable for policy compliance and customer experience.

## 2. Task Boundaries
- ALWAYS call verify_customer first
- Refunds <= $500 lifetime: ALLOWED
- Refunds > $500 lifetime: DENIED, escalate to human
- Never negotiate the $500 policy

## 3. Output Format
JSON: { status: "approved"|"denied"|"escalated", amount, reason, customer_message }

## 4. Tool Guidance
Order: verify_customer -> lookup_order -> process_refund
Skip verification = escalate.

## 5. Example Patterns
Example 1 (approve): $247.83 + prior $150 = $397.83 < $500 -> approve
Example 2 (deny): $600 + prior $300 = $900 > $500 -> deny + escalate
Example 3 (explicit): "speak to manager" -> escalate immediately
"""

def run_agent(user_msg: str, tools: list):
    messages = [{"role": "user", "content": user_msg}]
    for i in range(10):
        resp = client.messages.create(
            model="claude-opus-4-5",
            max_tokens=1024,
            # Cache the system prompt: ~90% savings on subsequent calls
            system=[{
                "type": "text",
                "text": SYSTEM_PROMPT,
                "cache_control": {"type": "ephemeral"},
            }],
            tools=tools,
            messages=messages,
        )
        if resp.stop_reason == "end_turn":
            return resp.content[0].text
        # ... handle tool_use, append tool_result ...
    return "max_iterations"

Five sections: role, boundaries, format, tool guidance, examples. Cache_control: ephemeral on system saves ~90% per turn after the first.

06 · Distractor patterns

Looks right, isn't

Each row pairs a plausible-looking pattern with the failure it actually creates. These are the shapes exam distractors are built from.

Looks right

Write the policy as natural-language guidance: 'be conservative with refunds.'

Actually wrong

Natural-language guidance is advice the model ignores under load. Only deterministic tool code (checking $500 in process_refund) enforces 100%.

Looks right

Update the system prompt mid-conversation to clarify ambiguity.

Actually wrong

Each call re-reads the system prompt; updating is possible but creates inconsistency. Define the full charter upfront.

Looks right

System prompt can replace tool governance; don't burden tool descriptions.

Actually wrong

Two-layer enforcement: system prompt = macro policy, tool descriptions = micro policy. Omitting either creates gaps.

Looks right

Longer system prompts give Claude more context.

Actually wrong

5,000+ token prompts cause attention dilution. Tight 500-1500 token charters outperform verbose. Every section must be load-bearing.

Looks right

Skip example patterns; they're noise.

Actually wrong

Example patterns are the highest-ROI section. A before/after pair reduces misclassification by 30-60%. 2-3 examples is right.

07 · Compare

Side-by-side

Aspect	System Prompt	Tool Description	User Instruction	Cached vs Fresh
Scope	Entire conversation	Per-tool when/how	Single-turn	Cached: constant
Persistence	Constant across turns	Constant	Turn-specific	Cached: 5-min TTL
Enforcement	Policy guidance (medium)	Structural (high)	Linguistic hint (low)	Cached: ~90% cheaper
Update timing	Next call	Next call	Immediate	Cached: post-expiry
Best for	Role, boundaries, examples	Routing, validation	Clarification	High-volume loops
Token cost	100% fresh, 10% cached	Not cached	Per turn	Cached: 10 vs 100

08 · When to use

Decision tree

01

Building an agentic loop running 5+ turns?

YesUse a five-section system prompt with cache_control: ephemeral. Save 90%.

NoSingle-turn call: caching marginal.

02

Is there a policy that must be enforced 100% of the time?

YesEncode in tool code (if amount > 500: deny), not system prompt. System prompts are guidance; tools are enforcement.

NoSystem prompt guidance is acceptable.

03

Struggling with the agent misrouting?

YesFix tool descriptions, not system prompt. Add when-to-use clarity and edge-cases.

NoRouting healthy; focus on examples.

04

Multi-step task with mandatory ordering?

YesDocument in tool guidance section. Re-read every turn.

NoOrdering flexible.

05

Need output in specific format?

YesDefine in output format section. Pair with example patterns showing exact format.

NoNatural-language fine; format section still useful for consistency.

09 · On the exam

Question patterns

System Prompts & Instructions exam trap, painterly cautionary scene featuring Loop mascot.

56 V2 questions wired to this concept. Tap an answer to check it instantly — you'll see whether it's right and why — then expand the full breakdown for the mental model and all four rationales.

Two of your tools have similar names (fetch_data and get_data). The model picks the wrong one 30% of the time. What is the best first fix?

Tap your answer to check it.

A research subagent ran for 40 turns and returned a perfect summary, but the bill is huge. What is the architectural fix?

Tap your answer to check it.

Cross-task context like a vendor matrix path should live in: project Instructions or session messages?

Tap your answer to check it.

Your agent calls the wrong tool 30% of the time across 8 similar tools. What is the first fix?

Tap your answer to check it.

Your refund flow uses tool_choice: any to guarantee a tool call. The agent calls lookup_order instead of verify_customer first. Fix?

Tap your answer to check it.

The user asks a simple question and your forced tool_choice calls process_refund with empty input. What went wrong?

Tap your answer to check it.

50 additional questions for this concept live in the practice pillar. Take a mock exam ↗

10 · FAQ

Frequently asked

Difference between system prompt and user message?

System prompt is constant and re-read every turn (the charter). User message is a single-turn request that accumulates in history.

How long should a system prompt be?

500-1500 tokens. Every section must be load-bearing: role, boundaries, format, tool guidance, examples.

Can I have multiple system prompts in the same conversation?

No. One system per messages.create() call. To shift roles, append a user message or spawn a subagent.

How does caching work?

Mark system with cache_control: ephemeral. Cached 5 minutes. Subsequent calls pay ~90% less. After 5 min or new conversation, expires.

System prompts or tool descriptions for policy enforcement?

Both. System = macro policy, tool descriptions = micro policy. Use deterministic tool code for 100% enforcement.

What if I update the system prompt between calls?

New prompt used on next call. Creates inconsistency mid-conversation. Define charter upfront.

Can system prompts replace few-shot examples in tool descriptions?

Partially. System can include 2-3 example patterns. Tool descriptions can include input examples. Both useful; different purposes.

Do I re-send the system prompt every call?

Yes. With cache_control: ephemeral, you only pay once per 5-minute window.

What if system prompt conflicts with tool descriptions?

Tool descriptions take precedence (structural). System prompt is linguistic. Align them to avoid confusion.

Use system prompt instead of agentic loop?

No. System prompt defines role; agentic loop is the control structure. Need both.

11 · Practice with AI

Work this with your AI

Work this concept hands-on with Claude Code, Codex, or claude.ai. Copy a prompt, paste it into your assistant, and practise in tandem. Each one keeps you active (explain it back, get drilled, or build) rather than just reading.

Drill it like the exam (scenario MCQs)
Practice in the exam's scenario-MCQ format with trap awareness.
Explain it back (Feynman)
Build durable, transferable understanding of a concept you can half-state.
Test me, adapting the difficulty
Active recall practice on a concept you think you know.
Check my prerequisites first
Before studying a concept that keeps not sticking.
Find the high-leverage 20%
When a domain feels too big and you are short on time.

System Prompts & Instructions.

TLDR

What it is

How it works

Where you'll see it

Customer support refund agent

Multi-turn research synthesis with citation discipline

Structured data extraction with validation

Code review bot in CI/CD

Code examples

Looks right, isn't

Side-by-side

Decision tree

Building an agentic loop running 5+ turns?

Is there a policy that must be enforced 100% of the time?

Struggling with the agent misrouting?

Multi-step task with mandatory ordering?

Need output in specific format?

Question patterns

Frequently asked

Work this with your AI

Test yourself

System Prompts & Instructions, complete.

System Prompts & Instructions.

TLDR

What it is

How it works

Where you'll see it

Customer support refund agent

Multi-turn research synthesis with citation discipline

Structured data extraction with validation

Code review bot in CI/CD

Code examples

Looks right, isn't

Side-by-side

Decision tree

Building an agentic loop running 5+ turns?

Is there a policy that must be enforced 100% of the time?

Struggling with the agent misrouting?

Multi-step task with mandatory ordering?

Need output in specific format?

Question patterns

Frequently asked

Work this with your AI

Test yourself

System Prompts & Instructions, complete.

Share this primitive