# System Prompts & Instructions

> System prompts establish role, constraints, format, and tool guidance. Anatomy: role, capability boundaries, style/format rules, tool guidance, examples. Full content in SCRUM-21 follow-up.

**Domain:** D4 · Prompt Engineering (20% of CCA-F exam)
**Canonical:** https://claudearchitectcertification.com/concepts/system-prompts
**Last reviewed:** 2026-05-04

## Quick stats

- **Anatomy parts:** 5
- **Exam domain:** D4
- **Coverage tier:** B
- **Common trap:** vague role
- **Best length:** concise

## What it is

A system prompt is the foundational instruction that shapes Claude's entire behavior for a conversation. Sent with every messages.create() call, it lives at the top of the message list and persists across all turns. Unlike user messages (ephemeral within a turn), the system prompt anchors the role, boundaries, format, tool guidance, and example patterns for every response. It is the charter document that governs what Claude will and won't do.

The anatomy of a production system prompt has five load-bearing sections: role definition (who Claude is, what it's accountable for), task boundaries (what's in scope, what's forbidden), output format (response structure, JSON schemas), tool guidance (when to use which tools, ordering), and example patterns (2-3 concrete before/after pairs). A vague "be helpful" fails at scale; a precise charter reduces hallucination, misrouting, and off-policy behavior by 30-60% depending on stakes.

The contract is structural, not linguistic. A system prompt is not persuasion ("please be thoughtful") or hope ("avoid errors"). It is a policy document the model reads and uses to allocate attention. Every phrase is load-bearing. "Do not refund above $500" is ignored without a hook; if amount > 500: deny in tool code is enforced 100%. Deterministic enforcement beats prompt guidance for high-stakes decisions.

Production failures cluster around two gaps: role-boundary confusion ("be conservative" without specifying what that means) and missing example patterns (the model guesses style from tone). The exam drills both. A common distractor: "upgrade the model to fix policy compliance." The real fix is a system prompt that embeds policy into tool descriptions, not as linguistic guidance.

## How it works

Every messages.create() call includes a system parameter. Claude reads it first, before the message list. It shapes internal planning: what goals to optimize, which tools are available, what to do with ambiguity. The system prompt is stateless across turns: each turn re-reads it, so updates are immediately visible. Unlike user context that accumulates, the system prompt is a constant reset.

The five-section anatomy is signal compression. Role tells Claude its job ("You are a refund agent..."). Boundaries enumerate allowed/forbidden actions. Format specifies the shape (JSON with {status, amount}). Tool guidance is the when-to-use logic ("Always call verify_customer first"). Examples show before/after pairs: customer requests $600 refund → worked example showing the reasoning.

The system prompt interacts with tool descriptions to form a two-layer enforcement model. System prompt is macro policy (roles, broad guardrails). Tool descriptions are micro policy (when to call X, what inputs are valid). Without both, enforcement is incomplete. A system prompt saying "always verify first" plus a tool description saying "verify_customer: Use first" is redundant but safe. Vague tool descriptions create misrouting.

Caching the system prompt is the highest-leverage optimization. The system prompt is read-only across turns, so mark with cache_control: {type: "ephemeral"}. Cache persists 5 minutes, every subsequent call pays ~90% less. For a 1000-token system prompt called 10 times, save 9000 tokens. Caching is free and automatic; just add the annotation.

## Where you'll see it in production

### Customer support refund agent

Five-section system prompt: role ("refund policy enforcer"), boundaries ("never >$500 lifetime"), format (JSON), tool guidance ("verify → lookup → process"), examples ($600 denied because lifetime $300 + $600 = $900). Agent respects $500 cap 99%+ of time. Without system prompt, model guesses.

### Multi-turn research synthesis with citation discipline

Coordinator's system prompt defines: role (orchestrator), boundaries (cite sources; never aggregate without attribution), format (JSON array of findings with provenance), tool guidance (subagents do NOT inherit history), examples. Re-read every turn, citation discipline holds across 6-turn conversation.

### Structured data extraction with validation

Invoice pipeline's system prompt covers role (extract without hallucination), boundaries (return null if absent; never invent), format (JSON schema), tool guidance (use mark_entity per field), examples (3 invoices showing correct extraction). Cached, so 1000 extraction requests pay once for the prompt.

### Code review bot in CI/CD

GitHub Actions invokes Claude with claude -p. System prompt defines role (security + perf reviewer), boundaries (only flag with test cases; no speculation), format (JSON {verdict, critical_issues, minor_issues}), tool guidance (Read + Grep, no Edit), examples (2 PR diffs). CI parses JSON deterministically.

## Code examples

### Five-section system prompt with caching

**Python:**

```python
from anthropic import Anthropic

client = Anthropic()

SYSTEM_PROMPT = """# Refund Agent Charter

## 1. Role Definition
You are a customer support agent for refund processing. Accountable for policy compliance and customer experience.

## 2. Task Boundaries
- ALWAYS call verify_customer first
- Refunds <= $500 lifetime: ALLOWED
- Refunds > $500 lifetime: DENIED, escalate to human
- Never negotiate the $500 policy

## 3. Output Format
JSON: { status: "approved"|"denied"|"escalated", amount, reason, customer_message }

## 4. Tool Guidance
Order: verify_customer -> lookup_order -> process_refund
Skip verification = escalate.

## 5. Example Patterns
Example 1 (approve): $247.83 + prior $150 = $397.83 < $500 -> approve
Example 2 (deny): $600 + prior $300 = $900 > $500 -> deny + escalate
Example 3 (explicit): "speak to manager" -> escalate immediately
"""

def run_agent(user_msg: str, tools: list):
    messages = [{"role": "user", "content": user_msg}]
    for i in range(10):
        resp = client.messages.create(
            model="claude-opus-4-5",
            max_tokens=1024,
            # Cache the system prompt: ~90% savings on subsequent calls
            system=[{
                "type": "text",
                "text": SYSTEM_PROMPT,
                "cache_control": {"type": "ephemeral"},
            }],
            tools=tools,
            messages=messages,
        )
        if resp.stop_reason == "end_turn":
            return resp.content[0].text
        # ... handle tool_use, append tool_result ...
    return "max_iterations"
```

> Five sections: role, boundaries, format, tool guidance, examples. Cache_control: ephemeral on system saves ~90% per turn after the first.

**TypeScript:**

```typescript
import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

const SYSTEM_PROMPT = `# Refund Agent Charter

## 1. Role
You are a refund agent. Accountable for policy + customer experience.

## 2. Boundaries
- ALWAYS verify_customer first
- <= $500: ALLOWED. > $500: DENIED + escalate
- Never negotiate

## 3. Format
JSON: { status, amount, reason, customer_message }

## 4. Tool Guidance
verify_customer -> lookup_order -> process_refund

## 5. Examples
Approve: $247 + prior $150 = $397 < $500
Deny: $600 + prior $300 = $900 > $500
Escalate: explicit "want a manager"
`;

async function runAgent(userMsg: string, tools: Anthropic.Tool[]) {
  const messages: Anthropic.MessageParam[] = [{ role: "user", content: userMsg }];
  for (let i = 0; i < 10; i++) {
    const resp = await client.messages.create({
      model: "claude-opus-4-5",
      max_tokens: 1024,
      system: [{
        type: "text",
        text: SYSTEM_PROMPT,
        cache_control: { type: "ephemeral" },
      }] as any,
      tools,
      messages,
    });
    if (resp.stop_reason === "end_turn") {
      return resp.content[0].type === "text" ? resp.content[0].text : "";
    }
    // ... handle tool_use, append tool_result ...
  }
  return "max_iterations";
}
```

> Same five-section pattern in TypeScript. Array syntax required for cache_control. Cache TTL = 5 minutes default.

## Looks-right vs actually-wrong

| Looks right | Actually wrong |
|---|---|
| Write the policy as natural-language guidance: 'be conservative with refunds.' | Natural-language guidance is advice the model ignores under load. Only deterministic tool code (checking $500 in process_refund) enforces 100%. |
| Update the system prompt mid-conversation to clarify ambiguity. | Each call re-reads the system prompt; updating is possible but creates inconsistency. Define the full charter upfront. |
| System prompt can replace tool governance; don't burden tool descriptions. | Two-layer enforcement: system prompt = macro policy, tool descriptions = micro policy. Omitting either creates gaps. |
| Longer system prompts give Claude more context. | 5,000+ token prompts cause attention dilution. Tight 500-1500 token charters outperform verbose. Every section must be load-bearing. |
| Skip example patterns; they're noise. | Example patterns are the highest-ROI section. A before/after pair reduces misclassification by 30-60%. 2-3 examples is right. |

## Comparison

| Aspect | System Prompt | Tool Description | User Instruction | Cached vs Fresh |
| --- | --- | --- | --- | --- |
| Scope | Entire conversation | Per-tool when/how | Single-turn | Cached: constant |
| Persistence | Constant across turns | Constant | Turn-specific | Cached: 5-min TTL |
| Enforcement | Policy guidance (medium) | Structural (high) | Linguistic hint (low) | Cached: ~90% cheaper |
| Update timing | Next call | Next call | Immediate | Cached: post-expiry |
| Best for | Role, boundaries, examples | Routing, validation | Clarification | High-volume loops |
| Token cost | 100% fresh, 10% cached | Not cached | Per turn | Cached: 10 vs 100 |

## Decision tree

1. **Building an agentic loop running 5+ turns?**
   - **Yes:** Use a five-section system prompt with cache_control: ephemeral. Save 90%.
   - **No:** Single-turn call: caching marginal.

2. **Is there a policy that must be enforced 100% of the time?**
   - **Yes:** Encode in tool code (if amount > 500: deny), not system prompt. System prompts are guidance; tools are enforcement.
   - **No:** System prompt guidance is acceptable.

3. **Struggling with the agent misrouting?**
   - **Yes:** Fix tool descriptions, not system prompt. Add when-to-use clarity and edge-cases.
   - **No:** Routing healthy; focus on examples.

4. **Multi-step task with mandatory ordering?**
   - **Yes:** Document in tool guidance section. Re-read every turn.
   - **No:** Ordering flexible.

5. **Need output in specific format?**
   - **Yes:** Define in output format section. Pair with example patterns showing exact format.
   - **No:** Natural-language fine; format section still useful for consistency.

## Exam-pattern questions

### Q1. System prompt says "be conservative with refunds." Production shows 5% policy violations. Why?

Natural-language guidance is advice the model ignores under pressure. Encode the policy in tool code (if amount > 500: deny). System prompt = macro guidance; tools = enforcement.

### Q2. Vague system prompt vs precise five-section charter: how much accuracy difference?

30-60% reduction in misrouting and off-policy behavior. A precise charter (role, boundaries, format, tool guidance, examples) outperforms "be helpful" by orders of magnitude. Every section is load-bearing.

### Q3. Update the system prompt mid-conversation: what happens?

Each call re-reads the system prompt; the new prompt takes effect on the next call. Creates inconsistency mid-conversation: turn 5 with old prompt vs turn 6 with new prompt produces shifting behavior. Define the full charter upfront.

### Q4. System prompt is 5,000 tokens long. What's the issue?

Attention dilution. Tight 500-1500 token charters outperform verbose. The model allocates less attention per section when the prompt bloats. Every section must be load-bearing; prune the rest.

### Q5. Why are example patterns the highest-ROI section of a system prompt?

A 2-3 example before/after pair reduces misclassification by 30-60%. Examples show Claude exactly what success looks like in concrete cases. Vague descriptions can't substitute for one worked example.

### Q6. System prompt + tool descriptions: which takes precedence on routing?

Tool descriptions take precedence (structural enforcement via SDK). System prompt is linguistic guidance. Align them to avoid confusion: prompt says "verify first"; tool description says "verify_customer: Use first." Redundant but safe.

### Q7. Cache the system prompt: when is it not worth it?

Below ~1024 tokens (cache overhead exceeds savings) or for one-off requests (no repeat reads). For loops repeating the same system + tools, caching saves ~88% on those tokens per turn.

### Q8. Use a system prompt instead of an agentic loop for complex tasks: viable?

No. System prompt defines role and rules. Agentic loop is the control structure (the while block that re-sends messages on tool_use). They're orthogonal; you need both.

## FAQ

### Q1. Difference between system prompt and user message?

System prompt is constant and re-read every turn (the charter). User message is a single-turn request that accumulates in history.

### Q2. How long should a system prompt be?

500-1500 tokens. Every section must be load-bearing: role, boundaries, format, tool guidance, examples.

### Q3. Can I have multiple system prompts in the same conversation?

No. One system per messages.create() call. To shift roles, append a user message or spawn a subagent.

### Q4. How does caching work?

Mark system with cache_control: ephemeral. Cached 5 minutes. Subsequent calls pay ~90% less. After 5 min or new conversation, expires.

### Q5. System prompts or tool descriptions for policy enforcement?

Both. System = macro policy, tool descriptions = micro policy. Use deterministic tool code for 100% enforcement.

### Q6. What if I update the system prompt between calls?

New prompt used on next call. Creates inconsistency mid-conversation. Define charter upfront.

### Q7. Can system prompts replace few-shot examples in tool descriptions?

Partially. System can include 2-3 example patterns. Tool descriptions can include input examples. Both useful; different purposes.

### Q8. Do I re-send the system prompt every call?

Yes. With cache_control: ephemeral, you only pay once per 5-minute window.

### Q9. What if system prompt conflicts with tool descriptions?

Tool descriptions take precedence (structural). System prompt is linguistic. Align them to avoid confusion.

### Q10. Use system prompt instead of agentic loop?

No. System prompt defines role; agentic loop is the control structure. Need both.

---

**Source:** https://claudearchitectcertification.com/concepts/system-prompts
**Vault sources:** ACP-T03 §4.4 prompt engineering; ACP-T04 §6.B system prompt anatomy; ASC-A01 Course 6
**Last reviewed:** 2026-05-04

**Evidence tiers** — 🟢 official Anthropic doc / API contract · 🟡 partial doc / inferred · 🟠 community-derived · 🔴 disputed.
