You'll walk away with
- How
client.messages.create()works, including system prompts, temperature, streaming, and structured outputs - How to build a prompt-evaluation pipeline with model-based and code-based grading on a curated test set
- Six prompt-engineering techniques: clarity, specificity, XML tags, examples, role, decomposition
- How tool use works end-to-end: schemas, message blocks, tool results, multi-turn tool calling, fine-grained streaming
- How RAG composes (chunking, embeddings, BM25, multi-index reranking) and how prompt caching, vision, citations stack on top
- How MCP exposes tools/resources/prompts, and when to choose workflows (chaining, routing, parallelization) versus full agents
Read these first
Lesson outline
Every lesson from Building with the Claude API with our one-line simplification. The Skilljar course is the source; we summarize.
Show all 85 lessons
| # | Skilljar lesson | Our simplification |
|---|---|---|
| 1 | Welcome to the course | Course frame: bottom-up API tour from first request to full agent architectures. |
| 2 | Overview of Claude models | Opus, Sonnet, Haiku tiers; pick by capability vs. cost vs. latency tradeoff. |
| 3 | Accessing the API | Anthropic Console, direct API, AWS Bedrock, Google Vertex; same model, different access path. |
| 4 | Getting an API key | Generate a key from console.anthropic.com; never commit it; use .env plus python-dotenv. |
| 5 | Making a request | client.messages.create(model, max_tokens, messages); the three required params; max_tokens is a safety cap, not a target. |
| 6 | Multi-turn conversations | Append assistant responses to your messages list; the model is stateless, the conversation is your responsibility. |
| 7 | Chat exercise | Hands-on: build a minimal CLI chatbot with multi-turn message history. |
| 8 | System prompts | Pass system="..." to set role, behavior, constraints; system prompts are not in the messages array. |
| 9 | System prompts exercise | Hands-on: write a system prompt that turns Claude into a Socratic math tutor without giving direct answers. |
| 10 | Temperature | 0.0 = deterministic and focused; 1.0 = creative and varied. Default 1.0; lower for extraction, higher for ideation. |
| 11 | Course satisfaction survey | Mid-course survey checkpoint, no technical content. |
| 12 | Response streaming | Use stream=True and iterate events; ship tokens to the UI as they arrive instead of waiting for the full response. |
| 13 | Structured data | Ask for JSON in the prompt and parse; for strict schemas use tool-calling as a structured-output mechanism. |
| 14 | Structured data exercise | Hands-on: extract structured fields (name, date, amount) from unstructured invoice text. |
| 15 | Quiz on accessing Claude with the API | Knowledge check covering messages, system prompts, temperature, streaming, structured data. |
| 16 | Prompt evaluation | Don't ship a prompt without evaluating it; evals are the difference between a demo and a system. |
| 17 | A typical eval workflow | Generate test dataset, run prompt against each, grade outputs, score, iterate. |
| 18 | Generating test datasets | Use Claude itself to generate diverse test inputs covering edge cases, then hand-curate. |
| 19 | Running the eval | Run prompt against the test set in parallel with rate-limit handling; collect raw outputs. |
| 20 | Model-based grading | Use a grader prompt (often a stronger model) to score outputs against rubric; cheap and flexible. |
| 21 | Code-based grading | Programmatic checks: regex, schema validation, exact match. Use when criteria are mechanical, model-grade when subjective. |
| 22 | Exercise on prompt evals | Hands-on: build an eval pipeline for a meal-plan prompt with mixed code-based and model-based grading. |
| 23 | Quiz on prompt evaluation | Knowledge check covering eval workflow, dataset generation, model vs. code grading. |
| 24 | Prompt engineering | Iterative loop: goal → initial prompt → eval → apply technique → re-eval; repeat until you hit your bar. |
| 25 | Being clear and direct | Tell Claude exactly what you want; ambiguity is the most common failure mode. |
| 26 | Being specific | Concrete constraints beat abstract instructions; "three bullet points, max 15 words each" beats "be concise". |
| 27 | Structure with XML tags | Wrap inputs in <document>, <example>, <question> tags; Claude is trained to attend to XML structure. |
| 28 | Providing examples | Multishot beats explanation; one or two well-chosen examples shape output style faster than five paragraphs of description. |
| 29 | Exercise on prompting | Hands-on: take a weak prompt and apply the four engineering techniques to lift its eval score. |
| 30 | Quiz on prompt engineering techniques | Knowledge check covering clarity, specificity, XML, examples, role, decomposition. |
| 31 | Introducing tool use | Tool use lets Claude call your functions; you describe the tool, Claude decides when to invoke it, you run it and return the result. |
| 32 | Project overview | Course-long project frame: build a customer-data agent using tool use end-to-end. |
| 33 | Tool functions | Write the actual Python functions Claude will call; pure logic, no Anthropic-specific glue. |
| 34 | Tool schemas | JSON schema with name, description, input_schema; the description is what Claude uses to decide when to call. |
| 35 | Handling message blocks | Response content is a list of blocks: text, tool_use, thinking. Iterate the list, don't index [0]. |
| 36 | Sending tool results | After running the tool, send a tool_result block in the next user message with the matching tool_use_id. |
| 37 | Multi-turn conversations with tools | Loop: Claude requests tool → you run it → you append result → Claude responds or requests another tool. |
| 38 | Implementing multiple turns | Hands-on: write the agent loop with stop_reason == 'tool_use' as the continuation signal. |
| 39 | Using multiple tools | Pass an array of tools; Claude picks based on tool descriptions; use tool_choice to force or restrict selection. |
| 40 | Fine-grained tool calling | Stream tool inputs token-by-token as Claude generates them; useful for long arguments and progressive UIs. |
| 41 | The text edit tool | Anthropic-defined tool that lets Claude edit files via structured ops (view, create, str_replace, insert). |
| 42 | The web search tool | Anthropic-hosted web search tool; Claude issues search queries and you get cited results back. |
| 43 | Quiz on tool use with Claude | Knowledge check covering schemas, message blocks, tool results, multi-tool selection, fine-grained streaming. |
| 44 | Introducing retrieval augmented generation | RAG = retrieve relevant context from your data, stuff it into the prompt, let Claude answer with citations. |
| 45 | Text chunking strategies | Split documents into chunks; size and overlap matter; semantic boundaries beat naive token splits. |
| 46 | Text embeddings | Vectorize chunks with an embedding model; cosine similarity finds semantically near chunks at query time. |
| 47 | The full RAG flow | Ingest → chunk → embed → store. Query → embed → search → rerank → stuff prompt → generate. |
| 48 | Implementing the RAG flow | Hands-on: build the ingest and query path with a vector DB and embedding API. |
| 49 | BM25 lexical search | Keyword-based ranking that complements semantic search; catches exact terms semantic search misses. |
| 50 | A multi-index RAG pipeline | Combine dense (embeddings) + sparse (BM25) retrieval and rerank; the production-quality pattern. |
| 51 | Extended thinking | thinking={'type': 'enabled', 'budget_tokens': N} gives Claude scratchpad reasoning before answering; use for hard problems. |
| 52 | Image support | Pass images as base64 or URL in message content; Claude reads diagrams, screenshots, charts, photos natively. |
| 53 | PDF support | Upload PDFs as document content blocks; Claude reads text and visual layout (tables, figures) together. |
| 54 | Citations | Citations API attaches source spans to Claude's claims so users can verify; built-in for document content blocks. |
| 55 | Prompt caching | Mark a prefix as cache_control to cache it; subsequent calls reuse the cached prefix at ~10% cost. |
| 56 | Rules of prompt caching | Cache breakpoints must be deterministic, ordered, and exact; caching breaks on any change above the breakpoint. |
| 57 | Prompt caching in action | Hands-on: cache a long system prompt + RAG context block; measure cost and latency reduction. |
| 58 | Code execution and the files API | Anthropic-hosted Python sandbox + file storage; Claude can run code and read its output back. |
| 59 | Quiz on features of Claude | Knowledge check covering extended thinking, vision, PDFs, citations, caching, code execution. |
| 60 | Introducing MCP | Model Context Protocol: open standard for LLM clients to discover and use tools, resources, and prompts from servers. |
| 61 | MCP clients | Clients (Claude Desktop, Claude Code, Cursor) speak MCP; same server works across all of them. |
| 62 | Project setup | Hands-on project frame: build an MCP server that exposes customer data to any MCP client. |
| 63 | Defining tools with MCP | Same tool concept as the API, exposed via the MCP tools/list and tools/call methods. |
| 64 | The server inspector | MCP Inspector is a dev tool that connects to your server and lets you call tools/resources/prompts manually. |
| 65 | Implementing a client | Build a Python MCP client that lists server tools and routes Claude's tool calls to the server. |
| 66 | Defining resources | Resources expose read-only context (files, records) the client can fetch on demand; not tool calls. |
| 67 | Accessing resources | Client lists resources, fetches by URI, includes content in the prompt; cleaner than ad-hoc context loading. |
| 68 | Defining prompts | MCP prompts are reusable prompt templates the server publishes; client invokes by name with arguments. |
| 69 | Prompts in the client | Hands-on: list and invoke server prompts from the Python client. |
| 70 | MCP review | Recap: tools = actions, resources = read-only context, prompts = reusable templates. Three primitives, one protocol. |
| 71 | Quiz on Model Context Protocol | Knowledge check covering tools, resources, prompts, client/server architecture. |
| 72 | Anthropic apps | Tour of Anthropic-built apps: Claude.ai, Claude Code, Computer Use; how each layers on top of the API. |
| 73 | Claude Code setup | Install Claude Code, point it at the repo, see CLAUDE.md and slash commands in their native habitat. |
| 74 | Claude Code in action | Live walkthrough of Claude Code editing a real codebase end-to-end; same harness available via SDK. |
| 75 | Enhancements with MCP servers | Add MCP servers (GitHub, Postgres, Sentry) to Claude Code or Claude Desktop and watch capability bloom. |
| 76 | Agents and workflows | Two architectures: workflows (predefined steps, model fills gaps) vs. agents (model drives flow, tool use loops). |
| 77 | Parallelization workflows | Run multiple LLM calls in parallel and aggregate; lowers latency on independent subtasks. |
| 78 | Chaining workflows | Sequential pipeline: output of one call is input to the next; classic prompt chain. |
| 79 | Routing workflows | Classify input, route to specialized prompt or tool; cheaper and more accurate than one giant prompt. |
| 80 | Agents and tools | True agents loop on tool use until stop_reason != 'tool_use'; the model decides when it's done. |
| 81 | Environment inspection | Give the agent a tool to inspect its environment (list files, read state) before it acts. |
| 82 | Workflows vs agents | Use workflows when steps are predictable; use agents when the path depends on what the model finds. |
| 83 | Quiz on agents and workflows | Knowledge check covering parallelization, chaining, routing, agent loops, when to choose each. |
| 84 | Final assessment | Course-final assessment integrating everything from messages through agents. |
| 85 | Course wrap-up | Recap and pointers to MCP Advanced, Subagents, Claude Code in Action as next courses. |
The course in 7 paragraphs
Building with the Claude API is the spine of the entire Skilljar catalog. Eighty-five lessons, fourteen sections, organized as a strict bottom-up build: messages → system prompts → streaming → structured outputs → evals → prompt engineering → tool use → RAG → features → MCP → agents and workflows. Each section assumes you have the prior. The course is heavy on hands-on Jupyter-notebook exercises and rewards code-along; passive watching loses ~60% of the value. If you can take only one Skilljar course before the exam, take this one, because it is the canonical reference every other course assumes.
The API surface itself collapses to one function and three primitives. The function is client.messages.create(model, max_tokens, messages, ...). The primitives layered on top are system prompt (role and behavior), temperature (0.0 deterministic to 1.0 creative), and streaming (token-by-token via stream=True). The conversation is *your* responsibility; Claude is stateless, you append assistant responses back into the messages list yourself. `max_tokens` is a safety cap, not a target; Claude doesn't try to fill it. Structured outputs come for free if you ask for JSON in the prompt, but for strict schemas the production-quality pattern is to use tool-calling as a structured-output mechanism, which the course transitions to in Section 6.
Evaluation is the gate between a demo and a system, and prompt engineering is what you do once that gate is open. Section 4 (Lessons 16-23) walks the eval workflow in five steps: generate a diverse test dataset (use Claude itself, then hand-curate), run the prompt against each input in parallel with rate-limit handling, grade outputs (model-based for subjective criteria, code-based for mechanical ones), score, iterate. Section 5 (Lessons 24-30) layers prompt engineering on top: be clear and direct (ambiguity is the dominant failure mode), be specific (concrete constraints beat abstract instructions), use XML tags (<document>, <example>, <question>; Claude is trained to attend to them), provide examples (multishot beats explanation), define a role, and decompose multi-step reasoning. The five-step engineering loop (goal → prompt → eval → apply technique → re-eval) is *only* possible if you have the eval pipeline from Section 4 in place; without evals, every prompt change is vibes-based. Pair Lesson 27 (XML tags) with the docs.claude.com prompt-engineering guide; both are required reading for the exam's prompt-engineering domain.
Tool use is the longest section (Lessons 31-43) and the most tested concept on the exam. The mental model: you describe tools to Claude (name, description, JSON input schema), Claude decides when to invoke them, you actually run them, you send the result back via a tool_result block, repeat. The agent loop is just `while stop_reason 'tool_use'==; when stop_reason flips to end_turn, the model is done. Three sub-skills the course emphasizes: writing good tool descriptions (the description is what Claude reads to choose), iterating message *blocks* not message *strings* (responses are arrays of text, tool_use, thinking blocks), and matching tool_use_id` between request and result. Fine-grained tool calling (Lesson 40) streams tool inputs as they're generated; important for progressive UIs but optional for first builds.
RAG is presented as a composition pattern, not a product. Lessons 44-50 walk the full pipeline: chunk documents (semantic boundaries beat naive token splits), embed chunks (vectorize with an embedding model), store in a vector DB, retrieve at query time by cosine similarity, rerank, stuff into the prompt, generate. The production move is multi-index retrieval: combine dense (embeddings, captures meaning) + sparse (BM25, captures exact terms) and rerank. The course's RAG section is video-heavy and short on code, so the actual implementation work happens in Lesson 48; treat the rest as conceptual scaffolding. Pair this section with the Citations and Prompt Caching lessons from Section 8, because production RAG always involves both.
Section 8 (Features of Claude) is where the exam-tested optimization knobs live. Prompt caching (Lessons 55-57) marks a prefix as cache_control and reuses it across calls at ~10% cost; the rules are strict; cache breakpoints must be deterministic, ordered, and exact, and any change above the breakpoint invalidates everything. Extended thinking gives Claude an explicit reasoning budget (thinking={'type': 'enabled', 'budget_tokens': N}) for hard problems. Vision (image and PDF) is a content-block extension; pass image or document blocks alongside text. Citations attach source spans to claims; built-in for document blocks. Code execution and the Files API give Claude a Python sandbox. All five of these are highly testable on the certification and stack with tool use and RAG; learning them together is faster than learning them separately.
MCP and agents close the course. Lessons 60-71 build a working MCP server and client from scratch, with three primitives: tools (actions Claude can take), resources (read-only context Claude can fetch by URI), prompts (reusable templates the server publishes). The MCP Inspector is a dev tool that connects to your server and lets you call everything manually; use it before you wire up any client. The agents section (Lessons 76-83) frames the architectural choice cleanly: workflows for predictable, predefined paths (parallelization, chaining, routing) and agents for paths that depend on what the model finds (loop on tool_use, model decides when done). The course's final claim is that workflows and agents are not opposites; production systems usually mix them, with agents inside specific workflow steps. This is the architect-role conceptual takeaway; the certification's D1 domain hinges on getting it right.
The 8 load-bearing themes across 85 lessons
Don't try to memorize every lesson. Anchor on these eight themes; the lessons fill them in.
- Messages + parameters
Concept: system-prompts ↗client.messages.create(), system prompt, temperature, max_tokens, streaming. The base layer everything else sits on. - Structured outputs
JSON-in-prompt for casual, tool-call-as-output for strict schemas. Lessons 13-14, then revisited in Section 6.
Concept: structured-outputs ↗ - Evals
Generate dataset, run, grade (model + code), score, iterate. Section 4 is the gate from demo to system.
Concept: evaluation ↗ - Prompt engineering
Clear, specific, XML-tagged, exemplified, role-defined, decomposed. Six techniques applied iteratively.
Concept: prompt-engineering-techniques ↗ - Tool use
Schema → tool_use block → run → tool_result → loop. The longest section and most tested concept.
Concept: tool-calling ↗ - RAG
Chunk, embed, retrieve, rerank, stuff, generate. Multi-index (BM25 + dense) is the production pattern.
Scenario: long-document-processing ↗ - Features (caching, vision, citations, thinking, code exec)
The optimization knobs. Highly testable; learn together because they stack.
Concept: prompt-caching ↗ - MCP + agents/workflows
MCP exposes tools/resources/prompts via protocol. Workflows (predictable) vs. agents (path depends on findings).
Concept: mcp ↗
5 concrete eval-pipeline moves from Lessons 16-23
Section 4 is short on celebrity but heavy on exam relevance. These five moves are the eval blueprint.
- Generate test data with Claude
Use a stronger model with a clear rubric to generate diverse test inputs covering edge cases. Then hand-curate.
Concept: evaluation ↗ - Run with concurrency control
Start
Concept: evaluation ↗max_concurrent_tasks=3to avoid rate limits; raise once you know your quota. Async + retry on 429. - Use model-based grading for subjective criteria
Tone, helpfulness, completeness; these are model-grader territory. Pin the grader to a stronger model than the generator.
Concept: evaluation ↗ - Use code-based grading for mechanical criteria
Schema validation, regex match, length bounds, keyword presence. Cheap, deterministic, no model in the grading loop.
Concept: evaluation ↗ - Iterate with measurable deltas
Each prompt change should lift the eval score, not just feel better. Without this discipline, prompt-engineering is vibes-based tuning.
Concept: evaluation ↗
Workflows vs. agents; when to choose each
Lessons 76-82 frame the choice; this is the D1 architect-role decision the exam tests.
- Use a workflow when the steps are predictable
Sequential chain (chaining), parallel fan-out (parallelization), classify-then-route (routing). Cheaper, more debuggable, lower variance.
Concept: agentic-loops ↗ - Use an agent when the path depends on findings
Loop on
Concept: agentic-loops ↗stop_reason == 'tool_use'. The model decides when to stop. Higher variance, higher capability ceiling. - Mix them in real systems
Agents inside specific workflow steps. The agent does the open-ended sub-task; the workflow handles deterministic before/after.
Scenario: multi-agent-research-system ↗
6 takeaways with cross-pillar bridges
client.messages.create() is the only function; model, max_tokens, and messages are the three required parameters. Claude is stateless; the conversation is your job.
Evals (model-based + code-based grading on a curated test set) are the gate between a demo and a system; without them prompt iteration is vibes-based.
Tool use is a four-step loop: describe (schema) → request (tool_use block) → run → return (tool_result block); continue while stop_reason == 'tool_use'.
RAG production-quality means multi-index retrieval; dense (embeddings) + sparse (BM25) plus a reranker; chunking strategy matters more than embedding model choice.
Prompt caching reuses a deterministic, ordered, exact prefix at ~10% cost; pair it with extended thinking, vision, citations, and code execution as the five exam-tested optimization knobs.
MCP exposes three primitives; tools (actions), resources (read-only context), prompts (reusable templates); over an open protocol; same server runs across Claude Desktop, Claude Code, Cursor.
How this maps to the CCA-F exam
3 hand-picked extras
These amplify the Skilljar course beyond what the course itself covers. Each was picked for a specific reason.
Building effective agents; Anthropic engineering
Anthropic's canonical engineering essay distinguishing workflows (chaining, routing, parallelization) from agents. Lessons 76-82 are the course version; this is the source-of-truth.
Read source ↗Prompt engineering overview; Anthropic documentation
Canonical reference for the six prompt-engineering techniques the course teaches in Lessons 24-30. Keep open while building production prompts.
Read source ↗Prompt caching; Anthropic documentation
Authoritative reference for cache breakpoint rules, eligible content, TTL behavior, and pricing math. The course's caching lessons are video-heavy; the doc is what you reference while shipping.
Read source ↗Concepts in this course
System prompts
Section 3 primitive, used everywhere
Concept: system-prompts ↗Tool calling
Longest section (31-43), most tested concept
Concept: tool-calling ↗Structured outputs
Tool-call-as-output is the strict-schema pattern
Concept: structured-outputs ↗Evaluation
Section 4, the gate from demo to system
Concept: evaluation ↗Prompt engineering techniques
Section 5, the six techniques
Concept: prompt-engineering-techniques ↗Prompt caching
Section 8 optimization, cost reduction at ~10%
Concept: prompt-caching ↗Vision and multimodal
Section 8 image and PDF support
Concept: vision-multimodal ↗Streaming
Section 3 + fine-grained tool streaming in Section 6
Concept: streaming ↗MCP
Section 9, full protocol walkthrough
Concept: mcp ↗Agentic loops
Section 11, workflows vs. agents architectural choice
Concept: agentic-loops ↗Where you'll see this in production
Long document processing
RAG section (44-50) plus PDF support and citations
Scenario: long-document-processing ↗Structured data extraction
Section 3 structured outputs + Section 6 tool-call-as-output
Scenario: structured-data-extraction ↗Agentic tool design
Section 6 tool use end-to-end, the most exam-relevant scenario
Scenario: agentic-tool-design ↗Customer support resolution agent
Section 11 agents + tool use composition
Scenario: customer-support-resolution-agent ↗Other course mirrors you may want next
8 questions answered
Phrased as the way real students search. Tagged by intent so you can scan to what you actually need.
DefinitionWhat does client messages create do in the Anthropic Python SDK?
model (e.g. claude-sonnet-4-0), max_tokens (a safety cap, not a target), and messages (a list of {role, content} dicts). Optional params like system, temperature, stream, tools, and tool_choice shape the call. Claude is stateless; every call sends the full conversation, you append assistant responses back into messages yourself.ComparisonWhen should I use a system prompt versus put the same instructions in the user message?
system parameter (not inside messages) and are weighted slightly higher in attention, which makes them the right place for guardrails and persona.How-toHow does tool use actually work end-to-end with the Claude API?
tools=[...] describing each tool with name, description, and input_schema. Step 2: Claude returns a response containing a tool_use content block with the tool name and arguments; stop_reason will be tool_use. Step 3: you run the tool yourself and capture its output. Step 4: append a user message with a tool_result block matching the tool_use_id, then call messages.create() again. Loop while stop_reason == 'tool_use'.TroubleshootWhy is my prompt caching not actually saving any tokens?
cache_control block invalidates the entire cache. Second, the cached content is below the minimum cache size (1024 tokens for most models). Third, calls are spaced more than the cache TTL apart (default 5 minutes for ephemeral caching). Check the cache_creation_input_tokens vs. cache_read_input_tokens counts in the response usage to confirm hits.ComparisonWhat is the difference between a workflow and an agent in the Claude API?
ScopeDo I need a vector database to do RAG with Claude?
ComparisonIs MCP the same thing as tool use in the Claude API?
messages.create() call and run them yourself. MCP is a protocol that lets a Claude client (Desktop, Claude Code, Cursor) discover and call tools, fetch resources, and use prompts from an external server. MCP servers expose tools that *become* tool-use entries inside the API call the client makes. So MCP is upstream of tool use: it's how the tools get registered with the client; tool use is how they get called.