Pillar 4 · Knowledge · Intermediate

Building with the Claude API: Foundations to Agents.

Building with the Claude API is the comprehensive bottom-up tour of the Anthropic SDK: messages, system prompts, streaming, structured outputs, evals, prompt engineering, tool use, RAG, MCP, prompt caching, vision, citations, and finally agents and workflows. Eighty-five lessons across fourteen sections, organized so each capability builds on the prior. Treat it as the canonical API reference course; every other course in the catalog assumes you know what is here.

85 Skilljar lessons·~480 min on Skilljar·D2 + D1 + D4 + D5

Mirrors Anthropic's Building with the Claude API course on Skilljar.

Original course85 lessons · ~480 min
Building with the Claude API
Take it on Anthropic Skilljar ↗
Building with the Claude API: Foundations to Agents, painterly hero showing the course's central concept with the Loop mascot as guide.
01 · What you'll learn

You'll walk away with

  1. How client.messages.create() works, including system prompts, temperature, streaming, and structured outputs
  2. How to build a prompt-evaluation pipeline with model-based and code-based grading on a curated test set
  3. Six prompt-engineering techniques: clarity, specificity, XML tags, examples, role, decomposition
  4. How tool use works end-to-end: schemas, message blocks, tool results, multi-turn tool calling, fine-grained streaming
  5. How RAG composes (chunking, embeddings, BM25, multi-index reranking) and how prompt caching, vision, citations stack on top
  6. How MCP exposes tools/resources/prompts, and when to choose workflows (chaining, routing, parallelization) versus full agents
02 · Prerequisites

Read these first

03 · The course mirror

Lesson outline

Every lesson from Building with the Claude API with our one-line simplification. The Skilljar course is the source; we summarize.

Show all 85 lessons
#Skilljar lessonOur simplification
1Welcome to the courseCourse frame: bottom-up API tour from first request to full agent architectures.
2Overview of Claude modelsOpus, Sonnet, Haiku tiers; pick by capability vs. cost vs. latency tradeoff.
3Accessing the APIAnthropic Console, direct API, AWS Bedrock, Google Vertex; same model, different access path.
4Getting an API keyGenerate a key from console.anthropic.com; never commit it; use .env plus python-dotenv.
5Making a requestclient.messages.create(model, max_tokens, messages); the three required params; max_tokens is a safety cap, not a target.
6Multi-turn conversationsAppend assistant responses to your messages list; the model is stateless, the conversation is your responsibility.
7Chat exerciseHands-on: build a minimal CLI chatbot with multi-turn message history.
8System promptsPass system="..." to set role, behavior, constraints; system prompts are not in the messages array.
9System prompts exerciseHands-on: write a system prompt that turns Claude into a Socratic math tutor without giving direct answers.
10Temperature0.0 = deterministic and focused; 1.0 = creative and varied. Default 1.0; lower for extraction, higher for ideation.
11Course satisfaction surveyMid-course survey checkpoint, no technical content.
12Response streamingUse stream=True and iterate events; ship tokens to the UI as they arrive instead of waiting for the full response.
13Structured dataAsk for JSON in the prompt and parse; for strict schemas use tool-calling as a structured-output mechanism.
14Structured data exerciseHands-on: extract structured fields (name, date, amount) from unstructured invoice text.
15Quiz on accessing Claude with the APIKnowledge check covering messages, system prompts, temperature, streaming, structured data.
16Prompt evaluationDon't ship a prompt without evaluating it; evals are the difference between a demo and a system.
17A typical eval workflowGenerate test dataset, run prompt against each, grade outputs, score, iterate.
18Generating test datasetsUse Claude itself to generate diverse test inputs covering edge cases, then hand-curate.
19Running the evalRun prompt against the test set in parallel with rate-limit handling; collect raw outputs.
20Model-based gradingUse a grader prompt (often a stronger model) to score outputs against rubric; cheap and flexible.
21Code-based gradingProgrammatic checks: regex, schema validation, exact match. Use when criteria are mechanical, model-grade when subjective.
22Exercise on prompt evalsHands-on: build an eval pipeline for a meal-plan prompt with mixed code-based and model-based grading.
23Quiz on prompt evaluationKnowledge check covering eval workflow, dataset generation, model vs. code grading.
24Prompt engineeringIterative loop: goal → initial prompt → eval → apply technique → re-eval; repeat until you hit your bar.
25Being clear and directTell Claude exactly what you want; ambiguity is the most common failure mode.
26Being specificConcrete constraints beat abstract instructions; "three bullet points, max 15 words each" beats "be concise".
27Structure with XML tagsWrap inputs in <document>, <example>, <question> tags; Claude is trained to attend to XML structure.
28Providing examplesMultishot beats explanation; one or two well-chosen examples shape output style faster than five paragraphs of description.
29Exercise on promptingHands-on: take a weak prompt and apply the four engineering techniques to lift its eval score.
30Quiz on prompt engineering techniquesKnowledge check covering clarity, specificity, XML, examples, role, decomposition.
31Introducing tool useTool use lets Claude call your functions; you describe the tool, Claude decides when to invoke it, you run it and return the result.
32Project overviewCourse-long project frame: build a customer-data agent using tool use end-to-end.
33Tool functionsWrite the actual Python functions Claude will call; pure logic, no Anthropic-specific glue.
34Tool schemasJSON schema with name, description, input_schema; the description is what Claude uses to decide when to call.
35Handling message blocksResponse content is a list of blocks: text, tool_use, thinking. Iterate the list, don't index [0].
36Sending tool resultsAfter running the tool, send a tool_result block in the next user message with the matching tool_use_id.
37Multi-turn conversations with toolsLoop: Claude requests tool → you run it → you append result → Claude responds or requests another tool.
38Implementing multiple turnsHands-on: write the agent loop with stop_reason == 'tool_use' as the continuation signal.
39Using multiple toolsPass an array of tools; Claude picks based on tool descriptions; use tool_choice to force or restrict selection.
40Fine-grained tool callingStream tool inputs token-by-token as Claude generates them; useful for long arguments and progressive UIs.
41The text edit toolAnthropic-defined tool that lets Claude edit files via structured ops (view, create, str_replace, insert).
42The web search toolAnthropic-hosted web search tool; Claude issues search queries and you get cited results back.
43Quiz on tool use with ClaudeKnowledge check covering schemas, message blocks, tool results, multi-tool selection, fine-grained streaming.
44Introducing retrieval augmented generationRAG = retrieve relevant context from your data, stuff it into the prompt, let Claude answer with citations.
45Text chunking strategiesSplit documents into chunks; size and overlap matter; semantic boundaries beat naive token splits.
46Text embeddingsVectorize chunks with an embedding model; cosine similarity finds semantically near chunks at query time.
47The full RAG flowIngest → chunk → embed → store. Query → embed → search → rerank → stuff prompt → generate.
48Implementing the RAG flowHands-on: build the ingest and query path with a vector DB and embedding API.
49BM25 lexical searchKeyword-based ranking that complements semantic search; catches exact terms semantic search misses.
50A multi-index RAG pipelineCombine dense (embeddings) + sparse (BM25) retrieval and rerank; the production-quality pattern.
51Extended thinkingthinking={'type': 'enabled', 'budget_tokens': N} gives Claude scratchpad reasoning before answering; use for hard problems.
52Image supportPass images as base64 or URL in message content; Claude reads diagrams, screenshots, charts, photos natively.
53PDF supportUpload PDFs as document content blocks; Claude reads text and visual layout (tables, figures) together.
54CitationsCitations API attaches source spans to Claude's claims so users can verify; built-in for document content blocks.
55Prompt cachingMark a prefix as cache_control to cache it; subsequent calls reuse the cached prefix at ~10% cost.
56Rules of prompt cachingCache breakpoints must be deterministic, ordered, and exact; caching breaks on any change above the breakpoint.
57Prompt caching in actionHands-on: cache a long system prompt + RAG context block; measure cost and latency reduction.
58Code execution and the files APIAnthropic-hosted Python sandbox + file storage; Claude can run code and read its output back.
59Quiz on features of ClaudeKnowledge check covering extended thinking, vision, PDFs, citations, caching, code execution.
60Introducing MCPModel Context Protocol: open standard for LLM clients to discover and use tools, resources, and prompts from servers.
61MCP clientsClients (Claude Desktop, Claude Code, Cursor) speak MCP; same server works across all of them.
62Project setupHands-on project frame: build an MCP server that exposes customer data to any MCP client.
63Defining tools with MCPSame tool concept as the API, exposed via the MCP tools/list and tools/call methods.
64The server inspectorMCP Inspector is a dev tool that connects to your server and lets you call tools/resources/prompts manually.
65Implementing a clientBuild a Python MCP client that lists server tools and routes Claude's tool calls to the server.
66Defining resourcesResources expose read-only context (files, records) the client can fetch on demand; not tool calls.
67Accessing resourcesClient lists resources, fetches by URI, includes content in the prompt; cleaner than ad-hoc context loading.
68Defining promptsMCP prompts are reusable prompt templates the server publishes; client invokes by name with arguments.
69Prompts in the clientHands-on: list and invoke server prompts from the Python client.
70MCP reviewRecap: tools = actions, resources = read-only context, prompts = reusable templates. Three primitives, one protocol.
71Quiz on Model Context ProtocolKnowledge check covering tools, resources, prompts, client/server architecture.
72Anthropic appsTour of Anthropic-built apps: Claude.ai, Claude Code, Computer Use; how each layers on top of the API.
73Claude Code setupInstall Claude Code, point it at the repo, see CLAUDE.md and slash commands in their native habitat.
74Claude Code in actionLive walkthrough of Claude Code editing a real codebase end-to-end; same harness available via SDK.
75Enhancements with MCP serversAdd MCP servers (GitHub, Postgres, Sentry) to Claude Code or Claude Desktop and watch capability bloom.
76Agents and workflowsTwo architectures: workflows (predefined steps, model fills gaps) vs. agents (model drives flow, tool use loops).
77Parallelization workflowsRun multiple LLM calls in parallel and aggregate; lowers latency on independent subtasks.
78Chaining workflowsSequential pipeline: output of one call is input to the next; classic prompt chain.
79Routing workflowsClassify input, route to specialized prompt or tool; cheaper and more accurate than one giant prompt.
80Agents and toolsTrue agents loop on tool use until stop_reason != 'tool_use'; the model decides when it's done.
81Environment inspectionGive the agent a tool to inspect its environment (list files, read state) before it acts.
82Workflows vs agentsUse workflows when steps are predictable; use agents when the path depends on what the model finds.
83Quiz on agents and workflowsKnowledge check covering parallelization, chaining, routing, agent loops, when to choose each.
84Final assessmentCourse-final assessment integrating everything from messages through agents.
85Course wrap-upRecap and pointers to MCP Advanced, Subagents, Claude Code in Action as next courses.
04 · Our simplification

The course in 7 paragraphs

Building with the Claude API is the spine of the entire Skilljar catalog. Eighty-five lessons, fourteen sections, organized as a strict bottom-up build: messages → system prompts → streaming → structured outputs → evals → prompt engineering → tool use → RAG → features → MCP → agents and workflows. Each section assumes you have the prior. The course is heavy on hands-on Jupyter-notebook exercises and rewards code-along; passive watching loses ~60% of the value. If you can take only one Skilljar course before the exam, take this one, because it is the canonical reference every other course assumes.

The API surface itself collapses to one function and three primitives. The function is client.messages.create(model, max_tokens, messages, ...). The primitives layered on top are system prompt (role and behavior), temperature (0.0 deterministic to 1.0 creative), and streaming (token-by-token via stream=True). The conversation is *your* responsibility; Claude is stateless, you append assistant responses back into the messages list yourself. `max_tokens` is a safety cap, not a target; Claude doesn't try to fill it. Structured outputs come for free if you ask for JSON in the prompt, but for strict schemas the production-quality pattern is to use tool-calling as a structured-output mechanism, which the course transitions to in Section 6.

Evaluation is the gate between a demo and a system, and prompt engineering is what you do once that gate is open. Section 4 (Lessons 16-23) walks the eval workflow in five steps: generate a diverse test dataset (use Claude itself, then hand-curate), run the prompt against each input in parallel with rate-limit handling, grade outputs (model-based for subjective criteria, code-based for mechanical ones), score, iterate. Section 5 (Lessons 24-30) layers prompt engineering on top: be clear and direct (ambiguity is the dominant failure mode), be specific (concrete constraints beat abstract instructions), use XML tags (<document>, <example>, <question>; Claude is trained to attend to them), provide examples (multishot beats explanation), define a role, and decompose multi-step reasoning. The five-step engineering loop (goal → prompt → eval → apply technique → re-eval) is *only* possible if you have the eval pipeline from Section 4 in place; without evals, every prompt change is vibes-based. Pair Lesson 27 (XML tags) with the docs.claude.com prompt-engineering guide; both are required reading for the exam's prompt-engineering domain.

Tool use is the longest section (Lessons 31-43) and the most tested concept on the exam. The mental model: you describe tools to Claude (name, description, JSON input schema), Claude decides when to invoke them, you actually run them, you send the result back via a tool_result block, repeat. The agent loop is just `while stop_reason 'tool_use'==; when stop_reason flips to end_turn, the model is done. Three sub-skills the course emphasizes: writing good tool descriptions (the description is what Claude reads to choose), iterating message *blocks* not message *strings* (responses are arrays of text, tool_use, thinking blocks), and matching tool_use_id` between request and result. Fine-grained tool calling (Lesson 40) streams tool inputs as they're generated; important for progressive UIs but optional for first builds.

RAG is presented as a composition pattern, not a product. Lessons 44-50 walk the full pipeline: chunk documents (semantic boundaries beat naive token splits), embed chunks (vectorize with an embedding model), store in a vector DB, retrieve at query time by cosine similarity, rerank, stuff into the prompt, generate. The production move is multi-index retrieval: combine dense (embeddings, captures meaning) + sparse (BM25, captures exact terms) and rerank. The course's RAG section is video-heavy and short on code, so the actual implementation work happens in Lesson 48; treat the rest as conceptual scaffolding. Pair this section with the Citations and Prompt Caching lessons from Section 8, because production RAG always involves both.

Section 8 (Features of Claude) is where the exam-tested optimization knobs live. Prompt caching (Lessons 55-57) marks a prefix as cache_control and reuses it across calls at ~10% cost; the rules are strict; cache breakpoints must be deterministic, ordered, and exact, and any change above the breakpoint invalidates everything. Extended thinking gives Claude an explicit reasoning budget (thinking={'type': 'enabled', 'budget_tokens': N}) for hard problems. Vision (image and PDF) is a content-block extension; pass image or document blocks alongside text. Citations attach source spans to claims; built-in for document blocks. Code execution and the Files API give Claude a Python sandbox. All five of these are highly testable on the certification and stack with tool use and RAG; learning them together is faster than learning them separately.

MCP and agents close the course. Lessons 60-71 build a working MCP server and client from scratch, with three primitives: tools (actions Claude can take), resources (read-only context Claude can fetch by URI), prompts (reusable templates the server publishes). The MCP Inspector is a dev tool that connects to your server and lets you call everything manually; use it before you wire up any client. The agents section (Lessons 76-83) frames the architectural choice cleanly: workflows for predictable, predefined paths (parallelization, chaining, routing) and agents for paths that depend on what the model finds (loop on tool_use, model decides when done). The course's final claim is that workflows and agents are not opposites; production systems usually mix them, with agents inside specific workflow steps. This is the architect-role conceptual takeaway; the certification's D1 domain hinges on getting it right.

05 · Listicle pattern

The 8 load-bearing themes across 85 lessons

Don't try to memorize every lesson. Anchor on these eight themes; the lessons fill them in.

  1. Messages + parameters

    client.messages.create(), system prompt, temperature, max_tokens, streaming. The base layer everything else sits on.

    Concept: system-prompts
  2. Structured outputs

    JSON-in-prompt for casual, tool-call-as-output for strict schemas. Lessons 13-14, then revisited in Section 6.

    Concept: structured-outputs
  3. Evals

    Generate dataset, run, grade (model + code), score, iterate. Section 4 is the gate from demo to system.

    Concept: evaluation
  4. Prompt engineering

    Clear, specific, XML-tagged, exemplified, role-defined, decomposed. Six techniques applied iteratively.

    Concept: prompt-engineering-techniques
  5. Tool use

    Schema → tool_use block → run → tool_result → loop. The longest section and most tested concept.

    Concept: tool-calling
  6. RAG

    Chunk, embed, retrieve, rerank, stuff, generate. Multi-index (BM25 + dense) is the production pattern.

    Scenario: long-document-processing
  7. Features (caching, vision, citations, thinking, code exec)

    The optimization knobs. Highly testable; learn together because they stack.

    Concept: prompt-caching
  8. MCP + agents/workflows

    MCP exposes tools/resources/prompts via protocol. Workflows (predictable) vs. agents (path depends on findings).

    Concept: mcp
06 · Listicle pattern

5 concrete eval-pipeline moves from Lessons 16-23

Section 4 is short on celebrity but heavy on exam relevance. These five moves are the eval blueprint.

  1. Generate test data with Claude

    Use a stronger model with a clear rubric to generate diverse test inputs covering edge cases. Then hand-curate.

    Concept: evaluation
  2. Run with concurrency control

    Start max_concurrent_tasks=3 to avoid rate limits; raise once you know your quota. Async + retry on 429.

    Concept: evaluation
  3. Use model-based grading for subjective criteria

    Tone, helpfulness, completeness; these are model-grader territory. Pin the grader to a stronger model than the generator.

    Concept: evaluation
  4. Use code-based grading for mechanical criteria

    Schema validation, regex match, length bounds, keyword presence. Cheap, deterministic, no model in the grading loop.

    Concept: evaluation
  5. Iterate with measurable deltas

    Each prompt change should lift the eval score, not just feel better. Without this discipline, prompt-engineering is vibes-based tuning.

    Concept: evaluation
07 · Listicle pattern

Workflows vs. agents; when to choose each

Lessons 76-82 frame the choice; this is the D1 architect-role decision the exam tests.

  1. Use a workflow when the steps are predictable

    Sequential chain (chaining), parallel fan-out (parallelization), classify-then-route (routing). Cheaper, more debuggable, lower variance.

    Concept: agentic-loops
  2. Use an agent when the path depends on findings

    Loop on stop_reason == 'tool_use'. The model decides when to stop. Higher variance, higher capability ceiling.

    Concept: agentic-loops
  3. Mix them in real systems

    Agents inside specific workflow steps. The agent does the open-ended sub-task; the workflow handles deterministic before/after.

    Scenario: multi-agent-research-system
08 · Key takeaways

6 takeaways with cross-pillar bridges

client.messages.create() is the only function; model, max_tokens, and messages are the three required parameters. Claude is stateless; the conversation is your job.

Evals (model-based + code-based grading on a curated test set) are the gate between a demo and a system; without them prompt iteration is vibes-based.

Tool use is a four-step loop: describe (schema) → request (tool_use block) → run → return (tool_result block); continue while stop_reason == 'tool_use'.

RAG production-quality means multi-index retrieval; dense (embeddings) + sparse (BM25) plus a reranker; chunking strategy matters more than embedding model choice.

Prompt caching reuses a deterministic, ordered, exact prefix at ~10% cost; pair it with extended thinking, vision, citations, and code execution as the five exam-tested optimization knobs.

MCP exposes three primitives; tools (actions), resources (read-only context), prompts (reusable templates); over an open protocol; same server runs across Claude Desktop, Claude Code, Cursor.

09 · Exam mapping

How this maps to the CCA-F exam

Domains
D2 Tool Design + Integration · D1 Agentic Architectures · D4 Prompt Engineering · D5 Context + Reliability
Blueprint
18% (D2) + heavy spillover into D1 / D4 / D5
What it advances
The most exam-relevant single course in the catalog. Direct prep for D2 tool design, D1 agentic loops, D4 prompt engineering, and D5 context features (caching, batch, streaming, citations, vision). If you can only take one Skilljar course before the exam, this is it.
10 · Curated supplementary sources

3 hand-picked extras

These amplify the Skilljar course beyond what the course itself covers. Each was picked for a specific reason.

11 · Concepts wired

Concepts in this course

12 · Scenarios in play

Where you'll see this in production

13 · Sibling Knowledge

Other course mirrors you may want next

14 · AEO FAQ

8 questions answered

Phrased as the way real students search. Tagged by intent so you can scan to what you actually need.

DefinitionWhat does client messages create do in the Anthropic Python SDK?
`client.messages.create()` is the single API function for all Claude generation. You pass model (e.g. claude-sonnet-4-0), max_tokens (a safety cap, not a target), and messages (a list of {role, content} dicts). Optional params like system, temperature, stream, tools, and tool_choice shape the call. Claude is stateless; every call sends the full conversation, you append assistant responses back into messages yourself.
ComparisonWhen should I use a system prompt versus put the same instructions in the user message?
Use a system prompt for stable role and behavior that doesn't change turn-to-turn; "you are a math tutor", "respond in JSON", "never reveal internal IDs". Put turn-specific instructions in the user message. System prompts are passed as the system parameter (not inside messages) and are weighted slightly higher in attention, which makes them the right place for guardrails and persona.
How-toHow does tool use actually work end-to-end with the Claude API?
Four steps. Step 1: pass tools=[...] describing each tool with name, description, and input_schema. Step 2: Claude returns a response containing a tool_use content block with the tool name and arguments; stop_reason will be tool_use. Step 3: you run the tool yourself and capture its output. Step 4: append a user message with a tool_result block matching the tool_use_id, then call messages.create() again. Loop while stop_reason == 'tool_use'.
TroubleshootWhy is my prompt caching not actually saving any tokens?
Three common causes. First, your cache breakpoint isn't above truly stable content; any change above the cache_control block invalidates the entire cache. Second, the cached content is below the minimum cache size (1024 tokens for most models). Third, calls are spaced more than the cache TTL apart (default 5 minutes for ephemeral caching). Check the cache_creation_input_tokens vs. cache_read_input_tokens counts in the response usage to confirm hits.
ComparisonWhat is the difference between a workflow and an agent in the Claude API?
A workflow has predefined steps; the model fills in the content of each step but does not choose the path. Examples: chaining (step A → step B → step C), routing (classify input, dispatch to specialized prompt), parallelization (fan out, aggregate). An agent loops on tool use; the model decides what to call next based on what it has found, and decides when to stop. Use workflows for predictability and lower cost; use agents when the path depends on what the model discovers.
ScopeDo I need a vector database to do RAG with Claude?
Not for small corpora. If your data fits in the context window (with prompt caching to keep it cheap), you can skip retrieval entirely and stuff everything in. For larger corpora, yes; and the production pattern is multi-index: a vector DB for semantic similarity *plus* BM25 keyword search, with results reranked. Pinecone, Weaviate, pgvector, Turbopuffer all work; Anthropic doesn't ship a vector DB.
ComparisonIs MCP the same thing as tool use in the Claude API?
Related but distinct. Tool use is an API feature where you describe tools in your messages.create() call and run them yourself. MCP is a protocol that lets a Claude client (Desktop, Claude Code, Cursor) discover and call tools, fetch resources, and use prompts from an external server. MCP servers expose tools that *become* tool-use entries inside the API call the client makes. So MCP is upstream of tool use: it's how the tools get registered with the client; tool use is how they get called.
ScopeHow long does it take to complete the Building with the Claude API course?
Skilljar estimates ~8 hours of video and exercises across 85 lessons, plus several hands-on coding exercises that double the wall-clock time if you actually code along. The course is heavy on Jupyter-notebook walkthroughs; passive watching loses ~60% of the value. Plan two full work-day blocks if you want to internalize tool use, RAG, and the agent/workflow distinction at exam-prep depth.
Last reviewed: 2026-05-06·Refresh cadence: 60 days; Anthropic API surface evolves quickly and the Skilljar course is updated frequently·View on Skilljar ↗
K · Intermediate · D2 · Tool Design + Integration

Building with the Claude API: Foundations to Agents, complete.

You've covered the full ten-section breakdown for this primitive, definition, mechanics, code, false positives, comparison, decision tree, exam patterns, and FAQ. One technical primitive down on the path to CCA-F.

Share your win →