Pillar 4 · Knowledge · Intermediate

Claude with Google Cloud Vertex AI: Deployment + GCP Integration.

This 93-lesson course teaches the same Claude API surface as claude-api-foundations but accessed through Google Cloud's Vertex AI rather than the direct Anthropic API. The deployment-specific substance is the AnthropicVertex SDK, gcloud Application Default Credentials auth, Model Garden enablement, project + region binding, and the regional model-availability model. Everything else (prompt engineering, evals, tool use, RAG, MCP, agents) mirrors Course 6 lesson-for-lesson.

93 Skilljar lessons·~480 min on Skilljar·D5 + D2

Mirrors Anthropic's Claude with Google Cloud's Vertex AI course on Skilljar.

Original course93 lessons · ~480 min
Claude with Google Cloud's Vertex AI
Take it on Anthropic Skilljar ↗
Claude with Google Cloud Vertex AI: Deployment + GCP Integration, painterly hero showing the course's central concept with the Loop mascot as guide.
01 · What you'll learn

You'll walk away with

  1. How to enable Claude models in Vertex AI Model Garden and authenticate via gcloud Application Default Credentials
  2. How the AnthropicVertex Python SDK differs from the direct Anthropic SDK (project_id, region, model name format)
  3. Which Claude features ride on top of Vertex unchanged (prompt caching, vision, PDF support, citations, extended thinking, batch)
  4. How regional model availability and quota management work on Vertex compared to the direct API
  5. How IAM, VPC Service Controls, and Cloud Logging fit into a Vertex-hosted Claude deployment
  6. When to choose Vertex AI vs the direct Anthropic API vs Amazon Bedrock for a given workload
02 · Prerequisites

Read these first

03 · The course mirror

Lesson outline

Every lesson from Claude with Google Cloud's Vertex AI with our one-line simplification. The Skilljar course is the source; we summarize.

Show all 93 lessons
#Skilljar lessonOur simplification
1Welcome to the courseCourse intro and what Vertex AI adds vs the direct Anthropic API.
2Overview of Claude modelsModel family overview; same models, accessed via Vertex.
3Accessing the APIRequest lifecycle: client to server to Vertex to model and back; never call from browser.
4Vertex AI SetupDEPLOYMENT-SPECIFIC: enable Anthropic models in Model Garden, install gcloud CLI, run gcloud auth application-default login.
5Making a requestDEPLOYMENT-SPECIFIC: pip install anthropic[vertex], instantiate AnthropicVertex(region=..., project_id=...), model id format claude-sonnet-4@20250514.
6Multi-turn conversationsAppend assistant + user messages to maintain dialogue state; mirrors Course 6.
7Chat exerciseHands-on: build a minimal chat loop against Vertex.
8System promptsSet role and behavior via the system parameter; identical semantics to direct API.
9System prompts exerciseHands-on: experiment with system prompt variations.
10TemperatureSampling control 0 to 1; lower for deterministic, higher for creative.
11Course satisfaction surveyMid-course feedback prompt.
12Response streamingStream tokens as they generate via SSE; reduces TTFT for chat UIs.
13Controlling model outputmax_tokens, stop_sequences, and response shaping basics.
14Structured dataCoax JSON via prompting; introduce schema thinking before tool use.
15Structured data exerciseHands-on: extract structured fields with prompted JSON.
16Quiz on accessing Claude with the APISection quiz on auth, request shape, and response handling.
17Prompt evaluationWhy systematic eval beats vibes; same content as Course 6.
18A typical eval workflowTest set + grader + iteration loop; canonical pattern.
19Generating test datasetsUse Claude itself (via Vertex) to synthesize test inputs at scale.
20Running the evalLoop the test set through your prompt and capture outputs.
21Model-based gradingLLM-as-judge: rubric-driven grading with a separate Claude call.
22Code-based gradingDeterministic graders for format, regex, schema validation.
23Exercise on prompt evalsHands-on: stand up a small eval harness.
24Quiz on prompt evaluationSection quiz on eval workflow.
25Prompt engineeringIntro to the canonical Anthropic prompt-engineering techniques.
26Being clear and directState the task plainly; ambiguity costs more than verbosity.
27Being specificSpecificity collapses the response space; vague asks invite drift.
28Structure with XML tagsXML tags as structural anchors Claude attends to reliably.
29Providing examplesFew-shot examples in the prompt steer style and format.
30Exercise on promptingHands-on: refactor a weak prompt using the four techniques.
31Quiz on prompt engineering techniquesSection quiz on the prompting toolkit.
32Introducing tool useTools let Claude call your functions; same protocol on Vertex.
33Project overviewMulti-tool project setup for the section.
34Tool functionsDefine the Python functions Claude will call.
35Tool schemasJSON schema for tool inputs; the model only sees this schema.
36Handling message blocksIterate the response content blocks (text + tool_use) and dispatch.
37Sending tool resultsAppend tool_result messages and re-call the model to continue.
38Multi-turn conversations with toolsMaintain the agentic loop across multiple tool calls.
39Implementing multiple turnsHands-on: code the loop with stop_reason guards.
40Using multiple toolsRegister a tool list; let Claude pick the right one per turn.
41The batch toolSubmit many requests at once for cost-and-throughput wins.
42Tools for structured dataTool schemas as the cleanest path to structured outputs.
43The text edit toolBuilt-in text-editor tool for file-edit workflows.
44The web search toolBuilt-in web search tool (note: availability differs across deployment platforms; check Vertex docs).
45Quiz on tool use with ClaudeSection quiz on tool-use mechanics.
46Introducing retrieval-augmented generationRAG = retrieve relevant docs and add them to context; Knowledge mitigation.
47Text chunking strategiesFixed-size, sentence-boundary, and semantic chunking tradeoffs.
48Text embeddingsDense vector representations for semantic search; on GCP often Vertex-hosted embedding models.
49The full RAG flowEmbed -> store -> retrieve top-k -> augment prompt -> generate.
50Implementing the RAG flowHands-on RAG pipeline.
51BM25 lexical searchKeyword/term-frequency retrieval as a complement to embedding search.
52A multi-index RAG pipelineCombine BM25 + embeddings with reciprocal rank fusion.
53Reranking resultsCross-encoder rerank of top-k retrievals before context insertion.
54Contextual retrievalAnthropic's contextual retrieval: prepend a chunk-aware summary before embedding.
55Quiz on retrieval-augmented generationSection quiz on RAG components.
56Extended thinkingReasoning mode where the model thinks before answering; surfaced as a content block.
57Image supportVision: pass images as base64 or URL content blocks.
58PDF supportNative PDF inputs; large docs feed the Working Memory cliff.
59CitationsBuilt-in citations: model returns spans tying claims back to source docs.
60Prompt cachingCache stable prefixes (system prompt, RAG context) for major cost wins.
61Rules of prompt cachingTTLs, breakpoints, minimum sizes; cache-hit accounting on Vertex.
62Prompt caching in actionHands-on: measure cache-hit savings on a realistic workload.
63Quiz on features of ClaudeSection quiz on cross-cutting features.
64Introducing MCPModel Context Protocol: standard for connecting tools/data to Claude.
65MCP clientsClaude Desktop, Claude Code, custom clients; all speak the same protocol.
66Project setupStand up an MCP server scaffold for the section.
67Defining tools with MCPExpose tools through MCP rather than per-app tool schemas.
68The server inspectorAnthropic's MCP inspector for debugging server output.
69Implementing a clientBuild a custom MCP client around Claude on Vertex.
70Defining resourcesMCP resources: model-pull data sources.
71Accessing resourcesWire resources into the client and let Claude read them.
72Defining promptsMCP prompts: server-provided prompt templates.
73Prompts in the clientSurface MCP-served prompts in client UI.
74MCP reviewRecap of tools/resources/prompts split.
75Quiz on Model Context ProtocolSection quiz on MCP architecture.
76Anthropic appsOverview of Claude Desktop and Claude Code as Vertex-compatible clients.
77Claude Code setupInstall + configure Claude Code; works against Vertex-backed deployments.
78Claude Code in actionLive coding session demoing common workflows.
79Enhancements with MCP serversPlug MCP servers into Claude Code for repo/db/issue access.
80Parallelizing Claude Codegit worktrees + multiple sessions for parallel feature work.
81Automated debuggingSubagent-driven bug repro and fix loop.
82Computer useComputer-use tool: Claude controls a virtual desktop via screenshots + actions.
83How computer use worksAction loop: screenshot -> reason -> click/type -> repeat.
84Agents and workflowsDistinction: workflows are scripted, agents choose their own path.
85Parallelization workflowsFan-out workflow: same task across many inputs in parallel.
86Chaining workflowsSequential workflow: output of step N feeds step N+1.
87Routing workflowsClassifier-driven routing to specialized downstream prompts.
88Agents and toolsAgentic loop with a curated tool whitelist; same primitives on Vertex.
89Environment inspectionLet the agent probe its environment before acting.
90Workflows vs agentsChoose workflow when steps are known; choose agent when path is open.
91Quiz on agents and workflowsSection quiz on agent design.
92Final assessment quizEnd-of-course assessment across all sections.
93Course wrap-upRecap and pointers to deeper Vertex + Anthropic resources.
04 · Our simplification

The course in 7 paragraphs

This course is the platform-agnostic Claude API course (Course 6 `claude-api-foundations`) wrapped in Google Cloud; same prompt engineering, same eval workflow, same tool-use mechanics, same RAG patterns, same MCP protocol, same agent design. If you have done Course 6, roughly 85 of the 93 lessons will feel familiar verbatim. The ~8 lessons that *justify a separate Knowledge page* are the deployment seam: how you authenticate, how you address models, how regions and quotas work, and how Vertex's enterprise controls (IAM, VPC-SC, Cloud Logging) fit on top. This page focuses on those seams; for everything else, lean on claude-api-foundations as the canonical reference.

Authentication on Vertex is gcloud-mediated, not API-key-mediated. You install the gcloud CLI, run gcloud init and gcloud auth application-default login, set a project with gcloud config set project YOUR_PROJECT_ID, and from then on the AnthropicVertex SDK picks up Application Default Credentials automatically. There is no ANTHROPIC_API_KEY. The implication for your architecture is that auth is bound to a Google Cloud identity (a user account in dev, a service account in production), which means your IAM model becomes the security boundary. Grant roles/aiplatform.user to the service account, scope it to the project, and rotate via Google Cloud's normal service-account-key lifecycle (or Workload Identity Federation if you are running outside GCP).

The SDK surface differs in three load-bearing ways. First, the import: from anthropic import AnthropicVertex instead of from anthropic import Anthropic. Second, the constructor: AnthropicVertex(region="global", project_id="your-project-id"); the region and project_id are mandatory and bind every request to a specific Vertex tenant. Third, the model id format: Vertex uses `claude-sonnet-4@20250514` rather than `claude-sonnet-4-20250514` (note the @ instead of trailing dash). Everything below those three lines (messages.create, content blocks, tool schemas, streaming, prompt caching) is byte-for-byte identical to the direct API. pip install "anthropic[vertex]" pulls the right extras.

Regional model availability is a real operational concern. Not every Claude model is hosted in every Vertex region; new models often launch in us-east5 or us-central1 first and roll out elsewhere over weeks. The Vertex region="global" setting routes to the nearest available region and is usually the right default for production unless you have data-residency constraints. If you do need a specific region (EU data residency, regulated workloads), check the Model Garden listing for that region before you commit; a model that exists in `us-east5` will return a `not found` error in `europe-west4` even though both are valid Vertex regions. Quotas are per-project per-region and are managed in the Cloud Console under Quotas & system limits; default quotas are conservative and you will likely raise them before production traffic.

Enterprise controls ride on top of Vertex unchanged. This is the main reason customers choose Vertex over the direct Anthropic API: VPC Service Controls confine traffic to a security perimeter, Cloud Audit Logs capture every messages.create invocation with caller identity, Customer-Managed Encryption Keys (CMEK) wrap inputs and outputs, and Private Service Connect avoids public-internet egress. None of those are Anthropic features per se; they are GCP features that Vertex inherits because Claude is served as a first-class Vertex AI model. The compliance story is `Vertex's, not Anthropic's`: SOC 2, ISO 27001, HIPAA BAA (where applicable), FedRAMP for government workloads. If your org has a Google Cloud landing zone, deploying Claude on Vertex slots into existing policy controls instead of standing up a parallel data-flow review.

Feature parity is high but not perfect, and the gaps move over time. Prompt caching, vision, PDF support, citations, extended thinking, and the batch API generally land on Vertex within weeks of the direct API release; the message-format protocol is identical. The exceptions tend to be at the *tool* layer: the built-in web search tool and computer use have shipped with deployment-specific availability gates, so check the Anthropic on Vertex docs before depending on them. Pricing is set by Google Cloud (not Anthropic) and is typically priced per 1M input/output tokens at parity with the direct API, billed through your GCP invoice. Cache hits and batch requests get the same multipliers you see on direct API.

When to choose Vertex vs the direct Anthropic API vs Bedrock. Pick Vertex when your stack is already on Google Cloud; the auth, billing, IAM, audit, and data-residency stories all consolidate, and you avoid a second vendor relationship. Pick the direct Anthropic API when you want fastest access to new models and features, simpler key-based auth, and no cloud lock-in. Pick Bedrock (covered in claude-with-bedrock) when your stack is on AWS for the symmetric reasons. The application code is roughly 95% portable across all three; the differences are auth, model id format, the SDK constructor, and operational integrations. Picking a deployment platform is mostly an organizational decision, not a technical one; and the exam expects you to recognize that.

05 · Listicle pattern

5 things that change when you move from the direct Anthropic API to Vertex

If you already know the direct API (Course 6), these are the deltas you actually need to internalize. Everything else is unchanged.

  1. Auth: gcloud ADC instead of API key

    No ANTHROPIC_API_KEY. Run gcloud auth application-default login in dev; use a service account with roles/aiplatform.user in prod. Auth is bound to a Google Cloud identity, which means IAM is your security boundary.

  2. SDK constructor: `AnthropicVertex(region, project_id)`

    from anthropic import AnthropicVertex and pass region and project_id. region="global" is a sensible default unless data residency dictates otherwise. Install with pip install "anthropic[vertex]".

  3. Model id format uses `@` not `-`

    claude-sonnet-4@20250514 on Vertex, claude-sonnet-4-20250514 on the direct API. A small but easy-to-trip-on difference; copy from the Model Garden listing rather than from the Anthropic docs.

  4. Regional availability is real

    Not every model is in every region. New models tend to launch in us-east5 first. Check Model Garden for your target region before you commit. region="global" routes to nearest available.

  5. Enterprise controls come from GCP

    VPC-SC, CMEK, Cloud Audit Logs, Private Service Connect, and the GCP compliance posture (SOC 2, ISO 27001, HIPAA BAA, FedRAMP) all apply because Claude is served as a Vertex model. Compliance story is GCP's, not Anthropic's directly.

06 · Key takeaways

6 takeaways with cross-pillar bridges

Vertex deployment is the same Claude API surface as claude-api-foundations plus a different auth and addressing model; about 85 of 93 lessons mirror Course 6 verbatim.

Authentication is gcloud Application Default Credentials, not an API key; production uses a service account with roles/aiplatform.user and IAM is the security boundary.

The AnthropicVertex SDK requires region and project_id, and the model id format uses @ (e.g. claude-sonnet-4@20250514) instead of a trailing dash.

Regional model availability is a real operational gate; new models launch in specific regions first and region="global" is the sensible production default unless data residency requires otherwise.

VPC Service Controls, CMEK, Cloud Audit Logs, and the GCP compliance posture (SOC 2, HIPAA BAA, FedRAMP) ride on top of Vertex unchanged; that is the main reason enterprises pick Vertex over the direct API.

Application code is roughly 95% portable across direct API, Vertex, and Bedrock; choosing a deployment platform is mostly an organizational decision (where your cloud landing zone lives), not a technical one.

07 · Exam mapping

How this maps to the CCA-F exam

Domains
D5 Context + Reliability · D2 Tool Design + Integration
Blueprint
15% (D5) + 18% (D2)
What it advances
Maps directly to D5 task statements about deploying Claude in customer-managed cloud environments, IAM-bound auth, regional model availability, and quota management on Vertex AI. The API/prompt/tool/RAG/MCP/agent content is identical to Course 6, so this page focuses on what differs at the deployment seam.
08 · Curated supplementary sources

2 hand-picked extras

These amplify the Skilljar course beyond what the course itself covers. Each was picked for a specific reason.

09 · Concepts wired

Concepts in this course

10 · Scenarios in play

Where you'll see this in production

11 · Sibling Knowledge

Other course mirrors you may want next

12 · AEO FAQ

8 questions answered

Phrased as the way real students search. Tagged by intent so you can scan to what you actually need.

ComparisonWhat is the difference between using Claude through the Anthropic API and through Google Vertex AI?
The application code is roughly 95% identical; the differences are at the deployment seam. Vertex uses gcloud Application Default Credentials instead of an ANTHROPIC_API_KEY, requires AnthropicVertex(region, project_id) instead of Anthropic(), uses @ in the model id (e.g. claude-sonnet-4@20250514), and inherits Google Cloud's IAM, audit, VPC-SC, and compliance controls. Choose Vertex when your stack is on GCP; choose direct API for simpler auth and fastest access to new features.
How-toHow do I authenticate with Claude on Vertex AI?
Install the gcloud CLI, run gcloud init and gcloud auth login, set your project with gcloud config set project YOUR_PROJECT_ID, then run gcloud auth application-default login. The AnthropicVertex SDK picks up Application Default Credentials automatically. There is no API key; auth is bound to a Google Cloud identity (your user account in dev, a service account with roles/aiplatform.user in production).
TroubleshootWhy does my Vertex AI request return a model-not-found error when the model exists?
Almost always a regional availability mismatch. Not every Claude model is hosted in every Vertex region, and new models often launch in us-east5 or us-central1 first. Check the Model Garden listing for your target region before you commit. The fix is usually to switch to `region="global"`, which routes to the nearest available region, unless data residency dictates a specific region. Also confirm the model id format; Vertex uses claude-sonnet-4@20250514 with an @, not a trailing dash.
ScopeDoes prompt caching work on Claude through Vertex AI?
Yes. Prompt caching, vision, PDF support, citations, extended thinking, and the batch API all work on Vertex with the same TTLs, breakpoint rules, and pricing multipliers as the direct API. The message-format protocol is identical. The features that occasionally lag are at the tool layer; the built-in web search tool and computer use have shipped with deployment-specific availability gates, so check the Anthropic on Vertex docs before depending on them.
ComparisonShould I use Vertex AI or the direct Anthropic API for my production deployment?
Pick Vertex when your stack is already on Google Cloud; the auth, billing, IAM, audit, VPC-SC, and data-residency stories all consolidate into your existing GCP landing zone. Pick the direct Anthropic API when you want the fastest access to new models and features, simpler key-based auth, and no cloud lock-in. The application code is portable both ways, so this is mostly an organizational decision (where does your security review live, who pays the invoice) rather than a technical one.
How-toHow do I install the right Anthropic SDK for Vertex AI in Python?
Run `pip install "anthropic[vertex]"`; the [vertex] extras pull in the Google Auth dependencies needed to connect to Vertex. Then import AnthropicVertex (not Anthropic) and instantiate it with region and project_id. The same messages.create API works on both clients, so application code below the constructor line is unchanged.
How-toWhat IAM permissions does my service account need to call Claude on Vertex?
At minimum, roles/aiplatform.user on the project. For production, scope tightly: grant only that role on only the project hosting your AI workloads, and rotate service-account keys via Workload Identity Federation if your code runs outside GCP. Auth is bound to identity, so any IAM policy you apply to the service account flows through to your Claude calls; including organizational policies on which regions are allowed and which models are enabled.
ScopeCan I use HIPAA, SOC 2, or FedRAMP-covered Claude through Vertex?
Yes; the compliance posture is Google Cloud's, and Claude on Vertex inherits it. SOC 2, ISO 27001, HIPAA BAA (where applicable), and FedRAMP for government workloads all extend to Anthropic models served through Vertex AI. The compliance story is `Vertex's, not Anthropic's directly`, which is one of the main reasons regulated industries pick Vertex over the direct API. Confirm the specific certifications in the Google Cloud Compliance Resource Center for your target region before going to production.
Last reviewed: 2026-05-06·Refresh cadence: 60 days; deployment-platform features (regional availability, IAM scopes, feature parity) shift faster than the platform-agnostic API surface, so refresh more aggressively than Course 6·View on Skilljar ↗
K · Intermediate · D5 · Context + Reliability

Claude with Google Cloud Vertex AI: Deployment + GCP Integration, complete.

You've covered the full ten-section breakdown for this primitive, definition, mechanics, code, false positives, comparison, decision tree, exam patterns, and FAQ. One technical primitive down on the path to CCA-F.

Share your win →