Claude with Google Cloud Vertex AI: Deployment + GCP Integration

01 · What you'll learn

You'll walk away with

How to enable Claude models in Vertex AI Model Garden and authenticate via gcloud Application Default Credentials
How the AnthropicVertex Python SDK differs from the direct Anthropic SDK (project_id, region, model name format)
Which Claude features ride on top of Vertex unchanged (prompt caching, vision, PDF support, citations, extended thinking, batch)
How regional model availability and quota management work on Vertex compared to the direct API
How IAM, VPC Service Controls, and Cloud Logging fit into a Vertex-hosted Claude deployment
When to choose Vertex AI vs the direct Anthropic API vs Amazon Bedrock for a given workload

02 · Prerequisites

Read these first

Claude API Foundations (the platform-agnostic content)

knowledge

Open ↗

Claude 101: First Principles

knowledge

Open ↗

03 · The course mirror

Lesson outline

Every lesson from Claude with Google Cloud's Vertex AI with our one-line simplification. The Skilljar course is the source; we summarize.

Show all 93 lessons

#	Skilljar lesson	Our simplification
1	Welcome to the course	Course intro and what Vertex AI adds vs the direct Anthropic API.
2	Overview of Claude models	Model family overview; same models, accessed via Vertex.
3	Accessing the API	Request lifecycle: client to server to Vertex to model and back; never call from browser.
4	Vertex AI Setup	DEPLOYMENT-SPECIFIC: enable Anthropic models in Model Garden, install gcloud CLI, run `gcloud auth application-default login`.
5	Making a request	DEPLOYMENT-SPECIFIC: `pip install anthropic[vertex]`, instantiate `AnthropicVertex(region=..., project_id=...)`, model id format `claude-sonnet-4@20250514`.
6	Multi-turn conversations	Append assistant + user messages to maintain dialogue state; mirrors Course 6.
7	Chat exercise	Hands-on: build a minimal chat loop against Vertex.
8	System prompts	Set role and behavior via the `system` parameter; identical semantics to direct API.
9	System prompts exercise	Hands-on: experiment with system prompt variations.
10	Temperature	Sampling control 0 to 1; lower for deterministic, higher for creative.
11	Course satisfaction survey	Mid-course feedback prompt.
12	Response streaming	Stream tokens as they generate via SSE; reduces TTFT for chat UIs.
13	Controlling model output	max_tokens, stop_sequences, and response shaping basics.
14	Structured data	Coax JSON via prompting; introduce schema thinking before tool use.
15	Structured data exercise	Hands-on: extract structured fields with prompted JSON.
16	Quiz on accessing Claude with the API	Section quiz on auth, request shape, and response handling.
17	Prompt evaluation	Why systematic eval beats vibes; same content as Course 6.
18	A typical eval workflow	Test set + grader + iteration loop; canonical pattern.
19	Generating test datasets	Use Claude itself (via Vertex) to synthesize test inputs at scale.
20	Running the eval	Loop the test set through your prompt and capture outputs.
21	Model-based grading	LLM-as-judge: rubric-driven grading with a separate Claude call.
22	Code-based grading	Deterministic graders for format, regex, schema validation.
23	Exercise on prompt evals	Hands-on: stand up a small eval harness.
24	Quiz on prompt evaluation	Section quiz on eval workflow.
25	Prompt engineering	Intro to the canonical Anthropic prompt-engineering techniques.
26	Being clear and direct	State the task plainly; ambiguity costs more than verbosity.
27	Being specific	Specificity collapses the response space; vague asks invite drift.
28	Structure with XML tags	XML tags as structural anchors Claude attends to reliably.
29	Providing examples	Few-shot examples in the prompt steer style and format.
30	Exercise on prompting	Hands-on: refactor a weak prompt using the four techniques.
31	Quiz on prompt engineering techniques	Section quiz on the prompting toolkit.
32	Introducing tool use	Tools let Claude call your functions; same protocol on Vertex.
33	Project overview	Multi-tool project setup for the section.
34	Tool functions	Define the Python functions Claude will call.
35	Tool schemas	JSON schema for tool inputs; the model only sees this schema.
36	Handling message blocks	Iterate the response content blocks (text + tool_use) and dispatch.
37	Sending tool results	Append tool_result messages and re-call the model to continue.
38	Multi-turn conversations with tools	Maintain the agentic loop across multiple tool calls.
39	Implementing multiple turns	Hands-on: code the loop with stop_reason guards.
40	Using multiple tools	Register a tool list; let Claude pick the right one per turn.
41	The batch tool	Submit many requests at once for cost-and-throughput wins.
42	Tools for structured data	Tool schemas as the cleanest path to structured outputs.
43	The text edit tool	Built-in text-editor tool for file-edit workflows.
44	The web search tool	Built-in web search tool (note: availability differs across deployment platforms; check Vertex docs).
45	Quiz on tool use with Claude	Section quiz on tool-use mechanics.
46	Introducing retrieval-augmented generation	RAG = retrieve relevant docs and add them to context; Knowledge mitigation.
47	Text chunking strategies	Fixed-size, sentence-boundary, and semantic chunking tradeoffs.
48	Text embeddings	Dense vector representations for semantic search; on GCP often Vertex-hosted embedding models.
49	The full RAG flow	Embed -> store -> retrieve top-k -> augment prompt -> generate.
50	Implementing the RAG flow	Hands-on RAG pipeline.
51	BM25 lexical search	Keyword/term-frequency retrieval as a complement to embedding search.
52	A multi-index RAG pipeline	Combine BM25 + embeddings with reciprocal rank fusion.
53	Reranking results	Cross-encoder rerank of top-k retrievals before context insertion.
54	Contextual retrieval	Anthropic's contextual retrieval: prepend a chunk-aware summary before embedding.
55	Quiz on retrieval-augmented generation	Section quiz on RAG components.
56	Extended thinking	Reasoning mode where the model thinks before answering; surfaced as a content block.
57	Image support	Vision: pass images as base64 or URL content blocks.
58	PDF support	Native PDF inputs; large docs feed the Working Memory cliff.
59	Citations	Built-in citations: model returns spans tying claims back to source docs.
60	Prompt caching	Cache stable prefixes (system prompt, RAG context) for major cost wins.
61	Rules of prompt caching	TTLs, breakpoints, minimum sizes; cache-hit accounting on Vertex.
62	Prompt caching in action	Hands-on: measure cache-hit savings on a realistic workload.
63	Quiz on features of Claude	Section quiz on cross-cutting features.
64	Introducing MCP	Model Context Protocol: standard for connecting tools/data to Claude.
65	MCP clients	Claude Desktop, Claude Code, custom clients; all speak the same protocol.
66	Project setup	Stand up an MCP server scaffold for the section.
67	Defining tools with MCP	Expose tools through MCP rather than per-app tool schemas.
68	The server inspector	Anthropic's MCP inspector for debugging server output.
69	Implementing a client	Build a custom MCP client around Claude on Vertex.
70	Defining resources	MCP resources: model-pull data sources.
71	Accessing resources	Wire resources into the client and let Claude read them.
72	Defining prompts	MCP prompts: server-provided prompt templates.
73	Prompts in the client	Surface MCP-served prompts in client UI.
74	MCP review	Recap of tools/resources/prompts split.
75	Quiz on Model Context Protocol	Section quiz on MCP architecture.
76	Anthropic apps	Overview of Claude Desktop and Claude Code as Vertex-compatible clients.
77	Claude Code setup	Install + configure Claude Code; works against Vertex-backed deployments.
78	Claude Code in action	Live coding session demoing common workflows.
79	Enhancements with MCP servers	Plug MCP servers into Claude Code for repo/db/issue access.
80	Parallelizing Claude Code	git worktrees + multiple sessions for parallel feature work.
81	Automated debugging	Subagent-driven bug repro and fix loop.
82	Computer use	Computer-use tool: Claude controls a virtual desktop via screenshots + actions.
83	How computer use works	Action loop: screenshot -> reason -> click/type -> repeat.
84	Agents and workflows	Distinction: workflows are scripted, agents choose their own path.
85	Parallelization workflows	Fan-out workflow: same task across many inputs in parallel.
86	Chaining workflows	Sequential workflow: output of step N feeds step N+1.
87	Routing workflows	Classifier-driven routing to specialized downstream prompts.
88	Agents and tools	Agentic loop with a curated tool whitelist; same primitives on Vertex.
89	Environment inspection	Let the agent probe its environment before acting.
90	Workflows vs agents	Choose workflow when steps are known; choose agent when path is open.
91	Quiz on agents and workflows	Section quiz on agent design.
92	Final assessment quiz	End-of-course assessment across all sections.
93	Course wrap-up	Recap and pointers to deeper Vertex + Anthropic resources.

04 · Our simplification

The course in 7 paragraphs

This course is the platform-agnostic Claude API course (Course 6 `claude-api-foundations`) wrapped in Google Cloud; same prompt engineering, same eval workflow, same tool-use mechanics, same RAG patterns, same MCP protocol, same agent design. If you have done Course 6, roughly 85 of the 93 lessons will feel familiar verbatim. The ~8 lessons that *justify a separate Knowledge page* are the deployment seam: how you authenticate, how you address models, how regions and quotas work, and how Vertex's enterprise controls (IAM, VPC-SC, Cloud Logging) fit on top. This page focuses on those seams; for everything else, lean on claude-api-foundations as the canonical reference.

Authentication on Vertex is gcloud-mediated, not API-key-mediated. You install the gcloud CLI, run gcloud init and gcloud auth application-default login, set a project with gcloud config set project YOUR_PROJECT_ID, and from then on the AnthropicVertex SDK picks up Application Default Credentials automatically. There is no ANTHROPIC_API_KEY. The implication for your architecture is that auth is bound to a Google Cloud identity (a user account in dev, a service account in production), which means your IAM model becomes the security boundary. Grant roles/aiplatform.user to the service account, scope it to the project, and rotate via Google Cloud's normal service-account-key lifecycle (or Workload Identity Federation if you are running outside GCP).

The SDK surface differs in three load-bearing ways. First, the import: from anthropic import AnthropicVertex instead of from anthropic import Anthropic. Second, the constructor: AnthropicVertex(region="global", project_id="your-project-id"); the region and project_id are mandatory and bind every request to a specific Vertex tenant. Third, the model id format: Vertex uses `claude-sonnet-4@20250514` rather than `claude-sonnet-4-20250514` (note the @ instead of trailing dash). Everything below those three lines (messages.create, content blocks, tool schemas, streaming, prompt caching) is byte-for-byte identical to the direct API. pip install "anthropic[vertex]" pulls the right extras.

Regional model availability is a real operational concern. Not every Claude model is hosted in every Vertex region; new models often launch in us-east5 or us-central1 first and roll out elsewhere over weeks. The Vertex region="global" setting routes to the nearest available region and is usually the right default for production unless you have data-residency constraints. If you do need a specific region (EU data residency, regulated workloads), check the Model Garden listing for that region before you commit; a model that exists in `us-east5` will return a `not found` error in `europe-west4` even though both are valid Vertex regions. Quotas are per-project per-region and are managed in the Cloud Console under Quotas & system limits; default quotas are conservative and you will likely raise them before production traffic.

Enterprise controls ride on top of Vertex unchanged. This is the main reason customers choose Vertex over the direct Anthropic API: VPC Service Controls confine traffic to a security perimeter, Cloud Audit Logs capture every messages.create invocation with caller identity, Customer-Managed Encryption Keys (CMEK) wrap inputs and outputs, and Private Service Connect avoids public-internet egress. None of those are Anthropic features per se; they are GCP features that Vertex inherits because Claude is served as a first-class Vertex AI model. The compliance story is `Vertex's, not Anthropic's`: SOC 2, ISO 27001, HIPAA BAA (where applicable), FedRAMP for government workloads. If your org has a Google Cloud landing zone, deploying Claude on Vertex slots into existing policy controls instead of standing up a parallel data-flow review.

Feature parity is high but not perfect, and the gaps move over time. Prompt caching, vision, PDF support, citations, extended thinking, and the batch API generally land on Vertex within weeks of the direct API release; the message-format protocol is identical. The exceptions tend to be at the *tool* layer: the built-in web search tool and computer use have shipped with deployment-specific availability gates, so check the Anthropic on Vertex docs before depending on them. Pricing is set by Google Cloud (not Anthropic) and is typically priced per 1M input/output tokens at parity with the direct API, billed through your GCP invoice. Cache hits and batch requests get the same multipliers you see on direct API.

When to choose Vertex vs the direct Anthropic API vs Bedrock. Pick Vertex when your stack is already on Google Cloud; the auth, billing, IAM, audit, and data-residency stories all consolidate, and you avoid a second vendor relationship. Pick the direct Anthropic API when you want fastest access to new models and features, simpler key-based auth, and no cloud lock-in. Pick Bedrock (covered in claude-with-bedrock) when your stack is on AWS for the symmetric reasons. The application code is roughly 95% portable across all three; the differences are auth, model id format, the SDK constructor, and operational integrations. Picking a deployment platform is mostly an organizational decision, not a technical one; and the exam expects you to recognize that.

05 · Listicle pattern

5 things that change when you move from the direct Anthropic API to Vertex

If you already know the direct API (Course 6), these are the deltas you actually need to internalize. Everything else is unchanged.

Auth: gcloud ADC instead of API key
No ANTHROPIC_API_KEY. Run gcloud auth application-default login in dev; use a service account with roles/aiplatform.user in prod. Auth is bound to a Google Cloud identity, which means IAM is your security boundary.
SDK constructor: `AnthropicVertex(region, project_id)`
from anthropic import AnthropicVertex and pass region and project_id. region="global" is a sensible default unless data residency dictates otherwise. Install with pip install "anthropic[vertex]".
Model id format uses `@` not `-`
claude-sonnet-4@20250514 on Vertex, claude-sonnet-4-20250514 on the direct API. A small but easy-to-trip-on difference; copy from the Model Garden listing rather than from the Anthropic docs.
Regional availability is real
Not every model is in every region. New models tend to launch in us-east5 first. Check Model Garden for your target region before you commit. region="global" routes to nearest available.
Enterprise controls come from GCP
VPC-SC, CMEK, Cloud Audit Logs, Private Service Connect, and the GCP compliance posture (SOC 2, ISO 27001, HIPAA BAA, FedRAMP) all apply because Claude is served as a Vertex model. Compliance story is GCP's, not Anthropic's directly.

06 · Key takeaways

6 takeaways with cross-pillar bridges

Vertex deployment is the same Claude API surface as claude-api-foundations plus a different auth and addressing model; about 85 of 93 lessons mirror Course 6 verbatim.

Concept: tool-calling ↗

Authentication is gcloud Application Default Credentials, not an API key; production uses a service account with roles/aiplatform.user and IAM is the security boundary.

Concept: system-prompts ↗

The AnthropicVertex SDK requires region and project_id, and the model id format uses @ (e.g. claude-sonnet-4@20250514) instead of a trailing dash.

Concept: tool-calling ↗

Regional model availability is a real operational gate; new models launch in specific regions first and region="global" is the sensible production default unless data residency requires otherwise.

Scenario: claude-for-operations ↗

VPC Service Controls, CMEK, Cloud Audit Logs, and the GCP compliance posture (SOC 2, HIPAA BAA, FedRAMP) ride on top of Vertex unchanged; that is the main reason enterprises pick Vertex over the direct API.

Concept: evaluation ↗

Application code is roughly 95% portable across direct API, Vertex, and Bedrock; choosing a deployment platform is mostly an organizational decision (where your cloud landing zone lives), not a technical one.

Concept: tool-calling ↗

07 · Exam mapping

How this maps to the CCA-F exam

Domains: D5 Context + Reliability · D2 Tool Design + Integration
Blueprint: 15% (D5) + 18% (D2)
What it advances: Maps directly to D5 task statements about deploying Claude in customer-managed cloud environments, IAM-bound auth, regional model availability, and quota management on Vertex AI. The API/prompt/tool/RAG/MCP/agent content is identical to Course 6, so this page focuses on what differs at the deployment seam.

08 · Curated supplementary sources

2 hand-picked extras

These amplify the Skilljar course beyond what the course itself covers. Each was picked for a specific reason.

Anthropic blog · 2025-09-01

Claude on Google Cloud Vertex AI; Anthropic API documentation

Canonical reference for the SDK surface, model id formats, and feature-availability table on Vertex. Pair with Skilljar Lessons 4-5 when you start authenticating against a real GCP project.

Read source ↗

Anthropic blog · 2025-10-15

Anthropic Claude in Vertex AI Model Garden

Google's own integration docs covering Model Garden enablement, regional availability, quota management, and the IAM permissions required to invoke Claude on Vertex.

Read source ↗

09 · Concepts wired

Concepts in this course

Tool calling

Same protocol on Vertex; tool schemas are byte-identical to direct API

Concept: tool-calling ↗

Prompt caching

Available on Vertex with same TTL/breakpoint rules as direct API

Concept: prompt-caching ↗

Batch API

Available on Vertex; useful for cost-sensitive bulk workloads

Concept: batch-api ↗

MCP

Protocol-level, deployment-independent; works with Vertex-backed Claude

Concept: mcp ↗

Vision and multimodal

Image and PDF inputs work identically on Vertex

Concept: vision-multimodal ↗

10 · Scenarios in play

Where you'll see this in production

Claude for operations

Vertex's IAM, audit, and VPC-SC controls are the operational substrate this scenario depends on for enterprise rollouts

Scenario: claude-for-operations ↗

Structured data extraction

Common Vertex workload: route extraction jobs through Cloud Run / Cloud Functions backed by Claude on Vertex

Scenario: structured-data-extraction ↗

11 · Sibling Knowledge

Other course mirrors you may want next

12 · AEO FAQ

8 questions answered

Phrased as the way real students search. Tagged by intent so you can scan to what you actually need.

ComparisonWhat is the difference between using Claude through the Anthropic API and through Google Vertex AI?

The application code is roughly 95% identical; the differences are at the deployment seam. Vertex uses gcloud Application Default Credentials instead of an ANTHROPIC_API_KEY, requires AnthropicVertex(region, project_id) instead of Anthropic(), uses @ in the model id (e.g. claude-sonnet-4@20250514), and inherits Google Cloud's IAM, audit, VPC-SC, and compliance controls. Choose Vertex when your stack is on GCP; choose direct API for simpler auth and fastest access to new features.

How-toHow do I authenticate with Claude on Vertex AI?

Install the gcloud CLI, run gcloud init and gcloud auth login, set your project with gcloud config set project YOUR_PROJECT_ID, then run gcloud auth application-default login. The AnthropicVertex SDK picks up Application Default Credentials automatically. There is no API key; auth is bound to a Google Cloud identity (your user account in dev, a service account with roles/aiplatform.user in production).

TroubleshootWhy does my Vertex AI request return a model-not-found error when the model exists?

Almost always a regional availability mismatch. Not every Claude model is hosted in every Vertex region, and new models often launch in us-east5 or us-central1 first. Check the Model Garden listing for your target region before you commit. The fix is usually to switch to `region="global"`, which routes to the nearest available region, unless data residency dictates a specific region. Also confirm the model id format; Vertex uses claude-sonnet-4@20250514 with an @, not a trailing dash.

ScopeDoes prompt caching work on Claude through Vertex AI?

Yes. Prompt caching, vision, PDF support, citations, extended thinking, and the batch API all work on Vertex with the same TTLs, breakpoint rules, and pricing multipliers as the direct API. The message-format protocol is identical. The features that occasionally lag are at the tool layer; the built-in web search tool and computer use have shipped with deployment-specific availability gates, so check the Anthropic on Vertex docs before depending on them.

ComparisonShould I use Vertex AI or the direct Anthropic API for my production deployment?

Pick Vertex when your stack is already on Google Cloud; the auth, billing, IAM, audit, VPC-SC, and data-residency stories all consolidate into your existing GCP landing zone. Pick the direct Anthropic API when you want the fastest access to new models and features, simpler key-based auth, and no cloud lock-in. The application code is portable both ways, so this is mostly an organizational decision (where does your security review live, who pays the invoice) rather than a technical one.

How-toHow do I install the right Anthropic SDK for Vertex AI in Python?

Run `pip install "anthropic[vertex]"`; the [vertex] extras pull in the Google Auth dependencies needed to connect to Vertex. Then import AnthropicVertex (not Anthropic) and instantiate it with region and project_id. The same messages.create API works on both clients, so application code below the constructor line is unchanged.

How-toWhat IAM permissions does my service account need to call Claude on Vertex?

At minimum, roles/aiplatform.user on the project. For production, scope tightly: grant only that role on only the project hosting your AI workloads, and rotate service-account keys via Workload Identity Federation if your code runs outside GCP. Auth is bound to identity, so any IAM policy you apply to the service account flows through to your Claude calls; including organizational policies on which regions are allowed and which models are enabled.

ScopeCan I use HIPAA, SOC 2, or FedRAMP-covered Claude through Vertex?

Yes; the compliance posture is Google Cloud's, and Claude on Vertex inherits it. SOC 2, ISO 27001, HIPAA BAA (where applicable), and FedRAMP for government workloads all extend to Anthropic models served through Vertex AI. The compliance story is `Vertex's, not Anthropic's directly`, which is one of the main reasons regulated industries pick Vertex over the direct API. Confirm the specific certifications in the Google Cloud Compliance Resource Center for your target region before going to production.

Claude with Google Cloud Vertex AI: Deployment + GCP Integration.

You'll walk away with

Read these first

Claude API Foundations (the platform-agnostic content)

Claude 101: First Principles

Lesson outline

The course in 7 paragraphs

5 things that change when you move from the direct Anthropic API to Vertex

6 takeaways with cross-pillar bridges

How this maps to the CCA-F exam

2 hand-picked extras

Claude on Google Cloud Vertex AI; Anthropic API documentation

Anthropic Claude in Vertex AI Model Garden

Concepts in this course

Tool calling

Prompt caching

Batch API

MCP

Vision and multimodal

Where you'll see this in production

Claude for operations

Structured data extraction

Other course mirrors you may want next

Claude API Foundations (the platform-agnostic content)

Claude in Amazon Bedrock (the AWS deployment counterpart)

MCP Foundations

8 questions answered