# Claude with Google Cloud Vertex AI: Deployment + GCP Integration

> This 93-lesson course teaches the same Claude API surface as claude-api-foundations but accessed through Google Cloud's Vertex AI rather than the direct Anthropic API. The deployment-specific substance is the AnthropicVertex SDK, gcloud Application Default Credentials auth, Model Garden enablement, project + region binding, and the regional model-availability model. Everything else (prompt engineering, evals, tool use, RAG, MCP, agents) mirrors Course 6 lesson-for-lesson.

**Domain:** D5 · Context + Reliability (15%)
**Difficulty:** intermediate
**Skilljar course:** Claude with Google Cloud's Vertex AI (93 lessons)
**Canonical:** https://claudearchitectcertification.com/knowledge/claude-with-vertex
**Last reviewed:** 2026-05-06

## Exam mapping

**Blueprint share:** 15% (D5) + 18% (D2)

Maps directly to D5 task statements about deploying Claude in customer-managed cloud environments, IAM-bound auth, regional model availability, and quota management on Vertex AI. The API/prompt/tool/RAG/MCP/agent content is identical to Course 6, so this page focuses on what differs at the deployment seam.

## What you'll learn

- How to enable Claude models in Vertex AI Model Garden and authenticate via gcloud Application Default Credentials
- How the AnthropicVertex Python SDK differs from the direct Anthropic SDK (project_id, region, model name format)
- Which Claude features ride on top of Vertex unchanged (prompt caching, vision, PDF support, citations, extended thinking, batch)
- How regional model availability and quota management work on Vertex compared to the direct API
- How IAM, VPC Service Controls, and Cloud Logging fit into a Vertex-hosted Claude deployment
- When to choose Vertex AI vs the direct Anthropic API vs Amazon Bedrock for a given workload

## Prerequisites

- **Claude API Foundations (the platform-agnostic content)** (knowledge · `claude-api-foundations`)
- **Claude 101: First Principles** (knowledge · `claude-101`)

## Lesson outline

### 1. Welcome to the course

Course intro and what Vertex AI adds vs the direct Anthropic API.

### 2. Overview of Claude models

Model family overview; same models, accessed via Vertex.

### 3. Accessing the API

Request lifecycle: client to server to Vertex to model and back; never call from browser.

### 4. Vertex AI Setup

DEPLOYMENT-SPECIFIC: enable Anthropic models in Model Garden, install gcloud CLI, run gcloud auth application-default login.

### 5. Making a request

DEPLOYMENT-SPECIFIC: pip install anthropic[vertex], instantiate AnthropicVertex(region=..., project_id=...), model id format claude-sonnet-4@20250514.

### 6. Multi-turn conversations

Append assistant + user messages to maintain dialogue state; mirrors Course 6.

### 7. Chat exercise

Hands-on: build a minimal chat loop against Vertex.

### 8. System prompts

Set role and behavior via the system parameter; identical semantics to direct API.

### 9. System prompts exercise

Hands-on: experiment with system prompt variations.

### 10. Temperature

Sampling control 0 to 1; lower for deterministic, higher for creative.

### 11. Course satisfaction survey

Mid-course feedback prompt.

### 12. Response streaming

Stream tokens as they generate via SSE; reduces TTFT for chat UIs.

### 13. Controlling model output

max_tokens, stop_sequences, and response shaping basics.

### 14. Structured data

Coax JSON via prompting; introduce schema thinking before tool use.

### 15. Structured data exercise

Hands-on: extract structured fields with prompted JSON.

### 16. Quiz on accessing Claude with the API

Section quiz on auth, request shape, and response handling.

### 17. Prompt evaluation

Why systematic eval beats vibes; same content as Course 6.

### 18. A typical eval workflow

Test set + grader + iteration loop; canonical pattern.

### 19. Generating test datasets

Use Claude itself (via Vertex) to synthesize test inputs at scale.

### 20. Running the eval

Loop the test set through your prompt and capture outputs.

### 21. Model-based grading

LLM-as-judge: rubric-driven grading with a separate Claude call.

### 22. Code-based grading

Deterministic graders for format, regex, schema validation.

### 23. Exercise on prompt evals

Hands-on: stand up a small eval harness.

### 24. Quiz on prompt evaluation

Section quiz on eval workflow.

### 25. Prompt engineering

Intro to the canonical Anthropic prompt-engineering techniques.

### 26. Being clear and direct

State the task plainly; ambiguity costs more than verbosity.

### 27. Being specific

Specificity collapses the response space; vague asks invite drift.

### 28. Structure with XML tags

XML tags as structural anchors Claude attends to reliably.

### 29. Providing examples

Few-shot examples in the prompt steer style and format.

### 30. Exercise on prompting

Hands-on: refactor a weak prompt using the four techniques.

### 31. Quiz on prompt engineering techniques

Section quiz on the prompting toolkit.

### 32. Introducing tool use

Tools let Claude call your functions; same protocol on Vertex.

### 33. Project overview

Multi-tool project setup for the section.

### 34. Tool functions

Define the Python functions Claude will call.

### 35. Tool schemas

JSON schema for tool inputs; the model only sees this schema.

### 36. Handling message blocks

Iterate the response content blocks (text + tool_use) and dispatch.

### 37. Sending tool results

Append tool_result messages and re-call the model to continue.

### 38. Multi-turn conversations with tools

Maintain the agentic loop across multiple tool calls.

### 39. Implementing multiple turns

Hands-on: code the loop with stop_reason guards.

### 40. Using multiple tools

Register a tool list; let Claude pick the right one per turn.

### 41. The batch tool

Submit many requests at once for cost-and-throughput wins.

### 42. Tools for structured data

Tool schemas as the cleanest path to structured outputs.

### 43. The text edit tool

Built-in text-editor tool for file-edit workflows.

### 44. The web search tool

Built-in web search tool (note: availability differs across deployment platforms; check Vertex docs).

### 45. Quiz on tool use with Claude

Section quiz on tool-use mechanics.

### 46. Introducing retrieval-augmented generation

RAG = retrieve relevant docs and add them to context; Knowledge mitigation.

### 47. Text chunking strategies

Fixed-size, sentence-boundary, and semantic chunking tradeoffs.

### 48. Text embeddings

Dense vector representations for semantic search; on GCP often Vertex-hosted embedding models.

### 49. The full RAG flow

Embed -> store -> retrieve top-k -> augment prompt -> generate.

### 50. Implementing the RAG flow

Hands-on RAG pipeline.

### 51. BM25 lexical search

Keyword/term-frequency retrieval as a complement to embedding search.

### 52. A multi-index RAG pipeline

Combine BM25 + embeddings with reciprocal rank fusion.

### 53. Reranking results

Cross-encoder rerank of top-k retrievals before context insertion.

### 54. Contextual retrieval

Anthropic's contextual retrieval: prepend a chunk-aware summary before embedding.

### 55. Quiz on retrieval-augmented generation

Section quiz on RAG components.

### 56. Extended thinking

Reasoning mode where the model thinks before answering; surfaced as a content block.

### 57. Image support

Vision: pass images as base64 or URL content blocks.

### 58. PDF support

Native PDF inputs; large docs feed the Working Memory cliff.

### 59. Citations

Built-in citations: model returns spans tying claims back to source docs.

### 60. Prompt caching

Cache stable prefixes (system prompt, RAG context) for major cost wins.

### 61. Rules of prompt caching

TTLs, breakpoints, minimum sizes; cache-hit accounting on Vertex.

### 62. Prompt caching in action

Hands-on: measure cache-hit savings on a realistic workload.

### 63. Quiz on features of Claude

Section quiz on cross-cutting features.

### 64. Introducing MCP

Model Context Protocol: standard for connecting tools/data to Claude.

### 65. MCP clients

Claude Desktop, Claude Code, custom clients; all speak the same protocol.

### 66. Project setup

Stand up an MCP server scaffold for the section.

### 67. Defining tools with MCP

Expose tools through MCP rather than per-app tool schemas.

### 68. The server inspector

Anthropic's MCP inspector for debugging server output.

### 69. Implementing a client

Build a custom MCP client around Claude on Vertex.

### 70. Defining resources

MCP resources: model-pull data sources.

### 71. Accessing resources

Wire resources into the client and let Claude read them.

### 72. Defining prompts

MCP prompts: server-provided prompt templates.

### 73. Prompts in the client

Surface MCP-served prompts in client UI.

### 74. MCP review

Recap of tools/resources/prompts split.

### 75. Quiz on Model Context Protocol

Section quiz on MCP architecture.

### 76. Anthropic apps

Overview of Claude Desktop and Claude Code as Vertex-compatible clients.

### 77. Claude Code setup

Install + configure Claude Code; works against Vertex-backed deployments.

### 78. Claude Code in action

Live coding session demoing common workflows.

### 79. Enhancements with MCP servers

Plug MCP servers into Claude Code for repo/db/issue access.

### 80. Parallelizing Claude Code

git worktrees + multiple sessions for parallel feature work.

### 81. Automated debugging

Subagent-driven bug repro and fix loop.

### 82. Computer use

Computer-use tool: Claude controls a virtual desktop via screenshots + actions.

### 83. How computer use works

Action loop: screenshot -> reason -> click/type -> repeat.

### 84. Agents and workflows

Distinction: workflows are scripted, agents choose their own path.

### 85. Parallelization workflows

Fan-out workflow: same task across many inputs in parallel.

### 86. Chaining workflows

Sequential workflow: output of step N feeds step N+1.

### 87. Routing workflows

Classifier-driven routing to specialized downstream prompts.

### 88. Agents and tools

Agentic loop with a curated tool whitelist; same primitives on Vertex.

### 89. Environment inspection

Let the agent probe its environment before acting.

### 90. Workflows vs agents

Choose workflow when steps are known; choose agent when path is open.

### 91. Quiz on agents and workflows

Section quiz on agent design.

### 92. Final assessment quiz

End-of-course assessment across all sections.

### 93. Course wrap-up

Recap and pointers to deeper Vertex + Anthropic resources.

## Our simplification

This course is the platform-agnostic Claude API course (Course 6 claude-api-foundations) wrapped in Google Cloud; same prompt engineering, same eval workflow, same tool-use mechanics, same RAG patterns, same MCP protocol, same agent design. If you have done Course 6, roughly 85 of the 93 lessons will feel familiar verbatim. The ~8 lessons that *justify a separate Knowledge page* are the deployment seam: how you authenticate, how you address models, how regions and quotas work, and how Vertex's enterprise controls (IAM, VPC-SC, Cloud Logging) fit on top. This page focuses on those seams; for everything else, lean on claude-api-foundations as the canonical reference.

Authentication on Vertex is gcloud-mediated, not API-key-mediated. You install the gcloud CLI, run gcloud init and gcloud auth application-default login, set a project with gcloud config set project YOUR_PROJECT_ID, and from then on the AnthropicVertex SDK picks up Application Default Credentials automatically. There is no ANTHROPIC_API_KEY. The implication for your architecture is that auth is bound to a Google Cloud identity (a user account in dev, a service account in production), which means your IAM model becomes the security boundary. Grant roles/aiplatform.user to the service account, scope it to the project, and rotate via Google Cloud's normal service-account-key lifecycle (or Workload Identity Federation if you are running outside GCP).

The SDK surface differs in three load-bearing ways. First, the import: from anthropic import AnthropicVertex instead of from anthropic import Anthropic. Second, the constructor: AnthropicVertex(region="global", project_id="your-project-id"); the region and project_id are mandatory and bind every request to a specific Vertex tenant. Third, the model id format: Vertex uses claude-sonnet-4@20250514 rather than claude-sonnet-4-20250514 (note the @ instead of trailing dash). Everything below those three lines (messages.create, content blocks, tool schemas, streaming, prompt caching) is byte-for-byte identical to the direct API. pip install "anthropic[vertex]" pulls the right extras.

Regional model availability is a real operational concern. Not every Claude model is hosted in every Vertex region; new models often launch in us-east5 or us-central1 first and roll out elsewhere over weeks. The Vertex region="global" setting routes to the nearest available region and is usually the right default for production unless you have data-residency constraints. If you do need a specific region (EU data residency, regulated workloads), check the Model Garden listing for that region before you commit; a model that exists in us-east5 will return a not found error in europe-west4 even though both are valid Vertex regions. Quotas are per-project per-region and are managed in the Cloud Console under Quotas & system limits; default quotas are conservative and you will likely raise them before production traffic.

Enterprise controls ride on top of Vertex unchanged. This is the main reason customers choose Vertex over the direct Anthropic API: VPC Service Controls confine traffic to a security perimeter, Cloud Audit Logs capture every messages.create invocation with caller identity, Customer-Managed Encryption Keys (CMEK) wrap inputs and outputs, and Private Service Connect avoids public-internet egress. None of those are Anthropic features per se; they are GCP features that Vertex inherits because Claude is served as a first-class Vertex AI model. The compliance story is Vertex's, not Anthropic's: SOC 2, ISO 27001, HIPAA BAA (where applicable), FedRAMP for government workloads. If your org has a Google Cloud landing zone, deploying Claude on Vertex slots into existing policy controls instead of standing up a parallel data-flow review.

Feature parity is high but not perfect, and the gaps move over time. Prompt caching, vision, PDF support, citations, extended thinking, and the batch API generally land on Vertex within weeks of the direct API release; the message-format protocol is identical. The exceptions tend to be at the *tool* layer: the built-in web search tool and computer use have shipped with deployment-specific availability gates, so check the Anthropic on Vertex docs before depending on them. Pricing is set by Google Cloud (not Anthropic) and is typically priced per 1M input/output tokens at parity with the direct API, billed through your GCP invoice. Cache hits and batch requests get the same multipliers you see on direct API.

When to choose Vertex vs the direct Anthropic API vs Bedrock. Pick Vertex when your stack is already on Google Cloud; the auth, billing, IAM, audit, and data-residency stories all consolidate, and you avoid a second vendor relationship. Pick the direct Anthropic API when you want fastest access to new models and features, simpler key-based auth, and no cloud lock-in. Pick Bedrock (covered in claude-with-bedrock) when your stack is on AWS for the symmetric reasons. The application code is roughly 95% portable across all three; the differences are auth, model id format, the SDK constructor, and operational integrations. Picking a deployment platform is mostly an organizational decision, not a technical one; and the exam expects you to recognize that.

## Patterns

### 5 things that change when you move from the direct Anthropic API to Vertex

If you already know the direct API (Course 6), these are the deltas you actually need to internalize. Everything else is unchanged.

- **Auth: gcloud ADC instead of API key.** No ANTHROPIC_API_KEY. Run gcloud auth application-default login in dev; use a service account with roles/aiplatform.user in prod. Auth is bound to a Google Cloud identity, which means IAM is your security boundary.
- **SDK constructor: `AnthropicVertex(region, project_id)`.** from anthropic import AnthropicVertex and pass region and project_id. region="global" is a sensible default unless data residency dictates otherwise. Install with pip install "anthropic[vertex]".
- **Model id format uses `@` not `-`.** claude-sonnet-4@20250514 on Vertex, claude-sonnet-4-20250514 on the direct API. A small but easy-to-trip-on difference; copy from the Model Garden listing rather than from the Anthropic docs.
- **Regional availability is real.** Not every model is in every region. New models tend to launch in us-east5 first. Check Model Garden for your target region before you commit. region="global" routes to nearest available.
- **Enterprise controls come from GCP.** VPC-SC, CMEK, Cloud Audit Logs, Private Service Connect, and the GCP compliance posture (SOC 2, ISO 27001, HIPAA BAA, FedRAMP) all apply because Claude is served as a Vertex model. Compliance story is GCP's, not Anthropic's directly.

## Key takeaways

- Vertex deployment is the same Claude API surface as claude-api-foundations plus a different auth and addressing model; about 85 of 93 lessons mirror Course 6 verbatim. (`tool-calling`)
- Authentication is gcloud Application Default Credentials, not an API key; production uses a service account with roles/aiplatform.user and IAM is the security boundary. (`system-prompts`)
- The AnthropicVertex SDK requires region and project_id, and the model id format uses @ (e.g. claude-sonnet-4@20250514) instead of a trailing dash. (`tool-calling`)
- Regional model availability is a real operational gate; new models launch in specific regions first and region="global" is the sensible production default unless data residency requires otherwise. (`claude-for-operations`)
- VPC Service Controls, CMEK, Cloud Audit Logs, and the GCP compliance posture (SOC 2, HIPAA BAA, FedRAMP) ride on top of Vertex unchanged; that is the main reason enterprises pick Vertex over the direct API. (`evaluation`)
- Application code is roughly 95% portable across direct API, Vertex, and Bedrock; choosing a deployment platform is mostly an organizational decision (where your cloud landing zone lives), not a technical one. (`tool-calling`)

## Concepts in play

- **Tool calling** (`tool-calling`), Same protocol on Vertex; tool schemas are byte-identical to direct API
- **Prompt caching** (`prompt-caching`), Available on Vertex with same TTL/breakpoint rules as direct API
- **Batch API** (`batch-api`), Available on Vertex; useful for cost-sensitive bulk workloads
- **MCP** (`mcp`), Protocol-level, deployment-independent; works with Vertex-backed Claude
- **Vision and multimodal** (`vision-multimodal`), Image and PDF inputs work identically on Vertex

## Scenarios in play

- **Claude for operations** (`claude-for-operations`), Vertex's IAM, audit, and VPC-SC controls are the operational substrate this scenario depends on for enterprise rollouts
- **Structured data extraction** (`structured-data-extraction`), Common Vertex workload: route extraction jobs through Cloud Run / Cloud Functions backed by Claude on Vertex

## Curated sources

- **Claude on Google Cloud Vertex AI; Anthropic API documentation** (anthropic-blog, 2025-09-01): Canonical reference for the SDK surface, model id formats, and feature-availability table on Vertex. Pair with Skilljar Lessons 4-5 when you start authenticating against a real GCP project.
- **Anthropic Claude in Vertex AI Model Garden** (anthropic-blog, 2025-10-15): Google's own integration docs covering Model Garden enablement, regional availability, quota management, and the IAM permissions required to invoke Claude on Vertex.

## FAQ

### Q1. What is the difference between using Claude through the Anthropic API and through Google Vertex AI?

The application code is roughly 95% identical; the differences are at the deployment seam. Vertex uses gcloud Application Default Credentials instead of an ANTHROPIC_API_KEY, requires AnthropicVertex(region, project_id) instead of Anthropic(), uses @ in the model id (e.g. claude-sonnet-4@20250514), and inherits Google Cloud's IAM, audit, VPC-SC, and compliance controls. Choose Vertex when your stack is on GCP; choose direct API for simpler auth and fastest access to new features.

### Q2. How do I authenticate with Claude on Vertex AI?

Install the gcloud CLI, run gcloud init and gcloud auth login, set your project with gcloud config set project YOUR_PROJECT_ID, then run gcloud auth application-default login. The AnthropicVertex SDK picks up Application Default Credentials automatically. There is no API key; auth is bound to a Google Cloud identity (your user account in dev, a service account with roles/aiplatform.user in production).

### Q3. Why does my Vertex AI request return a model-not-found error when the model exists?

Almost always a regional availability mismatch. Not every Claude model is hosted in every Vertex region, and new models often launch in us-east5 or us-central1 first. Check the Model Garden listing for your target region before you commit. The fix is usually to switch to region="global", which routes to the nearest available region, unless data residency dictates a specific region. Also confirm the model id format; Vertex uses claude-sonnet-4@20250514 with an @, not a trailing dash.

### Q4. Does prompt caching work on Claude through Vertex AI?

Yes. Prompt caching, vision, PDF support, citations, extended thinking, and the batch API all work on Vertex with the same TTLs, breakpoint rules, and pricing multipliers as the direct API. The message-format protocol is identical. The features that occasionally lag are at the tool layer; the built-in web search tool and computer use have shipped with deployment-specific availability gates, so check the Anthropic on Vertex docs before depending on them.

### Q5. Should I use Vertex AI or the direct Anthropic API for my production deployment?

Pick Vertex when your stack is already on Google Cloud; the auth, billing, IAM, audit, VPC-SC, and data-residency stories all consolidate into your existing GCP landing zone. Pick the direct Anthropic API when you want the fastest access to new models and features, simpler key-based auth, and no cloud lock-in. The application code is portable both ways, so this is mostly an organizational decision (where does your security review live, who pays the invoice) rather than a technical one.

### Q6. How do I install the right Anthropic SDK for Vertex AI in Python?

Run pip install "anthropic[vertex]"; the [vertex] extras pull in the Google Auth dependencies needed to connect to Vertex. Then import AnthropicVertex (not Anthropic) and instantiate it with region and project_id. The same messages.create API works on both clients, so application code below the constructor line is unchanged.

### Q7. What IAM permissions does my service account need to call Claude on Vertex?

At minimum, roles/aiplatform.user on the project. For production, scope tightly: grant only that role on only the project hosting your AI workloads, and rotate service-account keys via Workload Identity Federation if your code runs outside GCP. Auth is bound to identity, so any IAM policy you apply to the service account flows through to your Claude calls; including organizational policies on which regions are allowed and which models are enabled.

### Q8. Can I use HIPAA, SOC 2, or FedRAMP-covered Claude through Vertex?

Yes; the compliance posture is Google Cloud's, and Claude on Vertex inherits it. SOC 2, ISO 27001, HIPAA BAA (where applicable), and FedRAMP for government workloads all extend to Anthropic models served through Vertex AI. The compliance story is Vertex's, not Anthropic's directly, which is one of the main reasons regulated industries pick Vertex over the direct API. Confirm the specific certifications in the Google Cloud Compliance Resource Center for your target region before going to production.

---

**Source:** https://claudearchitectcertification.com/knowledge/claude-with-vertex
**Vault sources:** Course_12/Lesson_03_accessing-the-api.md; Course_12/Lesson_04_vertex-ai-setup.md; Course_12/Lesson_05_making-a-request.md; Course_12/Lesson_60_prompt-caching.md; Course_12/Lesson_61_rules-of-prompt-caching.md; Course_12/Lesson_64_introducing-mcp.md; Course_12/Lesson_84_agents-and-workflows.md
**Last reviewed:** 2026-05-06
