Batch API (D4, 20% of CCA-F) - Claude Architect Concept

01 · Summary

TLDR

Message Batches API: 50% discount for async, non-time-sensitive workloads. A full deep-dive guide is coming soon. Anthropic batch API

50%

Discount

D4

Exam domain

C

Coverage tier

stub

Status

research

Action

02 · Definition

What it is

The Batch API is an asynchronous endpoint that processes requests in bulk within a 24-hour window at a 50% cost discount. Instead of messages.create() calls one-by-one, you prepare a JSONL file with up to 10,000 requests, submit, and poll for results. The trade-off is latency: responses come within 24 hours, not milliseconds. The mental model: no one is waiting, so optimize for cost not speed.

The use-case filter is strict: asynchronous workloads only. If a human or system is waiting (chatbot turn, CI/CD pre-merge check), Batch API is wrong; standard synchronous Messages API is correct. But if you have 10,000 documents to extract, a nightly report, or a queue that can finish by tomorrow, Batch API's 50% savings justify the delay.

The JSONL format is simple: one JSON object per line, each a messages.create() request. Include a custom_id (string you define) to correlate requests with results. The API returns results JSONL with the same custom_id, your response, and token usage. Single-turn only: no tool calling, no streaming, no multi-turn loops. For complex agentic flows, use synchronous API in a loop; for request-response pairs at scale, Batch is ideal.

Production failures cluster around one gap: applying it to latency-sensitive workflows. A team tries Batch for CI/CD pre-merge checks (must complete in minutes) and gets frustrated. Or for a customer-facing feature and hits the 24-hour wait. Recognize correct use cases: overnight reports, bulk data processing, non-urgent analysis, customer-success retrospectives.

03 · Mechanics

How it works

The Batch workflow has three stages. Prepare: create JSONL with up to 10,000 requests, each with custom_id and a valid messages.create() request body. Submit: upload via messages.batches.create(). The API returns a batch_id and initial state processing. Poll: query messages.batches.retrieve(batch_id). When state changes to completed, download the results JSONL.

The economics are stark: Batch requests cost 50% of synchronous calls. A claude-opus-4-5 request that costs 1 unit synchronously costs 0.5 units in Batch. Flat 50% applies to all tokens (input and output), all models. The catch is latency: processing happens "in the next 24 hours," not immediately.

Each request is independent (no multi-turn loops, no tool continuation). If you need an agent to tool_use and retry, either: (a) embed the entire agentic loop in a single request (one messages.create() runs the loop server-side and returns final result), or (b) don't use Batch. Batch is for request-response pairs, not interactive loops.

Results are returned as JSONL with the same line count as input. Each result has custom_id, response (Message object), and usage. Iterate, match by custom_id, decide next steps (DB store, follow-up, log errors). Results file is immutable: download multiple times, but the batch is complete once state is completed.

Batch API mechanics, painterly diagram featuring Loop mascot.

04 · In production

Where you'll see it

Overnight document extraction

50,000 invoices. JSONL with 50,000 requests, submit via Batch. Next morning, results ready. 50% savings = $2,000 saved vs synchronous. No customer waiting; nightly job.

Bulk entity extraction from contracts

10,000 contracts. Submit Tuesday evening, results Wednesday morning. 50% savings amortize the engineering overhead. Synchronous would cost 2x more and require 24 hours of API calls anyway.

Customer success retrospectives

After every 30-day cohort, analyze 500 conversations for sentiment, NPS drivers, churn signals. Submit Monday, results Tuesday. Non-urgent, huge savings.

Overnight question bank generation

Education platform generates 1000 practice questions. One request per topic, custom_id is the topic. Next morning, 1000 questions ready. 50% off.

05 · Implementation

Code examples

Submit, poll, retrieve a batch

from anthropic import Anthropic
import json, time

client = Anthropic()

def prepare_requests(invoices):
    return [
        {
            "custom_id": f"invoice-{i}",
            "model": "claude-opus-4-5",
            "max_tokens": 1024,
            "system": "Extract invoice fields. Return JSON only.",
            "messages": [{"role": "user", "content": inv["content"]}],
        }
        for i, inv in enumerate(invoices)
    ]

def submit_batch(requests):
    jsonl = "\n".join(json.dumps(r) for r in requests)
    with open("/tmp/batch.jsonl", "w") as f:
        f.write(jsonl)
    with open("/tmp/batch.jsonl", "rb") as f:
        batch = client.beta.messages.batches.create(request_file=f)
    return batch.id

def poll(batch_id, max_wait=3600):
    start = time.time()
    while time.time() - start < max_wait:
        batch = client.beta.messages.batches.retrieve(batch_id)
        if batch.processing_status == "completed":
            return True
        if batch.processing_status == "failed":
            return False
        time.sleep(30)
    return False

def retrieve(batch_id):
    batch = client.beta.messages.batches.retrieve(batch_id)
    return [json.loads(line) for line in batch.result_file.split("\n") if line.strip()]

# Full workflow
invoices = [{"content": "Vendor: Acme, $247.83, 2026-05-01"}, ...]
batch_id = submit_batch(prepare_requests(invoices))
if poll(batch_id):
    results = retrieve(batch_id)
    print(f"{len(results)} extracted, 50% cost savings")

Three stages: prepare JSONL → submit → poll → retrieve. custom_id correlates requests with results.

06 · Distractor patterns

Looks right, isn't

Each row pairs a plausible-looking pattern with the failure it actually creates. These are the shapes exam distractors are built from.

Looks right

Use Batch API for a CI/CD pre-merge check that must block the PR.

Actually wrong

Batch processes within 24 hours, not immediately. Pre-merge needs synchronous responses (minutes). Use standard Messages API for latency-sensitive workflows.

Looks right

Use Batch API for a feature that shows results to users in real time.

Actually wrong

If a human is waiting (chatbot, UI, real-time), 24-hour window is unacceptable. Use synchronous. Batch is for no one waiting.

Looks right

Batch API supports multi-turn tool calling.

Actually wrong

Batch is single-turn per request. Each JSONL line is a separate messages.create() with no tool continuation. For tool loops, use synchronous API.

Looks right

50% savings means always use Batch over synchronous.

Actually wrong

50% savings only justifies 24-hour latency if no one is waiting. For interactive tasks, the cost of waiting (user frustration) exceeds the savings.

Looks right

Batch API is faster for 50,000 requests.

Actually wrong

Batch is cheaper, not faster. Batch processes within 24 hours; synchronous in parallel finishes in minutes.

07 · Compare

Side-by-side

Aspect	Synchronous Messages API	Batch API	Caching	Agentic Loop
Latency	Immediate (ms)	Up to 24 hours	Immediate, reuses cache	Immediate per turn
Cost	100%	50%	90% on reused content	100% per turn (unless cached)
Use case	Interactive, real-time	Non-urgent bulk	Repeated prompts	Multi-turn reasoning
Throughput	Rate-limited, sequential	Bulk, batched	Per-conversation	Per-iteration
Tool calling	Supported	Not supported	Cached meta	Full support
Custom_id needed	No	Yes	No	No

08 · When to use

Decision tree

01

Is a human or system waiting in real time?

YesSynchronous. Latency non-negotiable. No Batch.

NoConsider Batch if non-urgent and high-volume.

02

Have 100+ requests to process?

YesBatch's 50% savings justify the engineering overhead.

NoSynchronous simpler for small workloads.

03

Can you wait 24 hours for results?

YesBatch is ideal. Submit, poll, retrieve.

NoUse synchronous or reduce batch size.

04

Need tool calling or multi-turn reasoning?

YesBatch doesn't support tool continuation. Embed the loop in client code or use synchronous.

NoBatch viable. Single-turn only.

05

Cost is primary, latency flexible?

YesBatch (50% savings).

NoSynchronous (interactive). Cost is secondary to UX.

09 · On the exam

Question patterns

Batch API exam trap, painterly cautionary scene featuring Loop mascot.

25 V2 questions wired to this concept. Tap an answer to check it instantly — you'll see whether it's right and why — then expand the full breakdown for the mental model and all four rationales.

Can you use checkpoints with the Batch API?

Tap your answer to check it.

A team marks daily-refreshed news content with cache_control ephemeral. Why is this the wrong layer to cache?

Tap your answer to check it.

A 1000-token system prompt is reused across 10 calls inside the cache window. Roughly how much does prompt caching save versus 10 fresh calls?

Tap your answer to check it.

Are prompt caching and the Batch API two ways to do the same thing?

Tap your answer to check it.

A team wants to use the Batch API for a CI/CD pre-merge check that must block the PR. Is this viable?

Tap your answer to check it.

A customer-facing chatbot is wired to the Batch API to save money. What goes wrong in production?

Tap your answer to check it.

19 additional questions for this concept live in the practice pillar. Take a mock exam ↗

10 · FAQ

Frequently asked

How much does Batch cost?

50% of synchronous. A request that costs $1 sync costs $0.50 in Batch. Flat applies to all tokens, all models.

How long do results take?

Up to 24 hours. Anthropic processes "within next 24 hours". Plan for 24, treat earlier as bonus.

Maximum batch size?

Up to 10,000 requests per batch. Split larger workloads into multiple batches.

Submit multiple batches in parallel?

Yes. Each gets a unique batch_id. Submit 10, poll all. Processing happens in parallel.

What if a request fails in the batch?

Result includes error field. Process individually (retry or skip). Batch processing doesn't stop on errors.

Can I cancel a batch after submitting?

No. Once submitted, runs to completion. Plan carefully before submitting.

Does Batch support streaming?

No. Non-streaming, single-turn requests only.

Vision or file uploads?

Yes. Include base64 images or file references. Anthropic processes normally.

How do I correlate requests with results?

Use `custom_id` field. Each request includes one (you define), result includes the same. Match to pair.

Better than Caching?

Complementary. Caching saves 90% on reused fixed content (5-min). Batch saves 50% on all tokens (24-hour). Use caching for interactive, Batch for async bulk.

11 · Practice with AI

Work this with your AI

Work this concept hands-on with Claude Code, Codex, or claude.ai. Copy a prompt, paste it into your assistant, and practise in tandem. Each one keeps you active (explain it back, get drilled, or build) rather than just reading.

Drill it like the exam (scenario MCQs)
Practice in the exam's scenario-MCQ format with trap awareness.
Explain it back (Feynman)
Build durable, transferable understanding of a concept you can half-state.
Test me, adapting the difficulty
Active recall practice on a concept you think you know.
Check my prerequisites first
Before studying a concept that keeps not sticking.
Find the high-leverage 20%
When a domain feels too big and you are short on time.

Batch API.

TLDR

What it is

How it works

Where you'll see it

Overnight document extraction

Bulk entity extraction from contracts

Customer success retrospectives

Overnight question bank generation

Code examples

Looks right, isn't

Side-by-side

Decision tree

Is a human or system waiting in real time?

Have 100+ requests to process?

Can you wait 24 hours for results?

Need tool calling or multi-turn reasoning?

Cost is primary, latency flexible?

Question patterns

Frequently asked

Work this with your AI

Test yourself

Batch API, complete.

Batch API.

TLDR

What it is

How it works

Where you'll see it

Overnight document extraction

Bulk entity extraction from contracts

Customer success retrospectives

Overnight question bank generation

Code examples

Looks right, isn't

Side-by-side

Decision tree

Is a human or system waiting in real time?

Have 100+ requests to process?

Can you wait 24 hours for results?

Need tool calling or multi-turn reasoning?

Cost is primary, latency flexible?

Question patterns

Frequently asked

Work this with your AI

Test yourself

Batch API, complete.

Share this primitive