# Batch API

> Message Batches API: 50% discount for async, non-time-sensitive workloads. Vault coverage thin; needs Phase 6 research.

**Domain:** D4 · Prompt Engineering (20% of CCA-F exam)
**Canonical:** https://claudearchitectcertification.com/concepts/batch-api
**Last reviewed:** 2026-05-04

## Quick stats

- **Discount:** 50%
- **Exam domain:** D4
- **Coverage tier:** C
- **Status:** stub
- **Action:** research

## What it is

The Batch API is an asynchronous endpoint that processes requests in bulk within a 24-hour window at a 50% cost discount. Instead of messages.create() calls one-by-one, you prepare a JSONL file with up to 10,000 requests, submit, and poll for results. The trade-off is latency: responses come within 24 hours, not milliseconds. The mental model: no one is waiting, so optimize for cost not speed.

The use-case filter is strict: asynchronous workloads only. If a human or system is waiting (chatbot turn, CI/CD pre-merge check), Batch API is wrong; standard synchronous Messages API is correct. But if you have 10,000 documents to extract, a nightly report, or a queue that can finish by tomorrow, Batch API's 50% savings justify the delay.

The JSONL format is simple: one JSON object per line, each a messages.create() request. Include a custom_id (string you define) to correlate requests with results. The API returns results JSONL with the same custom_id, your response, and token usage. Single-turn only: no tool calling, no streaming, no multi-turn loops. For complex agentic flows, use synchronous API in a loop; for request-response pairs at scale, Batch is ideal.

Production failures cluster around one gap: applying it to latency-sensitive workflows. A team tries Batch for CI/CD pre-merge checks (must complete in minutes) and gets frustrated. Or for a customer-facing feature and hits the 24-hour wait. Recognize correct use cases: overnight reports, bulk data processing, non-urgent analysis, customer-success retrospectives.

## How it works

The Batch workflow has three stages. Prepare: create JSONL with up to 10,000 requests, each with custom_id and a valid messages.create() request body. Submit: upload via messages.batches.create(). The API returns a batch_id and initial state processing. Poll: query messages.batches.retrieve(batch_id). When state changes to completed, download the results JSONL.

The economics are stark: Batch requests cost 50% of synchronous calls. A claude-opus-4-5 request that costs 1 unit synchronously costs 0.5 units in Batch. Flat 50% applies to all tokens (input and output), all models. The catch is latency: processing happens "in the next 24 hours," not immediately.

Each request is independent (no multi-turn loops, no tool continuation). If you need an agent to tool_use and retry, either: (a) embed the entire agentic loop in a single request (one messages.create() runs the loop server-side and returns final result), or (b) don't use Batch. Batch is for request-response pairs, not interactive loops.

Results are returned as JSONL with the same line count as input. Each result has custom_id, response (Message object), and usage. Iterate, match by custom_id, decide next steps (DB store, follow-up, log errors). Results file is immutable: download multiple times, but the batch is complete once state is completed.

## Where you'll see it in production

### Overnight document extraction

50,000 invoices. JSONL with 50,000 requests, submit via Batch. Next morning, results ready. 50% savings = $2,000 saved vs synchronous. No customer waiting; nightly job.

### Bulk entity extraction from contracts

10,000 contracts. Submit Tuesday evening, results Wednesday morning. 50% savings amortize the engineering overhead. Synchronous would cost 2x more and require 24 hours of API calls anyway.

### Customer success retrospectives

After every 30-day cohort, analyze 500 conversations for sentiment, NPS drivers, churn signals. Submit Monday, results Tuesday. Non-urgent, huge savings.

### Overnight question bank generation

Education platform generates 1000 practice questions. One request per topic, custom_id is the topic. Next morning, 1000 questions ready. 50% off.

## Code examples

### Submit, poll, retrieve a batch

**Python:**

```python
from anthropic import Anthropic
import json, time

client = Anthropic()

def prepare_requests(invoices):
    return [
        {
            "custom_id": f"invoice-{i}",
            "model": "claude-opus-4-5",
            "max_tokens": 1024,
            "system": "Extract invoice fields. Return JSON only.",
            "messages": [{"role": "user", "content": inv["content"]}],
        }
        for i, inv in enumerate(invoices)
    ]

def submit_batch(requests):
    jsonl = "\n".join(json.dumps(r) for r in requests)
    with open("/tmp/batch.jsonl", "w") as f:
        f.write(jsonl)
    with open("/tmp/batch.jsonl", "rb") as f:
        batch = client.beta.messages.batches.create(request_file=f)
    return batch.id

def poll(batch_id, max_wait=3600):
    start = time.time()
    while time.time() - start < max_wait:
        batch = client.beta.messages.batches.retrieve(batch_id)
        if batch.processing_status == "completed":
            return True
        if batch.processing_status == "failed":
            return False
        time.sleep(30)
    return False

def retrieve(batch_id):
    batch = client.beta.messages.batches.retrieve(batch_id)
    return [json.loads(line) for line in batch.result_file.split("\n") if line.strip()]

# Full workflow
invoices = [{"content": "Vendor: Acme, $247.83, 2026-05-01"}, ...]
batch_id = submit_batch(prepare_requests(invoices))
if poll(batch_id):
    results = retrieve(batch_id)
    print(f"{len(results)} extracted, 50% cost savings")
```

> Three stages: prepare JSONL → submit → poll → retrieve. custom_id correlates requests with results.

**TypeScript:**

```typescript
import Anthropic from "@anthropic-ai/sdk";
import fs from "fs";

const client = new Anthropic();

interface Invoice { content: string; }

function prepareRequests(invoices: Invoice[]) {
  return invoices.map((inv, i) => ({
    custom_id: `invoice-${i}`,
    model: "claude-opus-4-5",
    max_tokens: 1024,
    system: "Extract invoice fields. Return JSON only.",
    messages: [{ role: "user", content: inv.content }],
  }));
}

async function submitBatch(requests: any[]) {
  const jsonl = requests.map(r => JSON.stringify(r)).join("\n");
  fs.writeFileSync("/tmp/batch.jsonl", jsonl);
  const batch = await (client as any).beta.messages.batches.create({
    request_file: fs.readFileSync("/tmp/batch.jsonl", "utf-8"),
  });
  return batch.id as string;
}

async function poll(batchId: string, maxWaitMs = 3600000) {
  const start = Date.now();
  while (Date.now() - start < maxWaitMs) {
    const batch = await (client as any).beta.messages.batches.retrieve(batchId);
    if (batch.processing_status === "completed") return true;
    if (batch.processing_status === "failed") return false;
    await new Promise(r => setTimeout(r, 30000));
  }
  return false;
}

async function retrieve(batchId: string) {
  const batch = await (client as any).beta.messages.batches.retrieve(batchId);
  return (batch.result_file || "").split("\n").filter((l: string) => l.trim()).map(JSON.parse);
}
```

> Same three-stage pattern in TypeScript. async/await everywhere. custom_id is your correlation key.

## Looks-right vs actually-wrong

| Looks right | Actually wrong |
|---|---|
| Use Batch API for a CI/CD pre-merge check that must block the PR. | Batch processes within 24 hours, not immediately. Pre-merge needs synchronous responses (minutes). Use standard Messages API for latency-sensitive workflows. |
| Use Batch API for a feature that shows results to users in real time. | If a human is waiting (chatbot, UI, real-time), 24-hour window is unacceptable. Use synchronous. Batch is for no one waiting. |
| Batch API supports multi-turn tool calling. | Batch is single-turn per request. Each JSONL line is a separate messages.create() with no tool continuation. For tool loops, use synchronous API. |
| 50% savings means always use Batch over synchronous. | 50% savings only justifies 24-hour latency if no one is waiting. For interactive tasks, the cost of waiting (user frustration) exceeds the savings. |
| Batch API is faster for 50,000 requests. | Batch is cheaper, not faster. Batch processes within 24 hours; synchronous in parallel finishes in minutes. |

## Comparison

| Aspect | Synchronous Messages API | Batch API | Caching | Agentic Loop |
| --- | --- | --- | --- | --- |
| Latency | Immediate (ms) | Up to 24 hours | Immediate, reuses cache | Immediate per turn |
| Cost | 100% | 50% | 90% on reused content | 100% per turn (unless cached) |
| Use case | Interactive, real-time | Non-urgent bulk | Repeated prompts | Multi-turn reasoning |
| Throughput | Rate-limited, sequential | Bulk, batched | Per-conversation | Per-iteration |
| Tool calling | Supported | Not supported | Cached meta | Full support |
| Custom_id needed | No | Yes | No | No |

## Decision tree

1. **Is a human or system waiting in real time?**
   - **Yes:** Synchronous. Latency non-negotiable. No Batch.
   - **No:** Consider Batch if non-urgent and high-volume.

2. **Have 100+ requests to process?**
   - **Yes:** Batch's 50% savings justify the engineering overhead.
   - **No:** Synchronous simpler for small workloads.

3. **Can you wait 24 hours for results?**
   - **Yes:** Batch is ideal. Submit, poll, retrieve.
   - **No:** Use synchronous or reduce batch size.

4. **Need tool calling or multi-turn reasoning?**
   - **Yes:** Batch doesn't support tool continuation. Embed the loop in client code or use synchronous.
   - **No:** Batch viable. Single-turn only.

5. **Cost is primary, latency flexible?**
   - **Yes:** Batch (50% savings).
   - **No:** Synchronous (interactive). Cost is secondary to UX.

## Exam-pattern questions

### Q1. Use Batch API for a CI/CD pre-merge check that must block the PR. Viable?

No. Batch processes within 24 hours, not immediately. Pre-merge needs synchronous responses (minutes). Use the standard Messages API for latency-sensitive workflows.

### Q2. Customer-facing chatbot uses Batch API. What goes wrong?

24-hour latency is unacceptable for interactive UX. The customer waits 24 hours for a response; they leave. Use synchronous API for real-time. Batch is for no one waiting.

### Q3. Batch supports multi-turn tool calling, right?

No. Batch is single-turn per request. No tool continuation, no agentic loops, no streaming. Each JSONL line is a separate messages.create() request. For tool loops, use synchronous API.

### Q4. 50% savings means always use Batch. True?

No. 50% savings only justify 24-hour latency if no one is waiting. For interactive tasks, the cost of waiting (user frustration, business loss) exceeds the 50% savings. Batch is for asynchronous workloads only.

### Q5. Batch API processes 50,000 requests faster than synchronous?

No, cheaper not faster. Batch processes within 24 hours; synchronous in parallel finishes in minutes. If you need speed, use synchronous + concurrency. If you need cost, use Batch.

### Q6. What's the maximum batch size?

Up to 10,000 requests per batch. For larger workloads, split into multiple batches. Each batch is independent; submit them in parallel for higher throughput.

### Q7. How do you correlate requests with results?

Use the custom_id field. Each request defines a custom_id; the result includes the same custom_id. Match them to pair request with response. Without custom_id, results are unordered and untraceable.

### Q8. Can you cancel a batch after submitting?

No. Once submitted, the batch runs to completion. Plan carefully before submitting; cancellation is not supported. Test on small batches first, then scale to 10,000.

## FAQ

### Q1. How much does Batch cost?

50% of synchronous. A request that costs $1 sync costs $0.50 in Batch. Flat applies to all tokens, all models.

### Q2. How long do results take?

Up to 24 hours. Anthropic processes "within next 24 hours". Plan for 24, treat earlier as bonus.

### Q3. Maximum batch size?

Up to 10,000 requests per batch. Split larger workloads into multiple batches.

### Q4. Submit multiple batches in parallel?

Yes. Each gets a unique batch_id. Submit 10, poll all. Processing happens in parallel.

### Q5. What if a request fails in the batch?

Result includes error field. Process individually (retry or skip). Batch processing doesn't stop on errors.

### Q6. Can I cancel a batch after submitting?

No. Once submitted, runs to completion. Plan carefully before submitting.

### Q7. Does Batch support streaming?

No. Non-streaming, single-turn requests only.

### Q8. Vision or file uploads?

Yes. Include base64 images or file references. Anthropic processes normally.

### Q9. How do I correlate requests with results?

Use custom_id field. Each request includes one (you define), result includes the same. Match to pair.

### Q10. Better than Caching?

Complementary. Caching saves 90% on reused fixed content (5-min). Batch saves 50% on all tokens (24-hour). Use caching for interactive, Batch for async bulk.

---

**Source:** https://claudearchitectcertification.com/concepts/batch-api
**Vault sources:** ACP-T03 §5 batches
**Last reviewed:** 2026-05-04

**Evidence tiers** — 🟢 official Anthropic doc / API contract · 🟡 partial doc / inferred · 🟠 community-derived · 🔴 disputed.