# Invoice Processing Agent

> An AP-automation agent that wraps four guarantees around invoice approval. (1) Forced tool_use with a strict JSON schema (vendor_id, invoice_number, line_items[], total_amount, currency ISO 4217, due_date ISO 8601, PO_reference nullable) prevents fabrication. (2) Validation-retry loop confirms sum(line_items) total, currency in ISO 4217 enum, due_date >= invoice_date. (3) Three-way match reconciles invoice with the purchase order and the goods receipt; variance > 2% routes to human review. (4) PreToolUse hook on approve_payment denies if invoice_amount > vendor_authorization_cap, or if vendor on blocklist, or if (vendor_id, invoice_number) was seen in the last 90 days (duplicate detection). PostToolUse audit log captures every approval and rejection. The most-tested distractor: prompt-only field extraction leaks ~15% on edge invoices; forced tool_choice is the only credible architecture.

**Sub-marker:** P3.14
**Domains:** D2 · Tool Design + Integration, D5 · Context + Reliability
**Exam weight:** 33% of CCA-F (D2 + D5)
**Build time:** 26 minutes
**Source:** 🟠 Applied scenario. Community-derived from AP-automation patterns; composes P3.6 + P3.7 + P3.8 in a finance workflow.
**Canonical:** https://claudearchitectcertification.com/scenarios/invoice-processing-agent
**Last reviewed:** 2026-05-05

## In plain English

Think of this as the agent that handles your accounts-payable inbox without the team paying the same invoice twice. A vendor emails a PDF or scanned image; the agent extracts the structured fields (vendor, invoice number, line items, total, currency, due date, PO reference) using a strict schema so it cannot make values up; then it checks the math (line totals must equal the header total), looks up the matching purchase order and goods receipt to make sure the three documents agree, asks a deterministic policy hook whether the vendor still has authorization headroom and whether this exact invoice has been seen in the last 90 days, and only then approves payment. Anything ambiguous routes to a human AP analyst with a structured exception block. The whole point is that AP automation is one wrong cap-policy or duplicate-detection check away from a real money loss.

## Exam impact

Domain 2 (Tool Design, 18%) tests forced tool_choice + JSON schema authoring + the PreToolUse cap and duplicate-detection hooks. Domain 5 (Context, 15%) tests CASE_FACTS pinning across multi-page invoices, schema caching, and Batch API for overnight bulk runs. Composes the patterns from P3.6 (structured-data-extraction), P3.7 (agentic-tool-design), and P3.8 (long-document-processing) into a single deployable workflow. The 'why does my prompt-only AP agent leak 15% on edge invoices?' question is the canonical exam distractor.

## The problem

### What the customer needs
- Schema-conformant extraction on every invoice: vendor, number, line items, total, currency, due date, PO reference. No prose wrapping; downstream systems must parse cleanly.
- Three-way match before approval: invoice, purchase order, goods receipt all agree on amount, vendor, and quantities.
- Cap-policy enforcement that cannot be bypassed by clever invoice phrasing: vendor authorization caps, duplicate detection, blocklisted-vendor checks.
- Audit-grade trail of every approval and rejection so finance can replay any decision in a quarterly close.

### Why naive approaches fail
- Prompt 'output JSON' for invoice extraction: ~15% leakage on edge invoices (handwritten notes, mixed languages, credit memos, rotated scans).
- Single-pass extraction with no semantic validation: line totals do not match the header; corrupted records ship downstream.
- No three-way match: the agent approves an invoice for goods that were never received, or against a PO that does not exist.
- Cap policy in the system prompt: ~3% of approvals exceed authorization cap because prompts leak under unusual phrasing.
- No duplicate-invoice check: the same invoice number gets paid twice when the vendor re-sends after a delivery confirmation.

### Definition of done
- Forced tool_choice: { type: 'tool', name: 'extract_invoice' } on every extraction call.
- JSON schema requires vendor_id, invoice_number, line_items[], total_amount, currency (ISO 4217 enum), due_date (ISO 8601), PO_reference (nullable).
- Validation-retry loop confirms sum(line_items) total, currency in enum, due_date >= invoice_date.
- Three-way match service reconciles invoice + PO + GRN; variance > 2% routes to human review.
- PreToolUse hook on approve_payment: deny on cap exceeded, vendor blocklisted, or duplicate (vendor_id, invoice_number) in the last 90 days.
- PostToolUse audit log writes every approval / rejection / hook decision.

## Concepts in play

- 🟢 **Structured outputs** (`structured-outputs`), Forced tool_use as the structural contract
- 🟢 **Tool calling** (`tool-calling`), Schema lives in tools[0].input_schema
- 🟢 **tool_choice** (`tool-choice`), Forced for guaranteed extraction
- 🟢 **Evaluation** (`evaluation`), Semantic validation + three-way match
- 🟢 **Hooks** (`hooks`), PreToolUse cap + duplicate gate; PostToolUse audit
- 🟠 **Case-facts block** (`case-facts-block`), Vendor + invoice_number pinned across multi-page runs
- 🟢 **Prompt caching** (`prompt-caching`), Schema + vendor master cached for sustained traffic
- 🟢 **Batch API** (`batch-api`), Overnight bulk processing at 50% off

## Components

### Invoice JSON Schema, the contract, in tools[0].input_schema

The output shape lives inside a tool definition, not as freeform text. Required: vendor_id, invoice_number, line_items[], total_amount, currency (ISO 4217 enum), due_date (ISO 8601 string). Optional and nullable: PO_reference, tax_amount, notes. Every numeric field has a minimum: 0. Every line item has description, quantity, unit_price, total.

**Configuration:** tools = [{ name: 'extract_invoice', input_schema: { type: 'object', properties: { vendor_id: {type: 'string'}, invoice_number: {type: 'string'}, total_amount: {type: 'number', minimum: 0}, currency: {type: 'string', enum: ['USD', 'EUR', 'GBP', 'INR', 'JPY', 'unclear']}, due_date: {type: 'string', format: 'date'}, line_items: {type: 'array', items: {...}}, PO_reference: {type: ['string', 'null']} }, required: ['vendor_id', 'invoice_number', 'total_amount', 'currency', 'due_date', 'line_items'] } }]
**Concept:** `structured-outputs`

### Forced tool_use Extractor, tool_choice: { type: 'tool', name: 'extract_invoice' }

Forces the model to fire extract_invoice with arguments matching the schema. No prose preamble, no probabilistic adherence. Vision-capable invocation reads the PDF or image; the model emits a structured tool_use. Pair with few-shot examples that show currency: 'unclear' on truly ambiguous source.

**Configuration:** tool_choice: { type: 'tool', name: 'extract_invoice' }. Use auto only on triage-style flows. Forced is for mandatory extraction.
**Concept:** `tool-choice`

### Validation-Retry Loop, sum check, currency enum, date sanity

Schema enforces shape. Code enforces meaning. After parse: sum(line_items[].total) total_amount (within 0.01 cent tolerance for FX rounding); currency in the enum; due_date format YYYY-MM-DD; due_date >= invoice_date. On failure, feed the specific error back to the model ('line totals sum to 4950 but header total is 5000'); typical convergence in 1-2 retries.

**Configuration:** loop: extract -> parse -> validate_semantically -> on failure, append { role: 'user', content: tool_result with is_error: true and a specific error } -> retry. Max retries: 3. After 3, route to human review.
**Concept:** `evaluation`

### Three-Way Match Service, invoice + PO + goods receipt

Queries the PO master and the goods-receipt ledger by PO_reference. Compares amount (variance <= 2% OK for FX rounding and small price changes), vendor identity (normalized vendor name fuzzy match), line-item count (must match), and date sanity (invoice date >= PO date; receipt date >= PO date). Variance above thresholds returns a structured exception; invoice is held pending human review.

**Configuration:** match(invoice, po, grn) -> { match: bool, variance_pct, mismatched_fields[], routed_to: 'auto-approve' | 'human-review' }. Threshold: amount variance > 2% -> human-review. Vendor mismatch -> human-review. Line-item count mismatch -> human-review.
**Concept:** `evaluation`

### PreToolUse Cap and Duplicate Hook, deterministic policy gate before approve_payment

Sits between the model's tool_use for approve_payment and actual execution. Reads tool_input.vendor_id, tool_input.amount, tool_input.invoice_number. Three checks. (1) Cap: vendor_ytd_spend + amount <= vendor_authorization_cap. (2) Blocklist: vendor not in the active blocklist. (3) Duplicate: no row in the audit log with the same (vendor_id, invoice_number) in the last 90 days. Any check fails and the hook exits 2 with a structured stderr message; the agent observes the deny as tool_result is_error: true and routes to a structured exception block for the AP analyst.

**Configuration:** matcher: 'approve_payment'. Hook exits 2 with stderr { reason: 'cap_exceeded' | 'vendor_blocklisted' | 'duplicate_detected', detail: ..., recommended_action: ... }. SDK forwards stderr to the model as a tool_result with is_error: true.
**Concept:** `hooks`

## Build steps

### 1. Author the invoice JSON schema as a tool definition

Define the output shape in tools[0].input_schema. Every required field listed in required[]. Currency is an enum that includes an 'unclear' escape hatch. PO_reference is ['string', 'null'] because cash invoices and credit memos have no PO. Every numeric field has minimum: 0. Line items are an array with description, quantity, unit_price, total. The schema is the contract; everything downstream depends on it being right.

**Python:**

```python
from anthropic import Anthropic
client = Anthropic()

EXTRACT_INVOICE_TOOL = {
    "name": "extract_invoice",
    "description": "Extract a structured invoice record from a PDF or image.",
    "input_schema": {
        "type": "object",
        "properties": {
            "vendor_id": {"type": "string"},
            "invoice_number": {"type": "string"},
            "invoice_date": {"type": "string", "format": "date"},
            "due_date": {"type": "string", "format": "date"},
            "currency": {
                "type": "string",
                "enum": ["USD", "EUR", "GBP", "INR", "JPY", "unclear"],
            },
            "total_amount": {"type": "number", "minimum": 0},
            "tax_amount": {"type": ["number", "null"], "minimum": 0},
            "PO_reference": {"type": ["string", "null"]},
            "line_items": {
                "type": "array",
                "minItems": 1,
                "items": {
                    "type": "object",
                    "properties": {
                        "description": {"type": "string"},
                        "quantity": {"type": "number", "minimum": 0},
                        "unit_price": {"type": "number", "minimum": 0},
                        "total": {"type": "number", "minimum": 0},
                    },
                    "required": ["description", "quantity", "unit_price", "total"],
                },
            },
        },
        "required": [
            "vendor_id", "invoice_number", "invoice_date", "due_date",
            "currency", "total_amount", "line_items",
        ],
    },
}
```

**TypeScript:**

```typescript
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();

const EXTRACT_INVOICE_TOOL: Anthropic.Tool = {
  name: "extract_invoice",
  description: "Extract a structured invoice record from a PDF or image.",
  input_schema: {
    type: "object",
    properties: {
      vendor_id: { type: "string" },
      invoice_number: { type: "string" },
      invoice_date: { type: "string", format: "date" },
      due_date: { type: "string", format: "date" },
      currency: {
        type: "string",
        enum: ["USD", "EUR", "GBP", "INR", "JPY", "unclear"],
      },
      total_amount: { type: "number", minimum: 0 },
      tax_amount: { type: ["number", "null"], minimum: 0 },
      PO_reference: { type: ["string", "null"] },
      line_items: {
        type: "array",
        minItems: 1,
        items: {
          type: "object",
          properties: {
            description: { type: "string" },
            quantity: { type: "number", minimum: 0 },
            unit_price: { type: "number", minimum: 0 },
            total: { type: "number", minimum: 0 },
          },
          required: ["description", "quantity", "unit_price", "total"],
        },
      },
    },
    required: [
      "vendor_id", "invoice_number", "invoice_date", "due_date",
      "currency", "total_amount", "line_items",
    ],
  },
};
```

Concept: `structured-outputs`

### 2. Force tool_choice and run extraction with vision input

Set tool_choice: { type: 'tool', name: 'extract_invoice' } so the model has no choice but to fire the tool with arguments matching the schema. Pass the invoice as a vision input (PDF page rasterized to image, or direct image upload). The model emits a structured tool_use; the harness extracts tool_use.input as the candidate record.

**Python:**

```python
import base64

def extract_invoice(invoice_image_bytes: bytes, mime_type: str = "image/png") -> dict:
    image_b64 = base64.b64encode(invoice_image_bytes).decode("ascii")
    resp = client.messages.create(
        model="claude-sonnet-4.5",
        max_tokens=2048,
        tools=[EXTRACT_INVOICE_TOOL],
        tool_choice={"type": "tool", "name": "extract_invoice"},
        messages=[
            {
                "role": "user",
                "content": [
                    {
                        "type": "image",
                        "source": {"type": "base64", "media_type": mime_type, "data": image_b64},
                    },
                    {"type": "text", "text": "Extract this invoice into the schema."},
                ],
            }
        ],
    )
    for block in resp.content:
        if block.type == "tool_use" and block.name == "extract_invoice":
            return block.input
    raise RuntimeError("forced tool_choice did not yield tool_use")
```

**TypeScript:**

```typescript
async function extractInvoice(
  invoiceBytes: Uint8Array,
  mimeType: "image/png" | "image/jpeg" = "image/png",
) {
  const imageB64 = Buffer.from(invoiceBytes).toString("base64");
  const resp = await client.messages.create({
    model: "claude-sonnet-4.5",
    max_tokens: 2048,
    tools: [EXTRACT_INVOICE_TOOL],
    tool_choice: { type: "tool", name: "extract_invoice" },
    messages: [
      {
        role: "user",
        content: [
          {
            type: "image",
            source: { type: "base64", media_type: mimeType, data: imageB64 },
          },
          { type: "text", text: "Extract this invoice into the schema." },
        ],
      },
    ],
  });
  for (const block of resp.content) {
    if (block.type === "tool_use" && block.name === "extract_invoice") {
      return block.input as Record<string, unknown>;
    }
  }
  throw new Error("forced tool_choice did not yield tool_use");
}
```

Concept: `tool-choice`

### 3. Wrap extraction in a validation-retry loop

Schema guarantees structure; semantics need code. After parsing, validate: sum(line_items[].total) equals total_amount within 0.01 tolerance; currency in the enum; due_date format and >= invoice_date. On failure, feed the specific error back via tool_result with is_error: true so the model sees what was wrong; retry up to 3 times. Most failures converge in 1-2 retries because the model now knows what the validator rejected.

**Python:**

```python
from datetime import date

def validate(record: dict) -> list[str]:
    errors = []
    items_sum = sum(it.get("total", 0) for it in record.get("line_items", []))
    if abs(items_sum - record.get("total_amount", 0)) > 0.01:
        errors.append(
            f"line items sum to {items_sum:.2f} but total_amount is "
            f"{record['total_amount']:.2f}; reconcile"
        )
    if record.get("currency") not in {"USD", "EUR", "GBP", "INR", "JPY", "unclear"}:
        errors.append(f"currency {record.get('currency')!r} not in ISO 4217 enum")
    try:
        inv_date = date.fromisoformat(record.get("invoice_date", ""))
        due_date = date.fromisoformat(record.get("due_date", ""))
        if due_date < inv_date:
            errors.append(
                f"due_date {due_date} is before invoice_date {inv_date}"
            )
    except ValueError as e:
        errors.append(f"date parse failed: {e}")
    return errors

def extract_with_retry(invoice_image_bytes: bytes, max_retries: int = 3) -> dict:
    image_b64 = base64.b64encode(invoice_image_bytes).decode("ascii")
    messages = [{
        "role": "user",
        "content": [
            {"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": image_b64}},
            {"type": "text", "content": "Extract this invoice into the schema."},
        ],
    }]
    for attempt in range(max_retries):
        resp = client.messages.create(
            model="claude-sonnet-4.5",
            max_tokens=2048,
            tools=[EXTRACT_INVOICE_TOOL],
            tool_choice={"type": "tool", "name": "extract_invoice"},
            messages=messages,
        )
        tool_use = next(b for b in resp.content if b.type == "tool_use")
        record = tool_use.input
        errors = validate(record)
        if not errors:
            return record
        messages.append({"role": "assistant", "content": resp.content})
        messages.append({
            "role": "user",
            "content": [{
                "type": "tool_result",
                "tool_use_id": tool_use.id,
                "content": "Validation failed: " + "; ".join(errors) + ". Re-extract.",
                "is_error": True,
            }],
        })
    raise ValueError(f"extraction did not converge in {max_retries} attempts")
```

**TypeScript:**

```typescript
function validate(record: Record<string, any>): string[] {
  const errors: string[] = [];
  const itemsSum = (record.line_items as Array<{ total: number }>)?.reduce(
    (s, it) => s + (it.total ?? 0),
    0,
  ) ?? 0;
  if (Math.abs(itemsSum - (record.total_amount ?? 0)) > 0.01) {
    errors.push(
      `line items sum to ${itemsSum.toFixed(2)} but total_amount is ` +
        `${(record.total_amount ?? 0).toFixed(2)}; reconcile`,
    );
  }
  const validCurrencies = new Set(["USD", "EUR", "GBP", "INR", "JPY", "unclear"]);
  if (!validCurrencies.has(record.currency)) {
    errors.push(`currency ${JSON.stringify(record.currency)} not in ISO 4217 enum`);
  }
  try {
    const inv = new Date(record.invoice_date);
    const due = new Date(record.due_date);
    if (due < inv) {
      errors.push(`due_date ${record.due_date} is before invoice_date ${record.invoice_date}`);
    }
  } catch (e) {
    errors.push(`date parse failed: ${(e as Error).message}`);
  }
  return errors;
}
```

Concept: `evaluation`

### 4. Run a three-way match against PO and goods receipt

Query the PO master by PO_reference and the goods-receipt ledger by the same key. Compare amount (variance <= 2% OK for FX rounding and minor price changes), vendor identity (normalized fuzzy match on vendor name), and line-item count (must match exactly). Variance above any threshold routes to human review with a structured exception block; otherwise auto-proceed.

**Python:**

```python
def three_way_match(invoice: dict, po: dict, grn: dict) -> dict:
    """Reconcile invoice with purchase order and goods receipt note."""
    issues = []
    inv_amount = invoice["total_amount"]
    po_amount = po.get("total_amount", 0)
    if po_amount > 0:
        variance_pct = abs(inv_amount - po_amount) / po_amount * 100
        if variance_pct > 2.0:
            issues.append(
                f"amount variance {variance_pct:.2f}% exceeds 2% threshold"
            )

    if normalize_vendor(invoice["vendor_id"]) != normalize_vendor(po["vendor_id"]):
        issues.append(
            f"vendor mismatch: invoice {invoice['vendor_id']!r} "
            f"vs PO {po['vendor_id']!r}"
        )

    if len(invoice["line_items"]) != len(grn.get("line_items", [])):
        issues.append(
            f"line-item count mismatch: invoice {len(invoice['line_items'])} "
            f"vs GRN {len(grn.get('line_items', []))}"
        )

    return {
        "match": len(issues) == 0,
        "issues": issues,
        "routed_to": "auto-approve" if not issues else "human-review",
    }

def normalize_vendor(name: str) -> str:
    return "".join(ch.lower() for ch in name if ch.isalnum())
```

**TypeScript:**

```typescript
interface Invoice { vendor_id: string; total_amount: number; line_items: unknown[]; }
interface PO { vendor_id: string; total_amount: number; }
interface GRN { line_items: unknown[]; }

export function threeWayMatch(invoice: Invoice, po: PO, grn: GRN) {
  // Reconcile invoice with purchase order and goods receipt note.
  const issues: string[] = [];
  if (po.total_amount > 0) {
    const variancePct =
      (Math.abs(invoice.total_amount - po.total_amount) / po.total_amount) * 100;
    if (variancePct > 2.0) {
      issues.push(`amount variance ${variancePct.toFixed(2)}% exceeds 2% threshold`);
    }
  }
  if (normalizeVendor(invoice.vendor_id) !== normalizeVendor(po.vendor_id)) {
    issues.push(
      `vendor mismatch: invoice ${JSON.stringify(invoice.vendor_id)} ` +
      `vs PO ${JSON.stringify(po.vendor_id)}`,
    );
  }
  if (invoice.line_items.length !== (grn.line_items?.length ?? 0)) {
    issues.push(
      `line-item count mismatch: invoice ${invoice.line_items.length} ` +
      `vs GRN ${grn.line_items?.length ?? 0}`,
    );
  }
  return {
    match: issues.length === 0,
    issues,
    routed_to: issues.length === 0 ? "auto-approve" : "human-review",
  };
}

function normalizeVendor(name: string): string {
  return name.toLowerCase().replace(/[^a-z0-9]/g, "");
}
```

Concept: `evaluation`

### 5. Wire the PreToolUse cap and duplicate-detection hook

Hook on approve_payment. Three checks. (1) Cap: vendor_ytd_spend + amount <= vendor_authorization_cap. (2) Blocklist: vendor not on the active blocklist. (3) Duplicate: no audit-log row with the same (vendor_id, invoice_number) in the last 90 days. Any check fails and the hook exits 2 with a structured stderr message; the agent observes the deny and routes to an exception block for the AP analyst. Deterministic, no prompt-injection bypass.

**Python:**

```python
# .claude/hooks/invoice_approval.py
import sys, json, os, sqlite3
from datetime import date, timedelta

DB = sqlite3.connect(os.environ.get("AUDIT_DB", "audit.sqlite3"))

def vendor_cap_check(vendor_id: str, amount: float) -> str | None:
    row = DB.execute(
        "SELECT cap, ytd_spend FROM vendor_master WHERE vendor_id = ?",
        (vendor_id,),
    ).fetchone()
    if not row:
        return f"vendor {vendor_id!r} not in master; escalate"
    cap, ytd = row
    if ytd + amount > cap:
        remaining = cap - ytd
        return (
            f"vendor cap exceeded: ytd_spend={ytd:.2f} + amount={amount:.2f} > "
            f"cap={cap:.2f}; cap_remaining={remaining:.2f}"
        )
    return None

def blocklist_check(vendor_id: str) -> str | None:
    row = DB.execute(
        "SELECT 1 FROM vendor_blocklist WHERE vendor_id = ?", (vendor_id,)
    ).fetchone()
    if row:
        return f"vendor {vendor_id!r} on active blocklist"
    return None

def duplicate_check(vendor_id: str, invoice_number: str) -> str | None:
    cutoff = (date.today() - timedelta(days=90)).isoformat()
    row = DB.execute(
        "SELECT approved_at FROM audit_log WHERE vendor_id = ? "
        "AND invoice_number = ? AND approved_at >= ? ORDER BY approved_at DESC LIMIT 1",
        (vendor_id, invoice_number, cutoff),
    ).fetchone()
    if row:
        return (
            f"duplicate detected: same (vendor_id, invoice_number) approved on "
            f"{row[0]}; reject this submission"
        )
    return None

def main():
    payload = json.loads(sys.stdin.read())
    if payload["tool_name"] != "approve_payment":
        sys.exit(0)
    inp = payload["tool_input"]
    for check in (
        vendor_cap_check(inp["vendor_id"], inp["amount"]),
        blocklist_check(inp["vendor_id"]),
        duplicate_check(inp["vendor_id"], inp["invoice_number"]),
    ):
        if check:
            print(check, file=sys.stderr)
            sys.exit(2)
    sys.exit(0)

if __name__ == "__main__":
    main()
```

**TypeScript:**

```typescript
// .claude/hooks/invoice-approval.ts
import { readFileSync } from "node:fs";
import Database from "better-sqlite3";

const db = new Database(process.env.AUDIT_DB ?? "audit.sqlite3");

function vendorCapCheck(vendorId: string, amount: number): string | null {
  const row = db
    .prepare("SELECT cap, ytd_spend FROM vendor_master WHERE vendor_id = ?")
    .get(vendorId) as { cap: number; ytd_spend: number } | undefined;
  if (!row) return `vendor ${JSON.stringify(vendorId)} not in master; escalate`;
  if (row.ytd_spend + amount > row.cap) {
    const remaining = row.cap - row.ytd_spend;
    return (
      `vendor cap exceeded: ytd_spend=${row.ytd_spend.toFixed(2)} + ` +
      `amount=${amount.toFixed(2)} > cap=${row.cap.toFixed(2)}; ` +
      `cap_remaining=${remaining.toFixed(2)}`
    );
  }
  return null;
}

function blocklistCheck(vendorId: string): string | null {
  const row = db.prepare("SELECT 1 FROM vendor_blocklist WHERE vendor_id = ?").get(vendorId);
  return row ? `vendor ${JSON.stringify(vendorId)} on active blocklist` : null;
}

function duplicateCheck(vendorId: string, invoiceNumber: string): string | null {
  const cutoff = new Date(Date.now() - 90 * 86400_000).toISOString().slice(0, 10);
  const row = db
    .prepare(
      "SELECT approved_at FROM audit_log WHERE vendor_id = ? AND invoice_number = ? " +
        "AND approved_at >= ? ORDER BY approved_at DESC LIMIT 1",
    )
    .get(vendorId, invoiceNumber, cutoff) as { approved_at: string } | undefined;
  return row
    ? `duplicate detected: same (vendor_id, invoice_number) approved on ${row.approved_at}; reject this submission`
    : null;
}

const payload = JSON.parse(readFileSync(0, "utf8"));
if (payload.tool_name !== "approve_payment") process.exit(0);
const inp = payload.tool_input;
for (const check of [
  vendorCapCheck(inp.vendor_id, inp.amount),
  blocklistCheck(inp.vendor_id),
  duplicateCheck(inp.vendor_id, inp.invoice_number),
]) {
  if (check) {
    process.stderr.write(check + "\n");
    process.exit(2);
  }
}
process.exit(0);
```

Concept: `hooks`

### 6. Cache the schema and the vendor master

The schema is the largest stable token cost (~1500 tokens for invoice extraction). The vendor master (caps, blocklist, name normalization rules) is also stable per session. Mark both with cache_control: ephemeral so a 5-minute TTL keeps them warm across sustained AP traffic. Realistic savings: ~80% on cached portions, ~50% reduction on overall steady-state cost.

**Python:**

```python
def extract_with_cache(invoice_image_bytes: bytes, vendor_master_blob: str) -> dict:
    image_b64 = base64.b64encode(invoice_image_bytes).decode("ascii")
    resp = client.messages.create(
        model="claude-sonnet-4.5",
        max_tokens=2048,
        system=[
            {
                "type": "text",
                "text": (
                    "You are an AP-automation extraction agent. Return only "
                    "structured tool_use; never prose."
                ),
                "cache_control": {"type": "ephemeral"},
            },
            {
                "type": "text",
                "text": vendor_master_blob,
                "cache_control": {"type": "ephemeral"},
            },
        ],
        tools=[
            {**EXTRACT_INVOICE_TOOL, "cache_control": {"type": "ephemeral"}},
        ],
        tool_choice={"type": "tool", "name": "extract_invoice"},
        messages=[{
            "role": "user",
            "content": [
                {"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": image_b64}},
                {"type": "text", "text": "Extract this invoice into the schema."},
            ],
        }],
    )
    print(f"cache_creation: {resp.usage.cache_creation_input_tokens}")
    print(f"cache_read:     {resp.usage.cache_read_input_tokens}")
    return next(b.input for b in resp.content if b.type == "tool_use")
```

**TypeScript:**

```typescript
async function extractWithCache(
  invoiceBytes: Uint8Array,
  vendorMasterBlob: string,
) {
  const imageB64 = Buffer.from(invoiceBytes).toString("base64");
  const resp = await client.messages.create({
    model: "claude-sonnet-4.5",
    max_tokens: 2048,
    system: [
      {
        type: "text",
        text:
          "You are an AP-automation extraction agent. Return only structured " +
          "tool_use; never prose.",
        cache_control: { type: "ephemeral" },
      },
      {
        type: "text",
        text: vendorMasterBlob,
        cache_control: { type: "ephemeral" },
      },
    ],
    tools: [
      { ...EXTRACT_INVOICE_TOOL, cache_control: { type: "ephemeral" } },
    ],
    tool_choice: { type: "tool", name: "extract_invoice" },
    messages: [
      {
        role: "user",
        content: [
          {
            type: "image",
            source: { type: "base64", media_type: "image/png", data: imageB64 },
          },
          { type: "text", text: "Extract this invoice into the schema." },
        ],
      },
    ],
  });
  console.log(`cache_creation: ${resp.usage.cache_creation_input_tokens}`);
  console.log(`cache_read:     ${resp.usage.cache_read_input_tokens}`);
  const tu = resp.content.find((b) => b.type === "tool_use");
  return tu?.type === "tool_use" ? tu.input : null;
}
```

Concept: `prompt-caching`

### 7. Use Batch API for overnight bulk runs

Sync API for inbox-arrival latency. For nightly backfills (10K invoices), the Batch API gives a flat 50% discount with a 24-hour SLA. Combined with schema and vendor-master caching (per-100-item sub-batches keep ephemeral cache warm), bulk extraction cost drops ~75% versus naive sync. Resubmit failures the next morning as a fresh batch with the specific error in the next message.

**Python:**

```python
def submit_bulk_extraction(invoices: list[dict]) -> str:
    """Submit a batch of invoice extractions for overnight processing."""
    requests = []
    for inv in invoices:
        image_b64 = base64.b64encode(inv["image_bytes"]).decode("ascii")
        requests.append({
            "custom_id": f"extract-{inv['id']}",
            "params": {
                "model": "claude-sonnet-4.5",
                "max_tokens": 2048,
                "tools": [EXTRACT_INVOICE_TOOL],
                "tool_choice": {"type": "tool", "name": "extract_invoice"},
                "messages": [{
                    "role": "user",
                    "content": [
                        {"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": image_b64}},
                        {"type": "text", "text": "Extract this invoice into the schema."},
                    ],
                }],
            },
        })
    batch = client.messages.batches.create(requests=requests)
    print(f"Batch {batch.id} submitted with {len(requests)} extractions")
    return batch.id

def harvest_batch(batch_id: str):
    batch = client.messages.batches.retrieve(batch_id)
    if batch.processing_status != "ended":
        return {"status": "not_ready"}
    accepted, rejected = [], []
    for r in client.messages.batches.results(batch_id):
        if r.result.type == "succeeded":
            tu = next(b for b in r.result.message.content if b.type == "tool_use")
            if not validate(tu.input):
                accepted.append(tu.input)
                continue
        rejected.append(r.custom_id)
    return {"accepted": accepted, "rejected_for_retry": rejected}
```

**TypeScript:**

```typescript
async function submitBulkExtraction(
  invoices: Array<{ id: string; image_bytes: Uint8Array }>,
) {
  const requests = invoices.map((inv) => {
    const imageB64 = Buffer.from(inv.image_bytes).toString("base64");
    return {
      custom_id: `extract-${inv.id}`,
      params: {
        model: "claude-sonnet-4.5",
        max_tokens: 2048,
        tools: [EXTRACT_INVOICE_TOOL],
        tool_choice: { type: "tool", name: "extract_invoice" } as const,
        messages: [
          {
            role: "user" as const,
            content: [
              {
                type: "image",
                source: { type: "base64", media_type: "image/png", data: imageB64 },
              },
              { type: "text", text: "Extract this invoice into the schema." },
            ],
          },
        ],
      },
    };
  });
  const batch = await client.messages.batches.create({ requests });
  console.log(`Batch ${batch.id} submitted with ${requests.length} extractions`);
  return batch.id;
}

async function harvestBatch(batchId: string) {
  const batch = await client.messages.batches.retrieve(batchId);
  if (batch.processing_status !== "ended") return { status: "not_ready" };
  const accepted: unknown[] = [];
  const rejected: string[] = [];
  for await (const r of client.messages.batches.results(batchId)) {
    if (r.result.type === "succeeded") {
      const tu = r.result.message.content.find((b) => b.type === "tool_use");
      if (
        tu?.type === "tool_use" &&
        validate(tu.input as Record<string, unknown>).length === 0
      ) {
        accepted.push(tu.input);
        continue;
      }
    }
    rejected.push(r.custom_id);
  }
  return { accepted, rejected_for_retry: rejected };
}
```

Concept: `batch-api`

### 8. Audit-log every approval, rejection, and hook decision

PostToolUse hook on every approve_payment call. Append a row to durable storage: timestamp, vendor_id, invoice_number, amount, currency, three-way-match outcome, hook decisions (cap, blocklist, duplicate), final routing (approved | human-review | denied). Retain at least 7 years for audit compliance. The audit log is the replay tool when finance asks 'why did we approve this in May?' three months later.

**Python:**

```python
import datetime, json, sqlite3
from pathlib import Path

AUDIT_DB = sqlite3.connect("audit.sqlite3")
AUDIT_DB.execute("""
CREATE TABLE IF NOT EXISTS audit_log (
    ts TEXT PRIMARY KEY,
    vendor_id TEXT,
    invoice_number TEXT,
    amount REAL,
    currency TEXT,
    match_outcome TEXT,
    hook_decisions TEXT,
    final_routing TEXT,
    approved_at TEXT
)
""")

def audit(invoice: dict, match_result: dict, hook_decisions: dict, routing: str):
    AUDIT_DB.execute(
        "INSERT INTO audit_log (ts, vendor_id, invoice_number, amount, currency, "
        "match_outcome, hook_decisions, final_routing, approved_at) "
        "VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)",
        (
            datetime.datetime.utcnow().isoformat() + "Z",
            invoice["vendor_id"],
            invoice["invoice_number"],
            invoice["total_amount"],
            invoice["currency"],
            json.dumps(match_result),
            json.dumps(hook_decisions),
            routing,
            datetime.date.today().isoformat() if routing == "approved" else None,
        ),
    )
    AUDIT_DB.commit()
```

**TypeScript:**

```typescript
import Database from "better-sqlite3";

const auditDb = new Database("audit.sqlite3");
auditDb.exec(`
CREATE TABLE IF NOT EXISTS audit_log (
    ts TEXT PRIMARY KEY,
    vendor_id TEXT,
    invoice_number TEXT,
    amount REAL,
    currency TEXT,
    match_outcome TEXT,
    hook_decisions TEXT,
    final_routing TEXT,
    approved_at TEXT
)
`);

interface InvoiceRecord { vendor_id: string; invoice_number: string; total_amount: number; currency: string; }

export function audit(
  invoice: InvoiceRecord,
  matchResult: Record<string, unknown>,
  hookDecisions: Record<string, unknown>,
  routing: "approved" | "human-review" | "denied",
) {
  auditDb
    .prepare(
      "INSERT INTO audit_log (ts, vendor_id, invoice_number, amount, currency, " +
        "match_outcome, hook_decisions, final_routing, approved_at) " +
        "VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)",
    )
    .run(
      new Date().toISOString(),
      invoice.vendor_id,
      invoice.invoice_number,
      invoice.total_amount,
      invoice.currency,
      JSON.stringify(matchResult),
      JSON.stringify(hookDecisions),
      routing,
      routing === "approved" ? new Date().toISOString().slice(0, 10) : null,
    );
}
```

Concept: `evaluation`

## Decision matrix

| Decision | Right answer | Wrong answer | Why |
|---|---|---|---|
| Output shape guarantee on extraction | Forced tool_choice with input_schema as the contract | Prompt 'output JSON' or 'respond with valid JSON only' | Prompt-only is probabilistic (~85% adherence); ~15% leakage on edge invoices (handwritten notes, mixed languages, credit memos, rotated scans). Forced tool_use is structural (100% adherence). The cost is identical; the reliability gap is decisive in finance. |
| Vendor authorization cap enforcement | PreToolUse hook reads vendor_ytd_spend, exits 2 on violation | System prompt: 'never approve above the vendor cap' | Prompts leak ~3% in production. Hooks are deterministic. For policy-bearing limits (cap, duplicate, blocklist), the deterministic gate is the only credible architecture. Prompt-only enforcement is a finding waiting to be flagged in the next audit. |
| Same invoice arriving twice (vendor re-sends after delivery) | PreToolUse duplicate-detection hook keyed on (vendor_id, invoice_number) over last 90 days | Trust the model to notice duplicates in conversation context | Context memory is unreliable across multi-turn or batch runs. The hook is stateless, queries the audit log, and prevents race conditions when two parallel extractions hit the same invoice within seconds. |
| Bulk overnight processing of 10K invoices | Batch API + schema and vendor-master caching | Sync API in a tight loop or sync API without caching | Batch API gives a flat 50% discount with a 24-hour SLA. Caching adds another ~80% off the schema and vendor-master tokens. Combined: ~75% savings versus naive sync. Sync API is reserved for inbox-arrival latency. |

## Failure modes

| Anti-pattern | Failure | Fix |
|---|---|---|
| AP-INV-01 · Prompt-only field extraction | Prompt 'extract this invoice as JSON' leaks ~15% on edge invoices. Downstream parser breaks every seventh document; AP analyst spends the morning re-keying invoices the agent botched. | Forced tool_choice: { type: 'tool', name: 'extract_invoice' } plus a strict JSON schema in tools[0].input_schema. The model has no choice but to fire the tool with arguments matching the schema. 100% structural adherence. |
| AP-INV-02 · No semantic validation | Single-pass extraction with no math check. The model returns a structurally-valid record where line totals sum to 4950 but the header total says 5000. Bad data ships downstream; quarterly close finds the discrepancy three months later. | Validation-retry loop. After parse, validate sum(line_items[].total) total_amount (within 0.01 tolerance), currency in ISO 4217 enum, due_date >= invoice_date. On failure, feed the specific error back; retry up to 3 times; route to human review if still failing. |
| AP-INV-03 · No three-way match | Agent approves an invoice that has no matching purchase order, or where the goods receipt was for fewer items, or where the vendor name on the invoice does not match the vendor on the PO. AP pays for goods never received, or pays the wrong vendor. | Three-way match service queries PO master and goods-receipt ledger. Compares amount (variance <= 2% OK), normalized vendor name, line-item count. Variance above thresholds routes to human review with a structured exception block. |
| AP-INV-04 · Cap policy in the system prompt | System prompt: 'never approve more than the vendor authorization cap'. Production logs show ~3% of approvals exceed the cap because the prompt language leaks under unusual phrasing or when the agent is processing many invoices in one session. | PreToolUse hook on approve_payment reads tool_input.vendor_id and tool_input.amount, queries the vendor master for vendor_ytd_spend + cap, exits 2 on violation with a structured message including cap_remaining. Deterministic, not probabilistic. |
| AP-INV-05 · No duplicate-invoice check | Vendor re-sends the same invoice number after delivery confirmation, or the same invoice is uploaded twice through different channels (email + portal). The agent approves both. AP discovers the duplicate payment in next month's reconciliation. | PreToolUse hook queries the audit log for any row with the same (vendor_id, invoice_number) in the last 90 days. On match, exits 2 with the prior approval date. Stateless, auditable, prevents race conditions in parallel runs. |

## Implementation checklist

- [ ] Invoice JSON schema lives in tools[0].input_schema with required and nullable fields explicit (`structured-outputs`)
- [ ] tool_choice forced to extract_invoice on every extraction call (`tool-choice`)
- [ ] Currency field is an enum with an 'unclear' escape hatch (`structured-outputs`)
- [ ] Validation-retry loop with sum check, currency enum, date sanity (`evaluation`)
- [ ] Three-way match service against PO master and goods-receipt ledger (`evaluation`)
- [ ] PreToolUse cap-and-duplicate hook on approve_payment (`hooks`)
- [ ] Schema and vendor master cached with cache_control: ephemeral (`prompt-caching`)
- [ ] Batch API for nightly bulk runs (greater than 100 invoices) (`batch-api`)
- [ ] PostToolUse audit log writes every approval, rejection, and hook decision; 7-year retention
- [ ] Stratified accuracy reporting by vendor, currency, document type
- [ ] Human-review queue for invoices that fail validation, three-way match, or hook checks

## Cost &amp; latency

- **Per-invoice synchronous extraction (cached schema):** ~$0.001 to $0.003, Schema ~1500 tokens at cache-read price plus image vision tokens (~1000-2000) plus ~150 output. Sustained AP traffic with cache hits >= 70% drops effective cost predictably.
- **Three-way match service:** ~$0 token cost; ~10-30 ms latency, Pure SQL queries against PO master and goods-receipt ledger. No LLM call. Latency is dominated by the database round-trip.
- **PreToolUse hook overhead:** ~$0; ~5 ms latency, Subprocess reads stdin JSON, runs three SQL queries (vendor cap, blocklist, duplicate), exits 0 or 2. No LLM call. Latency below the noise floor of any tool dispatch.
- **Batch overnight (10K invoices, batch + caching):** ~75% off naive sync, Batch API flat 50% discount times schema and vendor-master cache (~80% off cached portion). 10K invoices at typical complexity drop from ~$30 sync uncached to ~$8 batch cached.
- **Validation-retry overhead:** ~+25% on records that retry, 5-10% of records retry once; 1-2% retry twice. Specific-error feedback converges quickly. Pipeline cost up ~5% to gain ~99% schema-conformance plus ~99% semantic-conformance.
- **Per-1000-invoices total (steady state):** ~$1.00 to $3.00, Sync cached extraction at scale. Adding human review of unconverged records adds operator-time cost but recovers the long tail of edge invoices.

## Domain weights

- **D2 · Tool Design + Integration (18%):** Invoice schema. Forced tool_choice. PreToolUse cap and duplicate hook. Three-way match contract.
- **D5 · Context + Reliability (15%):** Schema caching. Vendor master caching. Multi-page invoice CASE_FACTS. Batch API integration.

## Practice questions

### Q1. Your invoice extraction agent uses prompt-only extraction. Production logs show ~15% of records arrive with prose wrapping ('Sure, here is the JSON:') and the downstream parser breaks. What is the architectural fix?

Move the schema into tools[0].input_schema and set tool_choice: { type: 'tool', name: 'extract_invoice' }. This forces the model to emit a structured tool_use call matching the schema. No prose wrapping, no preamble, no probabilistic adherence. The 15% leak collapses to 0% because the SDK rejects anything that does not match the schema. Tagged to AP-INV-01.

### Q2. You notice line-item totals do not match the invoice header total. The model extracted all items correctly but the math is wrong. How do you prevent this from shipping bad data downstream?

Add a validation-retry loop. After parsing, compute sum(line_items[].total) in code. If it differs from total_amount by more than 0.01, send the specific error back to the model in a tool_result with is_error: true ('line items sum to 4950 but header total says 5000'); retry up to 3 times. About 95% of records converge on the second attempt because the model now sees what was wrong. Pair with currency enum and date sanity checks. Tagged to AP-INV-02.

### Q3. An invoice arrives with no matching purchase order in your system. The vendor claims the PO was issued verbally last quarter. Should the agent approve based on the vendor reputation?

No. The three-way match service must find an active PO before the agent can approve. No PO means the invoice routes to human review with a structured exception block; the AP analyst either creates a retroactive PO and reprocesses, or rejects the invoice. The agent never bypasses the three-way match; verbal POs are not a valid input to the workflow. Tagged to AP-INV-03.

### Q4. Your system prompt says 'never approve invoices above the vendor authorization cap'. Production logs show ~3% of approvals still exceed the cap. What is the architectural fix?

Move the constraint to a PreToolUse hook on approve_payment. The hook reads tool_input.vendor_id and tool_input.amount, queries the vendor master for vendor_ytd_spend + cap, exits 2 on violation with a structured stderr message including cap_remaining. Deterministic, not probabilistic. Pair with blocklist and duplicate checks in the same hook. Prompts leak; hooks do not. Tagged to AP-INV-04.

### Q5. A vendor re-sends the same invoice 30 days later because they did not see the payment confirmation. How does your agent prevent paying the same invoice twice?

PreToolUse hook on approve_payment queries the audit log for any row with the same (vendor_id, invoice_number) in the last 90 days. On match, exits 2 with the prior approval date and routes to a structured exception block. The check is stateless, auditable, and prevents race conditions when two parallel extractions hit the same invoice within seconds. The 90-day window is a configurable policy. Tagged to AP-INV-05.

## FAQ

### Q1. How do you handle handwritten or scanned invoices with poor image quality?

Vision-capable extraction handles most cases. For edge invoices (rotated scans, faded ink, handwritten amendments), the validation-retry loop catches arithmetic mismatches and the three-way match catches structural issues. Records that fail after 3 retries route to human review with the original image attached. Stratified accuracy reporting by document-type quickly surfaces vendors whose invoices need a layout-aware preprocessing step.

### Q2. What happens if a vendor has multiple naming variations (Apple Inc, APPLE, Apple, Inc.)?

The vendor master holds the canonical vendor_id and a list of name variations. The extraction schema requires the model to extract the vendor as text; a normalization step (lowercase, strip punctuation, fuzzy match against the vendor master) resolves it to a vendor_id. The duplicate-detection hook keys on vendor_id, not the raw name, so naming variation does not break uniqueness.

### Q3. Can the agent process multi-currency invoices in one workflow?

Yes. The schema enforces currency as an ISO 4217 enum. The cap policy and duplicate detection key on vendor_id and amount in the invoice currency; the cap can be denominated per-vendor in the vendor master. For consolidated reporting, a daily FX-rate table converts to a base currency at audit-log write time.

### Q4. How do you handle credit memos (negative invoices)?

Credit memos use the same schema with total_amount representing the credit (positive number) and a separate document_type enum field that distinguishes invoice from credit_memo. The PreToolUse hook treats credit memos as vendor_ytd_spend - amount (effectively decreasing YTD spend). Three-way match runs against the original invoice and the credit-memo reason code instead of a PO and GRN.

### Q5. Should the agent auto-approve, or always route to human review?

Auto-approve only when all gates pass: schema valid, semantic validation passed, three-way match within thresholds, PreToolUse hook approved (cap, blocklist, duplicate). Any failure routes to human review with a structured exception block. Auto-approval rate at steady state is typically 75-85%; the remaining 15-25% needs an analyst's eye. The point of the agent is not to remove the analyst; it is to make the analyst's queue much smaller and every queued invoice well-explained.

### Q6. How long do you retain the audit log?

At least 7 years for financial-record compliance (US SOX, EU equivalent). Append-only schema; immutable rows; indexed by vendor_id, invoice_number, and date. Replay tool reconstructs any approval decision in seconds when finance asks 'why did we approve this in May?' three months later.

### Q7. Is Batch API worth using for fewer than 1000 invoices a night?

Sometimes. Batch API gives a flat 50% discount but with a 24-hour SLA. For under 500 invoices, the Batch overhead and the latency may not be worth it; sync extraction is cheaper end-to-end when AP needs same-day processing. For nightly backfills of historical invoices or large vendor consolidations (more than 1000 documents), Batch API earns its keep.

## Production readiness

- [ ] JSON schema versioned in source control; PR-reviewed before deploy
- [ ] Vendor master kept current; cap and blocklist updates flow through change control
- [ ] Validation-retry loop unit-tested for line-total mismatch, currency drift, date inversion
- [ ] Three-way match service tested against synthetic PO + GRN cases including 1.9% and 2.1% variance edge cases
- [ ] PreToolUse hook unit-tested for cap exceeded, blocklisted vendor, duplicate within 90 days, all three pass
- [ ] PostToolUse audit log retention confirmed at 7 years; index on (vendor_id, invoice_number, date)
- [ ] Schema cache hit rate monitored; alert if drops below 50%
- [ ] Stratified accuracy dashboard by vendor and document type; alert on any vendor below 90% pass rate
- [ ] Human-review queue with SLA documented and on-call for invoices held more than 48 hours
- [ ] Batch API job for nightly backfill with auto-resubmit on transient failures

---

**Source:** https://claudearchitectcertification.com/scenarios/invoice-processing-agent
**Vault sources:** P3.6 structured-data-extraction (canonical extraction patterns); P3.7 agentic-tool-design (PreToolUse hook lifecycle); P3.8 long-document-processing (multi-page invoice handling); P3.5 claude-code-for-cicd (Batch API for bulk runs); concepts/vision-multimodal (PDF and image input); concepts/structured-outputs (forced tool_choice patterns); concepts/hooks (PreToolUse + PostToolUse policy enforcement)
**Last reviewed:** 2026-05-05

**Evidence tiers**, 🟢 official Anthropic doc · 🟡 partial doc / inferred · 🟠 community-derived · 🔴 disputed.
