P3.14 · D2 + D5 · Process

Invoice Processing Agent.

Think of this as the agent that handles your accounts-payable inbox without the team paying the same invoice twice. A vendor emails a PDF or scanned image; the agent extracts the structured fields (vendor, invoice number, line items, total, currency, due date, PO reference) using a strict schema so it cannot make values up; then it checks the math (line totals must equal the header total), looks up the matching purchase order and goods receipt to make sure the three documents agree, asks a deterministic policy hook whether the vendor still has authorization headroom and whether this exact invoice has been seen in the last 90 days, and only then approves payment. Anything ambiguous routes to a human AP analyst with a structured exception block. The whole point is that AP automation is one wrong cap-policy or duplicate-detection check away from a real money loss.

26 min build·5 components·8 concepts

An AP-automation agent that wraps four guarantees around invoice approval. (1) Forced tool_use with a strict JSON schema (vendor_id, invoice_number, line_items[], total_amount, currency ISO 4217, due_date ISO 8601, PO_reference nullable) prevents fabrication. (2) Validation-retry loop confirms sum(line_items) == total, currency in ISO 4217 enum, due_date >= invoice_date. (3) Three-way match reconciles invoice with the purchase order and the goods receipt; variance > 2% routes to human review. (4) PreToolUse hook on approve_payment denies if invoice_amount > vendor_authorization_cap, or if vendor on blocklist, or if (vendor_id, invoice_number) was seen in the last 90 days (duplicate detection). PostToolUse audit log captures every approval and rejection. The most-tested distractor: prompt-only field extraction leaks ~15% on edge invoices; forced tool_choice is the only credible architecture.

33% exam weight
SourceApplied scenario. Community-derived from AP-automation patterns; composes P3.6 + P3.7 + P3.8 in a finance workflow.
What do the colours mean?
Green
Official Anthropic doc or API contract
Yellow
Partial doc / inferred
Orange
Community-derived
Red
Disputed / changes frequently
Stack
Claude SDK. Vision-capable model for PDF / image. PO + GRN systems of record. Durable audit log.
Needs
Forced tool_choice. Validation-retry. PreToolUse hooks. Three-way match.
Exam
33% of CCA-F (D2 + D5). 18% D2 · 15% D5. Highest-weight scenario on the test. Master this one and you've covered most of it.
Loop — the ACP mascot — illustrated as a calm customer-support agent at a walnut desk with headset, notebook, and a small speech-bubble holding an inbound question.
End-to-end flow33% of CCA-F (D2 + D5)
01 · Problem framing

The problem

What the customer needs

  1. Schema-conformant extraction on every invoice: vendor, number, line items, total, currency, due date, PO reference. No prose wrapping; downstream systems must parse cleanly.
  2. Three-way match before approval: invoice, purchase order, goods receipt all agree on amount, vendor, and quantities.
  3. Cap-policy enforcement that cannot be bypassed by clever invoice phrasing: vendor authorization caps, duplicate detection, blocklisted-vendor checks.
  4. Audit-grade trail of every approval and rejection so finance can replay any decision in a quarterly close.

Why naive approaches fail

  1. Prompt 'output JSON' for invoice extraction: ~15% leakage on edge invoices (handwritten notes, mixed languages, credit memos, rotated scans).
  2. Single-pass extraction with no semantic validation: line totals do not match the header; corrupted records ship downstream.
  3. No three-way match: the agent approves an invoice for goods that were never received, or against a PO that does not exist.
  4. Cap policy in the system prompt: ~3% of approvals exceed authorization cap because prompts leak under unusual phrasing.
  5. No duplicate-invoice check: the same invoice number gets paid twice when the vendor re-sends after a delivery confirmation.
Definition of done
  • Forced tool_choice: { type: 'tool', name: 'extract_invoice' } on every extraction call.
  • JSON schema requires vendor_id, invoice_number, line_items[], total_amount, currency (ISO 4217 enum), due_date (ISO 8601), PO_reference (nullable).
  • Validation-retry loop confirms sum(line_items) == total, currency in enum, due_date >= invoice_date.
  • Three-way match service reconciles invoice + PO + GRN; variance > 2% routes to human review.
  • PreToolUse hook on approve_payment: deny on cap exceeded, vendor blocklisted, or duplicate (vendor_id, invoice_number) in the last 90 days.
  • PostToolUse audit log writes every approval / rejection / hook decision.
02 · Architecture

The system

03 · Component detail

What each part does

5 components, each owns a concept. Click any card to drill into the underlying primitive.

Invoice JSON Schema

the contract, in tools[0].input_schema

The output shape lives inside a tool definition, not as freeform text. Required: vendor_id, invoice_number, line_items[], total_amount, currency (ISO 4217 enum), due_date (ISO 8601 string). Optional and nullable: PO_reference, tax_amount, notes. Every numeric field has a minimum: 0. Every line item has description, quantity, unit_price, total.

Configuration

tools = [{ name: 'extract_invoice', input_schema: { type: 'object', properties: { vendor_id: {type: 'string'}, invoice_number: {type: 'string'}, total_amount: {type: 'number', minimum: 0}, currency: {type: 'string', enum: ['USD', 'EUR', 'GBP', 'INR', 'JPY', 'unclear']}, due_date: {type: 'string', format: 'date'}, line_items: {type: 'array', items: {...}}, PO_reference: {type: ['string', 'null']} }, required: ['vendor_id', 'invoice_number', 'total_amount', 'currency', 'due_date', 'line_items'] } }]

Concept: structured-outputs

Forced tool_use Extractor

tool_choice: { type: 'tool', name: 'extract_invoice' }

Forces the model to fire extract_invoice with arguments matching the schema. No prose preamble, no probabilistic adherence. Vision-capable invocation reads the PDF or image; the model emits a structured tool_use. Pair with few-shot examples that show currency: 'unclear' on truly ambiguous source.

Configuration

tool_choice: { type: 'tool', name: 'extract_invoice' }. Use auto only on triage-style flows. Forced is for mandatory extraction.

Concept: tool-choice

Validation-Retry Loop

sum check, currency enum, date sanity

Schema enforces shape. Code enforces meaning. After parse: sum(line_items[].total) == total_amount (within 0.01 cent tolerance for FX rounding); currency in the enum; due_date format YYYY-MM-DD; due_date >= invoice_date. On failure, feed the specific error back to the model ('line totals sum to 4950 but header total is 5000'); typical convergence in 1-2 retries.

Configuration

loop: extract -> parse -> validate_semantically -> on failure, append { role: 'user', content: tool_result with is_error: true and a specific error } -> retry. Max retries: 3. After 3, route to human review.

Concept: evaluation

Three-Way Match Service

invoice + PO + goods receipt

Queries the PO master and the goods-receipt ledger by PO_reference. Compares amount (variance <= 2% OK for FX rounding and small price changes), vendor identity (normalized vendor name fuzzy match), line-item count (must match), and date sanity (invoice date >= PO date; receipt date >= PO date). Variance above thresholds returns a structured exception; invoice is held pending human review.

Configuration

match(invoice, po, grn) -> { match: bool, variance_pct, mismatched_fields[], routed_to: 'auto-approve' | 'human-review' }. Threshold: amount variance > 2% -> human-review. Vendor mismatch -> human-review. Line-item count mismatch -> human-review.

Concept: evaluation

PreToolUse Cap and Duplicate Hook

deterministic policy gate before approve_payment

Sits between the model's tool_use for approve_payment and actual execution. Reads tool_input.vendor_id, tool_input.amount, tool_input.invoice_number. Three checks. (1) Cap: vendor_ytd_spend + amount <= vendor_authorization_cap. (2) Blocklist: vendor not in the active blocklist. (3) Duplicate: no row in the audit log with the same (vendor_id, invoice_number) in the last 90 days. Any check fails and the hook exits 2 with a structured stderr message; the agent observes the deny as tool_result is_error: true and routes to a structured exception block for the AP analyst.

Configuration

matcher: 'approve_payment'. Hook exits 2 with stderr { reason: 'cap_exceeded' | 'vendor_blocklisted' | 'duplicate_detected', detail: ..., recommended_action: ... }. SDK forwards stderr to the model as a tool_result with is_error: true.

Concept: hooks
04 · One concrete run

Data flow

05 · Build it

Eight steps to production

01

Author the invoice JSON schema as a tool definition

Define the output shape in tools[0].input_schema. Every required field listed in required[]. Currency is an enum that includes an 'unclear' escape hatch. PO_reference is ['string', 'null'] because cash invoices and credit memos have no PO. Every numeric field has minimum: 0. Line items are an array with description, quantity, unit_price, total. The schema is the contract; everything downstream depends on it being right.

Author the invoice JSON schema as a tool definition
from anthropic import Anthropic
client = Anthropic()

EXTRACT_INVOICE_TOOL = {
    "name": "extract_invoice",
    "description": "Extract a structured invoice record from a PDF or image.",
    "input_schema": {
        "type": "object",
        "properties": {
            "vendor_id": {"type": "string"},
            "invoice_number": {"type": "string"},
            "invoice_date": {"type": "string", "format": "date"},
            "due_date": {"type": "string", "format": "date"},
            "currency": {
                "type": "string",
                "enum": ["USD", "EUR", "GBP", "INR", "JPY", "unclear"],
            },
            "total_amount": {"type": "number", "minimum": 0},
            "tax_amount": {"type": ["number", "null"], "minimum": 0},
            "PO_reference": {"type": ["string", "null"]},
            "line_items": {
                "type": "array",
                "minItems": 1,
                "items": {
                    "type": "object",
                    "properties": {
                        "description": {"type": "string"},
                        "quantity": {"type": "number", "minimum": 0},
                        "unit_price": {"type": "number", "minimum": 0},
                        "total": {"type": "number", "minimum": 0},
                    },
                    "required": ["description", "quantity", "unit_price", "total"],
                },
            },
        },
        "required": [
            "vendor_id", "invoice_number", "invoice_date", "due_date",
            "currency", "total_amount", "line_items",
        ],
    },
}
↪ Concept: structured-outputs
02

Force tool_choice and run extraction with vision input

Set tool_choice: { type: 'tool', name: 'extract_invoice' } so the model has no choice but to fire the tool with arguments matching the schema. Pass the invoice as a vision input (PDF page rasterized to image, or direct image upload). The model emits a structured tool_use; the harness extracts tool_use.input as the candidate record.

Force tool_choice and run extraction with vision input
import base64

def extract_invoice(invoice_image_bytes: bytes, mime_type: str = "image/png") -> dict:
    image_b64 = base64.b64encode(invoice_image_bytes).decode("ascii")
    resp = client.messages.create(
        model="claude-sonnet-4.5",
        max_tokens=2048,
        tools=[EXTRACT_INVOICE_TOOL],
        tool_choice={"type": "tool", "name": "extract_invoice"},
        messages=[
            {
                "role": "user",
                "content": [
                    {
                        "type": "image",
                        "source": {"type": "base64", "media_type": mime_type, "data": image_b64},
                    },
                    {"type": "text", "text": "Extract this invoice into the schema."},
                ],
            }
        ],
    )
    for block in resp.content:
        if block.type == "tool_use" and block.name == "extract_invoice":
            return block.input
    raise RuntimeError("forced tool_choice did not yield tool_use")
↪ Concept: tool-choice
03

Wrap extraction in a validation-retry loop

Schema guarantees structure; semantics need code. After parsing, validate: sum(line_items[].total) equals total_amount within 0.01 tolerance; currency in the enum; due_date format and >= invoice_date. On failure, feed the specific error back via tool_result with is_error: true so the model sees what was wrong; retry up to 3 times. Most failures converge in 1-2 retries because the model now knows what the validator rejected.

Wrap extraction in a validation-retry loop
from datetime import date

def validate(record: dict) -> list[str]:
    errors = []
    items_sum = sum(it.get("total", 0) for it in record.get("line_items", []))
    if abs(items_sum - record.get("total_amount", 0)) > 0.01:
        errors.append(
            f"line items sum to {items_sum:.2f} but total_amount is "
            f"{record['total_amount']:.2f}; reconcile"
        )
    if record.get("currency") not in {"USD", "EUR", "GBP", "INR", "JPY", "unclear"}:
        errors.append(f"currency {record.get('currency')!r} not in ISO 4217 enum")
    try:
        inv_date = date.fromisoformat(record.get("invoice_date", ""))
        due_date = date.fromisoformat(record.get("due_date", ""))
        if due_date < inv_date:
            errors.append(
                f"due_date {due_date} is before invoice_date {inv_date}"
            )
    except ValueError as e:
        errors.append(f"date parse failed: {e}")
    return errors

def extract_with_retry(invoice_image_bytes: bytes, max_retries: int = 3) -> dict:
    image_b64 = base64.b64encode(invoice_image_bytes).decode("ascii")
    messages = [{
        "role": "user",
        "content": [
            {"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": image_b64}},
            {"type": "text", "content": "Extract this invoice into the schema."},
        ],
    }]
    for attempt in range(max_retries):
        resp = client.messages.create(
            model="claude-sonnet-4.5",
            max_tokens=2048,
            tools=[EXTRACT_INVOICE_TOOL],
            tool_choice={"type": "tool", "name": "extract_invoice"},
            messages=messages,
        )
        tool_use = next(b for b in resp.content if b.type == "tool_use")
        record = tool_use.input
        errors = validate(record)
        if not errors:
            return record
        messages.append({"role": "assistant", "content": resp.content})
        messages.append({
            "role": "user",
            "content": [{
                "type": "tool_result",
                "tool_use_id": tool_use.id,
                "content": "Validation failed: " + "; ".join(errors) + ". Re-extract.",
                "is_error": True,
            }],
        })
    raise ValueError(f"extraction did not converge in {max_retries} attempts")
↪ Concept: evaluation
04

Run a three-way match against PO and goods receipt

Query the PO master by PO_reference and the goods-receipt ledger by the same key. Compare amount (variance <= 2% OK for FX rounding and minor price changes), vendor identity (normalized fuzzy match on vendor name), and line-item count (must match exactly). Variance above any threshold routes to human review with a structured exception block; otherwise auto-proceed.

Run a three-way match against PO and goods receipt
def three_way_match(invoice: dict, po: dict, grn: dict) -> dict:
    """Reconcile invoice with purchase order and goods receipt note."""
    issues = []
    inv_amount = invoice["total_amount"]
    po_amount = po.get("total_amount", 0)
    if po_amount > 0:
        variance_pct = abs(inv_amount - po_amount) / po_amount * 100
        if variance_pct > 2.0:
            issues.append(
                f"amount variance {variance_pct:.2f}% exceeds 2% threshold"
            )

    if normalize_vendor(invoice["vendor_id"]) != normalize_vendor(po["vendor_id"]):
        issues.append(
            f"vendor mismatch: invoice {invoice['vendor_id']!r} "
            f"vs PO {po['vendor_id']!r}"
        )

    if len(invoice["line_items"]) != len(grn.get("line_items", [])):
        issues.append(
            f"line-item count mismatch: invoice {len(invoice['line_items'])} "
            f"vs GRN {len(grn.get('line_items', []))}"
        )

    return {
        "match": len(issues) == 0,
        "issues": issues,
        "routed_to": "auto-approve" if not issues else "human-review",
    }

def normalize_vendor(name: str) -> str:
    return "".join(ch.lower() for ch in name if ch.isalnum())
↪ Concept: evaluation
05

Wire the PreToolUse cap and duplicate-detection hook

Hook on approve_payment. Three checks. (1) Cap: vendor_ytd_spend + amount <= vendor_authorization_cap. (2) Blocklist: vendor not on the active blocklist. (3) Duplicate: no audit-log row with the same (vendor_id, invoice_number) in the last 90 days. Any check fails and the hook exits 2 with a structured stderr message; the agent observes the deny and routes to an exception block for the AP analyst. Deterministic, no prompt-injection bypass.

Wire the PreToolUse cap and duplicate-detection hook
# .claude/hooks/invoice_approval.py
import sys, json, os, sqlite3
from datetime import date, timedelta

DB = sqlite3.connect(os.environ.get("AUDIT_DB", "audit.sqlite3"))

def vendor_cap_check(vendor_id: str, amount: float) -> str | None:
    row = DB.execute(
        "SELECT cap, ytd_spend FROM vendor_master WHERE vendor_id = ?",
        (vendor_id,),
    ).fetchone()
    if not row:
        return f"vendor {vendor_id!r} not in master; escalate"
    cap, ytd = row
    if ytd + amount > cap:
        remaining = cap - ytd
        return (
            f"vendor cap exceeded: ytd_spend={ytd:.2f} + amount={amount:.2f} > "
            f"cap={cap:.2f}; cap_remaining={remaining:.2f}"
        )
    return None

def blocklist_check(vendor_id: str) -> str | None:
    row = DB.execute(
        "SELECT 1 FROM vendor_blocklist WHERE vendor_id = ?", (vendor_id,)
    ).fetchone()
    if row:
        return f"vendor {vendor_id!r} on active blocklist"
    return None

def duplicate_check(vendor_id: str, invoice_number: str) -> str | None:
    cutoff = (date.today() - timedelta(days=90)).isoformat()
    row = DB.execute(
        "SELECT approved_at FROM audit_log WHERE vendor_id = ? "
        "AND invoice_number = ? AND approved_at >= ? ORDER BY approved_at DESC LIMIT 1",
        (vendor_id, invoice_number, cutoff),
    ).fetchone()
    if row:
        return (
            f"duplicate detected: same (vendor_id, invoice_number) approved on "
            f"{row[0]}; reject this submission"
        )
    return None

def main():
    payload = json.loads(sys.stdin.read())
    if payload["tool_name"] != "approve_payment":
        sys.exit(0)
    inp = payload["tool_input"]
    for check in (
        vendor_cap_check(inp["vendor_id"], inp["amount"]),
        blocklist_check(inp["vendor_id"]),
        duplicate_check(inp["vendor_id"], inp["invoice_number"]),
    ):
        if check:
            print(check, file=sys.stderr)
            sys.exit(2)
    sys.exit(0)

if __name__ == "__main__":
    main()
↪ Concept: hooks
06

Cache the schema and the vendor master

The schema is the largest stable token cost (~1500 tokens for invoice extraction). The vendor master (caps, blocklist, name normalization rules) is also stable per session. Mark both with cache_control: ephemeral so a 5-minute TTL keeps them warm across sustained AP traffic. Realistic savings: ~80% on cached portions, ~50% reduction on overall steady-state cost.

Cache the schema and the vendor master
def extract_with_cache(invoice_image_bytes: bytes, vendor_master_blob: str) -> dict:
    image_b64 = base64.b64encode(invoice_image_bytes).decode("ascii")
    resp = client.messages.create(
        model="claude-sonnet-4.5",
        max_tokens=2048,
        system=[
            {
                "type": "text",
                "text": (
                    "You are an AP-automation extraction agent. Return only "
                    "structured tool_use; never prose."
                ),
                "cache_control": {"type": "ephemeral"},
            },
            {
                "type": "text",
                "text": vendor_master_blob,
                "cache_control": {"type": "ephemeral"},
            },
        ],
        tools=[
            {**EXTRACT_INVOICE_TOOL, "cache_control": {"type": "ephemeral"}},
        ],
        tool_choice={"type": "tool", "name": "extract_invoice"},
        messages=[{
            "role": "user",
            "content": [
                {"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": image_b64}},
                {"type": "text", "text": "Extract this invoice into the schema."},
            ],
        }],
    )
    print(f"cache_creation: {resp.usage.cache_creation_input_tokens}")
    print(f"cache_read:     {resp.usage.cache_read_input_tokens}")
    return next(b.input for b in resp.content if b.type == "tool_use")
↪ Concept: prompt-caching
07

Use Batch API for overnight bulk runs

Sync API for inbox-arrival latency. For nightly backfills (10K invoices), the Batch API gives a flat 50% discount with a 24-hour SLA. Combined with schema and vendor-master caching (per-100-item sub-batches keep ephemeral cache warm), bulk extraction cost drops ~75% versus naive sync. Resubmit failures the next morning as a fresh batch with the specific error in the next message.

Use Batch API for overnight bulk runs
def submit_bulk_extraction(invoices: list[dict]) -> str:
    """Submit a batch of invoice extractions for overnight processing."""
    requests = []
    for inv in invoices:
        image_b64 = base64.b64encode(inv["image_bytes"]).decode("ascii")
        requests.append({
            "custom_id": f"extract-{inv['id']}",
            "params": {
                "model": "claude-sonnet-4.5",
                "max_tokens": 2048,
                "tools": [EXTRACT_INVOICE_TOOL],
                "tool_choice": {"type": "tool", "name": "extract_invoice"},
                "messages": [{
                    "role": "user",
                    "content": [
                        {"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": image_b64}},
                        {"type": "text", "text": "Extract this invoice into the schema."},
                    ],
                }],
            },
        })
    batch = client.messages.batches.create(requests=requests)
    print(f"Batch {batch.id} submitted with {len(requests)} extractions")
    return batch.id

def harvest_batch(batch_id: str):
    batch = client.messages.batches.retrieve(batch_id)
    if batch.processing_status != "ended":
        return {"status": "not_ready"}
    accepted, rejected = [], []
    for r in client.messages.batches.results(batch_id):
        if r.result.type == "succeeded":
            tu = next(b for b in r.result.message.content if b.type == "tool_use")
            if not validate(tu.input):
                accepted.append(tu.input)
                continue
        rejected.append(r.custom_id)
    return {"accepted": accepted, "rejected_for_retry": rejected}
↪ Concept: batch-api
08

Audit-log every approval, rejection, and hook decision

PostToolUse hook on every approve_payment call. Append a row to durable storage: timestamp, vendor_id, invoice_number, amount, currency, three-way-match outcome, hook decisions (cap, blocklist, duplicate), final routing (approved | human-review | denied). Retain at least 7 years for audit compliance. The audit log is the replay tool when finance asks 'why did we approve this in May?' three months later.

Audit-log every approval, rejection, and hook decision
import datetime, json, sqlite3
from pathlib import Path

AUDIT_DB = sqlite3.connect("audit.sqlite3")
AUDIT_DB.execute("""
CREATE TABLE IF NOT EXISTS audit_log (
    ts TEXT PRIMARY KEY,
    vendor_id TEXT,
    invoice_number TEXT,
    amount REAL,
    currency TEXT,
    match_outcome TEXT,
    hook_decisions TEXT,
    final_routing TEXT,
    approved_at TEXT
)
""")

def audit(invoice: dict, match_result: dict, hook_decisions: dict, routing: str):
    AUDIT_DB.execute(
        "INSERT INTO audit_log (ts, vendor_id, invoice_number, amount, currency, "
        "match_outcome, hook_decisions, final_routing, approved_at) "
        "VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)",
        (
            datetime.datetime.utcnow().isoformat() + "Z",
            invoice["vendor_id"],
            invoice["invoice_number"],
            invoice["total_amount"],
            invoice["currency"],
            json.dumps(match_result),
            json.dumps(hook_decisions),
            routing,
            datetime.date.today().isoformat() if routing == "approved" else None,
        ),
    )
    AUDIT_DB.commit()
↪ Concept: evaluation
06 · Configuration decisions

The four decisions

DecisionRight answerWrong answerWhy
Output shape guarantee on extractionForced tool_choice with input_schema as the contractPrompt 'output JSON' or 'respond with valid JSON only'Prompt-only is probabilistic (~85% adherence); ~15% leakage on edge invoices (handwritten notes, mixed languages, credit memos, rotated scans). Forced tool_use is structural (100% adherence). The cost is identical; the reliability gap is decisive in finance.
Vendor authorization cap enforcementPreToolUse hook reads vendor_ytd_spend, exits 2 on violationSystem prompt: 'never approve above the vendor cap'Prompts leak ~3% in production. Hooks are deterministic. For policy-bearing limits (cap, duplicate, blocklist), the deterministic gate is the only credible architecture. Prompt-only enforcement is a finding waiting to be flagged in the next audit.
Same invoice arriving twice (vendor re-sends after delivery)PreToolUse duplicate-detection hook keyed on (vendor_id, invoice_number) over last 90 daysTrust the model to notice duplicates in conversation contextContext memory is unreliable across multi-turn or batch runs. The hook is stateless, queries the audit log, and prevents race conditions when two parallel extractions hit the same invoice within seconds.
Bulk overnight processing of 10K invoicesBatch API + schema and vendor-master cachingSync API in a tight loop or sync API without cachingBatch API gives a flat 50% discount with a 24-hour SLA. Caching adds another ~80% off the schema and vendor-master tokens. Combined: ~75% savings versus naive sync. Sync API is reserved for inbox-arrival latency.
07 · Failure modes

Where it breaks

Five failure pairs. Each one is one exam question. The fix is always architectural, deterministic gates, structured fields, pinned state.

Prompt-only field extraction

Prompt 'extract this invoice as JSON' leaks ~15% on edge invoices. Downstream parser breaks every seventh document; AP analyst spends the morning re-keying invoices the agent botched.

AP-INV-01
✅ Fix

Forced tool_choice: { type: 'tool', name: 'extract_invoice' } plus a strict JSON schema in tools[0].input_schema. The model has no choice but to fire the tool with arguments matching the schema. 100% structural adherence.

No semantic validation

Single-pass extraction with no math check. The model returns a structurally-valid record where line totals sum to 4950 but the header total says 5000. Bad data ships downstream; quarterly close finds the discrepancy three months later.

AP-INV-02
✅ Fix

Validation-retry loop. After parse, validate sum(line_items[].total) == total_amount (within 0.01 tolerance), currency in ISO 4217 enum, due_date >= invoice_date. On failure, feed the specific error back; retry up to 3 times; route to human review if still failing.

No three-way match

Agent approves an invoice that has no matching purchase order, or where the goods receipt was for fewer items, or where the vendor name on the invoice does not match the vendor on the PO. AP pays for goods never received, or pays the wrong vendor.

AP-INV-03
✅ Fix

Three-way match service queries PO master and goods-receipt ledger. Compares amount (variance <= 2% OK), normalized vendor name, line-item count. Variance above thresholds routes to human review with a structured exception block.

Cap policy in the system prompt

System prompt: 'never approve more than the vendor authorization cap'. Production logs show ~3% of approvals exceed the cap because the prompt language leaks under unusual phrasing or when the agent is processing many invoices in one session.

AP-INV-04
✅ Fix

PreToolUse hook on approve_payment reads tool_input.vendor_id and tool_input.amount, queries the vendor master for vendor_ytd_spend + cap, exits 2 on violation with a structured message including cap_remaining. Deterministic, not probabilistic.

No duplicate-invoice check

Vendor re-sends the same invoice number after delivery confirmation, or the same invoice is uploaded twice through different channels (email + portal). The agent approves both. AP discovers the duplicate payment in next month's reconciliation.

AP-INV-05
✅ Fix

PreToolUse hook queries the audit log for any row with the same (vendor_id, invoice_number) in the last 90 days. On match, exits 2 with the prior approval date. Stateless, auditable, prevents race conditions in parallel runs.

08 · Budget

Cost & latency

Per-invoice synchronous extraction (cached schema)
~$0.001 to $0.003

Schema ~1500 tokens at cache-read price plus image vision tokens (~1000-2000) plus ~150 output. Sustained AP traffic with cache hits >= 70% drops effective cost predictably.

Three-way match service
~$0 token cost; ~10-30 ms latency

Pure SQL queries against PO master and goods-receipt ledger. No LLM call. Latency is dominated by the database round-trip.

PreToolUse hook overhead
~$0; ~5 ms latency

Subprocess reads stdin JSON, runs three SQL queries (vendor cap, blocklist, duplicate), exits 0 or 2. No LLM call. Latency below the noise floor of any tool dispatch.

Batch overnight (10K invoices, batch + caching)
~75% off naive sync

Batch API flat 50% discount times schema and vendor-master cache (~80% off cached portion). 10K invoices at typical complexity drop from ~$30 sync uncached to ~$8 batch cached.

Validation-retry overhead
~+25% on records that retry

5-10% of records retry once; 1-2% retry twice. Specific-error feedback converges quickly. Pipeline cost up ~5% to gain ~99% schema-conformance plus ~99% semantic-conformance.

Per-1000-invoices total (steady state)
~$1.00 to $3.00

Sync cached extraction at scale. Adding human review of unconverged records adds operator-time cost but recovers the long tail of edge invoices.

09 · Ship gates

Ship checklist

Two passes. Build-time gates verify the code; run-time gates verify the system in production.

Build-time

  1. Invoice JSON schema lives in tools[0].input_schema with required and nullable fields explicitstructured-outputs
  2. tool_choice forced to extract_invoice on every extraction calltool-choice
  3. Currency field is an enum with an 'unclear' escape hatchstructured-outputs
  4. Validation-retry loop with sum check, currency enum, date sanityevaluation
  5. Three-way match service against PO master and goods-receipt ledgerevaluation
  6. PreToolUse cap-and-duplicate hook on approve_paymenthooks
  7. Schema and vendor master cached with cache_control: ephemeralprompt-caching
  8. Batch API for nightly bulk runs (greater than 100 invoices)batch-api
  9. PostToolUse audit log writes every approval, rejection, and hook decision; 7-year retention
  10. Stratified accuracy reporting by vendor, currency, document type
  11. Human-review queue for invoices that fail validation, three-way match, or hook checks

Run-time

  • JSON schema versioned in source control; PR-reviewed before deploy
  • Vendor master kept current; cap and blocklist updates flow through change control
  • Validation-retry loop unit-tested for line-total mismatch, currency drift, date inversion
  • Three-way match service tested against synthetic PO + GRN cases including 1.9% and 2.1% variance edge cases
  • PreToolUse hook unit-tested for cap exceeded, blocklisted vendor, duplicate within 90 days, all three pass
  • PostToolUse audit log retention confirmed at 7 years; index on (vendor_id, invoice_number, date)
  • Schema cache hit rate monitored; alert if drops below 50%
  • Stratified accuracy dashboard by vendor and document type; alert on any vendor below 90% pass rate
  • Human-review queue with SLA documented and on-call for invoices held more than 48 hours
  • Batch API job for nightly backfill with auto-resubmit on transient failures
10 · Question patterns

Five exam-pattern questions

Your invoice extraction agent uses prompt-only extraction. Production logs show ~15% of records arrive with prose wrapping ('Sure, here is the JSON:') and the downstream parser breaks. What is the architectural fix?
Move the schema into tools[0].input_schema and set tool_choice: { type: 'tool', name: 'extract_invoice' }. This forces the model to emit a structured tool_use call matching the schema. No prose wrapping, no preamble, no probabilistic adherence. The 15% leak collapses to 0% because the SDK rejects anything that does not match the schema. Tagged to AP-INV-01.
You notice line-item totals do not match the invoice header total. The model extracted all items correctly but the math is wrong. How do you prevent this from shipping bad data downstream?
Add a validation-retry loop. After parsing, compute sum(line_items[].total) in code. If it differs from total_amount by more than 0.01, send the specific error back to the model in a tool_result with is_error: true ('line items sum to 4950 but header total says 5000'); retry up to 3 times. About 95% of records converge on the second attempt because the model now sees what was wrong. Pair with currency enum and date sanity checks. Tagged to AP-INV-02.
An invoice arrives with no matching purchase order in your system. The vendor claims the PO was issued verbally last quarter. Should the agent approve based on the vendor reputation?
No. The three-way match service must find an active PO before the agent can approve. No PO means the invoice routes to human review with a structured exception block; the AP analyst either creates a retroactive PO and reprocesses, or rejects the invoice. The agent never bypasses the three-way match; verbal POs are not a valid input to the workflow. Tagged to AP-INV-03.
Your system prompt says 'never approve invoices above the vendor authorization cap'. Production logs show ~3% of approvals still exceed the cap. What is the architectural fix?
Move the constraint to a PreToolUse hook on approve_payment. The hook reads tool_input.vendor_id and tool_input.amount, queries the vendor master for vendor_ytd_spend + cap, exits 2 on violation with a structured stderr message including cap_remaining. Deterministic, not probabilistic. Pair with blocklist and duplicate checks in the same hook. Prompts leak; hooks do not. Tagged to AP-INV-04.
A vendor re-sends the same invoice 30 days later because they did not see the payment confirmation. How does your agent prevent paying the same invoice twice?
PreToolUse hook on approve_payment queries the audit log for any row with the same (vendor_id, invoice_number) in the last 90 days. On match, exits 2 with the prior approval date and routes to a structured exception block. The check is stateless, auditable, and prevents race conditions when two parallel extractions hit the same invoice within seconds. The 90-day window is a configurable policy. Tagged to AP-INV-05.
11 · FAQ

Frequently asked

How do you handle handwritten or scanned invoices with poor image quality?
Vision-capable extraction handles most cases. For edge invoices (rotated scans, faded ink, handwritten amendments), the validation-retry loop catches arithmetic mismatches and the three-way match catches structural issues. Records that fail after 3 retries route to human review with the original image attached. Stratified accuracy reporting by document-type quickly surfaces vendors whose invoices need a layout-aware preprocessing step.
What happens if a vendor has multiple naming variations (Apple Inc, APPLE, Apple, Inc.)?
The vendor master holds the canonical vendor_id and a list of name variations. The extraction schema requires the model to extract the vendor as text; a normalization step (lowercase, strip punctuation, fuzzy match against the vendor master) resolves it to a vendor_id. The duplicate-detection hook keys on vendor_id, not the raw name, so naming variation does not break uniqueness.
Can the agent process multi-currency invoices in one workflow?
Yes. The schema enforces currency as an ISO 4217 enum. The cap policy and duplicate detection key on vendor_id and amount in the invoice currency; the cap can be denominated per-vendor in the vendor master. For consolidated reporting, a daily FX-rate table converts to a base currency at audit-log write time.
How do you handle credit memos (negative invoices)?
Credit memos use the same schema with total_amount representing the credit (positive number) and a separate document_type enum field that distinguishes invoice from credit_memo. The PreToolUse hook treats credit memos as vendor_ytd_spend - amount (effectively decreasing YTD spend). Three-way match runs against the original invoice and the credit-memo reason code instead of a PO and GRN.
Should the agent auto-approve, or always route to human review?
Auto-approve only when all gates pass: schema valid, semantic validation passed, three-way match within thresholds, PreToolUse hook approved (cap, blocklist, duplicate). Any failure routes to human review with a structured exception block. Auto-approval rate at steady state is typically 75-85%; the remaining 15-25% needs an analyst's eye. The point of the agent is not to remove the analyst; it is to make the analyst's queue much smaller and every queued invoice well-explained.
How long do you retain the audit log?
At least 7 years for financial-record compliance (US SOX, EU equivalent). Append-only schema; immutable rows; indexed by vendor_id, invoice_number, and date. Replay tool reconstructs any approval decision in seconds when finance asks 'why did we approve this in May?' three months later.
Is Batch API worth using for fewer than 1000 invoices a night?
Sometimes. Batch API gives a flat 50% discount but with a 24-hour SLA. For under 500 invoices, the Batch overhead and the latency may not be worth it; sync extraction is cheaper end-to-end when AP needs same-day processing. For nightly backfills of historical invoices or large vendor consolidations (more than 1000 documents), Batch API earns its keep.
P3.14 · D2 · Tool Design + Integration

Invoice Processing Agent, complete.

You've covered the full ten-section breakdown for this primitive, definition, mechanics, code, false positives, comparison, decision tree, exam patterns, and FAQ. One technical primitive down on the path to CCA-F.

Share your win →