Claude's Marketplace Agent: The Project Deal Experiment

What just happened

On April 27, Anthropic published results from Project Deal — an internal experiment where Claude agents transacted with 69 employees over a $100-per-participant budget. The headline is 'Claude bought and sold stuff.' The useful finding is the gap between objective profitability and user satisfaction, and what it implies for model selection in any agentic workflow with stakes.

What the numbers actually say

Close-rate gap was huge. Opus 4.5 closed 78% of deals; Haiku 4.5 closed 52%. A 26-point gap on the most basic agent KPI.
Sell-side discipline gap was also huge. Opus sold at 94% of asking price; Haiku at 81%. On a $100 item that's $13 walking out the door per transaction.
Buy-side discipline gap was the same shape. Opus came in 12% under budget on purchases; Haiku 2%. The smaller model accepts the first counter-offer.
Negotiation depth doubled. Opus stayed in for 4.2 turns vs Haiku's 2.1. Persistence converts to revenue; the chat 'feels long' but it's exactly where the value comes from.
User satisfaction barely moved. 4.8 vs 4.6 on a 5-point scale. That is the perception gap: stakeholders evaluating on user feel will pick the wrong model and quietly bleed margin.

The exam-relevant takeaway

The CCA-F's recurring 'Model vs Design' distractor pattern shows up here in mirror form: a polite-feeling smaller model can mask a quietly broken outcome. The right answer in production (and on the exam) is to measure the outcome, not the chat, and to route negotiation steps explicitly to the model with the benchmark to back it up.

Three mistakes Project Deal exposes

Treating it as a single-agent task

Marketplace flows are a workflow split: faster agents for discovery and listing; stronger agents for final negotiation and disputes. Single-agent magic-trick framing always loses.

Optimizing on satisfaction surveys

If your eval is 'did the chat feel good', you'll pick Haiku and ship. If your eval is 'closed at price within X% of ask', you'll pick Opus for the closing turn.

Skipping the benchmark audit before approval

Project Deal published the gap publicly; most internal demos won't. Add a benchmark step (close rate, ask-vs-sell, turns-to-close) before any agentic workflow rolls past pilot.

Sources

Anthropic — Project Deal report (April 27, 2026) ↗

06 · Read next in the pillars

Where this lands in the exam-prep map

Each blog post bridges into the evergreen pillars. These are the most relevant follow-ups for this story.

Concept

Evaluation

Project Deal is the canonical example of why benchmarks beat satisfaction surveys.

Open ↗

Scenario

Customer support resolution

The negotiation handoff loop is structurally identical to support escalation.

Open ↗

Exam Guide

Day-of distractor patterns

Perception-gap is a Model-vs-Design distractor in disguise.

Open ↗

07 · FAQ

3 questions answered

What was Project Deal?

An internal Anthropic experiment where Claude agents transacted with 69 employees on an internal marketplace, with a $100 budget per participant. Opus 4.5 and Haiku 4.5 were compared head-to-head on close rate, sell price as % of ask, buy price as % of budget, and negotiation turns.

What does this teach about model selection on the exam?

Don't trust the chat-feel signal. The CCA-F's D1 and D5 questions repeatedly probe whether candidates can pick the model based on objective outcomes (benchmarks, evals) vs subjective UX. Project Deal is a textbook case of those two diverging.

Is this a 'Model vs Design' question or something different?

It's a sibling pattern. Model vs Design says 'if behavior is wrong, fix design before escalating model'. Project Deal says 'if the smaller model feels right, that doesn't mean it is right — measure the outcome'. Both share the same root cause: trusting feel over evidence.

Synthesized from research output on 2026-05-02. LinkedIn cross-post pending.
Last reviewed 2026-05-06.

Claude's Marketplace Agent: The Project Deal Experiment

What just happened

What the numbers actually say

The exam-relevant takeaway

Three mistakes Project Deal exposes

Treating it as a single-agent task

Optimizing on satisfaction surveys

Skipping the benchmark audit before approval

Sources

Where this lands in the exam-prep map

Evaluation

Customer support resolution

Day-of distractor patterns

3 questions answered

Claude's Marketplace Agent: The Project Deal Experiment, complete.

Share this primitive