Pillar 9 · Blog · 2026-05-02

Claude's Marketplace Agent: The Project Deal Experiment

Anthropic's Project Deal had Claude agents negotiate real transactions with 69 employees and a $100 budget. Opus 4.5 closed deals at 78%; Haiku 4.5 at 52%. But user satisfaction barely moved (4.8 vs 4.6) — the perception gap trap. The smaller model felt fine while quietly overpaying and underselling.

D1D5marketplaceevaluationopus
Painterly walnut market-stall scene: a Loop behind a Project Deal storefront, brass coins and a wax-sealed contract on the counter.

What just happened

On April 27, Anthropic published results from Project Deal — an internal experiment where Claude agents transacted with 69 employees over a $100-per-participant budget. The headline is 'Claude bought and sold stuff.' The useful finding is the gap between objective profitability and user satisfaction, and what it implies for model selection in any agentic workflow with stakes.

What the numbers actually say

  1. Close-rate gap was huge. Opus 4.5 closed 78% of deals; Haiku 4.5 closed 52%. A 26-point gap on the most basic agent KPI.
  2. Sell-side discipline gap was also huge. Opus sold at 94% of asking price; Haiku at 81%. On a $100 item that's $13 walking out the door per transaction.
  3. Buy-side discipline gap was the same shape. Opus came in 12% under budget on purchases; Haiku 2%. The smaller model accepts the first counter-offer.
  4. Negotiation depth doubled. Opus stayed in for 4.2 turns vs Haiku's 2.1. Persistence converts to revenue; the chat 'feels long' but it's exactly where the value comes from.
  5. User satisfaction barely moved. 4.8 vs 4.6 on a 5-point scale. That is the perception gap: stakeholders evaluating on user feel will pick the wrong model and quietly bleed margin.

The exam-relevant takeaway

The CCA-F's recurring 'Model vs Design' distractor pattern shows up here in mirror form: a polite-feeling smaller model can mask a quietly broken outcome. The right answer in production (and on the exam) is to measure the outcome, not the chat, and to route negotiation steps explicitly to the model with the benchmark to back it up.

Three mistakes Project Deal exposes

Treating it as a single-agent task

Marketplace flows are a workflow split: faster agents for discovery and listing; stronger agents for final negotiation and disputes. Single-agent magic-trick framing always loses.

Optimizing on satisfaction surveys

If your eval is 'did the chat feel good', you'll pick Haiku and ship. If your eval is 'closed at price within X% of ask', you'll pick Opus for the closing turn.

Skipping the benchmark audit before approval

Project Deal published the gap publicly; most internal demos won't. Add a benchmark step (close rate, ask-vs-sell, turns-to-close) before any agentic workflow rolls past pilot.

Sources

06 · Read next in the pillars

Where this lands in the exam-prep map

Each blog post bridges into the evergreen pillars. These are the most relevant follow-ups for this story.

07 · FAQ

3 questions answered

What was Project Deal?
An internal Anthropic experiment where Claude agents transacted with 69 employees on an internal marketplace, with a $100 budget per participant. Opus 4.5 and Haiku 4.5 were compared head-to-head on close rate, sell price as % of ask, buy price as % of budget, and negotiation turns.
What does this teach about model selection on the exam?
Don't trust the chat-feel signal. The CCA-F's D1 and D5 questions repeatedly probe whether candidates can pick the model based on objective outcomes (benchmarks, evals) vs subjective UX. Project Deal is a textbook case of those two diverging.
Is this a 'Model vs Design' question or something different?
It's a sibling pattern. Model vs Design says 'if behavior is wrong, fix design before escalating model'. Project Deal says 'if the smaller model feels right, that doesn't mean it is right — measure the outcome'. Both share the same root cause: trusting feel over evidence.

Synthesized from research output on 2026-05-02. LinkedIn cross-post pending.
Last reviewed 2026-05-06.

Blog post · D1 · Pillar 9 · Blog

Claude's Marketplace Agent: The Project Deal Experiment, complete.

You've covered the full ten-section breakdown for this primitive, definition, mechanics, code, false positives, comparison, decision tree, exam patterns, and FAQ. One technical primitive down on the path to CCA-F.

Share your win →