What just happened
On April 27, Anthropic published results from Project Deal — an internal experiment where Claude agents transacted with 69 employees over a $100-per-participant budget. The headline is 'Claude bought and sold stuff.' The useful finding is the gap between objective profitability and user satisfaction, and what it implies for model selection in any agentic workflow with stakes.
What the numbers actually say
- Close-rate gap was huge. Opus 4.5 closed 78% of deals; Haiku 4.5 closed 52%. A 26-point gap on the most basic agent KPI.
- Sell-side discipline gap was also huge. Opus sold at 94% of asking price; Haiku at 81%. On a $100 item that's $13 walking out the door per transaction.
- Buy-side discipline gap was the same shape. Opus came in 12% under budget on purchases; Haiku 2%. The smaller model accepts the first counter-offer.
- Negotiation depth doubled. Opus stayed in for 4.2 turns vs Haiku's 2.1. Persistence converts to revenue; the chat 'feels long' but it's exactly where the value comes from.
- User satisfaction barely moved. 4.8 vs 4.6 on a 5-point scale. That is the perception gap: stakeholders evaluating on user feel will pick the wrong model and quietly bleed margin.
The exam-relevant takeaway
The CCA-F's recurring 'Model vs Design' distractor pattern shows up here in mirror form: a polite-feeling smaller model can mask a quietly broken outcome. The right answer in production (and on the exam) is to measure the outcome, not the chat, and to route negotiation steps explicitly to the model with the benchmark to back it up.
Three mistakes Project Deal exposes
Treating it as a single-agent task
Marketplace flows are a workflow split: faster agents for discovery and listing; stronger agents for final negotiation and disputes. Single-agent magic-trick framing always loses.
Optimizing on satisfaction surveys
If your eval is 'did the chat feel good', you'll pick Haiku and ship. If your eval is 'closed at price within X% of ask', you'll pick Opus for the closing turn.
Skipping the benchmark audit before approval
Project Deal published the gap publicly; most internal demos won't. Add a benchmark step (close rate, ask-vs-sell, turns-to-close) before any agentic workflow rolls past pilot.
Sources
Where this lands in the exam-prep map
Each blog post bridges into the evergreen pillars. These are the most relevant follow-ups for this story.
Concept
Evaluation
Project Deal is the canonical example of why benchmarks beat satisfaction surveys.
Open ↗Scenario
Customer support resolution
The negotiation handoff loop is structurally identical to support escalation.
Open ↗Exam Guide
Day-of distractor patterns
Perception-gap is a Model-vs-Design distractor in disguise.
Open ↗3 questions answered
What was Project Deal?
What does this teach about model selection on the exam?
Is this a 'Model vs Design' question or something different?
Synthesized from research output on 2026-05-02. LinkedIn cross-post pending.
Last reviewed 2026-05-06.
