Blog · 2026-06-03· 4 min read

When Should You Use Opus vs. Sonnet vs. Haiku (CCA-F D5)?

Route work by task tier, do not default to the biggest model. Haiku for high-volume low-stakes, Sonnet for daily work, Opus for accuracy-critical multi-step tasks. Using one model for everything is not simplicity, it is a convenience tax. Matching model to task under context and reliability limits is a CCA-F D5 skill.

D5model-routingtask-tier-routingcost-optimization
Loop the orange ACP mascot as a postmaster at a three-chute routing wall sorting parcels into Haiku, Sonnet, and Opus chutes by task, illustrating task-tier model routing.

Quick answer

Route work by task tier; do not default to the biggest model. Haiku for high-volume low-stakes, Sonnet for daily work, Opus for accuracy-critical multi-step tasks. Using one model for everything is not simplicity, it is a convenience tax. For CCA-F D5, the skill is matching model to task under real context and reliability limits.

What changed

The reflex of the last cycle was simple: pick the most capable model and use it for everything. That reads as caution. It is actually waste, because most work does not need the flagship and most budgets cannot afford running it everywhere.

The shift now is tiered specialization: a clear hierarchy of models chosen per task by cost, speed, and the reliability the task demands (🟢 first-hand: Claude ships in distinct tiers, Haiku, Sonnet, and Opus, built for different operating points).

The mental model:

  • Haiku is the throughput layer. Real-time classification, log analysis, routing, low-stakes triage.
  • Sonnet is the daily driver. Standard professional work, content, routine coding.
  • Opus is the final stretch. Accuracy-critical, multi-step work where one error breaks the system.

One model for everything vs. task-tier routing

DimensionBiggest model everywhereTask-Tier Routing
CostFlagship price on trivial workCheap tier for volume, flagship for the hard part
LatencySlow on tasks that need speedFast where speed matters
Reliability focusUniform: stakes are invisibleConcentrated on the critical step
Volume workCost-prohibitive at scaleViable on the throughput tier
Strategy"Always pick the biggest"Smallest model that meets the bar
Right settingA one-off where cost is irrelevantAnything at scale

How tier routing actually works

Routing is two decisions, made per unit of work.

  • Tier (which model). Match the task's accuracy and reliability bar to the smallest model that clears it. Do not reach for the flagship by default.
  • Escalation (when to climb). Let a cheap tier handle the common case and escalate only the hard or high-stakes items. A small model can even triage and route to a larger one.

Worked example - "process a queue of support tickets."

  1. Triage on the cheap tier: classify and route every ticket on the throughput model.
  2. Handle the routine on the mid tier: standard replies and lookups go to the daily driver.
  3. Escalate the hard cases to the strong tier: ambiguous or high-stakes tickets, where an error is expensive, get the flagship.
  4. Verify the escalations: the strong tier is where reliability matters most, so check its output, not the triage.

That is Task-Tier Routing: cheap by default, strong where it counts.

A name for the trap: the Convenience Tax

The Convenience Tax - the premium you pay for routing all work to one big model because it is easier than choosing. It shows up as a larger bill, slower responses, and reliability effort spread thin across work that did not need it. You stop paying the Convenience Tax by routing per task tier and escalating selectively, not by buying a bigger default.

Why it matters for CCA-F

This sits in D5 - Context Management & Reliability, which is 15% of the exam and connects to context window, prompt caching, and evaluation.

The proprietary read: D5 questions reward right-sizing under cost and reliability limits, not maximizing capability.

  • Old instinct: the biggest model is the safe choice.
  • D5 instinct: the smallest model that meets the bar, with the flagship reserved for the reliability-critical step.

The distractor pattern to memorize. On D5 scenarios about runaway cost, latency, or reliability, the trap answer is "upgrade to the biggest model." The architecturally correct move is one of:

  1. Route by task tier (smallest sufficient model per unit of work), or
  2. Escalate selectively (cheap tier triages, strong tier finishes), or
  3. Reserve the flagship for the critical step (and verify there, not everywhere).

See developer productivity agent for a mixed-tier workflow in practice.

How to apply it

  1. Stop defaulting to the flagship. Start every task from the smallest tier that could plausibly meet the bar.
  2. Triage cheap, finish strong. Let a throughput model route, escalate only the hard cases.
  3. Tie tier to stakes. Reserve the strongest model for work where one error breaks the system.
  4. Evaluate before you trust a downgrade. Prove the cheaper tier meets the bar on the real task.
  5. Pair routing with caching. Cheap tier plus caching is where the cost savings compound.
  6. Measure the Convenience Tax. If everything runs on the flagship, that uniform bill is the tax.

The meta-skill, and the D5 exam skill, is the same: capability per dollar comes from routing work to the right tier, not from a bigger default.

01 · Read next in the pillars

Where this lands in the exam-prep map

Each blog post bridges into the evergreen pillars. These are the most relevant follow-ups for this story.

02 · FAQ

6 questions answered

When should you use Opus vs Sonnet vs Haiku?
Haiku for high-volume, low-stakes work like classification, routing, and triage. Sonnet for daily professional work and standard coding. Opus for accuracy-critical, multi-step tasks where one mistake breaks the system. The rule is task tier, not 'always pick the biggest.'
Why is using the biggest model for everything a mistake?
It pays flagship cost and latency for work a smaller tier handles fine. That is a convenience tax, not a strategy. Worse, it hides where reliability actually matters, because everything gets the same treatment regardless of stakes.
What is Task-Tier Routing?
Matching each unit of work to the smallest model that meets its accuracy and reliability bar: a cheap tier for volume, a mid tier for daily work, the strongest tier for the final accuracy-critical stretch. Often a small model triages and escalates only the hard cases.
How does context management affect tier choice?
Long-context, reliability-sensitive jobs (where losing a detail mid-document is costly) are exactly where the strongest tier earns its price. Short, low-stakes work does not need it. Tier choice is partly a context-and-reliability decision, which is why it sits in D5.
How do you know a cheaper tier is good enough?
Evaluate it on the actual task, do not assume. Route to the smaller tier, measure accuracy and failure cost against your bar, and escalate only what fails. Routing without evaluation is guessing dressed up as optimization.
How does this show up on the CCA-F exam (D5)?
D5 (Context Management & Reliability) is 15% of the exam. Expect scenarios about runaway cost or latency where the trap answer is 'upgrade to the biggest model.' The correct answer is to route by task tier, escalate selectively, and reserve the strongest model for the reliability-critical step.

Synthesized from research output on 2026-06-03. LinkedIn cross-post pending.
Last reviewed 2026-06-03.

Blog post · D5 · Blog

When Should You Use Opus vs. Sonnet vs. Haiku (CCA-F D5)?, complete.

You've covered the full ten-section breakdown for this primitive, definition, mechanics, code, false positives, comparison, decision tree, exam patterns, and FAQ. One technical primitive down on the path to CCA-F.

More platforms →