When Should You Use Opus vs. Sonnet vs. Haiku (CCA-F D5)?

Quick answer

Route work by task tier; do not default to the biggest model. Haiku for high-volume low-stakes, Sonnet for daily work, Opus for accuracy-critical multi-step tasks. Using one model for everything is not simplicity, it is a convenience tax. For CCA-F D5, the skill is matching model to task under real context and reliability limits.

What changed

The reflex of the last cycle was simple: pick the most capable model and use it for everything. That reads as caution. It is actually waste, because most work does not need the flagship and most budgets cannot afford running it everywhere.

The shift now is tiered specialization: a clear hierarchy of models chosen per task by cost, speed, and the reliability the task demands (🟢 first-hand: Claude ships in distinct tiers, Haiku, Sonnet, and Opus, built for different operating points).

The mental model:

Haiku is the throughput layer. Real-time classification, log analysis, routing, low-stakes triage.
Sonnet is the daily driver. Standard professional work, content, routine coding.
Opus is the final stretch. Accuracy-critical, multi-step work where one error breaks the system.

One model for everything vs. task-tier routing

Dimension	Biggest model everywhere	Task-Tier Routing
Cost	Flagship price on trivial work	Cheap tier for volume, flagship for the hard part
Latency	Slow on tasks that need speed	Fast where speed matters
Reliability focus	Uniform: stakes are invisible	Concentrated on the critical step
Volume work	Cost-prohibitive at scale	Viable on the throughput tier
Strategy	"Always pick the biggest"	Smallest model that meets the bar
Right setting	A one-off where cost is irrelevant	Anything at scale

How tier routing actually works

Routing is two decisions, made per unit of work.

Tier (which model). Match the task's accuracy and reliability bar to the smallest model that clears it. Do not reach for the flagship by default.
Escalation (when to climb). Let a cheap tier handle the common case and escalate only the hard or high-stakes items. A small model can even triage and route to a larger one.

Worked example - "process a queue of support tickets."

Triage on the cheap tier: classify and route every ticket on the throughput model.
Handle the routine on the mid tier: standard replies and lookups go to the daily driver.
Escalate the hard cases to the strong tier: ambiguous or high-stakes tickets, where an error is expensive, get the flagship.
Verify the escalations: the strong tier is where reliability matters most, so check its output, not the triage.

That is Task-Tier Routing: cheap by default, strong where it counts.

A name for the trap: the Convenience Tax

The Convenience Tax - the premium you pay for routing all work to one big model because it is easier than choosing. It shows up as a larger bill, slower responses, and reliability effort spread thin across work that did not need it. You stop paying the Convenience Tax by routing per task tier and escalating selectively, not by buying a bigger default.

Why it matters for CCA-F

This sits in D5 - Context Management & Reliability, which is 15% of the exam and connects to context window, prompt caching, and evaluation.

The proprietary read: D5 questions reward right-sizing under cost and reliability limits, not maximizing capability.

Old instinct: the biggest model is the safe choice.
D5 instinct: the smallest model that meets the bar, with the flagship reserved for the reliability-critical step.

The distractor pattern to memorize. On D5 scenarios about runaway cost, latency, or reliability, the trap answer is "upgrade to the biggest model." The architecturally correct move is one of:

Route by task tier (smallest sufficient model per unit of work), or
Escalate selectively (cheap tier triages, strong tier finishes), or
Reserve the flagship for the critical step (and verify there, not everywhere).

See developer productivity agent for a mixed-tier workflow in practice.

How to apply it

Stop defaulting to the flagship. Start every task from the smallest tier that could plausibly meet the bar.
Triage cheap, finish strong. Let a throughput model route, escalate only the hard cases.
Tie tier to stakes. Reserve the strongest model for work where one error breaks the system.
Evaluate before you trust a downgrade. Prove the cheaper tier meets the bar on the real task.
Pair routing with caching. Cheap tier plus caching is where the cost savings compound.
Measure the Convenience Tax. If everything runs on the flagship, that uniform bill is the tax.

The meta-skill, and the D5 exam skill, is the same: capability per dollar comes from routing work to the right tier, not from a bigger default.

01 · Read next in the pillars

Where this lands in the exam-prep map

Each blog post bridges into the evergreen pillars. These are the most relevant follow-ups for this story.

Concept

Context window

Tier choice interacts with context: a long, reliability-sensitive job is where the strongest tier earns its cost. D5 lives here.

Open ↗

Concept

Prompt caching

Routing cheap work to a smaller tier pairs with caching to control cost. Both are levers in the same budget decision.

Open ↗

Concept

Evaluation

You only know a smaller tier is good enough by evaluating it on the task. Routing without eval is guessing.

Open ↗

Scenario

Developer productivity agent

A scenario where mixed task tiers in one workflow make tier routing concrete: triage cheap, finish strong.

Open ↗

Exam Guide

CCA-F exam guide

D5 (Context Management & Reliability) is 15% of the exam and rewards right-sizing the model to the task and its reliability bar.

Open ↗

02 · FAQ

6 questions answered

When should you use Opus vs Sonnet vs Haiku?

Haiku for high-volume, low-stakes work like classification, routing, and triage. Sonnet for daily professional work and standard coding. Opus for accuracy-critical, multi-step tasks where one mistake breaks the system. The rule is task tier, not 'always pick the biggest.'

Why is using the biggest model for everything a mistake?

It pays flagship cost and latency for work a smaller tier handles fine. That is a convenience tax, not a strategy. Worse, it hides where reliability actually matters, because everything gets the same treatment regardless of stakes.

What is Task-Tier Routing?

Matching each unit of work to the smallest model that meets its accuracy and reliability bar: a cheap tier for volume, a mid tier for daily work, the strongest tier for the final accuracy-critical stretch. Often a small model triages and escalates only the hard cases.

How does context management affect tier choice?

Long-context, reliability-sensitive jobs (where losing a detail mid-document is costly) are exactly where the strongest tier earns its price. Short, low-stakes work does not need it. Tier choice is partly a context-and-reliability decision, which is why it sits in D5.

How do you know a cheaper tier is good enough?

Evaluate it on the actual task, do not assume. Route to the smaller tier, measure accuracy and failure cost against your bar, and escalate only what fails. Routing without evaluation is guessing dressed up as optimization.

How does this show up on the CCA-F exam (D5)?

D5 (Context Management & Reliability) is 15% of the exam. Expect scenarios about runaway cost or latency where the trap answer is 'upgrade to the biggest model.' The correct answer is to route by task tier, escalate selectively, and reserve the strongest model for the reliability-critical step.

Synthesized from research output on 2026-06-03. LinkedIn cross-post pending.
Last reviewed 2026-06-03.

When Should You Use Opus vs. Sonnet vs. Haiku (CCA-F D5)?

Quick answer

What changed

One model for everything vs. task-tier routing

How tier routing actually works

A name for the trap: the Convenience Tax

Why it matters for CCA-F

How to apply it

Where this lands in the exam-prep map

Context window

Prompt caching

Evaluation

Developer productivity agent

CCA-F exam guide

6 questions answered

When Should You Use Opus vs. Sonnet vs. Haiku (CCA-F D5)?, complete.

Share this primitive