Quick answer
Route work by task tier; do not default to the biggest model. Haiku for high-volume low-stakes, Sonnet for daily work, Opus for accuracy-critical multi-step tasks. Using one model for everything is not simplicity, it is a convenience tax. For CCA-F D5, the skill is matching model to task under real context and reliability limits.
What changed
The reflex of the last cycle was simple: pick the most capable model and use it for everything. That reads as caution. It is actually waste, because most work does not need the flagship and most budgets cannot afford running it everywhere.
The shift now is tiered specialization: a clear hierarchy of models chosen per task by cost, speed, and the reliability the task demands (🟢 first-hand: Claude ships in distinct tiers, Haiku, Sonnet, and Opus, built for different operating points).
The mental model:
- Haiku is the throughput layer. Real-time classification, log analysis, routing, low-stakes triage.
- Sonnet is the daily driver. Standard professional work, content, routine coding.
- Opus is the final stretch. Accuracy-critical, multi-step work where one error breaks the system.
One model for everything vs. task-tier routing
| Dimension | Biggest model everywhere | Task-Tier Routing |
|---|---|---|
| Cost | Flagship price on trivial work | Cheap tier for volume, flagship for the hard part |
| Latency | Slow on tasks that need speed | Fast where speed matters |
| Reliability focus | Uniform: stakes are invisible | Concentrated on the critical step |
| Volume work | Cost-prohibitive at scale | Viable on the throughput tier |
| Strategy | "Always pick the biggest" | Smallest model that meets the bar |
| Right setting | A one-off where cost is irrelevant | Anything at scale |
How tier routing actually works
Routing is two decisions, made per unit of work.
- Tier (which model). Match the task's accuracy and reliability bar to the smallest model that clears it. Do not reach for the flagship by default.
- Escalation (when to climb). Let a cheap tier handle the common case and escalate only the hard or high-stakes items. A small model can even triage and route to a larger one.
Worked example - "process a queue of support tickets."
- Triage on the cheap tier: classify and route every ticket on the throughput model.
- Handle the routine on the mid tier: standard replies and lookups go to the daily driver.
- Escalate the hard cases to the strong tier: ambiguous or high-stakes tickets, where an error is expensive, get the flagship.
- Verify the escalations: the strong tier is where reliability matters most, so check its output, not the triage.
That is Task-Tier Routing: cheap by default, strong where it counts.
A name for the trap: the Convenience Tax
The Convenience Tax - the premium you pay for routing all work to one big model because it is easier than choosing. It shows up as a larger bill, slower responses, and reliability effort spread thin across work that did not need it. You stop paying the Convenience Tax by routing per task tier and escalating selectively, not by buying a bigger default.
Why it matters for CCA-F
This sits in D5 - Context Management & Reliability, which is 15% of the exam and connects to context window, prompt caching, and evaluation.
The proprietary read: D5 questions reward right-sizing under cost and reliability limits, not maximizing capability.
- Old instinct: the biggest model is the safe choice.
- D5 instinct: the smallest model that meets the bar, with the flagship reserved for the reliability-critical step.
The distractor pattern to memorize. On D5 scenarios about runaway cost, latency, or reliability, the trap answer is "upgrade to the biggest model." The architecturally correct move is one of:
- Route by task tier (smallest sufficient model per unit of work), or
- Escalate selectively (cheap tier triages, strong tier finishes), or
- Reserve the flagship for the critical step (and verify there, not everywhere).
See developer productivity agent for a mixed-tier workflow in practice.
How to apply it
- Stop defaulting to the flagship. Start every task from the smallest tier that could plausibly meet the bar.
- Triage cheap, finish strong. Let a throughput model route, escalate only the hard cases.
- Tie tier to stakes. Reserve the strongest model for work where one error breaks the system.
- Evaluate before you trust a downgrade. Prove the cheaper tier meets the bar on the real task.
- Pair routing with caching. Cheap tier plus caching is where the cost savings compound.
- Measure the Convenience Tax. If everything runs on the flagship, that uniform bill is the tax.
The meta-skill, and the D5 exam skill, is the same: capability per dollar comes from routing work to the right tier, not from a bigger default.
Where this lands in the exam-prep map
Each blog post bridges into the evergreen pillars. These are the most relevant follow-ups for this story.
Concept
Context window
Tier choice interacts with context: a long, reliability-sensitive job is where the strongest tier earns its cost. D5 lives here.
Open ↗Concept
Prompt caching
Routing cheap work to a smaller tier pairs with caching to control cost. Both are levers in the same budget decision.
Open ↗Concept
Evaluation
You only know a smaller tier is good enough by evaluating it on the task. Routing without eval is guessing.
Open ↗Scenario
Developer productivity agent
A scenario where mixed task tiers in one workflow make tier routing concrete: triage cheap, finish strong.
Open ↗Exam Guide
CCA-F exam guide
D5 (Context Management & Reliability) is 15% of the exam and rewards right-sizing the model to the task and its reliability bar.
Open ↗6 questions answered
When should you use Opus vs Sonnet vs Haiku?
Why is using the biggest model for everything a mistake?
What is Task-Tier Routing?
How does context management affect tier choice?
How do you know a cheaper tier is good enough?
How does this show up on the CCA-F exam (D5)?
Synthesized from research output on 2026-06-03. LinkedIn cross-post pending.
Last reviewed 2026-06-03.
