TL;DR
- Hermes Agent is a coordinator from Nous Research that routes work between Claude Code and Codex under one persistent project memory
- The pattern is Specialist Routing: pick the cheaper specialist per task instead of running everything through one big model
- Claude Opus 4.7 wins SWE-bench Verified (87.6%); Codex GPT-5.5 wins Terminal-Bench (82.7%) - different benchmarks, different specialists
- This is the canonical CCA-F D1 hub-and-spoke pattern, now surfaced in production tooling
- Worth ~3-4 questions on the exam (D1 = 27% of total)
Quick answer
Hermes Agent is an orchestration layer from Nous Research that coordinates multiple specialist coding agents - most commonly Claude Code and Codex - under one persistent context. It is not another coder. It is the coordinator that picks the right specialist per task. The pattern is called Specialist Routing, and it is the production-shaped version of the CCA-F D1 hub-and-spoke pattern.
What just happened
Hermes Agent shipped v0.8 in late April 2026, and the design choice it codifies matters more than the version bump. Hermes is no longer a single-model wrapper. It is a persistent coordinator that maintains long-term project memory and routes tasks to specialist workers.
Two of those workers are Claude Code (the reasoning-heavy "Lead Engineer" powered by Claude Opus 4.7) and Codex (the rapid-scaffolding "Autonomous Worker" powered by GPT-5.5). The pattern Hermes operationalizes is exactly the agent orchestration shape the CCA-F D1 blueprint calls hub-and-spoke with subagent isolation - except now you can see it in production tooling instead of just in exam prose.
That alignment is why this release is exam-relevant, not just news. Production teams reaching for orchestration are reaching for the same primitives the certification tests.
Why this matters now
The AI coding-agent space spent 2025 in a "single agent does everything" model - Cursor, Copilot, Cody, even Claude Code each tried to be the one tool you reach for. Through 2026 that consolidation is reversing. Hermes' release is part of a wave: Aider's MoE routing, Cline's specialist-mode toggles, and now Hermes' explicit Claude-Code-plus-Codex orchestration. The bet is the same one the CCA-F D1 blueprint already tests: one big model is rarely the right answer when several smaller specialists are cheaper and better at narrow tasks.
The "use multiple specialists" framing has a real failure mode, though. Each specialist needs an isolated context, a clear task prompt, and a return contract - and that integration overhead is non-trivial. Hermes makes those problems explicit (Kanban for task isolation, Task Completion Judgment for the return contract, immutable skills for protection from self-improvement loops). A team that just wires Claude Code + Codex without those guardrails will burn more time on coordination than they save on inference. The CCA-F's D1 distractors are calibrated to this exact trap.
For teams building in production, the Hermes pattern argues for a coordinator-first stack rather than a model-first stack. Pick your coordinator (Hermes, Aider's MoE mode, or your own subagent dispatcher) before you pick which models to coordinate. The coordinator is the API your team uses; the workers behind it are interchangeable. Treat the coordinator as the load-bearing piece, version-pin it, and add observability around its task assignments. That's also what the canonical multi-agent-research-system scenario teaches.
The CCA-F has at least four kinds of D1 questions that this pattern lights up directly: hub-and-spoke vs flat agent ensembles (correct: hub-and-spoke), context inheritance in subagents (correct: subagents do not inherit), cost-aware model routing (correct: route cheap work to cheaper specialists), and verification loops at handoffs (correct: structured HITL with rollback below threshold). Knowing Hermes is the worked example for all four lets you map any D1 question to a known production pattern within 60 seconds.
The open questions are around skill marketplaces and protocol drift. Hermes deliberately uses self-generated skills rather than a public marketplace - that's a security-driven choice, but it limits ecosystem velocity. If a marketplace standard emerges (MCP-style, but for agent skills rather than tools), the next generation of orchestrators may bypass Hermes entirely. The competing bet is OpenClaw (marketplace-heavy, faster ecosystem, more CVEs). Whichever wins shapes what D1 looks like on the 2027 version of the exam.
5 things that matter for the exam
-
Put the orchestrator in charge of context, not coding. Hermes owns long-term memory and task hand-offs. The workers do not share context with each other - they share context only with the coordinator. That is the canonical D1 subagent isolation rule, surfaced as a real product feature.
-
Split work by benchmark strength. Claude Opus 4.7 hits 87.6% on SWE-bench Verified - best for refactors, security review, architecture. Codex GPT-5.5 hits 82.7% on Terminal-Bench 2.0 - best for scaffolding, tests, lint, terminal-heavy chores. Different benchmarks, different specialists.
-
Lock task boundaries to prevent context bleed. Hermes ships a Kanban feature that pins a task to one agent. Without explicit task boundaries, both workers touch everything and context leakage tanks reliability. Same lesson as the canonical subagent isolation rule on the exam.
-
Watch the token-drain refactor loop. Claude Code can consume up to 4× more tokens than Codex on the same task - one reported refactor cost $155 with Opus vs $15 with Codex. Use Codex first for scaffolding; use Claude only for final architectural review. Cost-aware routing is part of D1 model-selection.
-
Run the night-shift in Docker, with a PTY for risky commands. Autonomous overnight runs are now common. Use Docker isolation for the workers and
hermes start --ptyso Claude Code pauses for human confirmation onrm -rf,git push --force, and other destructive commands. Hooks-as-product, exam-relevant for D3.
3 production patterns the Hermes release codifies
-
The Coordinator pattern. One agent owns long-term project memory; specialist workers spin up per task with fresh context and a narrow tool whitelist. The coordinator never codes - it routes. Mirrors the CCA-F D1 expectation that subagents do not inherit parent context.
-
The Specialist-by-benchmark pattern. Don't pick one model and route everything to it. Pick the model per task type based on benchmark strength: Opus for reasoning-heavy multi-file edits, Codex for terminal-heavy single-shot commands. Same idea applies to Haiku for cheap classification.
-
The Verification-loop pattern. After workers complete, the coordinator runs a Task Completion Judgment check. If the score is below 0.9, it rolls back changes and escalates to a human. This is the structured HITL handoff the CCA-F D5 questions probe - operationalized.
Three orchestration anti-patterns
-
Skill overwriting on the self-improving loop. Hermes' self-improvement can clobber manually tuned skills. Mark them
immutable: truein the skill markdown frontmatter - same protective discipline as pinnedcase-factsblocks in long Claude sessions. -
Letting Claude Code do the cheap work. If you route lint + scaffolding through Opus 4.7, you are paying premium prices for cheap output. This is the classic "bigger model" distractor the exam plants in D1 questions: the right fix is routing, not escalation.
-
Skipping version pinning. Hermes shipped v0.1 → v0.8 in months. Pin your version in CI/CD; treat protocol drift as a real failure mode. The same discipline applies to MCP servers and the
tools/listhandshake.
How this shows up on the exam
Domain 1 (Agentic Architecture & Orchestration) - 27% of the exam, the biggest single domain. Expect at least two questions whose correct answer is some version of "pick the smaller specialist agent for the cheap step and route to the larger reasoning agent only for the architectural step." The distractors will suggest "increase model size", "lengthen context window", or "add a system prompt that lists every step" - all wrong. Hermes' Specialist Routing is the canonical answer pattern surfaced in production form.
For study-next, pair this post with the Subagents concept page (isolation rule), the Multi-agent research system scenario (the canonical D1 build-along), and the Day-of distractor patterns in the Exam Guide (especially the Model vs Design trap). Hermes is the reference implementation; the exam will probe the principle behind it.
Sources
- Anthropic Claude Opus 4.7 release notes (April 2026)
- Nous Research Hermes Agent docs
- CatDoes - token-economics comparison (April 30, 2026)
- Reddit r/hermesagent - Nightshift pattern (April 29, 2026)
- Blake Crosley AI Research - SWE-bench / Terminal-Bench data (April 23, 2026)
Where this lands in the exam-prep map
Each blog post bridges into the evergreen pillars. These are the most relevant follow-ups for this story.
Concept
Subagents
Hermes is the textbook coordinator; the workers are subagents with isolated context.
Open ↗Scenario
Multi-agent research
Same hub-and-spoke pattern at production research scale.
Open ↗Knowledge
Claude Code 101
Foundation course on the daily-driver workflow Hermes is composing.
Open ↗Exam Guide
Day-of distractor patterns
The 'pick the bigger model' distractor is exactly what Specialist Routing solves.
Open ↗7 questions answered
What is Hermes Agent and how is it different from Claude Code?
What is Specialist Routing in agent design?
Why does Claude Opus 4.7 beat Codex on SWE-bench but lose on Terminal-Bench?
Does Hermes Agent map to a CCA-F exam domain?
How does subagent context isolation actually work?
What is the Hermes Nightshift pattern?
Should I use Hermes for the exam or Claude Code directly?
Synthesized from research output on 2026-05-02. LinkedIn cross-post pending.
Last reviewed 2026-05-06.
