Pillar 9 · Blog · 2026-05-02

Hermes Agent Orchestrates Claude Code + Codex

Hermes positions itself as the orchestrator, not another coder. Claude Opus 4.7 wins SWE-bench (87.6%), Codex GPT-5.5 wins Terminal-Bench (82.7%); Hermes' job is to pick the right specialist per task. This is Specialist Routing — the canonical D1 hub-and-spoke pattern arriving in production tooling.

D1D3hermesorchestrationclaude-code
Painterly walnut relay desk: two sealed envelopes labelled Claude Code and Codex passing between three Loops; the central Loop signs a coordination ledger.

What just happened

Hermes Agent — Nous Research's orchestrator — shipped v0.8 in late April, and the design choice it codifies matters more than the version bump. Hermes is no longer a single-model wrapper; it is a persistent coordinator that maintains long-term project memory and routes tasks to specialist workers. Two of those workers are Claude Code (the reasoning-heavy 'Lead Engineer') and Codex (the rapid-scaffolding 'Autonomous Worker'). The pattern Hermes is operationalizing is the same one the CCA-F D1 blueprint calls hub-and-spoke with subagent isolation.

5 things that matter for the exam

  1. Put the orchestrator in charge of context, not coding. Hermes owns long-term memory and task hand-offs. The workers don't share context with each other, only with the coordinator. That is the canonical D1 isolation rule.
  2. Split work by benchmark strength. Claude Opus 4.7 hits 87.6% on SWE-bench Verified — best for refactors, security review, architecture. Codex GPT-5.5 hits 82.7% on Terminal-Bench 2.0 — best for scaffolding, tests, lint, terminal-heavy chores.
  3. Lock task boundaries to prevent context bleed. Hermes ships a 'Kanban' feature that pins a task to one agent. Without it, both workers touch everything and context leakage tanks reliability. Same lesson as the subagent isolation rule.
  4. Watch the token-drain refactor loop. Claude Code can consume up to 4× more tokens than Codex on the same task ($155 vs $15 in a reported case). Use Codex first for scaffolding; use Claude only for final architectural review.
  5. Run the night-shift in Docker, with a PTY for risky commands. Autonomous overnight runs are now common. Use Docker isolation and hermes start --pty so Claude Code pauses for human confirmation on rm, git push, and other destructive commands.

Three orchestration anti-patterns

Skill overwriting on the self-improving loop

Hermes' self-improvement can clobber manually tuned skills. Mark them immutable: true in the skill markdown frontmatter.

Letting Claude Code do the cheap work

If you route lint + scaffolding through Opus 4.7, you are paying premium prices for cheap output. The classic 'bigger model' distractor in CCA-F D1 questions.

Skipping version pinning

Hermes shipped v0.1 → v0.8 in months. Pin your version in CI/CD; treat protocol drift as a real failure mode.

Sources

05 · Read next in the pillars

Where this lands in the exam-prep map

Each blog post bridges into the evergreen pillars. These are the most relevant follow-ups for this story.

06 · FAQ

3 questions answered

What is Specialist Routing?
A coordinator agent (here, Hermes) inspects each incoming task and routes it to the worker best-suited by benchmark and cost. Claude Code for reasoning-heavy tasks; Codex for terminal/scaffolding tasks. The coordinator owns context; workers run in isolation. This is the canonical CCA-F D1 hub-and-spoke pattern.
Why does Claude Code beat Codex on SWE-bench but lose on Terminal-Bench?
Different optimization targets. SWE-bench Verified rewards multi-file reasoning and architectural correctness — Claude Opus 4.7's strengths. Terminal-Bench 2.0 rewards single-shot terminal commands and rapid tool-use — Codex GPT-5.5's strengths. Specialist Routing exploits both rather than picking one.
Does this map to a CCA-F domain?
Yes — Domain 1 (Agentic Architecture, 27% of the exam). Hub-and-spoke with specialist subagents is core D1 material. Expect distractors that suggest 'use the bigger model' when the right answer is 'route the work to the cheaper specialist.'

Synthesized from research output on 2026-05-02. LinkedIn cross-post pending.
Last reviewed 2026-05-06.

Blog post · D1 · Pillar 9 · Blog

Hermes Agent Orchestrates Claude Code + Codex, complete.

You've covered the full ten-section breakdown for this primitive, definition, mechanics, code, false positives, comparison, decision tree, exam patterns, and FAQ. One technical primitive down on the path to CCA-F.

Share your win →