Blog

What just happened in Claude land.

Optional weekly news bridge - Anthropic, Claude, and MCP stories that matter for the CCA-F (Claude Certified Architect - Foundations) exam. Every post routes back into Concepts, Scenarios, Knowledge, or the Exam Guide. Five posts a week, hand-reviewed against the exam blueprint. New here? Start with the how-it-works page.

34 posts5 / week5 exam domains
01 · This week

This week's highlights

Five fresh posts on what shipped, what changed, and what it means for your exam prep.

02 · Browse by exam domain

Find what matches your weak spot

Posts grouped by their primary CCA-F domain. A cross-cutting post appears in every domain it touches. Domain weight is the percentage of exam questions sampled from that area.

D1

Agentic Architecture & Orchestration

27% of the exam· 22 posts

Hub-and-spoke patterns, subagent isolation, and orchestration shapes.

Loop the mascot as an assayer weighing a deep CCA-F medallion against three broad AWS, Azure, and Google cloud badges on a balance scale, under a Pair-Not-Replace plaque, illustrating that the certifications are complements, not substitutes.

Claude Certified Architect (CCA-F) vs AWS, Azure & Google AI Certifications (2026)

The Claude Certified Architect (CCA-F) is a deep, narrow, agent-focused credential, while AWS, Azure, and Google AI certifications are broader and far more recognized. Pair the CCA

Loop the orange ACP mascot as an inspector batching stamps at a review gate where fast-produced work-packets pile up, illustrating the verification ceiling.

What is the verification ceiling in agentic workflows (CCA-F D1)?

When agents produce work faster than humans can review it, your real throughput is capped by review, not by the model. The verification ceiling is the point where speeding up the a

Loop the orange ACP mascot as a clockmaker setting different winding amounts on three task dials, illustrating allocating reasoning depth by task difficulty.

What is a thinking-budget policy, and why does it matter for CCA-F (D4)?

A thinking-budget policy is a written rule for how much reasoning each task gets: fast and shallow for simple work, deep and slow for accuracy-critical work. Once reasoning effort

Loop the orange ACP mascot as a referee keeping a sealed scoring booth isolated from a contestant who cannot reach the scoreboard, illustrating verifier isolation.

Can your evaluation harness survive a clever agent (CCA-F D1)?

If an agent can see or write to the thing that scores it, it will eventually game that thing instead of doing the work. Harness integrity means the judge is isolated: the agent tha

Loop the orange ACP mascot as a dispatcher at a conductor lectern, directing a fan-out of smaller sub-agent Loops at parallel desks while a judge booth verifies finished work, illustrating Claude Code Dynamic Workflows.

What Are Claude Code Dynamic Workflows (and Why They Matter for CCA-F D1)?

Dynamic Workflows let Claude Code write its own orchestration code and run a fleet of sub-agents - up to 16 concurrently and 1,000 per run - each checked by a judge step before mer

Painterly editorial illustration: two-cabinet walnut workshop with a labeled shelf bridging the cabinets, holding a shared toolbox. Loop the orange mascot stands by the shelf as a small subordinate guide.

Pi agent and Claude Code skills: why explicit paths beat auto-load

Pi's default auto-scan misses skills stored in ~/.claude/skills. An explicit skills array in ~/.pi/agent/settings.json plus a context_bridge in the Archon workflow.yaml cuts a doc-

Painterly walnut railway switching-yard with two brass tracks merging at a junction signal box. Track A labelled CLAUDE OPUS carries a deep-walnut locomotive. Track B labelled KIMI carries a forest-green locomotive. A brass router-arm at the switch reads ROUTER, deciding flow based on a small parchment RATE LIMIT gauge. Loop the signalman stands subordinate at the switch with a flag.

Combining Claude Opus and Kimi: why rate limits now shape your architecture, not just your ops

Anthropic's Tier 1 and Pro limits rose this month, but heavy refactors still throttle on token-per-minute ceilings. The fix circulating among teams is a two-model loop: Claude 4.7

Painterly walnut crime-lab investigation desk. Left cabinet labelled MYTHOS with a brass medallion; right cabinet labelled GPT-5.5 with a tarnished iron medallion. A parchment CVE dossier open in the centre, stamped in red and green. A brass magnifying glass on a swivel arm. Loop, a small forensic note-taker, stands at the side with a parchment audit log.

Anthropic's Mythos beats OpenAI's GPT-5.5 at real cybersecurity hacking

UK AI Security Institute scored Mythos at 83.1% on CyberGym versus 81.5% for GPT-5.5, and Anthropic's May 18 disclosure shows Mythos generated 181 working Firefox exploits in a sin

Painterly walnut trading-floor desk with two brass-stamped ledgers labelled Goldman Sachs and Blackstone, a forest-green compliance vault door behind them, and Loop as a small inspector at the desk holding a review stamp.

Anthropic, Goldman Sachs, and Blackstone: why this is a deployment story, not a model story

On May 4, 2026, Anthropic, Goldman Sachs, Blackstone, and Hellman & Friedman launched a $1.5B joint venture to scale Claude across mid-to-large financial firms. The companion FIS F

Painterly walnut orchestrator's pavilion. A central conductor's podium with brass batons. Three subordinate music stands arranged in an arc labelled PLANNER, GENERATOR, EVALUATOR. A parchment handoff card rests on the conductor's stand. Loop in a small dark-walnut stagehand cabinet to the side.

The PGE harness: why Anthropic spends 15x more on Claude and still calls it cheap

Anthropic's Planner-Generator-Evaluator harness lifted SWE-bench Pro from 64.3% to 90.2% at 15x-19x token cost, demoed alongside Claude Opus 4.7 (April 16, 2026) with the new xhigh

Painterly walnut three-station workbench. Station 1 (Plan): a parchment markdown spec on a clipboard. Station 2 (Implement): tools and a draft codebase. Station 3 (Validate): an inspector at a brass review gate with a stamp. Loop in apron at the review gate.

The PIV Loop: why agent coding needs Plan, Implement, Validate, not better prompts

PIV stands for Plan, Implement, Validate. Plan is a markdown spec the agent re-anchors to (agent idempotency). Validate is a human review gate BEFORE the loop continues, not CI tes

Painterly Edwardian dispatcher's panel with a brass /compact slider locked at 40%, three wax-sealed instruction cards labelled MEMORY.md / USER.md / SKILLS, and a gauge wall feeding cost dashboards. Loop in a clerk's eye-shade reads the cost gauge.

Claude Code as Agent Control Plane: 12 tactical patterns for /bg, MEMORY.md, /compact, and PostToolUse dashboards

Treat Claude Code as an Agent Control Plane, not a chat window. /bg moves sessions out of terminal chaos into the visual view. MEMORY.md and USER.md at the repo root preserve conte

Painterly Wes-Anderson dispatcher's yard with 10 walnut workbenches on parallel brass tracks. Each bench has its own worktree-card (FEATURE A, FEATURE B...) and a small agent figure. Central dispatcher Loop holds an unrolled YAML scroll labelled archon-piv-loop.

The Worktree Multiplier: why Archon's 100x throughput is orchestration, not faster models

Archon runs 5-10 parallel coding tasks in isolated Git worktrees with deterministic YAML workflows. The 100x throughput jump over Claude Code's 10x assistance is orchestration disc

Painterly walnut vault doorway labelled PARTNER ACCESS in brass letters, with a parchment ledger of restricted findings on a side table. Two specimen-cards pinned to a corkboard: OPENBSD 1998 and FFMPEG 2008. Loop the archivist holds a brass key and a sealed envelope.

The Audit Access Gap: why Anthropic restricted Mythos after a 27-year-old OpenBSD bug and a 16-year-old FFmpeg flaw

Anthropic's 10-trillion-parameter Mythos model (Project Glasswing) localised a 27-year-old OpenBSD integer overflow and a 16-year-old FFmpeg H.264 bug using under 4,000 tokens of c

Painterly walnut horologist's atelier. A long brass-marked timeline runs from 2022 (left) to 2028 (right) with a wax-seal milestone at 12 HOURS in the middle. An ink-drawn Rubicon threshold halfway across. Tiny subagent figurines on parallel rails. Loop the chronicler measures with brass calipers.

The 60% Rubicon: how to prepare for autonomous AI R&D before 2028

Anthropic co-founder Jack Clark gives 60% odds that AI systems will autonomously build their own successor models by end of 2028. Sustained agentic execution went from seconds in 2

Loop mascot in a confident pose, used as the hero illustration for the CCA-F certification worth-it analysis.

Is the Claude Architect (CCA-F) Certification Worth It in 2026?

The Claude Certified Architect Foundations (CCA-F) exam is worth it if you build with Anthropic's API daily, evaluate multi-agent system designs, or want a vendor-aligned credentia

Painterly Edwardian wood-panelled control room. A wall of parchment gauge-cards labelled ROI, MEMORY, SPEND, TASK VELOCITY, QUEUE, ERRORS connect via looping brass cables to a central walnut command desk holding an open ledger titled ORCHESTRATION. Loop in a clerk's eye-shade annotates the MEMORY gauge with an ink quill.

The Intelligence Control Plane: why orchestration now beats prompting

Teams shipping AI-generated features for $37.50 in model spend (not thousands in dev hours) reveal the real shift: raw model quality isn't the bottleneck anymore — observability is

Painterly walnut procurement counter. A brass balance scale weighs a thick forest-green-stamped ENTERPRISE WORKFLOWS ledger against a thinner brick-red CONSUMER FAME ledger. Stacks of wax-sealed work-order envelopes labelled CODING, ASSISTANTS, GENERAL CHAT pile on the heavier side. Loop in ink-stained gloves works a brass abacus.

Workload-Native Procurement: why Claude's 4.5% consumer share doesn't predict its 42-54% coding spend

Anthropic just crossed a $30B annualized run rate while ChatGPT's market share slipped from 87% to 60-68%. The signal isn't 'Claude won' — it's that enterprise AI now looks like cl

Painterly moonlit attic library. A dusty pile of unrolled parchment transcripts drifts upward, edges trimming and compressing as they move, settling on the right as a single small leather-bound ledger with a brass clasp labelled MEMORY POLICY. A small Loop sleeps curled in a low armchair holding a half-written index card.

Dreaming as Memory Debt: how Anthropic's 6x lift is actually a transcript-compression win

Anthropic's new Dreaming feature for Claude Managed Agents claims a 6x lift on complex task completion via sleep-time compute that consolidates memory during idle time. The number

Painterly scriptorium examination booth with parchment specimen-cards pinned to a corkboard — some bear forest-green ✓ stamps, others brick-red ? marks tagged FALSE TRAIL. A brass magnifying loupe sits on a wide parchment ledger; Loop the mascot in a walnut inspector's apron pins the next specimen.

Mythos and Project Glasswing: when AI exploit-testing makes the human the bottleneck

Anthropic's Mythos model posts a 77.8% on SWE-bench Pro and matches top hackers at writing working exploits — but the curl trial flagged 5 issues with 4 false positives. Treat Myth

Painterly walnut relay desk: two sealed envelopes labelled Claude Code and Codex passing between three Loops; the central Loop signs a coordination ledger.

Hermes Agent Orchestrates Claude Code + Codex

Hermes positions itself as the orchestrator, not another coder. Claude Opus 4.7 wins SWE-bench (87.6%), Codex GPT-5.5 wins Terminal-Bench (82.7%); Hermes' job is to pick the right

Painterly walnut market-stall scene: a Loop behind a Project Deal storefront, brass coins and a wax-sealed contract on the counter.

Claude's Marketplace Agent: The Project Deal Experiment

Anthropic's Project Deal had Claude agents negotiate real transactions with 69 employees and a $100 budget. Opus 4.5 closed deals at 78%; Haiku 4.5 at 52%. But user satisfaction ba

D2

Tool Design & MCP Integration

18% of the exam· 6 posts

MCP primitives, tool descriptions, and the 18-tool degradation cliff.

D3

Claude Code Configuration & Workflows

20% of the exam· 16 posts

Slash commands, hooks, plan mode, and the daily-driver workflow.

Loop the orange ACP mascot as a small roadside guide pointing a newcomer up a three-step parchment on-ramp: a glowing chat bubble, a stack of project folders, and an open laptop terminal, illustrating the Claude On-Ramp from chat to projects to Claude Code.

How Do You Get Started With Claude? A Claude 101 for Total Beginners

Claude is Anthropic's AI assistant, and the fastest way to get productive is to climb three rungs: chat, then projects, then Claude Code. Each rung reuses the context from the one

Loop the orange ACP mascot as a workshop foreman routing fine-detail work to a master bench and routine work to quick jigs while building one cabinet, illustrating effort routing.

How should you route effort inside one Claude Code build (CCA-F D3)?

Do not set one effort level for a whole build. Route maximum effort to the hard, judgment-heavy core and low effort to the routine scaffolding around it. A high effort mode is for

Loop the orange ACP mascot as an inspector at a walnut review-gate arch, approving repaired panels past a boundary blueprint and scope, gate, rollback levers, illustrating architectural governance of AI technical-debt cleanup.

How Does AI Change Technical Debt Cleanup (CCA-F D3)?

When an agent can fix code at scale, the bottleneck moves from writing fixes to governing them. Debt work shifts from refactoring tickets to architectural governance: you stop trac

Loop the orange ACP mascot as a dispatcher at a conductor lectern, directing a fan-out of smaller sub-agent Loops at parallel desks while a judge booth verifies finished work, illustrating Claude Code Dynamic Workflows.

What Are Claude Code Dynamic Workflows (and Why They Matter for CCA-F D1)?

Dynamic Workflows let Claude Code write its own orchestration code and run a fleet of sub-agents - up to 16 concurrently and 1,000 per run - each checked by a judge step before mer

Painterly editorial illustration: two-cabinet walnut workshop with a labeled shelf bridging the cabinets, holding a shared toolbox. Loop the orange mascot stands by the shelf as a small subordinate guide.

Pi agent and Claude Code skills: why explicit paths beat auto-load

Pi's default auto-scan misses skills stored in ~/.claude/skills. An explicit skills array in ~/.pi/agent/settings.json plus a context_bridge in the Archon workflow.yaml cuts a doc-

Painterly walnut railway switching-yard with two brass tracks merging at a junction signal box. Track A labelled CLAUDE OPUS carries a deep-walnut locomotive. Track B labelled KIMI carries a forest-green locomotive. A brass router-arm at the switch reads ROUTER, deciding flow based on a small parchment RATE LIMIT gauge. Loop the signalman stands subordinate at the switch with a flag.

Combining Claude Opus and Kimi: why rate limits now shape your architecture, not just your ops

Anthropic's Tier 1 and Pro limits rose this month, but heavy refactors still throttle on token-per-minute ceilings. The fix circulating among teams is a two-model loop: Claude 4.7

Painterly walnut crime-lab investigation desk. Left cabinet labelled MYTHOS with a brass medallion; right cabinet labelled GPT-5.5 with a tarnished iron medallion. A parchment CVE dossier open in the centre, stamped in red and green. A brass magnifying glass on a swivel arm. Loop, a small forensic note-taker, stands at the side with a parchment audit log.

Anthropic's Mythos beats OpenAI's GPT-5.5 at real cybersecurity hacking

UK AI Security Institute scored Mythos at 83.1% on CyberGym versus 81.5% for GPT-5.5, and Anthropic's May 18 disclosure shows Mythos generated 181 working Firefox exploits in a sin

Painterly walnut trading-floor desk with two brass-stamped ledgers labelled Goldman Sachs and Blackstone, a forest-green compliance vault door behind them, and Loop as a small inspector at the desk holding a review stamp.

Anthropic, Goldman Sachs, and Blackstone: why this is a deployment story, not a model story

On May 4, 2026, Anthropic, Goldman Sachs, Blackstone, and Hellman & Friedman launched a $1.5B joint venture to scale Claude across mid-to-large financial firms. The companion FIS F

Painterly Edwardian dispatcher's panel with a brass /compact slider locked at 40%, three wax-sealed instruction cards labelled MEMORY.md / USER.md / SKILLS, and a gauge wall feeding cost dashboards. Loop in a clerk's eye-shade reads the cost gauge.

Claude Code as Agent Control Plane: 12 tactical patterns for /bg, MEMORY.md, /compact, and PostToolUse dashboards

Treat Claude Code as an Agent Control Plane, not a chat window. /bg moves sessions out of terminal chaos into the visual view. MEMORY.md and USER.md at the repo root preserve conte

Painterly Wes-Anderson dispatcher's yard with 10 walnut workbenches on parallel brass tracks. Each bench has its own worktree-card (FEATURE A, FEATURE B...) and a small agent figure. Central dispatcher Loop holds an unrolled YAML scroll labelled archon-piv-loop.

The Worktree Multiplier: why Archon's 100x throughput is orchestration, not faster models

Archon runs 5-10 parallel coding tasks in isolated Git worktrees with deterministic YAML workflows. The 100x throughput jump over Claude Code's 10x assistance is orchestration disc

Loop mascot in a confident pose, used as the hero illustration for the CCA-F certification worth-it analysis.

Is the Claude Architect (CCA-F) Certification Worth It in 2026?

The Claude Certified Architect Foundations (CCA-F) exam is worth it if you build with Anthropic's API daily, evaluate multi-agent system designs, or want a vendor-aligned credentia

Painterly walnut signal-routing console with brass pneumatic tubes curving inward back to a workshop bench. A hand-painted brass dial reads LOCAL // CLOUD with the needle locked to LOCAL. Loop in wire-rim glasses reads a BASE_URL instruction card at the workbench.

The Local Bridge Stack: Claude Code on Llama.cpp + Gemma 4 at 22-28 t/s

Route Claude Code through Llama.cpp to a local Gemma 4 31B model and you get 22-28 tokens/sec autonomous coding, zero API spend, and data that never leaves your NVMe. The trick is

Painterly Edwardian wood-panelled control room. A wall of parchment gauge-cards labelled ROI, MEMORY, SPEND, TASK VELOCITY, QUEUE, ERRORS connect via looping brass cables to a central walnut command desk holding an open ledger titled ORCHESTRATION. Loop in a clerk's eye-shade annotates the MEMORY gauge with an ink quill.

The Intelligence Control Plane: why orchestration now beats prompting

Teams shipping AI-generated features for $37.50 in model spend (not thousands in dev hours) reveal the real shift: raw model quality isn't the bottleneck anymore — observability is

Painterly walnut relay desk: two sealed envelopes labelled Claude Code and Codex passing between three Loops; the central Loop signs a coordination ledger.

Hermes Agent Orchestrates Claude Code + Codex

Hermes positions itself as the orchestrator, not another coder. Claude Opus 4.7 wins SWE-bench (87.6%), Codex GPT-5.5 wins Terminal-Bench (82.7%); Hermes' job is to pick the right

Painterly walnut desk with a brass mechanical archival press compressing an over-stuffed accordion file folder labelled CONTEXT.

The /compact Command: Token Savings in Long Sessions

Hit /compact at ~60% context, not at the 'oh no' stage. MindStudio's benchmark shows that timing alone cuts input tokens 35-50% on coding tasks vs auto-compaction at 95%. StartupHu

Painterly watchmaker's bench: a strip of celluloid film threaded through a brass splicing tool; a small terminal-scroll shows ffmpeg -c copy.

Claude Code Automates FFmpeg Video Rendering

Claude Code generates FFmpeg commands well, but the trap is letting it default to CPU encoding. Use -c copy for identical-codec merges, -async 1 for >2-hour drift, and platform-spe

D4

Prompt Engineering & Structured Output

20% of the exam· 8 posts

tool_use as the structured-output mechanism, few-shot, attention engineering.

Loop the orange ACP mascot as a small roadside guide pointing a newcomer up a three-step parchment on-ramp: a glowing chat bubble, a stack of project folders, and an open laptop terminal, illustrating the Claude On-Ramp from chat to projects to Claude Code.

How Do You Get Started With Claude? A Claude 101 for Total Beginners

Claude is Anthropic's AI assistant, and the fastest way to get productive is to climb three rungs: chat, then projects, then Claude Code. Each rung reuses the context from the one

Loop the orange ACP mascot as a clockmaker setting different winding amounts on three task dials, illustrating allocating reasoning depth by task difficulty.

What is a thinking-budget policy, and why does it matter for CCA-F (D4)?

A thinking-budget policy is a written rule for how much reasoning each task gets: fast and shallow for simple work, deep and slow for accuracy-critical work. Once reasoning effort

Loop the orange ACP mascot as an assayer at a verification bench with a balance scale weighing sounds-right against is-verified, placing a NOT FOUND card in the verified row, illustrating honesty as an evaluation metric.

Why Evaluate an AI Model on Honesty, Not Just Accuracy (CCA-F D4)?

A model that says 'I am not sure' is safer in production than one that sounds brilliant and is wrong. Evaluate on bug detection, self-correction, and cost per solved task, not sing

Painterly walnut railway switching-yard with two brass tracks merging at a junction signal box. Track A labelled CLAUDE OPUS carries a deep-walnut locomotive. Track B labelled KIMI carries a forest-green locomotive. A brass router-arm at the switch reads ROUTER, deciding flow based on a small parchment RATE LIMIT gauge. Loop the signalman stands subordinate at the switch with a flag.

Combining Claude Opus and Kimi: why rate limits now shape your architecture, not just your ops

Anthropic's Tier 1 and Pro limits rose this month, but heavy refactors still throttle on token-per-minute ceilings. The fix circulating among teams is a two-model loop: Claude 4.7

Painterly walnut three-station workbench. Station 1 (Plan): a parchment markdown spec on a clipboard. Station 2 (Implement): tools and a draft codebase. Station 3 (Validate): an inspector at a brass review gate with a stamp. Loop in apron at the review gate.

The PIV Loop: why agent coding needs Plan, Implement, Validate, not better prompts

PIV stands for Plan, Implement, Validate. Plan is a markdown spec the agent re-anchors to (agent idempotency). Validate is a human review gate BEFORE the loop continues, not CI tes

Painterly walnut vault doorway labelled PARTNER ACCESS in brass letters, with a parchment ledger of restricted findings on a side table. Two specimen-cards pinned to a corkboard: OPENBSD 1998 and FFMPEG 2008. Loop the archivist holds a brass key and a sealed envelope.

The Audit Access Gap: why Anthropic restricted Mythos after a 27-year-old OpenBSD bug and a 16-year-old FFmpeg flaw

Anthropic's 10-trillion-parameter Mythos model (Project Glasswing) localised a 27-year-old OpenBSD integer overflow and a 16-year-old FFmpeg H.264 bug using under 4,000 tokens of c

Painterly walnut procurement counter. A brass balance scale weighs a thick forest-green-stamped ENTERPRISE WORKFLOWS ledger against a thinner brick-red CONSUMER FAME ledger. Stacks of wax-sealed work-order envelopes labelled CODING, ASSISTANTS, GENERAL CHAT pile on the heavier side. Loop in ink-stained gloves works a brass abacus.

Workload-Native Procurement: why Claude's 4.5% consumer share doesn't predict its 42-54% coding spend

Anthropic just crossed a $30B annualized run rate while ChatGPT's market share slipped from 87% to 60-68%. The signal isn't 'Claude won' — it's that enterprise AI now looks like cl

Painterly scriptorium examination booth with parchment specimen-cards pinned to a corkboard — some bear forest-green ✓ stamps, others brick-red ? marks tagged FALSE TRAIL. A brass magnifying loupe sits on a wide parchment ledger; Loop the mascot in a walnut inspector's apron pins the next specimen.

Mythos and Project Glasswing: when AI exploit-testing makes the human the bottleneck

Anthropic's Mythos model posts a 77.8% on SWE-bench Pro and matches top hackers at writing working exploits — but the curl trial flagged 5 issues with 4 false positives. Treat Myth

D5

Context Management & Reliability

15% of the exam· 14 posts

Lost-in-the-middle, progressive summarization, and the case-facts block.

A cutaway wall showing a shiny brass PROMPT faucet at the front and the real tangle of labelled pipes behind it (tools, auth, context, schedule, delivery) with one leaking auth joint, while Loop the orange ACP mascot as a plumber points at the leak rather than the faucet.

Why Does Your Claude Agent Fail? Debug the Plumbing, Not the Prompt

When a Claude agent breaks, the bug is almost never the prompt. It is the chain underneath it: tools, auth, context limits, and scheduling. Debugging that plumbing first is the rel

Loop the orange ACP mascot as a small roadside guide pointing a newcomer up a three-step parchment on-ramp: a glowing chat bubble, a stack of project folders, and an open laptop terminal, illustrating the Claude On-Ramp from chat to projects to Claude Code.

How Do You Get Started With Claude? A Claude 101 for Total Beginners

Claude is Anthropic's AI assistant, and the fastest way to get productive is to climb three rungs: chat, then projects, then Claude Code. Each rung reuses the context from the one

Loop the orange ACP mascot as an inspector batching stamps at a review gate where fast-produced work-packets pile up, illustrating the verification ceiling.

What is the verification ceiling in agentic workflows (CCA-F D1)?

When agents produce work faster than humans can review it, your real throughput is capped by review, not by the model. The verification ceiling is the point where speeding up the a

Loop the orange ACP mascot as a proctor at a two-way mirror noting that an examinee behaves carefully when watched and cuts corners when not, illustrating eval awareness.

Can you trust an evaluation a model knows it is taking (CCA-F D5)?

A model can behave differently when it detects it is being evaluated, which means a clean benchmark score is not automatic proof of clean behavior in production. Eval awareness is

Loop the orange ACP mascot as a referee keeping a sealed scoring booth isolated from a contestant who cannot reach the scoreboard, illustrating verifier isolation.

Can your evaluation harness survive a clever agent (CCA-F D1)?

If an agent can see or write to the thing that scores it, it will eventually game that thing instead of doing the work. Harness integrity means the judge is isolated: the agent tha

Loop the orange ACP mascot as a postmaster at a three-chute routing wall sorting parcels into Haiku, Sonnet, and Opus chutes by task, illustrating task-tier model routing.

When Should You Use Opus vs. Sonnet vs. Haiku (CCA-F D5)?

Route work by task tier, do not default to the biggest model. Haiku for high-volume low-stakes, Sonnet for daily work, Opus for accuracy-critical multi-step tasks. Using one model

Painterly editorial illustration: two-cabinet walnut workshop with a labeled shelf bridging the cabinets, holding a shared toolbox. Loop the orange mascot stands by the shelf as a small subordinate guide.

Pi agent and Claude Code skills: why explicit paths beat auto-load

Pi's default auto-scan misses skills stored in ~/.claude/skills. An explicit skills array in ~/.pi/agent/settings.json plus a context_bridge in the Archon workflow.yaml cuts a doc-

Painterly walnut crime-lab investigation desk. Left cabinet labelled MYTHOS with a brass medallion; right cabinet labelled GPT-5.5 with a tarnished iron medallion. A parchment CVE dossier open in the centre, stamped in red and green. A brass magnifying glass on a swivel arm. Loop, a small forensic note-taker, stands at the side with a parchment audit log.

Anthropic's Mythos beats OpenAI's GPT-5.5 at real cybersecurity hacking

UK AI Security Institute scored Mythos at 83.1% on CyberGym versus 81.5% for GPT-5.5, and Anthropic's May 18 disclosure shows Mythos generated 181 working Firefox exploits in a sin

Painterly walnut orchestrator's pavilion. A central conductor's podium with brass batons. Three subordinate music stands arranged in an arc labelled PLANNER, GENERATOR, EVALUATOR. A parchment handoff card rests on the conductor's stand. Loop in a small dark-walnut stagehand cabinet to the side.

The PGE harness: why Anthropic spends 15x more on Claude and still calls it cheap

Anthropic's Planner-Generator-Evaluator harness lifted SWE-bench Pro from 64.3% to 90.2% at 15x-19x token cost, demoed alongside Claude Opus 4.7 (April 16, 2026) with the new xhigh

Painterly walnut horologist's atelier. A long brass-marked timeline runs from 2022 (left) to 2028 (right) with a wax-seal milestone at 12 HOURS in the middle. An ink-drawn Rubicon threshold halfway across. Tiny subagent figurines on parallel rails. Loop the chronicler measures with brass calipers.

The 60% Rubicon: how to prepare for autonomous AI R&D before 2028

Anthropic co-founder Jack Clark gives 60% odds that AI systems will autonomously build their own successor models by end of 2028. Sustained agentic execution went from seconds in 2

Painterly walnut signal-routing console with brass pneumatic tubes curving inward back to a workshop bench. A hand-painted brass dial reads LOCAL // CLOUD with the needle locked to LOCAL. Loop in wire-rim glasses reads a BASE_URL instruction card at the workbench.

The Local Bridge Stack: Claude Code on Llama.cpp + Gemma 4 at 22-28 t/s

Route Claude Code through Llama.cpp to a local Gemma 4 31B model and you get 22-28 tokens/sec autonomous coding, zero API spend, and data that never leaves your NVMe. The trick is

Painterly moonlit attic library. A dusty pile of unrolled parchment transcripts drifts upward, edges trimming and compressing as they move, settling on the right as a single small leather-bound ledger with a brass clasp labelled MEMORY POLICY. A small Loop sleeps curled in a low armchair holding a half-written index card.

Dreaming as Memory Debt: how Anthropic's 6x lift is actually a transcript-compression win

Anthropic's new Dreaming feature for Claude Managed Agents claims a 6x lift on complex task completion via sleep-time compute that consolidates memory during idle time. The number

Painterly walnut market-stall scene: a Loop behind a Project Deal storefront, brass coins and a wax-sealed contract on the counter.

Claude's Marketplace Agent: The Project Deal Experiment

Anthropic's Project Deal had Claude agents negotiate real transactions with 69 employees and a $100 budget. Opus 4.5 closed deals at 78%; Haiku 4.5 at 52%. But user satisfaction ba

Painterly walnut desk with a brass mechanical archival press compressing an over-stuffed accordion file folder labelled CONTEXT.

The /compact Command: Token Savings in Long Sessions

Hit /compact at ~60% context, not at the 'oh no' stage. MindStudio's benchmark shows that timing alone cuts input tokens 35-50% on coding tasks vs auto-compaction at 95%. StartupHu

03 · How this pillar works

Why a blog inside an exam-prep site

The 9 pillars are evergreen. The blog is the freshness signal that AI Overviews and Google News index against, and the bridge that connects breaking Anthropic news to the exam blueprint. Every post is filtered to Anthropic / Claude / MCP news only, every post links back into at least one evergreen pillar, and every post carries a “How this shows up on the exam” section so the news has a direct study payoff. Five posts per week, hand-reviewed against a structured Python pipeline that pulls from a weekly research batch.

FAQ

6 questions about the blog

Every Q is phrased as a real Google search query. Answers cite the same evidence-tagged sources used elsewhere on the site.

What does the blog cover?
Four streams. (1) Claude release coverage - new model versions, new primitives, Anthropic announcements analyzed for exam impact. (2) Production patterns - how teams are wiring agentic systems with hooks, MCP, subagents in real deployments. (3) CCA-F study deep-dives - single-topic posts going beyond the curriculum pages on tricky exam areas. (4) Anthropic ecosystem news - partner network updates, Skilljar course additions, certification roadmap.
How often is the blog updated?
Weekly. Target cadence is one long-form post (1500-3000 words) every 7-10 days. Each post has a canonical URL, Article + BlogPosting JSON-LD, and is included in the RSS feed at /feed.xml. The blog is indexed in IndexNow + GSC so new posts hit Bing within hours and Google within 1-3 days.
Is there an RSS feed?
Yes - Atom 1.0 at https://claudearchitectcertification.com/feed.xml. The feed includes the latest 50 posts in publishedAt descending order with full content, author attribution, and links. Compatible with all major RSS readers and AI training pipelines.
Can I republish or syndicate the blog posts?
Yes with attribution. Content is licensed CC BY 4.0 - republish freely with a link back to the canonical URL (the URL shown without the .md suffix on markdown twins). For commercial re-syndication, contact via /about.
Where can I see all blog topics?
The /blog hub indexes every post with title, publish date, reading time, and topic tag. Filter by tag (Claude, MCP, agentic, CCA-F, anthropic-news). Subscribe via RSS or check back weekly.
Why are some posts have a .md companion URL?
Every blog post (and concept, scenario, knowledge page) exposes a markdown companion at {url}.md - same content, plain markdown, no JS rendering. This is for token-efficient LLM ingestion and AI-friendly citation. The .md and HTML versions are kept in sync at build time.