This week's highlights
Five fresh posts on what shipped, what changed, and what it means for your exam prep.

FeaturedD1D22026-06-18
Claude Certified Architect (CCA-F) vs AWS, Azure & Google AI Certifications (2026)
The Claude Certified Architect (CCA-F) is a deep, narrow, agent-focused credential, while AWS, Azure, and Google AI certifications are broader and far more recognized. Pair the CCA-F with a cloud cert, don't replace one with the other: the CCA-F wins on agentic architecture, MCP, and Claude Code signal; the cloud certs win on breadth and hiring-market reach. At a $99 fee and a scenario-based 60-question exam, it is the cheapest way to prove Claude-specific agent-design skill.
Read the post ↗
Why Does Your Claude Agent Fail? Debug the Plumbing, Not the Prompt
When a Claude agent breaks, the bug is almost never the prompt. It is the chain underneath it: tools, auth, co…

How Do You Get Started With Claude? A Claude 101 for Total Beginners
Claude is Anthropic's AI assistant, and the fastest way to get productive is to climb three rungs: chat, then…

What is the verification ceiling in agentic workflows (CCA-F D1)?
When agents produce work faster than humans can review it, your real throughput is capped by review, not by th…

Can you trust an evaluation a model knows it is taking (CCA-F D5)?
A model can behave differently when it detects it is being evaluated, which means a clean benchmark score is n…
Find what matches your weak spot
Posts grouped by their primary CCA-F domain. A cross-cutting post appears in every domain it touches. Domain weight is the percentage of exam questions sampled from that area.
Agentic Architecture & Orchestration
27% of the exam· 22 posts
Hub-and-spoke patterns, subagent isolation, and orchestration shapes.

Claude Certified Architect (CCA-F) vs AWS, Azure & Google AI Certifications (2026)
The Claude Certified Architect (CCA-F) is a deep, narrow, agent-focused credential, while AWS, Azure, and Google AI certifications are broader and far more recognized. Pair the CCA

What is the verification ceiling in agentic workflows (CCA-F D1)?
When agents produce work faster than humans can review it, your real throughput is capped by review, not by the model. The verification ceiling is the point where speeding up the a

What is a thinking-budget policy, and why does it matter for CCA-F (D4)?
A thinking-budget policy is a written rule for how much reasoning each task gets: fast and shallow for simple work, deep and slow for accuracy-critical work. Once reasoning effort

Can your evaluation harness survive a clever agent (CCA-F D1)?
If an agent can see or write to the thing that scores it, it will eventually game that thing instead of doing the work. Harness integrity means the judge is isolated: the agent tha

What Are Claude Code Dynamic Workflows (and Why They Matter for CCA-F D1)?
Dynamic Workflows let Claude Code write its own orchestration code and run a fleet of sub-agents - up to 16 concurrently and 1,000 per run - each checked by a judge step before mer

Pi agent and Claude Code skills: why explicit paths beat auto-load
Pi's default auto-scan misses skills stored in ~/.claude/skills. An explicit skills array in ~/.pi/agent/settings.json plus a context_bridge in the Archon workflow.yaml cuts a doc-

Combining Claude Opus and Kimi: why rate limits now shape your architecture, not just your ops
Anthropic's Tier 1 and Pro limits rose this month, but heavy refactors still throttle on token-per-minute ceilings. The fix circulating among teams is a two-model loop: Claude 4.7

Anthropic's Mythos beats OpenAI's GPT-5.5 at real cybersecurity hacking
UK AI Security Institute scored Mythos at 83.1% on CyberGym versus 81.5% for GPT-5.5, and Anthropic's May 18 disclosure shows Mythos generated 181 working Firefox exploits in a sin

Anthropic, Goldman Sachs, and Blackstone: why this is a deployment story, not a model story
On May 4, 2026, Anthropic, Goldman Sachs, Blackstone, and Hellman & Friedman launched a $1.5B joint venture to scale Claude across mid-to-large financial firms. The companion FIS F

The PGE harness: why Anthropic spends 15x more on Claude and still calls it cheap
Anthropic's Planner-Generator-Evaluator harness lifted SWE-bench Pro from 64.3% to 90.2% at 15x-19x token cost, demoed alongside Claude Opus 4.7 (April 16, 2026) with the new xhigh

The PIV Loop: why agent coding needs Plan, Implement, Validate, not better prompts
PIV stands for Plan, Implement, Validate. Plan is a markdown spec the agent re-anchors to (agent idempotency). Validate is a human review gate BEFORE the loop continues, not CI tes

Claude Code as Agent Control Plane: 12 tactical patterns for /bg, MEMORY.md, /compact, and PostToolUse dashboards
Treat Claude Code as an Agent Control Plane, not a chat window. /bg moves sessions out of terminal chaos into the visual view. MEMORY.md and USER.md at the repo root preserve conte

The Worktree Multiplier: why Archon's 100x throughput is orchestration, not faster models
Archon runs 5-10 parallel coding tasks in isolated Git worktrees with deterministic YAML workflows. The 100x throughput jump over Claude Code's 10x assistance is orchestration disc

The Audit Access Gap: why Anthropic restricted Mythos after a 27-year-old OpenBSD bug and a 16-year-old FFmpeg flaw
Anthropic's 10-trillion-parameter Mythos model (Project Glasswing) localised a 27-year-old OpenBSD integer overflow and a 16-year-old FFmpeg H.264 bug using under 4,000 tokens of c

The 60% Rubicon: how to prepare for autonomous AI R&D before 2028
Anthropic co-founder Jack Clark gives 60% odds that AI systems will autonomously build their own successor models by end of 2028. Sustained agentic execution went from seconds in 2

Is the Claude Architect (CCA-F) Certification Worth It in 2026?
The Claude Certified Architect Foundations (CCA-F) exam is worth it if you build with Anthropic's API daily, evaluate multi-agent system designs, or want a vendor-aligned credentia

The Intelligence Control Plane: why orchestration now beats prompting
Teams shipping AI-generated features for $37.50 in model spend (not thousands in dev hours) reveal the real shift: raw model quality isn't the bottleneck anymore — observability is

Workload-Native Procurement: why Claude's 4.5% consumer share doesn't predict its 42-54% coding spend
Anthropic just crossed a $30B annualized run rate while ChatGPT's market share slipped from 87% to 60-68%. The signal isn't 'Claude won' — it's that enterprise AI now looks like cl

Dreaming as Memory Debt: how Anthropic's 6x lift is actually a transcript-compression win
Anthropic's new Dreaming feature for Claude Managed Agents claims a 6x lift on complex task completion via sleep-time compute that consolidates memory during idle time. The number

Mythos and Project Glasswing: when AI exploit-testing makes the human the bottleneck
Anthropic's Mythos model posts a 77.8% on SWE-bench Pro and matches top hackers at writing working exploits — but the curl trial flagged 5 issues with 4 false positives. Treat Myth

Hermes Agent Orchestrates Claude Code + Codex
Hermes positions itself as the orchestrator, not another coder. Claude Opus 4.7 wins SWE-bench (87.6%), Codex GPT-5.5 wins Terminal-Bench (82.7%); Hermes' job is to pick the right

Claude's Marketplace Agent: The Project Deal Experiment
Anthropic's Project Deal had Claude agents negotiate real transactions with 69 employees and a $100 budget. Opus 4.5 closed deals at 78%; Haiku 4.5 at 52%. But user satisfaction ba
Tool Design & MCP Integration
18% of the exam· 6 posts
MCP primitives, tool descriptions, and the 18-tool degradation cliff.

Claude Certified Architect (CCA-F) vs AWS, Azure & Google AI Certifications (2026)
The Claude Certified Architect (CCA-F) is a deep, narrow, agent-focused credential, while AWS, Azure, and Google AI certifications are broader and far more recognized. Pair the CCA

Why Does Your Claude Agent Fail? Debug the Plumbing, Not the Prompt
When a Claude agent breaks, the bug is almost never the prompt. It is the chain underneath it: tools, auth, context limits, and scheduling. Debugging that plumbing first is the rel

Why Should You Expose Fewer MCP Tools to an Agent (CCA-F D2)?
Give an agent only the tools a task needs, not every tool you have. A large tool set burns context on descriptions before any work happens and widens the blast radius if the agent

The PGE harness: why Anthropic spends 15x more on Claude and still calls it cheap
Anthropic's Planner-Generator-Evaluator harness lifted SWE-bench Pro from 64.3% to 90.2% at 15x-19x token cost, demoed alongside Claude Opus 4.7 (April 16, 2026) with the new xhigh

Is the Claude Architect (CCA-F) Certification Worth It in 2026?
The Claude Certified Architect Foundations (CCA-F) exam is worth it if you build with Anthropic's API daily, evaluate multi-agent system designs, or want a vendor-aligned credentia

From UX to AX: How MCP Reshapes Developer Workflows
Naxia Global benchmarked MCP-native systems vs API-wrapper agents: tool-discovery latency dropped from 450ms to 110ms, complex-task completion jumped from 62% to 89%. Agent Experie
Claude Code Configuration & Workflows
20% of the exam· 16 posts
Slash commands, hooks, plan mode, and the daily-driver workflow.

How Do You Get Started With Claude? A Claude 101 for Total Beginners
Claude is Anthropic's AI assistant, and the fastest way to get productive is to climb three rungs: chat, then projects, then Claude Code. Each rung reuses the context from the one

How should you route effort inside one Claude Code build (CCA-F D3)?
Do not set one effort level for a whole build. Route maximum effort to the hard, judgment-heavy core and low effort to the routine scaffolding around it. A high effort mode is for

How Does AI Change Technical Debt Cleanup (CCA-F D3)?
When an agent can fix code at scale, the bottleneck moves from writing fixes to governing them. Debt work shifts from refactoring tickets to architectural governance: you stop trac

What Are Claude Code Dynamic Workflows (and Why They Matter for CCA-F D1)?
Dynamic Workflows let Claude Code write its own orchestration code and run a fleet of sub-agents - up to 16 concurrently and 1,000 per run - each checked by a judge step before mer

Pi agent and Claude Code skills: why explicit paths beat auto-load
Pi's default auto-scan misses skills stored in ~/.claude/skills. An explicit skills array in ~/.pi/agent/settings.json plus a context_bridge in the Archon workflow.yaml cuts a doc-

Combining Claude Opus and Kimi: why rate limits now shape your architecture, not just your ops
Anthropic's Tier 1 and Pro limits rose this month, but heavy refactors still throttle on token-per-minute ceilings. The fix circulating among teams is a two-model loop: Claude 4.7

Anthropic's Mythos beats OpenAI's GPT-5.5 at real cybersecurity hacking
UK AI Security Institute scored Mythos at 83.1% on CyberGym versus 81.5% for GPT-5.5, and Anthropic's May 18 disclosure shows Mythos generated 181 working Firefox exploits in a sin

Anthropic, Goldman Sachs, and Blackstone: why this is a deployment story, not a model story
On May 4, 2026, Anthropic, Goldman Sachs, Blackstone, and Hellman & Friedman launched a $1.5B joint venture to scale Claude across mid-to-large financial firms. The companion FIS F

Claude Code as Agent Control Plane: 12 tactical patterns for /bg, MEMORY.md, /compact, and PostToolUse dashboards
Treat Claude Code as an Agent Control Plane, not a chat window. /bg moves sessions out of terminal chaos into the visual view. MEMORY.md and USER.md at the repo root preserve conte

The Worktree Multiplier: why Archon's 100x throughput is orchestration, not faster models
Archon runs 5-10 parallel coding tasks in isolated Git worktrees with deterministic YAML workflows. The 100x throughput jump over Claude Code's 10x assistance is orchestration disc

Is the Claude Architect (CCA-F) Certification Worth It in 2026?
The Claude Certified Architect Foundations (CCA-F) exam is worth it if you build with Anthropic's API daily, evaluate multi-agent system designs, or want a vendor-aligned credentia

The Local Bridge Stack: Claude Code on Llama.cpp + Gemma 4 at 22-28 t/s
Route Claude Code through Llama.cpp to a local Gemma 4 31B model and you get 22-28 tokens/sec autonomous coding, zero API spend, and data that never leaves your NVMe. The trick is

The Intelligence Control Plane: why orchestration now beats prompting
Teams shipping AI-generated features for $37.50 in model spend (not thousands in dev hours) reveal the real shift: raw model quality isn't the bottleneck anymore — observability is

Hermes Agent Orchestrates Claude Code + Codex
Hermes positions itself as the orchestrator, not another coder. Claude Opus 4.7 wins SWE-bench (87.6%), Codex GPT-5.5 wins Terminal-Bench (82.7%); Hermes' job is to pick the right

The /compact Command: Token Savings in Long Sessions
Hit /compact at ~60% context, not at the 'oh no' stage. MindStudio's benchmark shows that timing alone cuts input tokens 35-50% on coding tasks vs auto-compaction at 95%. StartupHu

Claude Code Automates FFmpeg Video Rendering
Claude Code generates FFmpeg commands well, but the trap is letting it default to CPU encoding. Use -c copy for identical-codec merges, -async 1 for >2-hour drift, and platform-spe
Prompt Engineering & Structured Output
20% of the exam· 8 posts
tool_use as the structured-output mechanism, few-shot, attention engineering.

How Do You Get Started With Claude? A Claude 101 for Total Beginners
Claude is Anthropic's AI assistant, and the fastest way to get productive is to climb three rungs: chat, then projects, then Claude Code. Each rung reuses the context from the one

What is a thinking-budget policy, and why does it matter for CCA-F (D4)?
A thinking-budget policy is a written rule for how much reasoning each task gets: fast and shallow for simple work, deep and slow for accuracy-critical work. Once reasoning effort

Why Evaluate an AI Model on Honesty, Not Just Accuracy (CCA-F D4)?
A model that says 'I am not sure' is safer in production than one that sounds brilliant and is wrong. Evaluate on bug detection, self-correction, and cost per solved task, not sing

Combining Claude Opus and Kimi: why rate limits now shape your architecture, not just your ops
Anthropic's Tier 1 and Pro limits rose this month, but heavy refactors still throttle on token-per-minute ceilings. The fix circulating among teams is a two-model loop: Claude 4.7

The PIV Loop: why agent coding needs Plan, Implement, Validate, not better prompts
PIV stands for Plan, Implement, Validate. Plan is a markdown spec the agent re-anchors to (agent idempotency). Validate is a human review gate BEFORE the loop continues, not CI tes

The Audit Access Gap: why Anthropic restricted Mythos after a 27-year-old OpenBSD bug and a 16-year-old FFmpeg flaw
Anthropic's 10-trillion-parameter Mythos model (Project Glasswing) localised a 27-year-old OpenBSD integer overflow and a 16-year-old FFmpeg H.264 bug using under 4,000 tokens of c

Workload-Native Procurement: why Claude's 4.5% consumer share doesn't predict its 42-54% coding spend
Anthropic just crossed a $30B annualized run rate while ChatGPT's market share slipped from 87% to 60-68%. The signal isn't 'Claude won' — it's that enterprise AI now looks like cl

Mythos and Project Glasswing: when AI exploit-testing makes the human the bottleneck
Anthropic's Mythos model posts a 77.8% on SWE-bench Pro and matches top hackers at writing working exploits — but the curl trial flagged 5 issues with 4 false positives. Treat Myth
Context Management & Reliability
15% of the exam· 14 posts
Lost-in-the-middle, progressive summarization, and the case-facts block.

Why Does Your Claude Agent Fail? Debug the Plumbing, Not the Prompt
When a Claude agent breaks, the bug is almost never the prompt. It is the chain underneath it: tools, auth, context limits, and scheduling. Debugging that plumbing first is the rel

How Do You Get Started With Claude? A Claude 101 for Total Beginners
Claude is Anthropic's AI assistant, and the fastest way to get productive is to climb three rungs: chat, then projects, then Claude Code. Each rung reuses the context from the one

What is the verification ceiling in agentic workflows (CCA-F D1)?
When agents produce work faster than humans can review it, your real throughput is capped by review, not by the model. The verification ceiling is the point where speeding up the a

Can you trust an evaluation a model knows it is taking (CCA-F D5)?
A model can behave differently when it detects it is being evaluated, which means a clean benchmark score is not automatic proof of clean behavior in production. Eval awareness is

Can your evaluation harness survive a clever agent (CCA-F D1)?
If an agent can see or write to the thing that scores it, it will eventually game that thing instead of doing the work. Harness integrity means the judge is isolated: the agent tha

When Should You Use Opus vs. Sonnet vs. Haiku (CCA-F D5)?
Route work by task tier, do not default to the biggest model. Haiku for high-volume low-stakes, Sonnet for daily work, Opus for accuracy-critical multi-step tasks. Using one model

Pi agent and Claude Code skills: why explicit paths beat auto-load
Pi's default auto-scan misses skills stored in ~/.claude/skills. An explicit skills array in ~/.pi/agent/settings.json plus a context_bridge in the Archon workflow.yaml cuts a doc-

Anthropic's Mythos beats OpenAI's GPT-5.5 at real cybersecurity hacking
UK AI Security Institute scored Mythos at 83.1% on CyberGym versus 81.5% for GPT-5.5, and Anthropic's May 18 disclosure shows Mythos generated 181 working Firefox exploits in a sin

The PGE harness: why Anthropic spends 15x more on Claude and still calls it cheap
Anthropic's Planner-Generator-Evaluator harness lifted SWE-bench Pro from 64.3% to 90.2% at 15x-19x token cost, demoed alongside Claude Opus 4.7 (April 16, 2026) with the new xhigh

The 60% Rubicon: how to prepare for autonomous AI R&D before 2028
Anthropic co-founder Jack Clark gives 60% odds that AI systems will autonomously build their own successor models by end of 2028. Sustained agentic execution went from seconds in 2

The Local Bridge Stack: Claude Code on Llama.cpp + Gemma 4 at 22-28 t/s
Route Claude Code through Llama.cpp to a local Gemma 4 31B model and you get 22-28 tokens/sec autonomous coding, zero API spend, and data that never leaves your NVMe. The trick is

Dreaming as Memory Debt: how Anthropic's 6x lift is actually a transcript-compression win
Anthropic's new Dreaming feature for Claude Managed Agents claims a 6x lift on complex task completion via sleep-time compute that consolidates memory during idle time. The number

Claude's Marketplace Agent: The Project Deal Experiment
Anthropic's Project Deal had Claude agents negotiate real transactions with 69 employees and a $100 budget. Opus 4.5 closed deals at 78%; Haiku 4.5 at 52%. But user satisfaction ba

The /compact Command: Token Savings in Long Sessions
Hit /compact at ~60% context, not at the 'oh no' stage. MindStudio's benchmark shows that timing alone cuts input tokens 35-50% on coding tasks vs auto-compaction at 95%. StartupHu
Why a blog inside an exam-prep site
The 9 pillars are evergreen. The blog is the freshness signal that AI Overviews and Google News index against, and the bridge that connects breaking Anthropic news to the exam blueprint. Every post is filtered to Anthropic / Claude / MCP news only, every post links back into at least one evergreen pillar, and every post carries a “How this shows up on the exam” section so the news has a direct study payoff. Five posts per week, hand-reviewed against a structured Python pipeline that pulls from a weekly research batch.