P3.11 · D3 + D2 · Process

Agent Skills for Enterprise KM.

Think of this as the way a 5,000-person company stops re-inventing the same agent prompt fifteen times. Every team writes its own Skills. Refund handling, expense reporting, deployment runbooks. And they all live in one shared library, organised by team and version, just like a code repo. When an agent on any team needs to do something, it searches the library, finds the right Skill, checks that the user is allowed to use it (finance Skills are not for the support team), and runs it. The whole point is that knowledge gets re-used safely at enterprise scale, not copy-pasted into a hundred system prompts.

26 min build·5 components·8 concepts

An enterprise-scale Skills registry. Each Skill is a markdown file with frontmatter (name + version + description + tags + dependencies + access_level), stored in .claude/skills/{team}/{name}.md so naming collisions become structural impossibilities. The registry indexer rebuilds on every commit, the search service surfaces the right Skill from 200+, semver gates breaking changes, and a permission-aware layer enforces ACLs before invocation (support agents cannot invoke finance Skills, no matter how cleverly prompted). Empirically confirmed on the real CCA-F exam by multiple pass-takers as one of the highest-leverage beyond-guide scenarios.

38% exam weight
SourceBeyond-guide scenario · empirically witnessed on the real CCA-F exam
What do the colours mean?
Green
Official Anthropic doc or API contract
Yellow
Partial doc / inferred
Orange
Community-derived
Red
Disputed / changes frequently
Stack
Claude Code · Git · search service (full-text or embeddings)
Needs
Skills frontmatter · Git semver · file ACLs
Exam
38% of CCA-F (D3 + D2). 20% D3 · 18% D2. Highest-weight scenario on the test. Master this one and you've covered most of it.
Loop — the ACP mascot — illustrated as a calm customer-support agent at a walnut desk with headset, notebook, and a small speech-bubble holding an inbound question.
End-to-end flow38% of CCA-F (D3 + D2)
01 · Problem framing

The problem

What the customer needs

  1. One source of truth across 15 teams. No copy-pasted Skill prompts drifting in 15 different repos.
  2. Discoverable at enterprise scale. An agent on the marketing team finds the right finance Skill in seconds, not hours.
  3. Permission-aware. Finance's budget-approval Skill must be unreachable from support's agent, no matter how the support agent is prompted.

Why naive approaches fail

  1. 200+ Skills in one flat folder → collision week 1 (refund-resolver exists in support/, growth/, AND finance/, all mean different things).
  2. No semver → v2 silently breaks v1 callers when frontmatter shape changes; agents start failing silently across the org.
  3. No ACL → support agent invokes finance/budget-approval because the Skill description sounded relevant; policy violation at scale.
Definition of done
  • Naming collision rate = 0 (team namespace prefix enforced)
  • Breaking-change incidents = 0 (semver in frontmatter, callers pin major)
  • Cross-team unauthorized invocation rate = 0 (ACL check before execution)
  • Skill-discovery p95 latency < 200ms (embeddings or full-text index)
  • Reindex SLA < 60s from commit to searchable
02 · Architecture

The system

03 · Component detail

What each part does

5 components, each owns a concept. Click any card to drill into the underlying primitive.

Skill Definition File

.claude/skills/{team}/{name}.md

The unit of enterprise knowledge. Markdown body holds the instructions; YAML frontmatter holds the metadata the registry indexes (name, version, description, tags, depends_on, access_level). Lives in version control next to code, reviewed via PR like any other team artifact.

Configuration

Path convention: .claude/skills/{team}/{name}.md. Frontmatter required: name, version (semver), description, tags, depends_on, access_level. Body: the actual prompt + examples. Reviewed in PRs.

Concept: skills

Shared Registry & Indexer

rebuilds on every commit

A CI job that walks .claude/skills/**/*.md, parses frontmatter, validates schema, builds a searchable index, and publishes it to the registry service. Idempotent. Fast (sub-minute on 500 Skills). When a Skill commit lands, the index is fresh within 60s and the new version is discoverable.

Configuration

Triggered on push to main. Steps: glob skills, parse YAML, validate (semver, ACL, deps exist), upload index to registry. Reindex SLA: <60s. Failed parses fail the CI; bad Skills never reach the registry.

Concept: structured-outputs

Search Service

embeddings-based at scale

Indexes Skill descriptions + tags + frontmatter. Agents query in natural language ('find a Skill for processing customer refunds') and get the top-k matches with their metadata. Full-text works at <50 Skills; embeddings (OpenAI / Voyage) become essential past 100; org-wide deployments use a hybrid (embeddings for recall, full-text for precision).

Configuration

POST /search { query, k=5, filters: { team?, access_level?, tag? } } → [{slug, version, description, score}]. Latency p95 < 200ms. Cache embeddings keyed by (skill_slug, content_hash); recompute only on content change.

Concept: context-window

Git-Based Versioning

semver in frontmatter + Git tags

Every Skill carries a semver version in its frontmatter; every release tags Git so older versions stay reachable. Callers pin a MAJOR version (refund-resolver:v1.x); the registry serves the latest patch within that major. Breaking changes bump the major; old callers keep working until they migrate.

Configuration

Frontmatter: version: 1.2.3. Caller: depends_on: ['support/refund-resolver:1.x']. Registry resolves to latest patch within pinned major. Deprecated versions stay queryable for 6 months before archive.

Concept: tool-calling

Access Control Layer

permission-aware invocation

Sits between Skill discovery and Skill execution. Reads the calling agent's role + the Skill's access_level (public | team | role-restricted | sensitive). Denies invocation when the agent's role isn't in the allowlist. Returns a structured permission-denied error. The agent observes it and can request access via the org's standard flow, not bypass it.

Configuration

Pre-invocation: { agent_role, skill_acl } → { allowed: bool, reason }. ACL stored in frontmatter access_level + team-level org config. Denied: structured error { code: 'ACL_DENIED', skill, reason, request_url }.

Concept: evaluation
04 · One concrete run

Data flow

05 · Build it

Eight steps to production

01

Lay out the team-namespaced directory

Create .claude/skills/{team}/{name}.md per team. Even on day one with 5 Skills, namespace from the start. Retrofitting a flat layout into namespaces at 100 Skills is painful. The directory IS the registry's source of truth.

Lay out the team-namespaced directory
# Repository layout
# .claude/
# └── skills/
#     ├── support/
#     │   ├── refund-resolver.md       # support/refund-resolver
#     │   └── escalation-router.md     # support/escalation-router
#     ├── platform/
#     │   ├── deploy-runbook.md
#     │   └── incident-triage.md
#     ├── data/
#     │   ├── query-builder.md
#     │   └── pii-redactor.md
#     └── finance/
#         └── budget-approval.md       # access_level: sensitive

# Bootstrap script for a fresh repo
import os
TEAMS = ["support", "platform", "data", "growth", "finance"]
for t in TEAMS:
    os.makedirs(f".claude/skills/{t}", exist_ok=True)
    with open(f".claude/skills/{t}/.gitkeep", "w") as f:
        pass
print("namespace-by-team layout ready; commit and start authoring.")
↪ Concept: skills
02

Define the Skill frontmatter schema

Every Skill carries the same YAML frontmatter shape, validated by the indexer. Required: name, version (semver), description, tags, access_level. Optional: depends_on, deprecated, owners. Schema lives in the repo so PRs that break it fail CI before merging.

Define the Skill frontmatter schema
# .claude/skills/_schema.yaml. The frontmatter contract
# Validated by the indexer; PRs that violate this schema fail CI.

required:
  - name              # team/skill-name (e.g. support/refund-resolver)
  - version           # semver: MAJOR.MINOR.PATCH
  - description       # 1-2 sentence description, search-indexed
  - tags              # array, search-indexed
  - access_level      # public | team | role-restricted | sensitive

optional:
  - depends_on        # ['support/case-facts:1.x', ...]
  - deprecated        # 'use support/refund-resolver-v2 instead'
  - owners            # ['@support-team', '@jane.doe']

# Example skill. Support/refund-resolver.md
---
name: support/refund-resolver
version: 1.2.3
description: |
  Resolves customer refund requests up to $500 using the case-facts
  block and escalation queue. For amounts above cap, escalates.
tags: [refund, customer-support, payment]
access_level: team
depends_on:
  - support/case-facts:1.x
  - shared/escalation-queue:2.x
owners:
  - "@support-team"
---

# Body: the actual instructions and examples ...
↪ Concept: structured-outputs
03

Build the registry indexer

A CI job walks .claude/skills/**/*.md, parses each Skill's frontmatter, validates the schema, resolves dependencies, and writes a searchable index. Runs on every push to main; reindex SLA <60s on 500 Skills. Bad Skills (broken schema, missing dep, semver violation) fail the CI. They never reach the registry.

Build the registry indexer
# scripts/index_skills.py. Runs in CI on push to main
import yaml, json, glob, sys, hashlib, semver
from pathlib import Path

REQUIRED = {"name", "version", "description", "tags", "access_level"}
ACCESS_LEVELS = {"public", "team", "role-restricted", "sensitive"}

def parse(path: Path) -> dict:
    text = path.read_text()
    if not text.startswith("---"):
        raise ValueError(f"{path}: missing frontmatter")
    _, fm, body = text.split("---", 2)
    meta = yaml.safe_load(fm)
    missing = REQUIRED - set(meta)
    if missing:
        raise ValueError(f"{path}: missing keys: {missing}")
    if meta["access_level"] not in ACCESS_LEVELS:
        raise ValueError(f"{path}: bad access_level: {meta['access_level']}")
    semver.VersionInfo.parse(meta["version"])  # raises if invalid
    meta["body_hash"] = hashlib.sha256(body.encode()).hexdigest()[:12]
    meta["path"] = str(path)
    return meta

def build_index() -> list[dict]:
    skills = [parse(Path(p)) for p in glob.glob(".claude/skills/**/*.md", recursive=True)]
    # Resolve dependencies. Every depends_on must exist
    names = {s["name"] for s in skills}
    for s in skills:
        for dep in s.get("depends_on", []):
            dep_name = dep.split(":")[0]
            if dep_name not in names:
                raise ValueError(f"{s['name']}: missing dep {dep_name}")
    return skills

if __name__ == "__main__":
    try:
        index = build_index()
        Path("dist/skill-registry.json").write_text(json.dumps(index, indent=2))
        print(f"indexed {len(index)} skills; pushed to registry")
    except ValueError as e:
        print(f"::error::{e}", file=sys.stderr)
        sys.exit(1)
↪ Concept: structured-outputs
04

Add semantic search over the registry

At <50 Skills, full-text on description+tags is enough. Past 100, agents need to discover by intent rather than keyword (a Skill that handles customer refunds should match refund-resolver even without the word 'refund' in the query). Embeddings + vector index over Skill description+tags is the play; cache embeddings keyed by body_hash so re-embedding only fires on content change.

Add semantic search over the registry
# scripts/search_service.py
from anthropic import Anthropic
import numpy as np, json
from pathlib import Path

# Index loaded from registry
SKILLS = json.loads(Path("dist/skill-registry.json").read_text())

# Embedding cache keyed by (skill name, body_hash)
_emb_cache: dict[tuple[str, str], list[float]] = {}

def embed_text(text: str) -> list[float]:
    """Stand-in for any embeddings provider (Voyage, OpenAI, etc.)."""
    # In production, batch-embed once at index time and cache:
    # client.embed(text, model="voyage-2-large")
    raise NotImplementedError

def index_skill(s: dict):
    key = (s["name"], s["body_hash"])
    if key not in _emb_cache:
        text = f"{s['description']} {' '.join(s['tags'])}"
        _emb_cache[key] = embed_text(text)

def cosine(a, b):
    a, b = np.array(a), np.array(b)
    return float(a @ b / (np.linalg.norm(a) * np.linalg.norm(b)))

def search(query: str, k: int = 5,
           team: str | None = None,
           access_level: str | None = None) -> list[dict]:
    qv = embed_text(query)
    scored = []
    for s in SKILLS:
        if team and not s["name"].startswith(f"{team}/"):
            continue
        if access_level and s["access_level"] != access_level:
            continue
        index_skill(s)
        sim = cosine(qv, _emb_cache[(s["name"], s["body_hash"])])
        scored.append((sim, s))
    scored.sort(key=lambda x: -x[0])
    return [
        {**s, "score": round(score, 3)}
        for score, s in scored[:k]
    ]

# Example
# results = search("handle a customer refund up to $500", k=3, team="support")
↪ Concept: context-window
05

Pin versions on every dependency edge

Every depends_on in a Skill's frontmatter pins a MAJOR version (support/case-facts:1.x), not a fixed PATCH. The registry resolves to the latest PATCH within the pinned major. When case-facts ships a breaking change, it bumps to v2. Old callers continue against v1.x; new callers opt in to v2 explicitly. This is exactly how pip / npm work, applied to Skills.

Pin versions on every dependency edge
# scripts/resolve_deps.py. Given a Skill, resolve its depends_on graph
import json, semver
from pathlib import Path

SKILLS = json.loads(Path("dist/skill-registry.json").read_text())
INDEX = {s["name"]: [] for s in SKILLS}
for s in SKILLS:
    INDEX[s["name"]].append(s)
for name in INDEX:
    INDEX[name].sort(key=lambda s: semver.VersionInfo.parse(s["version"]))

def resolve(spec: str) -> dict:
    """spec: 'team/skill:1.x' or 'team/skill:>=2.0.0 <3.0.0'."""
    name, _, constraint = spec.partition(":")
    versions = INDEX.get(name, [])
    if not versions:
        raise LookupError(f"unknown skill: {name}")

    if constraint.endswith(".x"):
        major = int(constraint.split(".")[0])
        candidates = [
            v for v in versions
            if semver.VersionInfo.parse(v["version"]).major == major
        ]
    else:
        candidates = [
            v for v in versions
            if semver.match(v["version"], constraint)
        ]

    if not candidates:
        raise LookupError(f"{name}: no version satisfies {constraint}")
    return candidates[-1]  # latest matching

def topo_resolve(skill_spec: str, seen: set | None = None) -> list[dict]:
    """Resolve full dep graph in topological order."""
    seen = seen or set()
    skill = resolve(skill_spec)
    if skill["name"] in seen:
        return []
    seen.add(skill["name"])
    out = []
    for dep_spec in skill.get("depends_on", []):
        out.extend(topo_resolve(dep_spec, seen))
    out.append(skill)
    return out
↪ Concept: tool-calling
06

Enforce ACLs before invocation

Permission-aware RAG isn't built into Claude. You implement it. Read the calling agent's role + the Skill's access_level, run a hard check before invoking, and return a structured error on deny. This is a deterministic gate, not a prompt-language constraint; the Skill's body never executes if ACL fails.

Enforce ACLs before invocation
# scripts/acl_gate.py
from typing import TypedDict, Literal

class Skill(TypedDict):
    name: str
    access_level: Literal["public", "team", "role-restricted", "sensitive"]
    owners: list[str]

class AgentContext(TypedDict):
    role: str           # e.g. 'support-agent', 'finance-agent'
    teams: list[str]    # ['support', 'shared']
    elevated: bool      # has the user explicitly elevated to invoke sensitive skills?

ROLE_ACL = {
    # Each access_level → which roles may invoke
    "public": lambda ctx, s: True,
    "team": lambda ctx, s: any(s["name"].startswith(f"{t}/") for t in ctx["teams"]),
    "role-restricted": lambda ctx, s: ctx["role"] in s.get("allowed_roles", []),
    "sensitive": lambda ctx, s: ctx["elevated"] and any(
        s["name"].startswith(f"{t}/") for t in ctx["teams"]
    ),
}

def check(ctx: AgentContext, skill: Skill) -> dict:
    """Returns {allowed, reason, request_url?}."""
    rule = ROLE_ACL[skill["access_level"]]
    if rule(ctx, skill):
        return {"allowed": True, "reason": "access_granted"}
    return {
        "allowed": False,
        "reason": f"agent role={ctx['role']} cannot invoke {skill['name']} (access_level={skill['access_level']})",
        "request_url": f"https://internal.example.com/skills/request-access?skill={skill['name']}",
    }

# Usage in the agent loop
def invoke_skill(ctx: AgentContext, skill_spec: str, payload: dict):
    from resolve_deps import resolve
    skill = resolve(skill_spec)
    decision = check(ctx, skill)
    if not decision["allowed"]:
        return {"error": "ACL_DENIED", **decision}
    # ...actually invoke the Skill body...
↪ Concept: evaluation
07

Wire the agent's Skill discovery into its tool loop

Expose two tools to every agent: search_skills(query, filters) and invoke_skill(name, version, payload). The agent finds Skills by intent, the ACL gate runs inside invoke_skill, and the Skill body executes only on allow. The agent never sees the registry's raw 200+ entries. Just the top-k matches for its query, gated by access_level.

Wire the agent's Skill discovery into its tool loop
# Skills are exposed as two tools to every agent
TOOLS = [
    {
        "name": "search_skills",
        "description": (
            "Find a Skill in the enterprise registry by natural-language query. "
            "Returns up to k matches with name, version, description, score. "
            "Use BEFORE invoke_skill so you have a name+version to invoke."
        ),
        "input_schema": {
            "type": "object",
            "properties": {
                "query": {"type": "string"},
                "k": {"type": "integer", "default": 5},
                "team": {"type": "string"},  # optional filter
                "access_level": {"type": "string"},  # optional filter
            },
            "required": ["query"],
        },
    },
    {
        "name": "invoke_skill",
        "description": (
            "Invoke a Skill from the registry. ACL is checked before the "
            "Skill body executes; if denied, returns ACL_DENIED with a "
            "request_url for access. Always pin a major version (e.g. 1.x)."
        ),
        "input_schema": {
            "type": "object",
            "properties": {
                "name": {"type": "string"},
                "version_constraint": {"type": "string", "default": "*"},
                "payload": {"type": "object"},
            },
            "required": ["name", "payload"],
        },
    },
]
↪ Concept: tool-calling
08

Track usage + deprecation lifecycle

Once Skills are in production, the registry needs to know which Skills are hot, which are stale, which have known broken versions. Log every invoke_skill call with name, version, agent role, outcome. Surface a deprecation notice in search_skills results when an old version is queried. Auto-archive Skills with zero invocations in 6 months.

Track usage + deprecation lifecycle
# scripts/usage_tracker.py
from datetime import datetime, timedelta
from collections import Counter
import json
from pathlib import Path

# Append-only log of every invoke_skill call
def log_invocation(name: str, version: str, agent_role: str, outcome: str):
    record = {
        "ts": datetime.utcnow().isoformat() + "Z",
        "name": name,
        "version": version,
        "agent_role": agent_role,
        "outcome": outcome,  # 'success' | 'acl_denied' | 'error'
    }
    with open("logs/skill-invocations.jsonl", "a") as f:
        f.write(json.dumps(record) + "\n")

# Nightly: surface deprecation candidates + hot Skills
def nightly_report():
    cutoff = datetime.utcnow() - timedelta(days=180)
    invocations = [
        json.loads(line)
        for line in Path("logs/skill-invocations.jsonl").read_text().splitlines()
    ]
    recent = [r for r in invocations if datetime.fromisoformat(r["ts"][:-1]) > cutoff]
    hot = Counter((r["name"], r["version"]) for r in recent).most_common(20)
    invoked_names = {r["name"] for r in recent}
    all_names = {s["name"] for s in json.loads(Path("dist/skill-registry.json").read_text())}
    cold = sorted(all_names - invoked_names)

    print("=== Hot Skills (last 180d) ===")
    for (name, ver), count in hot:
        print(f"  {name}:{ver}  {count}")
    print(f"\n=== Cold Skills (deprecation candidates) ({len(cold)}) ===")
    for name in cold:
        print(f"  {name}")
↪ Concept: evaluation
06 · Configuration decisions

The four decisions

DecisionRight answerWrong answerWhy
Org has 200+ Skills across 15 teamsTeam-namespaced directory ({team}/{name}) + shared registry + embeddings search + ACL layerFlat folder, full-text search, no ACL ('we'll add permissions later')Naming collisions, version drift, and cross-team ACL violations all become structural impossibilities at the directory + frontmatter level. Retrofitting them at 200 Skills costs an order of magnitude more than starting clean.
Skill `case-facts` is shipping a breaking changeBump major (v2.0.0); existing callers stay on v1.x until they migrate; deprecation notice in v1's frontmatterEdit v1 in place; tell teams to update their callersSemver + Git tags let old callers keep working while new callers opt into v2 deliberately. Editing in place breaks every agent in the org silently. A class of incident that's painful to debug because the symptoms surface in agent loops, not in the Skill itself.
Support agent's prompt suggests calling `finance/budget-approval`ACL gate denies pre-execution; structured ACL_DENIED error with request_url returned to the agentTrust the prompt; finance Skills not in support agent's tool listPrompt-only restriction leaks under prompt injection or clever phrasing. A deterministic ACL gate that runs before the Skill body executes is the only real boundary. Tool-list restriction is the second layer; ACL is the first.
Agent needs to find a Skill but doesn't know its exact namesearch_skills(query). Embeddings/full-text returns top-k matches with metadataShow all 200+ Skills in the agent's tool list200 tools in a single agent's tool list destroys routing accuracy (per Scenario P3.1's tool-count rule). Search-by-intent surfaces only the top-k relevant matches; the agent picks one and invokes it. Two tools (search + invoke) cover the whole space.
07 · Failure modes

Where it breaks

Five failure pairs. Each one is one exam question. The fix is always architectural, deterministic gates, structured fields, pinned state.

Unbounded skill count in flat layout

200+ Skills in .claude/skills/ flat folder. Naming collisions appear in week 1 (refund-resolver exists in support, growth, and finance contexts, all meaning different things). Discovery becomes a grep contest.

AP-20
✅ Fix

Team-namespaced layout: .claude/skills/{team}/{name}.md. Collisions become structurally impossible. support/refund-resolver and growth/refund-resolver are distinct paths. Past 50 Skills, add an embeddings-based search service.

No versioning

Skill case-facts ships a breaking change (frontmatter shape changes). Every agent in the org that depends on it starts failing silently. No way to roll back a single Skill's update.

AP-21
✅ Fix

Semver in frontmatter (version: 1.2.3) + Git tags. Callers pin major (case-facts:1.x); registry resolves to latest patch. Breaking changes bump major, callers migrate deliberately.

Naming collisions across teams

Two teams independently author a refund-resolver Skill. Both end up in .claude/skills/refund-resolver.md (last commit wins). Agents call the wrong one; nobody notices for weeks.

AP-22
✅ Fix

Team namespace prefix: support/refund-resolver vs growth/refund-resolver. The directory layout enforces uniqueness; the indexer rejects duplicates. PR review surfaces collisions before merge.

No access control

Support agent's prompt is cleverly engineered (or injected via PR content) to invoke finance/budget-approval. The Skill executes; an unauthorized $50K refund is approved. Audit log shows the agent did it; ACL log shows nothing because there is no ACL.

AP-23
✅ Fix

ACL gate (access_level: public | team | role-restricted | sensitive) on every Skill, checked pre-invocation. Denied calls return a structured ACL_DENIED error; the agent observes it and either escalates or routes differently. Deterministic, not prompt-based.

Skills as one-off prompts

Each agent's system prompt copy-pastes the relevant Skill content inline. When the Skill changes, 12 agents need updating. Nobody updates them all; behavior drifts over months.

AP-24
✅ Fix

Skills are reusable, composable, versioned units. Agents reference them via invoke_skill('support/refund-resolver:1.x', payload). One source of truth; one Skill update propagates to every caller automatically.

08 · Budget

Cost & latency

Skill execution (avg 800 tokens)
~$0.0024 per invocation

Skill body ~500 tokens system + ~200 input + ~100 output. Sonnet 4.5 pricing. Most Skills are narrow, focused units. No inflation from generic prompt scaffolding.

Search service (embeddings)
~$0.0001 per query

Voyage / OpenAI embedding ~512 dims at fractional cost per query. Embeddings cached by body_hash so re-embedding only fires on content change. At 1M queries/month, ~$100.

Reindex CI job (per push to main)
~$0 (compute) + ~$0.01 (embedding refresh)

Indexer is pure parsing on GitHub Actions free tier. Only cost is re-embedding Skills with changed content. Typically <5% of the registry per push.

ACL check overhead
~+0.01ms per invocation, ~0% token cost

ACL is a deterministic dictionary lookup against frontmatter + agent role. No LLM call. Latency is unmeasurable in the pipeline; cost is in maintenance, not execution.

Annual registry hosting (5K Skills, 20K queries/day)
~$3K-8K/year

Embeddings store + search service + reindex compute. Small relative to the per-invocation Skill execution cost which dominates total spend at scale.

09 · Ship gates

Ship checklist

Two passes. Build-time gates verify the code; run-time gates verify the system in production.

Build-time

  1. Team-namespaced directory layout: .claude/skills/{team}/{name}.mdskills
  2. Frontmatter schema documented and validated by the indexer (name, version, description, tags, access_level required)structured-outputs
  3. Semver enforced. Every Skill has a valid semver in frontmattertool-calling
  4. CI indexer runs on push to main; reindex SLA <60s on 500 Skills
  5. Search service deployed with p95 <200ms; embeddings cached by body_hashcontext-window
  6. Two tools exposed to every agent: search_skills + invoke_skilltool-calling
  7. ACL gate runs PRE-invocation; denied calls return structured ACL_DENIEDevaluation
  8. Dependency resolution: callers pin major; registry resolves to latest patch
  9. Deprecation lifecycle: zero-invocation Skills auto-flagged at 180d
  10. Usage log appended on every invoke_skill call (jsonl)
  11. PR review on every Skill change. Including the frontmatter shape

Run-time

  • All Skills have valid frontmatter (CI fails the merge if not)
  • Indexer reindex SLA <60s on the live registry
  • Search p95 latency <200ms under steady-state load
  • ACL gate unit-tested per access_level (public, team, role-restricted, sensitive)
  • Dep resolver tested with cycle, missing-dep, and major-bump scenarios
  • Usage log persisted append-only (jsonl) with retention policy documented
  • Nightly deprecation report runs; cold Skills surfaced to owners
  • Prod deploy of search service has fallback to full-text on embeddings outage
10 · Question patterns

Five exam-pattern questions

An enterprise has 200+ Skills across 15 teams. Skill-name collisions occur weekly (`refund-resolver` exists in support/, growth/, AND finance/, all meaning different things). How should you structure the registry to prevent this structurally?
Adopt a team namespace prefix convention enforced by the directory layout: .claude/skills/{team}/{name}.md, so the canonical name is support/refund-resolver vs growth/refund-resolver. Collisions become impossible at the filesystem level (different paths) and at the registry level (the indexer rejects duplicate name fields in frontmatter). Pair with PR review on every Skill change to catch deliberate naming drift before merge. Tagged to AP-22.
A Skill for customer-support refund processing is updated frequently. Last week, an in-place edit broke 12 dependent agents silently. How do you prevent this?
Semver in frontmatter plus Git tags. Every Skill carries a version: MAJOR.MINOR.PATCH; every release tags the Git history. Callers pin a major (support/refund-resolver:1.x); the registry resolves to the latest patch within that major. Breaking changes bump the major (v2.0.0); existing callers continue against v1.x until they migrate deliberately. The in-place edit becomes structurally impossible. The indexer rejects two Skills with the same name and version. Tagged to AP-21.
Finance team has sensitive Skills (e.g. `budget-approval`). The support team's agent must NEVER invoke them, no matter how cleverly prompted (or prompt-injected). How do you enforce this architecturally?
Every Skill carries an access_level in frontmatter (public | team | role-restricted | sensitive). An ACL gate runs before Skill invocation: read the calling agent's role + the Skill's access_level, deny pre-execution if not allowed, return structured {error: 'ACL_DENIED', skill, reason, request_url}. The agent observes the denial and either escalates or routes differently. It cannot bypass. This is deterministic, not prompt-based; cleverness in the prompt cannot defeat a hard pre-invocation check. Tagged to AP-23.
An agent on the marketing team needs to discover the right Skill from 50+ available. Searching by exact name is slow and requires the agent to already know what's there. What infrastructure should you add?
An embeddings-based search service keyed over description + tags with optional filters by team and access_level. The agent calls search_skills('process customer refund up to $500', k=5) and gets the top-5 matches with {name, version, description, score}. Pair with a full-text fallback for exact-keyword queries. Cache embeddings keyed by body_hash so re-embedding only fires on content change. p95 query latency stays <200ms even at 5,000 Skills.
A Skill captures enterprise knowledge (policies, procedures) for support. Should it be a single 500-line markdown file or modular across multiple files with `depends_on`?
Modular composition via depends_on. The Skill's frontmatter declares its dependencies (depends_on: ['support/case-facts:1.x', 'shared/escalation-queue:2.x']); the registry resolves them topologically at invocation time. Benefits: each unit is independently versioned (case-facts evolves separately from escalation-queue), reusable across multiple parent Skills, and easier to PR-review (smaller files). The dep resolver enforces no cycles and that every referenced version exists.
11 · FAQ

Frequently asked

What's the maximum number of Skills per organization?
Unbounded with the right infrastructure. Per-project (a single agent's working set), keep <12 for discoverability. Per-team, low hundreds is comfortable with a search service. Org-wide, thousands work with embeddings + namespaces + ACLs. The bottleneck is rarely raw Skill count. It's how the agent finds the right one and how the org governs change.
Can a Skill depend on other Skills?
Yes, declared in frontmatter. depends_on: ['support/case-facts:1.x', 'shared/escalation-queue:2.x']. The registry validates dependencies exist at index time (CI fails on missing dep) and resolves them topologically at invocation time. Avoid cycles. The dep resolver detects them and rejects.
How do you version Skills without breaking existing agents?
Semver in frontmatter + callers pin major. A Skill at v1.2.3 keeps backward compatibility for all v1.x callers. When a breaking change is needed, bump to v2.0.0; existing callers continue against v1.x until they migrate deliberately. Deprecation notices in the v1 frontmatter point to v2; the registry surfaces the warning in search_skills results.
Is permission-aware RAG built into Claude?
No. You implement it. Claude's tool layer doesn't know about your org's roles. Implement an ACL gate that runs pre-invocation: read agent role + Skill access_level, deny if not allowed, return structured ACL_DENIED. The Skill body never executes if the ACL check fails. This is the same pattern as authorization middleware in any HTTP service. Deterministic, not LLM-judged.
Should sensitive Skills be versioned differently?
No. Same versioning, different access control. Versioning is about backward compatibility; access control is about who can invoke. They're orthogonal. A sensitive Skill ships v1.2.3 just like a public one; the ACL gate gates who can call it, regardless of version.
How do you find the right Skill from 200+?
Two tools, one query. First, search_skills(query, k=5, filters). Embeddings search returns top-k matches by intent. Second, invoke_skill(name, version, payload). Runs the chosen Skill with ACL check. The agent never sees raw access to the registry; it queries through the search tool. This keeps the agent's tool list small (just 2 tools) while exposing the entire Skills library.
What happens to old Skill versions when a new major ships?
They stay queryable for 6 months by default. The deprecation lifecycle: ship v2.0.0 → mark v1.x with a deprecation note in frontmatter → registry serves v1.x to existing callers but flags the deprecation in search results → after 6 months of zero invocations, auto-archive. Active Skills stay forever; truly cold ones get cleaned up.
P3.11 · D3 · Agent Operations

Agent Skills for Enterprise KM, complete.

You've covered the full ten-section breakdown for this primitive, definition, mechanics, code, false positives, comparison, decision tree, exam patterns, and FAQ. One technical primitive down on the path to CCA-F.

Share your win →