The problem
What the customer needs
- One source of truth for refactoring, test generation, doc generation. Not 12 copy-pasted prompts in 12 repos.
- Risk-free exploration. A refactoring Skill must propose changes without touching the working tree.
- Reusable across repos. A Skill written for the React team should work on the Python team's repo with parameter changes only.
- Versioned upgrades. A breaking change to a Skill must NOT silently break agents on the prior version.
Why naive approaches fail
- Skills built as IDE plugins first. Other editors get a parallel implementation that drifts. The CLI never exists.
- Skills with unrestricted tool access. A test-generation Skill accidentally calls Edit on a real source file.
- Skills hardcoded to one codebase. A team has to re-author the Skill for every new repo.
- Skills versioned by edit-in-place. A v2 frontmatter change silently breaks 12 agents.
- Skills live in
.claude/skills/{team}/{name}.mdwith frontmatter (name, version, description, parameters, allowed-tools). - CLI invocation:
claude skills invoke <skill> --param key=value. IDE extensions wrap the CLI; they do not bypass it. - Exploratory Skills run inside
context: fork. The parent session is untouched. allowed-toolsis an explicit whitelist on every Skill. Edit and Bash are not on the list unless required.- Parameters are declared in frontmatter and validated by the CLI before invocation.
- Git tags pin versions:
skill-refactor@1.2.3. Callers reference the major (@1.x); registry resolves the latest patch.
The system
What each part does
5 components, each owns a concept. Click any card to drill into the underlying primitive.
Skill Definition File
.claude/skills/{team}/{name}.md
The unit of dev-tooling Skills. Markdown body holds the instructions; YAML frontmatter holds the metadata: name, version (semver), description, parameters (with types and defaults), allowed-tools (whitelist), context_mode (session or fork). Lives in version control. Reviewed via PR.
Configuration
Path: .claude/skills/{team}/{name}.md. Required frontmatter: name, version, description, parameters, allowed-tools, context_mode. Optional: deprecated, owners, requires_human_confirm.
Skill Frontmatter (Attention Engineering)
metadata routes the LLM to the right Skill
The frontmatter is read into the agent's system prompt at invocation; the LLM forward-pass uses it to decide whether the Skill fits the request. It is NOT a regex classifier. Good frontmatter (clear description, accurate when_to_use, well-typed parameters) lifts routing accuracy substantially.
Configuration
name: refactor-fn. version: 1.2.3. description: 'Rename a function and update every call site.'. when_to_use: 'When the user asks to rename a function across the repo.'. parameters: { directory: string, old_name: string, new_name: string }. allowed-tools: [Read, Grep, Glob, Edit].
context: fork Isolation
child session runs in isolation, parent untouched
When a Skill is exploratory (refactoring, test-gen, doc-gen), the CLI spawns a child session with context: fork. The child has its own conversation history and its own working tree view. Whatever the Skill explores or proposes stays in the child until the parent receives the final tool_result and decides whether to merge. Lighter than a full subagent, sufficient for one-Skill scope.
Configuration
context_mode: fork in the Skill frontmatter. CLI spawns an isolated session per invocation. Parent receives only the tool_result payload (proposed diff, generated test file, doc string). Parent decides whether to apply.
allowed-tools Whitelist
explicit, structural, deny-by-default
Every Skill declares its allowed-tools array in frontmatter. The CLI enforces the whitelist at tool_use interception: any call to a non-whitelisted tool fails with is_error: true. By default, Edit and Bash are NOT on the list. A code-gen Skill that only needs to read files lists [Read, Grep, Glob]; a refactoring Skill that needs to write changes adds Edit. Side-effect prevention is structural, not prompt-based.
Configuration
allowed-tools: [Read, Grep, Glob]. SDK-side enforcement: tool_use calls outside this list return tool_result with is_error: true. The Skill body cannot escalate its own tool list.
IDE/CLI Integration Wrapper
CLI-first; IDE is a thin shell over the CLI
The CLI is the canonical entry point. IDE extensions (VSCode, JetBrains, Neovim) shell out to the CLI rather than re-implementing Skill invocation logic. This means a Skill update in the registry propagates to every editor immediately. New editors get supported by writing a 200-line shell-out extension, not a full Skill engine.
Configuration
VSCode extension binds keybinds and context-menu items to claude skills invoke <skill> --param .... The extension's only job is to translate UI events to CLI calls and stream output back to the editor.
Data flow
Eight steps to production
Lay out the team-namespaced directory
Create .claude/skills/{team}/{name}.md per team. The directory IS the registry's source of truth. Even on day one, namespace from the start. Retrofitting a flat layout into namespaces at 50 Skills is painful.
import os
TEAMS = ["frontend", "backend", "data", "shared"]
for t in TEAMS:
os.makedirs(f".claude/skills/{t}", exist_ok=True)
with open(f".claude/skills/{t}/.gitkeep", "w") as f:
pass
print("namespace-by-team layout ready; commit and start authoring.")Author the Skill with full frontmatter
Required keys: name, version, description, when_to_use, parameters (with types and defaults), allowed-tools (explicit whitelist), context_mode (session for state-shared, fork for isolated). Body holds the prompt. The CLI validates frontmatter at parse time; invalid Skills are rejected.
# .claude/skills/frontend/refactor-component.md
SKILL_TEMPLATE = """---
name: frontend/refactor-component
version: 1.2.3
description: |
Rename a React component and update every import + usage in the repo.
Runs in context: fork so the working tree is untouched until you approve.
when_to_use: |
When the user asks to rename a React component or move it between files.
parameters:
old_name:
type: string
description: Current PascalCase component name (e.g. UserCard)
required: true
new_name:
type: string
description: Target PascalCase component name (e.g. UserProfile)
required: true
directory:
type: string
default: src/
description: Directory to scope the search (default: src/)
allowed-tools:
- Read
- Grep
- Glob
- Edit
context_mode: fork
---
You are renaming a React component across this repository.
Steps:
1. Glob {directory}/**/*.{tsx,ts,jsx,js} to find candidate files.
2. Grep for {old_name} across the matched files; collect file + line.
3. For each match, Read the file and decide whether the occurrence is the
component (the import/export/JSX-tag) or a coincidental string.
4. Edit each file. Update the filename if the file is currently {old_name}.tsx.
5. Return a structured summary: files touched, occurrences changed.
"""Spawn the Skill in context: fork
When context_mode: fork is set, the CLI runs the Skill in a child session with its own conversation history, its own tool whitelist, and its own working tree view. The parent session is untouched. The child returns a tool_result with the proposed change; the parent decides whether to apply.
from anthropic import Anthropic
import yaml
client = Anthropic()
def parse_skill(path: str) -> dict:
text = open(path).read()
if not text.startswith("---"):
raise ValueError(f"{path}: missing frontmatter")
_, fm, body = text.split("---", 2)
return {"frontmatter": yaml.safe_load(fm), "body": body.strip()}
def invoke_skill_in_fork(skill_path: str, params: dict, user_message: str) -> dict:
"""Spawn a child session for the Skill. Parent state untouched."""
skill = parse_skill(skill_path)
fm = skill["frontmatter"]
rendered_body = skill["body"]
for k, v in params.items():
rendered_body = rendered_body.replace("{" + k + "}", str(v))
child_response = client.messages.create(
model="claude-sonnet-4.5",
max_tokens=4096,
system=rendered_body,
tools=load_tools_from_whitelist(fm["allowed-tools"]),
messages=[{"role": "user", "content": user_message}],
)
return {
"skill_name": fm["name"],
"skill_version": fm["version"],
"child_stop_reason": child_response.stop_reason,
"child_content": child_response.content,
}Enforce allowed-tools at the SDK boundary
The frontmatter declares the whitelist; the CLI enforces it. Any tool_use call that targets a non-whitelisted tool fails with is_error: true. The Skill body cannot escalate its own tool list. This is structural, not prompt-based: a clever prompt cannot trick the SDK into calling Edit on a Skill that does not whitelist Edit.
from anthropic.types import Tool
ALL_TOOLS: dict[str, Tool] = {
"Read": {"name": "Read", "description": "...", "input_schema": {}},
"Edit": {"name": "Edit", "description": "...", "input_schema": {}},
"Bash": {"name": "Bash", "description": "...", "input_schema": {}},
"Grep": {"name": "Grep", "description": "...", "input_schema": {}},
"Glob": {"name": "Glob", "description": "...", "input_schema": {}},
"Write": {"name": "Write", "description": "...", "input_schema": {}},
}
KNOWN_TOOLS = set(ALL_TOOLS)
def load_tools_from_whitelist(whitelist: list[str]) -> list[Tool]:
unknown = set(whitelist) - KNOWN_TOOLS
if unknown:
raise ValueError(f"Skill frontmatter references unknown tools: {unknown}")
return [ALL_TOOLS[name] for name in whitelist]
# A refactor Skill whitelist
TOOLS_FOR_REFACTOR = load_tools_from_whitelist(["Read", "Grep", "Glob", "Edit"])
# A test-gen Skill whitelist (no Edit, no Bash)
TOOLS_FOR_TESTGEN = load_tools_from_whitelist(["Read", "Grep", "Glob"])Parameterize for cross-repo reuse
A good Skill is generic across repos. The Skill body uses {param_name} placeholders; the CLI fills them in from --param key=value arguments at invocation time. Required vs optional parameters are declared in the frontmatter; the CLI rejects invocations that miss required params before any LLM call.
import jsonschema
def validate_params(skill_fm: dict, params: dict) -> None:
schema = {"type": "object", "properties": {}, "required": []}
for name, spec in skill_fm.get("parameters", {}).items():
schema["properties"][name] = {"type": spec["type"]}
if spec.get("required"):
schema["required"].append(name)
try:
jsonschema.validate(instance=params, schema=schema)
except jsonschema.ValidationError as e:
raise ValueError(f"invalid params for skill {skill_fm['name']}: {e.message}")
# CLI: claude skills invoke <skill> --param key=value
def parse_cli_params(argv: list[str]) -> dict:
params = {}
i = 0
while i < len(argv):
if argv[i] == "--param" and i + 1 < len(argv):
k, _, v = argv[i + 1].partition("=")
params[k] = v
i += 2
else:
i += 1
return paramsBuild the IDE wrapper as a thin shell over the CLI
VSCode (or JetBrains, or Neovim) extension is the smallest possible shell over the CLI. It registers commands and keybinds, captures the developer's selection, builds a claude skills invoke shell command, runs it, and streams the output back into the editor.
# VSCode extension (TypeScript) - pseudo-Python summary of what it does
def vscode_command_refactor_component(editor_state):
"""The user invoked the 'Claude: Refactor This Component' palette item."""
selection = editor_state.get_selected_text()
workspace_root = editor_state.get_workspace_root()
cli_args = [
"claude", "skills", "invoke", "frontend/refactor-component",
"--param", f"old_name={selection}",
"--param", "new_name=AskUserViaInputBox",
"--param", f"directory={workspace_root}/src/",
]
proc = run_subprocess(cli_args, cwd=workspace_root)
stream_to_editor_panel(proc.stdout)Discover Skills via the CLI registry
claude skills list walks .claude/skills/**/*.md and ~/.claude/skills/**/*.md, parses frontmatter, and prints a discoverable table. IDE extensions call this and feed the result into command palettes.
import glob, yaml
from pathlib import Path
def list_skills(roots: list[str]) -> list[dict]:
out = []
for root in roots:
for path in glob.glob(f"{root}/**/*.md", recursive=True):
try:
text = Path(path).read_text()
_, fm, _ = text.split("---", 2)
meta = yaml.safe_load(fm)
out.append({
"name": meta["name"],
"version": meta["version"],
"description": meta["description"].strip().split("\n")[0],
"parameters": list(meta.get("parameters", {}).keys()),
"path": path,
})
except (ValueError, KeyError):
pass
return sorted(out, key=lambda s: s["name"])
def cmd_skills_list(team: str | None = None):
skills = list_skills([".claude/skills", str(Path.home() / ".claude/skills")])
if team:
skills = [s for s in skills if s["name"].startswith(f"{team}/")]
for s in skills:
print(f"{s['name']}@{s['version']:<8} {s['description']}")Version Skills via Git tags; pin majors
Each Skill carries a semver in frontmatter. Each release tags the Git history (git tag skill-refactor@1.2.3). Callers pin a major (@1.x); the registry resolves to the latest patch within that major. Edit-in-place is forbidden by PR review.
import subprocess, semver
def tag_skill_release(skill_path: str, new_version: str):
semver.VersionInfo.parse(new_version)
skill_name = parse_skill(skill_path)["frontmatter"]["name"]
tag = f"{skill_name.replace('/', '-')}@{new_version}"
update_frontmatter_version(skill_path, new_version)
subprocess.run(["git", "add", skill_path], check=True)
subprocess.run(["git", "commit", "-m", f"chore(skill): bump {skill_name} to {new_version}"], check=True)
subprocess.run(["git", "tag", "-a", tag, "-m", f"{skill_name} {new_version}"], check=True)
print(f"tagged {tag}")
def resolve_caller_pin(skill_name: str, pin: str) -> str:
if pin.endswith(".x"):
major = int(pin.split(".")[0])
tags = subprocess.check_output(
["git", "tag", "-l", f"{skill_name.replace('/', '-')}@{major}.*"],
text=True,
).strip().split("\n")
versions = [t.split("@", 1)[1] for t in tags if t]
return max(versions, key=lambda v: semver.VersionInfo.parse(v))
return pinThe four decisions
| Decision | Right answer | Wrong answer | Why |
|---|---|---|---|
| IDE-first or CLI-first? | CLI-first. IDE extensions wrap the CLI. | IDE-first. Skills built into a VSCode plugin and ported to other editors as parallel implementations. | CLI-first is portable across every editor and shell workflow. New editors get supported with a 200-line shell-out wrapper instead of a parallel Skill engine. One source of truth; one update path. |
| Skill or slash Command? | Skill if the work needs isolated exploration (context: fork) or reusable parameters. Command if the work has session-wide effects. | Use Command for everything because it is simpler. | Skills give you isolation (fork), reusable parameter schemas, and discoverability via claude skills list. Commands are inline and per-session. The wrong choice produces fragile workflows that break when copied between repos. |
| Tool access in a refactoring Skill | Explicit allowed-tools whitelist (Read, Grep, Glob, Edit). No Bash. Edit only because the Skill genuinely needs it. | Unrestricted tools. The agent will be careful. | Whitelisting is structural and SDK-enforced. Prompt-based caution is probabilistic and leaks under unusual phrasing. A test-gen Skill that does not list Edit literally cannot Edit, no matter how the prompt phrases the request. |
| Skill reusability across repos | Parameterize: directory, language, target_pattern. The Skill body uses {placeholders}. The CLI fills them in. | Hardcode paths and language. Fork the Skill per repo. | Parameterization scales linearly with use-cases. Hardcoding scales linearly with repos and produces drift. Once you have 5 forks of the same Skill, the next breaking change requires updating all 5. |
Where it breaks
Five failure pairs. Each one is one exam question. The fix is always architectural, deterministic gates, structured fields, pinned state.
The team builds Skills as a VSCode extension first. Six months later, JetBrains and Neovim users are stuck or get a parallel re-implementation that drifts. Updates ship to one editor at a time.
AP-DEVTOOLS-01CLI-first architecture. The CLI is the canonical entry point. IDE extensions are ~200-line shells over the CLI. New editors get supported with a tiny wrapper. The CLI stays the source of truth.
A test-generation Skill is granted full tool access. A clever prompt-injection in source comments tricks it into calling Edit on a real source file and overwriting the working tree.
AP-DEVTOOLS-02allowed-tools whitelist on every Skill. Explicit list. No Bash, no Edit unless the Skill genuinely needs them. SDK enforces the whitelist; the Skill body cannot escalate.
A refactor Skill hardcodes directory=src/ and language=tsx. Backend team needs the same Skill on app/ with language=py and forks the file. Now there are 5 forks across teams.
Parameterize: declare directory, language, target_pattern in frontmatter. The Skill body uses {placeholders}. The CLI substitutes them at invocation. One Skill, infinite repos.
The team has no clear criterion. Some workflows are Skills, some are Commands, the choice is ad-hoc. New developers cannot predict which to author for a new use case.
AP-DEVTOOLS-04Explicit decision tree. Skill if context: fork is needed (exploration without touching parent state) or if parameters make it reusable. Command if the work has session-wide effects.
Skills are edited in place. A v2 frontmatter change ships; 12 agents that depended on the v1 shape silently break. Nobody knows which Skill regression caused the failure.
AP-DEVTOOLS-05Git semver tagging. skill-refactor@1.2.3. Callers pin major (@1.x); the registry resolves to the latest patch. Breaking changes bump the major and ship as @2.0.0.
Cost & latency
Skill body ~500 tokens system + parameters ~50 tokens + child working tokens ~1500-3000 input + ~500 output. Sonnet 4.5 pricing.
Fork setup writes a fresh system prompt and instantiates child message history. No LLM-side cost beyond a few extra tokens.
Editor event triggers shell-out to the CLI; CLI parses Skill; spawns child session; child returns. The Claude API call dominates.
Frontmatter and body per Skill ~5-15 KB. 100 Skills with full Git history of 5 versions per Skill ~10 MB checked into the repo.
Combined Claude tokens, CLI overhead, registry lookup. At 1000 invocations per day across the team, ~$4 per day, ~$120 per month.
Ship checklist
Two passes. Build-time gates verify the code; run-time gates verify the system in production.
Build-time
- Team-namespaced layout:
.claude/skills/{team}/{name}.md↗ skills - Frontmatter schema (name, version, description, when_to_use, parameters, allowed-tools, context_mode) validated by the CLI at parse time↗ structured-outputs
- Every Skill declares
allowed-toolsexplicitly. Edit and Bash are NOT default↗ tool-calling - Exploratory Skills use
context_mode: fork↗ subagents - Parameters declared with types in frontmatter. CLI validates before invocation↗ structured-outputs
- CLI is the canonical entry point. IDE extensions are thin wrappers over the CLI↗ claude-md-hierarchy
claude skills listreturns name, version, description, when_to_use, parameters↗ evaluation- Git semver tags per release. Callers pin major↗ tool-calling
- PR review on every Skill change. Frontmatter shape changes require a major bump
- Per-Skill regression eval set. Runs in CI on every Skill PR
Run-time
- Skills directory structure: team-namespaced; PR-reviewed on every change
- Frontmatter schema validated in CI; PRs that violate the schema fail to merge
allowed-toolsaudit: lint blocks any Skill that grants Edit + Bash + Write togethercontext_mode: forktested end-to-end: parent state must be unchanged after a fork-mode invocation- Parameter validation tested with missing required, wrong type, and extra-key cases
- Git tag automation: release script bumps version, tags, pushes; PR review gates the bump
- IDE wrapper extensions kept under 300 lines; review enforces 'thin wrapper' principle
- Per-Skill regression eval suite; runs in CI on every Skill PR
Five exam-pattern questions
You are designing a Skill for TypeScript refactoring. The agent should explore changes without affecting the working directory. Which feature isolates the exploration: `context: fork` or `allowed-tools`?
context: fork. Setting context_mode: fork in the Skill frontmatter spawns a child session with its own conversation history and its own working tree view; the parent session is untouched. The child returns a tool_result with the proposed change and the parent decides whether to apply. allowed-tools is a separate axis: it restricts which tools the child can call. The two compose. For exploration that may need to write changes, you set context_mode: fork AND list Edit in allowed-tools. Tagged to AP-DEVTOOLS-04.A Skill needs parameters for `directory`, `target_pattern`, `backup_location`. How should these parameters be defined to make the Skill reusable across different codebases?
{directory}, {target_pattern}, {backup_location} placeholders that the CLI fills in from --param key=value arguments. Required parameters that are missing cause the CLI to reject the invocation before any Claude API call. Result: one generic Skill that works on any repo by passing different parameters. Tagged to AP-DEVTOOLS-03.Your IDE integration uses Skills. When should a developer use a Skill vs a slash Command?
context: fork), or (b) the work has reusable parameters that vary across invocations. Use a Command when the work has session-wide effects (persisting state into the current conversation, sharing context with subsequent Commands). Skills are versioned, parameterized, discoverable. Commands are inline and per-session.A team shares a test-gen Skill. It currently has no version identifier. How should you version the Skill?
version: 1.0.0 to the frontmatter and cut a Git tag skill-test-gen@1.0.0 on release. Callers pin a major version: claude skills invoke shared/test-gen@1.x. The registry resolves the pin to the latest patch within the major. Breaking changes bump the major to 2.0.0; existing v1.x callers continue to work; new callers explicitly opt in to v2. Tagged to AP-DEVTOOLS-05.A Skill can technically execute Bash, Edit, and Read. A developer wants to run a Skill that should NOT modify files. How do you prevent the Skill from calling Edit?
allowed-tools: [Read, Grep, Glob] in the Skill frontmatter. Edit is omitted. The CLI loads only the listed tools into the child session; any tool_use call to Edit returns tool_result with is_error: true. This is structural: the Skill body cannot escalate its own whitelist, no matter how the prompt phrases the request.Frequently asked
Can a Skill modify files?
allowed-tools includes Edit (or Write). Refactoring Skills typically grant Edit. Exploratory Skills (test-gen, code-gen, doc-gen) often deny it: they propose changes via the tool_result payload and let the parent session decide whether to apply.How does the IDE know which Skills are available?
claude skills list (which walks .claude/skills/**/*.md and ~/.claude/skills/**/*.md) and feeds the result into its command palette. The CLI is the source of truth.What is the difference between `context: fork` and a Subagent?
context: fork is lightweight isolation for a single Skill invocation: fresh messages, scoped tools, parent untouched. A full Subagent is a separate agent loop with its own task and full autonomy. Use fork for one-shot exploration. Use Subagent for delegated work that needs its own multi-turn loop.How do I version a Skill?
version: MAJOR.MINOR.PATCH in the frontmatter; git tag skill-name@1.2.3 on release. Callers pin a major (@1.x); the registry resolves to the latest patch. Breaking changes bump the major; existing callers stay on v1.x until they migrate.Can a Skill call another Skill?
allowed-tools. The composing Skill lists invoke_skill as an allowed tool. Composition enables shared building blocks. Avoid deep nesting (depth > 2): debugging multi-level Skill chains is painful.Does Skill frontmatter override the agent's decision-making?
What happens if a Skill fails?
tool_result with is_error: true and a structured error payload. The parent agent observes the error and decides: retry with different parameters, propose an alternative Skill, or escalate. Failures do NOT propagate to the parent's working tree because of context: fork.