# Agent Skills for Developer Tooling

> A CLI-first Skills architecture for developer tooling. Each Skill is a markdown file with frontmatter (name, version, parameters, allowed-tools); the CLI invokes them via claude skills invoke <skill> --param key=value; risky exploration runs inside context: fork so the working tree is untouched; the allowed-tools whitelist denies Edit and Bash by default and grants only what the Skill needs (Read, Grep, Glob); parameterization (directory, language, target_pattern) makes Skills reusable across repos; Git semver tagging (skill-name@1.2.3) pins versions so a v2 release does not silently break v1 callers; IDE extensions are thin wrappers over the CLI, not a parallel implementation. The most-tested distractor: building Skills as a parallel IDE-only feature instead of a CLI-first primitive.

**Sub-marker:** P3.12
**Domains:** D3 · Agent Operations, D2 · Tool Design + Integration
**Exam weight:** 38% of CCA-F (D3 + D2)
**Build time:** 22 minutes
**Source:** 🟡 Beyond-guide scenario. OP-claimed (Reddit 1s34iyl). Architecture matches Anthropic public guidance.
**Canonical:** https://claudearchitectcertification.com/scenarios/agent-skills-for-developer-tooling
**Last reviewed:** 2026-05-04

## In plain English

Think of this as the way you give every developer on the team a small library of pre-baked Claude workflows they can invoke from the command line. A refactoring Skill explores a function in an isolated child session and proposes the change without ever touching the working tree. A test-generation Skill reads a source file and writes a matching test file. A code-generation Skill scaffolds a new component in your team's exact style. Each Skill has a tight whitelist of tools it can use, a parameter shape that makes it reusable across repos, and a Git tag that pins which version each invocation runs against. The whole point is that developer tooling Skills work like npm packages built on top of Claude rather than fragile prompt copy-paste.

## Exam impact

Domain 3 (Claude Code Configuration, 20%) tests Skill frontmatter, `context: fork` semantics, allowed-tools whitelist, and the Skill-vs-Command decision tree. Domain 2 (Tool Design, 18%) tests Skill parameterization and Git semver tagging. Beyond-guide but architecturally well-grounded. The 'why is the IDE plugin a wrapper, not the source?' question is the canonical exam distractor.

## The problem

### What the customer needs
- One source of truth for refactoring, test generation, doc generation. Not 12 copy-pasted prompts in 12 repos.
- Risk-free exploration. A refactoring Skill must propose changes without touching the working tree.
- Reusable across repos. A Skill written for the React team should work on the Python team's repo with parameter changes only.
- Versioned upgrades. A breaking change to a Skill must NOT silently break agents on the prior version.

### Why naive approaches fail
- Skills built as IDE plugins first. Other editors get a parallel implementation that drifts. The CLI never exists.
- Skills with unrestricted tool access. A test-generation Skill accidentally calls Edit on a real source file.
- Skills hardcoded to one codebase. A team has to re-author the Skill for every new repo.
- Skills versioned by edit-in-place. A v2 frontmatter change silently breaks 12 agents.

### Definition of done
- Skills live in .claude/skills/{team}/{name}.md with frontmatter (name, version, description, parameters, allowed-tools).
- CLI invocation: claude skills invoke <skill> --param key=value. IDE extensions wrap the CLI; they do not bypass it.
- Exploratory Skills run inside context: fork. The parent session is untouched.
- allowed-tools is an explicit whitelist on every Skill. Edit and Bash are not on the list unless required.
- Parameters are declared in frontmatter and validated by the CLI before invocation.
- Git tags pin versions: skill-refactor@1.2.3. Callers reference the major (@1.x); registry resolves the latest patch.

## Concepts in play

- 🟢 **Skills** (`skills`), Markdown plus frontmatter as the unit of reusable workflow
- 🟢 **Project memory** (`claude-md-hierarchy`), Skills extend project-level CLAUDE.md across teams
- 🟢 **Tool calling** (`tool-calling`), Skill invocation as a structured tool call from the CLI
- 🟢 **Subagents** (`subagents`), context: fork is a lightweight subagent for one Skill invocation
- 🟢 **Attention engineering** (`attention-engineering`), Frontmatter routes the LLM to the right Skill
- 🟢 **Evaluation** (`evaluation`), Per-Skill regression evals catch frontmatter drift
- 🟢 **Structured outputs** (`structured-outputs`), Skill parameter contract is the schema
- 🟢 **Context window** (`context-window`), context: fork keeps parent context lean

## Components

### Skill Definition File, .claude/skills/{team}/{name}.md

The unit of dev-tooling Skills. Markdown body holds the instructions; YAML frontmatter holds the metadata: name, version (semver), description, parameters (with types and defaults), allowed-tools (whitelist), context_mode (session or fork). Lives in version control. Reviewed via PR.

**Configuration:** Path: .claude/skills/{team}/{name}.md. Required frontmatter: name, version, description, parameters, allowed-tools, context_mode. Optional: deprecated, owners, requires_human_confirm.
**Concept:** `skills`

### Skill Frontmatter (Attention Engineering), metadata routes the LLM to the right Skill

The frontmatter is read into the agent's system prompt at invocation; the LLM forward-pass uses it to decide whether the Skill fits the request. It is NOT a regex classifier. Good frontmatter (clear description, accurate when_to_use, well-typed parameters) lifts routing accuracy substantially.

**Configuration:** name: refactor-fn. version: 1.2.3. description: 'Rename a function and update every call site.'. when_to_use: 'When the user asks to rename a function across the repo.'. parameters: { directory: string, old_name: string, new_name: string }. allowed-tools: [Read, Grep, Glob, Edit].
**Concept:** `attention-engineering`

### context: fork Isolation, child session runs in isolation, parent untouched

When a Skill is exploratory (refactoring, test-gen, doc-gen), the CLI spawns a child session with context: fork. The child has its own conversation history and its own working tree view. Whatever the Skill explores or proposes stays in the child until the parent receives the final tool_result and decides whether to merge. Lighter than a full subagent, sufficient for one-Skill scope.

**Configuration:** context_mode: fork in the Skill frontmatter. CLI spawns an isolated session per invocation. Parent receives only the tool_result payload (proposed diff, generated test file, doc string). Parent decides whether to apply.
**Concept:** `subagents`

### allowed-tools Whitelist, explicit, structural, deny-by-default

Every Skill declares its allowed-tools array in frontmatter. The CLI enforces the whitelist at tool_use interception: any call to a non-whitelisted tool fails with is_error: true. By default, Edit and Bash are NOT on the list. A code-gen Skill that only needs to read files lists [Read, Grep, Glob]; a refactoring Skill that needs to write changes adds Edit. Side-effect prevention is structural, not prompt-based.

**Configuration:** allowed-tools: [Read, Grep, Glob]. SDK-side enforcement: tool_use calls outside this list return tool_result with is_error: true. The Skill body cannot escalate its own tool list.
**Concept:** `tool-calling`

### IDE/CLI Integration Wrapper, CLI-first; IDE is a thin shell over the CLI

The CLI is the canonical entry point. IDE extensions (VSCode, JetBrains, Neovim) shell out to the CLI rather than re-implementing Skill invocation logic. This means a Skill update in the registry propagates to every editor immediately. New editors get supported by writing a 200-line shell-out extension, not a full Skill engine.

**Configuration:** VSCode extension binds keybinds and context-menu items to claude skills invoke <skill> --param .... The extension's only job is to translate UI events to CLI calls and stream output back to the editor.
**Concept:** `claude-md-hierarchy`

## Build steps

### 1. Lay out the team-namespaced directory

Create .claude/skills/{team}/{name}.md per team. The directory IS the registry's source of truth. Even on day one, namespace from the start. Retrofitting a flat layout into namespaces at 50 Skills is painful.

**Python:**

```python
import os
TEAMS = ["frontend", "backend", "data", "shared"]
for t in TEAMS:
    os.makedirs(f".claude/skills/{t}", exist_ok=True)
    with open(f".claude/skills/{t}/.gitkeep", "w") as f:
        pass
print("namespace-by-team layout ready; commit and start authoring.")
```

**TypeScript:**

```typescript
import { mkdirSync, writeFileSync } from "node:fs";
const teams = ["frontend", "backend", "data", "shared"];
for (const t of teams) {
  mkdirSync(`.claude/skills/${t}`, { recursive: true });
  writeFileSync(`.claude/skills/${t}/.gitkeep`, "");
}
console.log("namespace-by-team layout ready; commit and start authoring.");
```

Concept: `skills`

### 2. Author the Skill with full frontmatter

Required keys: name, version, description, when_to_use, parameters (with types and defaults), allowed-tools (explicit whitelist), context_mode (session for state-shared, fork for isolated). Body holds the prompt. The CLI validates frontmatter at parse time; invalid Skills are rejected.

**Python:**

```python
# .claude/skills/frontend/refactor-component.md
SKILL_TEMPLATE = """---
name: frontend/refactor-component
version: 1.2.3
description: |
  Rename a React component and update every import + usage in the repo.
  Runs in context: fork so the working tree is untouched until you approve.
when_to_use: |
  When the user asks to rename a React component or move it between files.
parameters:
  old_name:
    type: string
    description: Current PascalCase component name (e.g. UserCard)
    required: true
  new_name:
    type: string
    description: Target PascalCase component name (e.g. UserProfile)
    required: true
  directory:
    type: string
    default: src/
    description: Directory to scope the search (default: src/)
allowed-tools:
  - Read
  - Grep
  - Glob
  - Edit
context_mode: fork
---

You are renaming a React component across this repository.

Steps:
1. Glob {directory}/**/*.{tsx,ts,jsx,js} to find candidate files.
2. Grep for {old_name} across the matched files; collect file + line.
3. For each match, Read the file and decide whether the occurrence is the
   component (the import/export/JSX-tag) or a coincidental string.
4. Edit each file. Update the filename if the file is currently {old_name}.tsx.
5. Return a structured summary: files touched, occurrences changed.
"""
```

**TypeScript:**

```typescript
// .claude/skills/frontend/refactor-component.md
const SKILL_TEMPLATE = `---
name: frontend/refactor-component
version: 1.2.3
description: |
  Rename a React component and update every import + usage in the repo.
  Runs in context: fork so the working tree is untouched until you approve.
when_to_use: |
  When the user asks to rename a React component or move it between files.
parameters:
  old_name:
    type: string
    required: true
  new_name:
    type: string
    required: true
  directory:
    type: string
    default: src/
allowed-tools:
  - Read
  - Grep
  - Glob
  - Edit
context_mode: fork
---

You are renaming a React component across this repository.

Steps:
1. Glob {directory}/**/*.{tsx,ts,jsx,js} to find candidate files.
2. Grep for {old_name} across the matched files.
3. Decide whether each occurrence is the component or a coincidental string.
4. Edit each file. Update the filename if the file is currently {old_name}.tsx.
5. Return a structured summary: files touched, occurrences changed.
`;
```

Concept: `attention-engineering`

### 3. Spawn the Skill in context: fork

When context_mode: fork is set, the CLI runs the Skill in a child session with its own conversation history, its own tool whitelist, and its own working tree view. The parent session is untouched. The child returns a tool_result with the proposed change; the parent decides whether to apply.

**Python:**

```python
from anthropic import Anthropic
import yaml
client = Anthropic()

def parse_skill(path: str) -> dict:
    text = open(path).read()
    if not text.startswith("---"):
        raise ValueError(f"{path}: missing frontmatter")
    _, fm, body = text.split("---", 2)
    return {"frontmatter": yaml.safe_load(fm), "body": body.strip()}

def invoke_skill_in_fork(skill_path: str, params: dict, user_message: str) -> dict:
    """Spawn a child session for the Skill. Parent state untouched."""
    skill = parse_skill(skill_path)
    fm = skill["frontmatter"]

    rendered_body = skill["body"]
    for k, v in params.items():
        rendered_body = rendered_body.replace("{" + k + "}", str(v))

    child_response = client.messages.create(
        model="claude-sonnet-4.5",
        max_tokens=4096,
        system=rendered_body,
        tools=load_tools_from_whitelist(fm["allowed-tools"]),
        messages=[{"role": "user", "content": user_message}],
    )
    return {
        "skill_name": fm["name"],
        "skill_version": fm["version"],
        "child_stop_reason": child_response.stop_reason,
        "child_content": child_response.content,
    }
```

**TypeScript:**

```typescript
import Anthropic from "@anthropic-ai/sdk";
import { readFileSync } from "node:fs";
import { parse as parseYaml } from "yaml";

const client = new Anthropic();

function parseSkill(path: string) {
  const text = readFileSync(path, "utf8");
  if (!text.startsWith("---")) throw new Error(`${path}: missing frontmatter`);
  const [, fm, body] = text.split("---", 3);
  return { frontmatter: parseYaml(fm) as Record<string, unknown>, body: body.trim() };
}

export async function invokeSkillInFork(
  skillPath: string,
  params: Record<string, string>,
  userMessage: string,
) {
  const skill = parseSkill(skillPath);
  const fm = skill.frontmatter as { name: string; version: string; "allowed-tools": string[] };
  let renderedBody = skill.body;
  for (const [k, v] of Object.entries(params)) {
    renderedBody = renderedBody.replaceAll(`{${k}}`, v);
  }
  const childResponse = await client.messages.create({
    model: "claude-sonnet-4.5",
    max_tokens: 4096,
    system: renderedBody,
    tools: loadToolsFromWhitelist(fm["allowed-tools"]),
    messages: [{ role: "user", content: userMessage }],
  });
  return {
    skill_name: fm.name,
    skill_version: fm.version,
    child_stop_reason: childResponse.stop_reason,
    child_content: childResponse.content,
  };
}
```

Concept: `subagents`

### 4. Enforce allowed-tools at the SDK boundary

The frontmatter declares the whitelist; the CLI enforces it. Any tool_use call that targets a non-whitelisted tool fails with is_error: true. The Skill body cannot escalate its own tool list. This is structural, not prompt-based: a clever prompt cannot trick the SDK into calling Edit on a Skill that does not whitelist Edit.

**Python:**

```python
from anthropic.types import Tool

ALL_TOOLS: dict[str, Tool] = {
    "Read":  {"name": "Read",  "description": "...", "input_schema": {}},
    "Edit":  {"name": "Edit",  "description": "...", "input_schema": {}},
    "Bash":  {"name": "Bash",  "description": "...", "input_schema": {}},
    "Grep":  {"name": "Grep",  "description": "...", "input_schema": {}},
    "Glob":  {"name": "Glob",  "description": "...", "input_schema": {}},
    "Write": {"name": "Write", "description": "...", "input_schema": {}},
}
KNOWN_TOOLS = set(ALL_TOOLS)

def load_tools_from_whitelist(whitelist: list[str]) -> list[Tool]:
    unknown = set(whitelist) - KNOWN_TOOLS
    if unknown:
        raise ValueError(f"Skill frontmatter references unknown tools: {unknown}")
    return [ALL_TOOLS[name] for name in whitelist]

# A refactor Skill whitelist
TOOLS_FOR_REFACTOR = load_tools_from_whitelist(["Read", "Grep", "Glob", "Edit"])
# A test-gen Skill whitelist (no Edit, no Bash)
TOOLS_FOR_TESTGEN = load_tools_from_whitelist(["Read", "Grep", "Glob"])
```

**TypeScript:**

```typescript
import type Anthropic from "@anthropic-ai/sdk";

const ALL_TOOLS: Record<string, Anthropic.Tool> = {
  Read:  { name: "Read",  description: "...", input_schema: { type: "object", properties: {} } },
  Edit:  { name: "Edit",  description: "...", input_schema: { type: "object", properties: {} } },
  Bash:  { name: "Bash",  description: "...", input_schema: { type: "object", properties: {} } },
  Grep:  { name: "Grep",  description: "...", input_schema: { type: "object", properties: {} } },
  Glob:  { name: "Glob",  description: "...", input_schema: { type: "object", properties: {} } },
  Write: { name: "Write", description: "...", input_schema: { type: "object", properties: {} } },
};
const KNOWN_TOOLS = new Set(Object.keys(ALL_TOOLS));

export function loadToolsFromWhitelist(whitelist: string[]): Anthropic.Tool[] {
  const unknown = whitelist.filter((t) => !KNOWN_TOOLS.has(t));
  if (unknown.length > 0) {
    throw new Error(`Skill frontmatter references unknown tools: ${unknown.join(", ")}`);
  }
  return whitelist.map((name) => ALL_TOOLS[name]);
}

const TOOLS_FOR_REFACTOR = loadToolsFromWhitelist(["Read", "Grep", "Glob", "Edit"]);
const TOOLS_FOR_TESTGEN = loadToolsFromWhitelist(["Read", "Grep", "Glob"]);
```

Concept: `tool-calling`

### 5. Parameterize for cross-repo reuse

A good Skill is generic across repos. The Skill body uses {param_name} placeholders; the CLI fills them in from --param key=value arguments at invocation time. Required vs optional parameters are declared in the frontmatter; the CLI rejects invocations that miss required params before any LLM call.

**Python:**

```python
import jsonschema

def validate_params(skill_fm: dict, params: dict) -> None:
    schema = {"type": "object", "properties": {}, "required": []}
    for name, spec in skill_fm.get("parameters", {}).items():
        schema["properties"][name] = {"type": spec["type"]}
        if spec.get("required"):
            schema["required"].append(name)
    try:
        jsonschema.validate(instance=params, schema=schema)
    except jsonschema.ValidationError as e:
        raise ValueError(f"invalid params for skill {skill_fm['name']}: {e.message}")

# CLI: claude skills invoke <skill> --param key=value
def parse_cli_params(argv: list[str]) -> dict:
    params = {}
    i = 0
    while i < len(argv):
        if argv[i] == "--param" and i + 1 < len(argv):
            k, _, v = argv[i + 1].partition("=")
            params[k] = v
            i += 2
        else:
            i += 1
    return params
```

**TypeScript:**

```typescript
import Ajv from "ajv";
const ajv = new Ajv();

export function validateParams(
  skillFm: { name: string; parameters?: Record<string, { type: string; required?: boolean }> },
  params: Record<string, unknown>,
): void {
  const schema: Record<string, unknown> = {
    type: "object",
    properties: {} as Record<string, unknown>,
    required: [] as string[],
  };
  for (const [name, spec] of Object.entries(skillFm.parameters ?? {})) {
    (schema.properties as Record<string, unknown>)[name] = { type: spec.type };
    if (spec.required) (schema.required as string[]).push(name);
  }
  const validate = ajv.compile(schema);
  if (!validate(params)) {
    throw new Error(`invalid params for skill ${skillFm.name}: ${ajv.errorsText(validate.errors)}`);
  }
}

export function parseCliParams(argv: string[]): Record<string, string> {
  const params: Record<string, string> = {};
  for (let i = 0; i < argv.length; i++) {
    if (argv[i] === "--param" && i + 1 < argv.length) {
      const [k, ...rest] = argv[i + 1].split("=");
      params[k] = rest.join("=");
      i++;
    }
  }
  return params;
}
```

Concept: `structured-outputs`

### 6. Build the IDE wrapper as a thin shell over the CLI

VSCode (or JetBrains, or Neovim) extension is the smallest possible shell over the CLI. It registers commands and keybinds, captures the developer's selection, builds a claude skills invoke shell command, runs it, and streams the output back into the editor.

**Python:**

```python
# VSCode extension (TypeScript) - pseudo-Python summary of what it does
def vscode_command_refactor_component(editor_state):
    """The user invoked the 'Claude: Refactor This Component' palette item."""
    selection = editor_state.get_selected_text()
    workspace_root = editor_state.get_workspace_root()

    cli_args = [
        "claude", "skills", "invoke", "frontend/refactor-component",
        "--param", f"old_name={selection}",
        "--param", "new_name=AskUserViaInputBox",
        "--param", f"directory={workspace_root}/src/",
    ]
    proc = run_subprocess(cli_args, cwd=workspace_root)
    stream_to_editor_panel(proc.stdout)
```

**TypeScript:**

```typescript
import * as vscode from "vscode";
import { spawn } from "node:child_process";

export function activate(ctx: vscode.ExtensionContext) {
  const cmd = vscode.commands.registerCommand("claude.refactorComponent", async () => {
    const editor = vscode.window.activeTextEditor;
    if (!editor) return;
    const selection = editor.document.getText(editor.selection);
    const newName = await vscode.window.showInputBox({ prompt: `New name for ${selection}` });
    if (!newName) return;
    const ws = vscode.workspace.workspaceFolders?.[0]?.uri.fsPath ?? ".";

    const proc = spawn(
      "claude",
      ["skills", "invoke", "frontend/refactor-component",
       "--param", `old_name=${selection}`,
       "--param", `new_name=${newName}`,
       "--param", `directory=${ws}/src/`],
      { cwd: ws },
    );
    const out = vscode.window.createOutputChannel("Claude refactor");
    proc.stdout.on("data", (d) => out.append(d.toString()));
    proc.stderr.on("data", (d) => out.append(d.toString()));
    out.show();
  });
  ctx.subscriptions.push(cmd);
}
```

Concept: `claude-md-hierarchy`

### 7. Discover Skills via the CLI registry

claude skills list walks .claude/skills//*.md and ~/.claude/skills//*.md, parses frontmatter, and prints a discoverable table. IDE extensions call this and feed the result into command palettes.

**Python:**

```python
import glob, yaml
from pathlib import Path

def list_skills(roots: list[str]) -> list[dict]:
    out = []
    for root in roots:
        for path in glob.glob(f"{root}/**/*.md", recursive=True):
            try:
                text = Path(path).read_text()
                _, fm, _ = text.split("---", 2)
                meta = yaml.safe_load(fm)
                out.append({
                    "name": meta["name"],
                    "version": meta["version"],
                    "description": meta["description"].strip().split("\n")[0],
                    "parameters": list(meta.get("parameters", {}).keys()),
                    "path": path,
                })
            except (ValueError, KeyError):
                pass
    return sorted(out, key=lambda s: s["name"])

def cmd_skills_list(team: str | None = None):
    skills = list_skills([".claude/skills", str(Path.home() / ".claude/skills")])
    if team:
        skills = [s for s in skills if s["name"].startswith(f"{team}/")]
    for s in skills:
        print(f"{s['name']}@{s['version']:<8}  {s['description']}")
```

**TypeScript:**

```typescript
import { glob } from "glob";
import { parse as parseYaml } from "yaml";
import { readFileSync } from "node:fs";
import { homedir } from "node:os";
import { join } from "node:path";

interface SkillSummary {
  name: string;
  version: string;
  description: string;
  parameters: string[];
  path: string;
}

export async function listSkills(roots: string[]): Promise<SkillSummary[]> {
  const out: SkillSummary[] = [];
  for (const root of roots) {
    const paths = await glob(`${root}/**/*.md`);
    for (const path of paths) {
      try {
        const text = readFileSync(path, "utf8");
        const [, fm] = text.split("---", 3);
        const meta = parseYaml(fm) as { name: string; version: string; description: string; parameters?: Record<string, unknown> };
        out.push({
          name: meta.name,
          version: meta.version,
          description: meta.description.trim().split("\n")[0],
          parameters: Object.keys(meta.parameters ?? {}),
          path,
        });
      } catch { /* skip malformed */ }
    }
  }
  return out.sort((a, b) => a.name.localeCompare(b.name));
}

export async function cmdSkillsList(opts: { team?: string } = {}) {
  let skills = await listSkills([".claude/skills", join(homedir(), ".claude/skills")]);
  if (opts.team) skills = skills.filter((s) => s.name.startsWith(`${opts.team}/`));
  for (const s of skills) {
    console.log(`${s.name}@${s.version.padEnd(8)}  ${s.description}`);
  }
}
```

Concept: `evaluation`

### 8. Version Skills via Git tags; pin majors

Each Skill carries a semver in frontmatter. Each release tags the Git history (git tag skill-refactor@1.2.3). Callers pin a major (@1.x); the registry resolves to the latest patch within that major. Edit-in-place is forbidden by PR review.

**Python:**

```python
import subprocess, semver

def tag_skill_release(skill_path: str, new_version: str):
    semver.VersionInfo.parse(new_version)
    skill_name = parse_skill(skill_path)["frontmatter"]["name"]
    tag = f"{skill_name.replace('/', '-')}@{new_version}"
    update_frontmatter_version(skill_path, new_version)
    subprocess.run(["git", "add", skill_path], check=True)
    subprocess.run(["git", "commit", "-m", f"chore(skill): bump {skill_name} to {new_version}"], check=True)
    subprocess.run(["git", "tag", "-a", tag, "-m", f"{skill_name} {new_version}"], check=True)
    print(f"tagged {tag}")

def resolve_caller_pin(skill_name: str, pin: str) -> str:
    if pin.endswith(".x"):
        major = int(pin.split(".")[0])
        tags = subprocess.check_output(
            ["git", "tag", "-l", f"{skill_name.replace('/', '-')}@{major}.*"],
            text=True,
        ).strip().split("\n")
        versions = [t.split("@", 1)[1] for t in tags if t]
        return max(versions, key=lambda v: semver.VersionInfo.parse(v))
    return pin
```

**TypeScript:**

```typescript
import { execSync } from "node:child_process";
import semver from "semver";

export function tagSkillRelease(skillPath: string, newVersion: string) {
  if (!semver.valid(newVersion)) throw new Error(`invalid semver ${newVersion}`);
  const skillName = (parseSkill(skillPath).frontmatter as { name: string }).name;
  const tag = `${skillName.replaceAll("/", "-")}@${newVersion}`;
  updateFrontmatterVersion(skillPath, newVersion);
  execSync(`git add ${skillPath}`, { stdio: "inherit" });
  execSync(`git commit -m "chore(skill): bump ${skillName} to ${newVersion}"`, { stdio: "inherit" });
  execSync(`git tag -a ${tag} -m "${skillName} ${newVersion}"`, { stdio: "inherit" });
  console.log(`tagged ${tag}`);
}

export function resolveCallerPin(skillName: string, pin: string): string {
  if (pin.endsWith(".x")) {
    const major = Number(pin.split(".")[0]);
    const raw = execSync(`git tag -l "${skillName.replaceAll("/", "-")}@${major}.*"`, { encoding: "utf8" });
    const tags = raw.trim().split("\n").filter(Boolean);
    const versions = tags.map((t) => t.split("@")[1]);
    return versions.sort(semver.rcompare)[0];
  }
  return pin;
}

declare function parseSkill(path: string): { frontmatter: Record<string, unknown>; body: string };
declare function updateFrontmatterVersion(path: string, version: string): void;
```

Concept: `tool-calling`

## Decision matrix

| Decision | Right answer | Wrong answer | Why |
|---|---|---|---|
| IDE-first or CLI-first? | CLI-first. IDE extensions wrap the CLI. | IDE-first. Skills built into a VSCode plugin and ported to other editors as parallel implementations. | CLI-first is portable across every editor and shell workflow. New editors get supported with a 200-line shell-out wrapper instead of a parallel Skill engine. One source of truth; one update path. |
| Skill or slash Command? | Skill if the work needs isolated exploration (context: fork) or reusable parameters. Command if the work has session-wide effects. | Use Command for everything because it is simpler. | Skills give you isolation (fork), reusable parameter schemas, and discoverability via claude skills list. Commands are inline and per-session. The wrong choice produces fragile workflows that break when copied between repos. |
| Tool access in a refactoring Skill | Explicit allowed-tools whitelist (Read, Grep, Glob, Edit). No Bash. Edit only because the Skill genuinely needs it. | Unrestricted tools. The agent will be careful. | Whitelisting is structural and SDK-enforced. Prompt-based caution is probabilistic and leaks under unusual phrasing. A test-gen Skill that does not list Edit literally cannot Edit, no matter how the prompt phrases the request. |
| Skill reusability across repos | Parameterize: directory, language, target_pattern. The Skill body uses {placeholders}. The CLI fills them in. | Hardcode paths and language. Fork the Skill per repo. | Parameterization scales linearly with use-cases. Hardcoding scales linearly with repos and produces drift. Once you have 5 forks of the same Skill, the next breaking change requires updating all 5. |

## Failure modes

| Anti-pattern | Failure | Fix |
|---|---|---|
| AP-DEVTOOLS-01 · Skills in IDE without CLI foundation | The team builds Skills as a VSCode extension first. Six months later, JetBrains and Neovim users are stuck or get a parallel re-implementation that drifts. Updates ship to one editor at a time. | CLI-first architecture. The CLI is the canonical entry point. IDE extensions are ~200-line shells over the CLI. New editors get supported with a tiny wrapper. The CLI stays the source of truth. |
| AP-DEVTOOLS-02 · Unlimited tool access in skill context | A test-generation Skill is granted full tool access. A clever prompt-injection in source comments tricks it into calling Edit on a real source file and overwriting the working tree. | allowed-tools whitelist on every Skill. Explicit list. No Bash, no Edit unless the Skill genuinely needs them. SDK enforces the whitelist; the Skill body cannot escalate. |
| AP-DEVTOOLS-03 · Skill designed for one codebase only | A refactor Skill hardcodes directory=src/ and language=tsx. Backend team needs the same Skill on app/ with language=py and forks the file. Now there are 5 forks across teams. | Parameterize: declare directory, language, target_pattern in frontmatter. The Skill body uses {placeholders}. The CLI substitutes them at invocation. One Skill, infinite repos. |
| AP-DEVTOOLS-04 · Skills vs Commands ambiguity | The team has no clear criterion. Some workflows are Skills, some are Commands, the choice is ad-hoc. New developers cannot predict which to author for a new use case. | Explicit decision tree. Skill if context: fork is needed (exploration without touching parent state) or if parameters make it reusable. Command if the work has session-wide effects. |
| AP-DEVTOOLS-05 · Shared Skills without version control | Skills are edited in place. A v2 frontmatter change ships; 12 agents that depended on the v1 shape silently break. Nobody knows which Skill regression caused the failure. | Git semver tagging. skill-refactor@1.2.3. Callers pin major (@1.x); the registry resolves to the latest patch. Breaking changes bump the major and ship as @2.0.0. |

## Implementation checklist

- [ ] Team-namespaced layout: .claude/skills/{team}/{name}.md (`skills`)
- [ ] Frontmatter schema (name, version, description, when_to_use, parameters, allowed-tools, context_mode) validated by the CLI at parse time (`structured-outputs`)
- [ ] Every Skill declares allowed-tools explicitly. Edit and Bash are NOT default (`tool-calling`)
- [ ] Exploratory Skills use context_mode: fork (`subagents`)
- [ ] Parameters declared with types in frontmatter. CLI validates before invocation (`structured-outputs`)
- [ ] CLI is the canonical entry point. IDE extensions are thin wrappers over the CLI (`claude-md-hierarchy`)
- [ ] claude skills list returns name, version, description, when_to_use, parameters (`evaluation`)
- [ ] Git semver tags per release. Callers pin major (`tool-calling`)
- [ ] PR review on every Skill change. Frontmatter shape changes require a major bump
- [ ] Per-Skill regression eval set. Runs in CI on every Skill PR

## Cost &amp; latency

- **Per-Skill invocation (in fork mode):** ~$0.004 to $0.012, Skill body ~500 tokens system + parameters ~50 tokens + child working tokens ~1500-3000 input + ~500 output. Sonnet 4.5 pricing.
- **context: fork overhead:** ~50 tokens per invocation, Fork setup writes a fresh system prompt and instantiates child message history. No LLM-side cost beyond a few extra tokens.
- **IDE wrapper integration latency p95:** ~2 to 3 seconds, Editor event triggers shell-out to the CLI; CLI parses Skill; spawns child session; child returns. The Claude API call dominates.
- **Skill registry storage:** ~10 MB for 100 Skills across 5 versions each, Frontmatter and body per Skill ~5-15 KB. 100 Skills with full Git history of 5 versions per Skill ~10 MB checked into the repo.
- **Cost per invocation at production scale:** ~$0.004, Combined Claude tokens, CLI overhead, registry lookup. At 1000 invocations per day across the team, ~$4 per day, ~$120 per month.

## Domain weights

- **D3 · Agent Operations (20%):** Skill definition file. Frontmatter schema. context: fork. allowed-tools whitelist.
- **D2 · Tool Design + Integration (18%):** Parameter contract. CLI invocation shape. Git semver tagging. IDE wrapper protocol.

## Practice questions

### Q1. You are designing a Skill for TypeScript refactoring. The agent should explore changes without affecting the working directory. Which feature isolates the exploration: context: fork or allowed-tools?

context: fork. Setting context_mode: fork in the Skill frontmatter spawns a child session with its own conversation history and its own working tree view; the parent session is untouched. The child returns a tool_result with the proposed change and the parent decides whether to apply. allowed-tools is a separate axis: it restricts which tools the child can call. The two compose. For exploration that may need to write changes, you set context_mode: fork AND list Edit in allowed-tools. Tagged to AP-DEVTOOLS-04.

### Q2. A Skill needs parameters for directory, target_pattern, backup_location. How should these parameters be defined to make the Skill reusable across different codebases?

Declare them in the Skill frontmatter with types and defaults. The Skill body uses {directory}, {target_pattern}, {backup_location} placeholders that the CLI fills in from --param key=value arguments. Required parameters that are missing cause the CLI to reject the invocation before any Claude API call. Result: one generic Skill that works on any repo by passing different parameters. Tagged to AP-DEVTOOLS-03.

### Q3. Your IDE integration uses Skills. When should a developer use a Skill vs a slash Command?

Use a Skill when (a) the work needs isolated exploration that should not touch the parent session (context: fork), or (b) the work has reusable parameters that vary across invocations. Use a Command when the work has session-wide effects (persisting state into the current conversation, sharing context with subsequent Commands). Skills are versioned, parameterized, discoverable. Commands are inline and per-session.

### Q4. A team shares a test-gen Skill. It currently has no version identifier. How should you version the Skill?

Git semver tagging. Add version: 1.0.0 to the frontmatter and cut a Git tag skill-test-gen@1.0.0 on release. Callers pin a major version: claude skills invoke shared/test-gen@1.x. The registry resolves the pin to the latest patch within the major. Breaking changes bump the major to 2.0.0; existing v1.x callers continue to work; new callers explicitly opt in to v2. Tagged to AP-DEVTOOLS-05.

### Q5. A Skill can technically execute Bash, Edit, and Read. A developer wants to run a Skill that should NOT modify files. How do you prevent the Skill from calling Edit?

Set allowed-tools: [Read, Grep, Glob] in the Skill frontmatter. Edit is omitted. The CLI loads only the listed tools into the child session; any tool_use call to Edit returns tool_result with is_error: true. This is structural: the Skill body cannot escalate its own whitelist, no matter how the prompt phrases the request.

## FAQ

### Q1. Can a Skill modify files?

Only if allowed-tools includes Edit (or Write). Refactoring Skills typically grant Edit. Exploratory Skills (test-gen, code-gen, doc-gen) often deny it: they propose changes via the tool_result payload and let the parent session decide whether to apply.

### Q2. How does the IDE know which Skills are available?

The IDE extension calls claude skills list (which walks .claude/skills//*.md and ~/.claude/skills//*.md) and feeds the result into its command palette. The CLI is the source of truth.

### Q3. What is the difference between context: fork and a Subagent?

context: fork is lightweight isolation for a single Skill invocation: fresh messages, scoped tools, parent untouched. A full Subagent is a separate agent loop with its own task and full autonomy. Use fork for one-shot exploration. Use Subagent for delegated work that needs its own multi-turn loop.

### Q4. How do I version a Skill?

version: MAJOR.MINOR.PATCH in the frontmatter; git tag skill-name@1.2.3 on release. Callers pin a major (@1.x); the registry resolves to the latest patch. Breaking changes bump the major; existing callers stay on v1.x until they migrate.

### Q5. Can a Skill call another Skill?

Yes, if both are in allowed-tools. The composing Skill lists invoke_skill as an allowed tool. Composition enables shared building blocks. Avoid deep nesting (depth > 2): debugging multi-level Skill chains is painful.

### Q6. Does Skill frontmatter override the agent's decision-making?

No. Frontmatter is attention engineering, not a regex classifier. The LLM forward-pass reads the frontmatter into context and uses it to decide whether the Skill is the right fit. Good frontmatter lifts routing accuracy substantially without removing the model's agency.

### Q7. What happens if a Skill fails?

The child session returns a tool_result with is_error: true and a structured error payload. The parent agent observes the error and decides: retry with different parameters, propose an alternative Skill, or escalate. Failures do NOT propagate to the parent's working tree because of context: fork.

## Production readiness

- [ ] Skills directory structure: team-namespaced; PR-reviewed on every change
- [ ] Frontmatter schema validated in CI; PRs that violate the schema fail to merge
- [ ] allowed-tools audit: lint blocks any Skill that grants Edit + Bash + Write together
- [ ] context_mode: fork tested end-to-end: parent state must be unchanged after a fork-mode invocation
- [ ] Parameter validation tested with missing required, wrong type, and extra-key cases
- [ ] Git tag automation: release script bumps version, tags, pushes; PR review gates the bump
- [ ] IDE wrapper extensions kept under 300 lines; review enforces 'thin wrapper' principle
- [ ] Per-Skill regression eval suite; runs in CI on every Skill PR

---

**Source:** https://claudearchitectcertification.com/scenarios/agent-skills-for-developer-tooling
**Vault sources:** ACP-T05 Scenario 12 (yellow beyond-guide; OP-claimed Reddit 1s34iyl); ACP-T07 Lab 12 spec (Skills for developer tooling); ACP-T08 section 3.12 (CLI-first, IDE wrappers, version control); Course 15 Introduction to Agent Skills (overview + lesson 5 sharing skills); Course 01 Claude 101 lesson 7 (working with Skills); ACP-T06 (5 practice Qs tagged to components)
**Last reviewed:** 2026-05-04

**Evidence tiers**, 🟢 official Anthropic doc · 🟡 partial doc / inferred · 🟠 community-derived · 🔴 disputed.