Knowledge · D2 · D5 · How-To

How to Prevent MCP Tool Descriptions from Consuming Excess Context Window.

MCP tool descriptions can silently eat 25% of your context window. Audit your token budget, apply progressive disclosure, prune redundant schema fields, and inject only task-relevant tool subsets per turn.

D2 Tool DesignD5 Context + ReliabilityHowTo · 5 steps

Last updated

01 · TLDR

The short version

MCP tool descriptions live in your context window and re-cost on every turn. Audit each description with the Anthropic tokenizer, cap per-tool at around 200 tokens, and hold the whole manifest under 25% of the window. Use progressive disclosure (one-line surface, deep detail in the schema or on demand) and dynamic registration (load only the tools the current task phase needs). Above 18 tools in one registration, routing accuracy degrades regardless of how clean each description is. On the CCA-F, this is a D2 plus D5 topic; the distractor to reject is "switch to a larger model."

02 · Why this matters in production

The silent context tax

A 200-token tool description does not feel expensive. A manifest of 30 of them does: 6,000 tokens spent before the user has said a word, paid again on every single turn, paid even on turns where Claude calls none of them. On a 200K window that is 3% per turn; on a 32K cohort or a constrained Bedrock deployment it is closer to 20%. Multiply by a 50-turn agentic loop and the manifest tax adds up to more than the actual conversation. Then the user notices the agent feels expensive without anyone noticing the manifest is the cause.

The failure mode is silent because no single component is broken. The MCP server works, the model works, the orchestrator works. The descriptions are just verbose. Per the /concepts/context-window page in the vault: "The 200K window applies to total input tokens, including system prompt, all messages, all tool definitions." Tool definitions are not an extra; they sit in the same budget as your conversation history. Pay attention to them or pay for them.

03 · The mechanics

How the manifest is loaded and what to do about it

When Claude's tool-use loop fires, the framework serializes every registered tool into the request payload. Each tool contributes a name, a description, and a JSON Schema for inputs. All three count toward your context budget. The model reads the entire manifest as part of its attention pass for every turn - cacheable, yes, but still consuming attention every time the model has to decide which tool to call.

Per the /concepts/mcp page in the vault: "Each server is interrogated: 'what tools do you expose?' The server responds with names, descriptions, and JSON schemas. All tools merge into a single namespace; Claude sees them as a unified set." That unified set is exactly the cost surface. Three independent levers shrink it.

First lever: progressive disclosure at the description layer. Strip the default description to one line - verb first, scope second, no filler. Move the long-form guidance into the JSON Schema parameter descriptions, which the model loads with the tool but does not have to scan as primary routing signal. Move deep examples into a help endpoint the model can call on demand. The shape is: minimal hint, medium detail, deep guidance, in three tiers.

# Tier 1 (in tool description, ~30 tokens):
"Refund a billed charge. Requires verify_customer first."

# Tier 2 (in inputSchema parameter descriptions, ~80 tokens):
properties:
  charge_id: { type: "string", description: "Stripe charge ID, ch_..." }
  amount: { type: "integer", description: "Amount in cents, max 50000" }
  reason: { type: "string", enum: ["duplicate","fraudulent","requested"] }

# Tier 3 (in get_tool_help("process_refund"), loaded on demand):
"Refunds over $500 require manager approval (see policy doc). For partial refunds, use the original
charge currency. Idempotency key is generated from charge_id + amount."

Second lever: pruning. The pruning checklist is short. Cut examples (Claude pattern-matches from types); cut verbose enum docs (five values do not need five sentences); cut filler ("This tool allows you to..."); cut defensive caveats (move them to the error contract per /knowledge/mcp-error-contracts-retry-behavior). Each cut compounds across many turns and many tools.

Third lever: dynamic registration. The orchestrator owns which tools the model sees. Research phase? Register search and read tools. Drafting phase? Swap in write tools. Per the /scenarios/agentic-tool-design scenario in the vault, you want at most 4-5 tools per specialist agent; above that, routing accuracy degrades. The 18-tool degradation principle (per ACP-T03 §5) is the empirical ceiling - past 18 tools in one registration, agents start picking the wrong tool more often than they save you by having the right tool present.

Layer caching on top. Mark the system prompt and tool manifest with cache_control: ephemeral. After the first turn, subsequent turns pay roughly 10% of the cache cost - per the /concepts/system-prompts page: "For a 1000-token system prompt called 10 times, save 9000 tokens." The same multiplier applies to the manifest. Caching is the cost optimization; pruning and progressive disclosure are the attention optimization. You need both.

04 · Decision rule and checklist

Eight checks before you ship a new MCP server

  1. Run count_tokens on the manifest. Know the number before you hit production. Re-run after every new tool.
  2. Cap per-tool description at around 200 tokens. If a tool needs more, push it into the input schema or a help endpoint.
  3. Hold the full manifest under 25% of the window. Past that, every turn pays a manifest tax that crowds out the conversation.
  4. Cut examples, verbose enums, filler, and defensive caveats. The pruning checklist is non-negotiable on a new server review.
  5. Cap a single registration at 4-5 tools per specialist agent.Above 18 in any one registration, routing accuracy degrades regardless of description quality.
  6. Implement dynamic tool subsets per phase. Research, draft, write, ship - each phase loads only what it needs.
  7. Mark the manifest with cache_control: ephemeral. Free win on recurring per-turn cost.
  8. Keep deep guidance in a help endpoint, not in the manifest. The model can call for detail; it cannot un-load detail from the manifest.
05 · Common anti-patterns

Five tool-description mistakes that bloat manifests

  1. The verbose enum. A status enum with five values each documented in a paragraph. Cause: copy-paste from the source API doc. Fix: name values clearly, drop the per-value sentence; if disambiguation matters, add one line at the parameter level.
  2. The filler opener."This tool allows you to perform a refund operation against...". Cause: API-doc-style writing. Fix: start with the verb. "Refund a billed charge."
  3. The defensive caveat."Note: this may fail if the network is unavailable or the charge does not exist or the policy denies...". Cause: trying to be helpful. Fix: those belong in the error contract; let the failure speak when it happens.
  4. Examples in every description. 80 tokens per tool, multiplied by 30 tools, for zero routing-accuracy gain. Cause: cargo cult from blog posts. Fix: examples in the help endpoint, not the manifest. Only keep an inline example when it disambiguates two similar tools.
  5. Static full-catalog registration. All 30 tools registered for every turn. Cause: simpler orchestrator. Fix: phase-aware subsets; the orchestrator complexity is worth the 3x context savings.
06 · CCA-F exam mapping

How this shows up on the exam

Domains
D2 Tool Design + Integration (18%) · D5 Context + Reliability (15%)
What is tested
Whether you can identify a context-budget intervention that preserves tool access. The exam expects you to reach for progressive disclosure or dynamic subsets, not for model upgrades or window expansion.
Stem pattern
An agent with N registered tools runs out of context on the third turn. Which intervention reduces context consumption without losing tool access?
Distractor to reject
"Switch to a larger model." Larger windows defer the problem; progressive disclosure and dynamic subsets solve it.
Second distractor
"Increase the maximum tool count." Past 18 tools the issue is routing accuracy, not budget; adding more tools makes the problem worse.
Third distractor
"Move the tool list into the system prompt instead of as parameters." Both surfaces are inside the same context budget; the move does not save tokens.
07 · Sources

Vault and external references

  • Vault: data/aeo/reports/2026-05-17-recommendations.md §Signal 2 (tool descriptions) - source of the 25% threshold and progressive disclosure pattern.
  • Vault: data/aeo/reports/2026-05-16-recommendations.md §Signal 2 - earliest formulation of the dynamic-registration recommendation.
  • Vault: public/concepts/mcp.md §How it works - manifest serialization and the unified-namespace cost surface.
  • Vault: public/concepts/context-window.md - 200K total budget includes tool definitions; count_tokens for measurement.
  • Vault: public/concepts/system-prompts.md §How it works - the two-layer enforcement model (macro policy in system prompt, micro policy in tool descriptions) and the prompt-caching mechanics.
  • Vault: public/scenarios/agentic-tool-design.md - the 4-5 tools per specialist rule and triage routing pattern.
  • Vault: 02-tasks/acp-t03-claude-architect-competitive-research.md §5 - the 18-tool degradation principle as the empirical ceiling.
  • Vault: public/knowledge/mcp-foundations.md §Lesson outline - the three MCP primitives (tools, resources, prompts) and how resources can absorb what would otherwise be tool definitions.
08 · Related

Adjacent reads