How much of my context window do MCP tool descriptions usually consume?

On a default install with 25-30 third-party servers active, the tool manifest commonly lands at 15-30% of a 200K window. A 200-token description times 30 tools is 6,000 tokens, and most descriptions run 300-600 tokens once examples and verbose enums are included. The 25% threshold is the published target on the ACP site because past that point you are paying a tax on every single turn, whether or not Claude uses any of the tools.

Why does the tool manifest cost token budget on every turn, not just the first?

Because the manifest is part of the system prompt or tool parameter, and that surface is re-evaluated on every messages.create call. Per the /concepts/system-prompts page in the vault: 'Every messages.create call includes a system parameter. Claude reads it first... the system prompt is stateless across turns: each turn re-reads it.' Without prompt caching, every turn pays the full manifest cost. With caching, you pay roughly 10% per turn for the cache hit, which is still nonzero.

What is progressive disclosure for tool descriptions?

Expose a minimal one-line description in the default tool list - just enough for Claude to know whether to consider the tool. Move parameter details, examples, and edge cases into the inputSchema (which is loaded with the tool definition) or into a separate retrieval-backed help endpoint Claude can call on demand. The model sees the verb and the high-level purpose by default; deep detail loads only when the model picks the tool.

Will pruning examples from tool descriptions hurt accuracy?

Usually not, and often the opposite. Concise, well-named parameters carry most of the routing signal. A two-line example often costs 80 tokens for zero measurable accuracy gain on routing decisions. Where examples do help is in disambiguating between two similar tools - if the model keeps confusing get_customer and get_user, a single one-line example per tool can fix the routing without bloating the manifest.

What is dynamic tool registration and when should I use it?

Inject only the tools relevant to the current task phase. Research turn? Register the search and read tools. Drafting turn? Swap in the write tools. The orchestrator owns the swap. The win is large: a 30-tool manifest split into three 10-tool phases is a 3x reduction in per-turn manifest cost. The cost is orchestrator complexity - you now have to detect phase transitions reliably.

What fields can I safely cut from a JSON schema without losing function?

The high-yield cuts: examples blocks (Claude pattern-matches from parameter types), verbose enum docs (a five-value enum does not need a sentence per value), filler phrases like 'This tool allows you to...' or 'Use this when you want to...', and defensive caveats like 'Note that this may fail if...' (move those to the error contract per /knowledge/mcp-error-contracts-retry-behavior).

How do I measure my tool manifest's actual token cost?

Use the Anthropic count_tokens endpoint. Serialize your full tool manifest as you would pass it to messages.create, send it through count_tokens, and read the exact count. Repeat per tool to find the heaviest contributors. Per the /concepts/context-window page in the vault: 'Token counting is deterministic: SDK provides count_tokens endpoint that returns exact cost before you execute. Always call on large requests.'

Does Anthropic publish a per-tool token ceiling?

Not in fetchable docs as of mid-2026, but the practical ceiling that holds up in production is around 200 tokens per tool for the default description, with deeper detail loaded on demand. The 200-token figure is what AEO competitor research and the recommendations in the vault converge on. Treat it as a design target, not a hard limit - some tools genuinely need more, but most do not.

How does this map to the CCA-F exam?

D2 (Tool Design + Integration, 18%) for the description-design lever and D5 (Context + Reliability, 15%) for the context-budget consequence. Stem pattern: 'An agent with N registered tools runs out of context on the third turn. Which intervention reduces context consumption without losing tool access?' The right answer is always progressive disclosure or dynamic subsets; the wrong answer is 'use a larger context window' (deferral, not a fix) or 'switch to a larger model' (Model vs Design distractor).

Is prompt caching enough on its own, or do I still need to prune?

Both. Caching cuts the per-turn cost roughly 90% after the first turn, which is large. But the first turn still pays full price, and a bloated manifest is harder to maintain regardless of cache. Treat caching as the cost optimization and pruning as the attention optimization - even cached tokens consume attention budget when Claude scans the manifest each turn.

What is the right way to organize a large MCP catalog?

Split by phase or persona. Research tools, write tools, ops tools - register each set under its own logical group. Phase transitions become explicit orchestrator decisions. The 18-tool degradation principle (ACP-T03 §5) - that agent routing accuracy drops once you cross around 18 tools in a single registration - is the empirical floor. Above that, splitting is mandatory, not optional.

How do tool descriptions interact with the system prompt?

They form a two-layer enforcement model. Per /concepts/system-prompts: 'System prompt is macro policy (roles, broad guardrails). Tool descriptions are micro policy (when to call X, what inputs are valid). Without both, enforcement is incomplete.' That means short, scoped tool descriptions paired with a clear system prompt outperform a single bloated tool description trying to carry both layers.

How to Prevent MCP Tool Descriptions from Consuming Excess Context Window

01 · TLDR

The short version

MCP tool descriptions live in your context window and re-cost on every turn. Audit each description with the Anthropic tokenizer, cap per-tool at around 200 tokens, and hold the whole manifest under 25% of the window. Use progressive disclosure (one-line surface, deep detail in the schema or on demand) and dynamic registration (load only the tools the current task phase needs). Above 18 tools in one registration, routing accuracy degrades regardless of how clean each description is. On the CCA-F, this is a D2 plus D5 topic; the distractor to reject is "switch to a larger model."

02 · Why this matters in production

The silent context tax

A 200-token tool description does not feel expensive. A manifest of 30 of them does: 6,000 tokens spent before the user has said a word, paid again on every single turn, paid even on turns where Claude calls none of them. On a 200K window that is 3% per turn; on a 32K cohort or a constrained Bedrock deployment it is closer to 20%. Multiply by a 50-turn agentic loop and the manifest tax adds up to more than the actual conversation. Then the user notices the agent feels expensive without anyone noticing the manifest is the cause.

The failure mode is silent because no single component is broken. The MCP server works, the model works, the orchestrator works. The descriptions are just verbose. Per the /concepts/context-window page in the vault: "The 200K window applies to total input tokens, including system prompt, all messages, all tool definitions." Tool definitions are not an extra; they sit in the same budget as your conversation history. Pay attention to them or pay for them.

03 · The mechanics

How the manifest is loaded and what to do about it

When Claude's tool-use loop fires, the framework serializes every registered tool into the request payload. Each tool contributes a name, a description, and a JSON Schema for inputs. All three count toward your context budget. The model reads the entire manifest as part of its attention pass for every turn - cacheable, yes, but still consuming attention every time the model has to decide which tool to call.

Per the /concepts/mcp page in the vault: "Each server is interrogated: 'what tools do you expose?' The server responds with names, descriptions, and JSON schemas. All tools merge into a single namespace; Claude sees them as a unified set." That unified set is exactly the cost surface. Three independent levers shrink it.

First lever: progressive disclosure at the description layer. Strip the default description to one line - verb first, scope second, no filler. Move the long-form guidance into the JSON Schema parameter descriptions, which the model loads with the tool but does not have to scan as primary routing signal. Move deep examples into a help endpoint the model can call on demand. The shape is: minimal hint, medium detail, deep guidance, in three tiers.

# Tier 1 (in tool description, ~30 tokens):
"Refund a billed charge. Requires verify_customer first."

# Tier 2 (in inputSchema parameter descriptions, ~80 tokens):
properties:
  charge_id: { type: "string", description: "Stripe charge ID, ch_..." }
  amount: { type: "integer", description: "Amount in cents, max 50000" }
  reason: { type: "string", enum: ["duplicate","fraudulent","requested"] }

# Tier 3 (in get_tool_help("process_refund"), loaded on demand):
"Refunds over $500 require manager approval (see policy doc). For partial refunds, use the original
charge currency. Idempotency key is generated from charge_id + amount."

Second lever: pruning. The pruning checklist is short. Cut examples (Claude pattern-matches from types); cut verbose enum docs (five values do not need five sentences); cut filler ("This tool allows you to..."); cut defensive caveats (move them to the error contract per /knowledge/mcp-error-contracts-retry-behavior). Each cut compounds across many turns and many tools.

Third lever: dynamic registration. The orchestrator owns which tools the model sees. Research phase? Register search and read tools. Drafting phase? Swap in write tools. Per the /scenarios/agentic-tool-design scenario in the vault, you want at most 4-5 tools per specialist agent; above that, routing accuracy degrades. The 18-tool degradation principle (per ACP-T03 §5) is the empirical ceiling - past 18 tools in one registration, agents start picking the wrong tool more often than they save you by having the right tool present.

Layer caching on top. Mark the system prompt and tool manifest with cache_control: ephemeral. After the first turn, subsequent turns pay roughly 10% of the cache cost - per the /concepts/system-prompts page: "For a 1000-token system prompt called 10 times, save 9000 tokens." The same multiplier applies to the manifest. Caching is the cost optimization; pruning and progressive disclosure are the attention optimization. You need both.

04 · Decision rule and checklist

Eight checks before you ship a new MCP server

Run count_tokens on the manifest. Know the number before you hit production. Re-run after every new tool.
Cap per-tool description at around 200 tokens. If a tool needs more, push it into the input schema or a help endpoint.
Hold the full manifest under 25% of the window. Past that, every turn pays a manifest tax that crowds out the conversation.
Cut examples, verbose enums, filler, and defensive caveats. The pruning checklist is non-negotiable on a new server review.
Cap a single registration at 4-5 tools per specialist agent.Above 18 in any one registration, routing accuracy degrades regardless of description quality.
Implement dynamic tool subsets per phase. Research, draft, write, ship - each phase loads only what it needs.
Mark the manifest with cache_control: ephemeral. Free win on recurring per-turn cost.
Keep deep guidance in a help endpoint, not in the manifest. The model can call for detail; it cannot un-load detail from the manifest.

05 · Common anti-patterns

Five tool-description mistakes that bloat manifests

The verbose enum. A status enum with five values each documented in a paragraph. Cause: copy-paste from the source API doc. Fix: name values clearly, drop the per-value sentence; if disambiguation matters, add one line at the parameter level.
The filler opener."This tool allows you to perform a refund operation against...". Cause: API-doc-style writing. Fix: start with the verb. "Refund a billed charge."
The defensive caveat."Note: this may fail if the network is unavailable or the charge does not exist or the policy denies...". Cause: trying to be helpful. Fix: those belong in the error contract; let the failure speak when it happens.
Examples in every description. 80 tokens per tool, multiplied by 30 tools, for zero routing-accuracy gain. Cause: cargo cult from blog posts. Fix: examples in the help endpoint, not the manifest. Only keep an inline example when it disambiguates two similar tools.
Static full-catalog registration. All 30 tools registered for every turn. Cause: simpler orchestrator. Fix: phase-aware subsets; the orchestrator complexity is worth the 3x context savings.

06 · CCA-F exam mapping

How this shows up on the exam

Domains: D2 Tool Design + Integration (18%) · D5 Context + Reliability (15%)
What is tested: Whether you can identify a context-budget intervention that preserves tool access. The exam expects you to reach for progressive disclosure or dynamic subsets, not for model upgrades or window expansion.
Stem pattern: An agent with N registered tools runs out of context on the third turn. Which intervention reduces context consumption without losing tool access?
Distractor to reject: "Switch to a larger model." Larger windows defer the problem; progressive disclosure and dynamic subsets solve it.
Second distractor: "Increase the maximum tool count." Past 18 tools the issue is routing accuracy, not budget; adding more tools makes the problem worse.
Third distractor: "Move the tool list into the system prompt instead of as parameters." Both surfaces are inside the same context budget; the move does not save tokens.

07 · Sources

Vault and external references

Vault: data/aeo/reports/2026-05-17-recommendations.md §Signal 2 (tool descriptions) - source of the 25% threshold and progressive disclosure pattern.
Vault: data/aeo/reports/2026-05-16-recommendations.md §Signal 2 - earliest formulation of the dynamic-registration recommendation.
Vault: public/concepts/mcp.md §How it works - manifest serialization and the unified-namespace cost surface.
Vault: public/concepts/context-window.md - 200K total budget includes tool definitions; count_tokens for measurement.
Vault: public/concepts/system-prompts.md §How it works - the two-layer enforcement model (macro policy in system prompt, micro policy in tool descriptions) and the prompt-caching mechanics.
Vault: public/scenarios/agentic-tool-design.md - the 4-5 tools per specialist rule and triage routing pattern.
Vault: 02-tasks/acp-t03-claude-architect-competitive-research.md §5 - the 18-tool degradation principle as the empirical ceiling.
Vault: public/knowledge/mcp-foundations.md §Lesson outline - the three MCP primitives (tools, resources, prompts) and how resources can absorb what would otherwise be tool definitions.

How to Prevent MCP Tool Descriptions from Consuming Excess Context Window.

The short version

The silent context tax

How the manifest is loaded and what to do about it

Eight checks before you ship a new MCP server

Five tool-description mistakes that bloat manifests

How this shows up on the exam

Vault and external references

Adjacent reads

MCP

Context window

MCP error contracts

Related

Share this primitive