The short version
MCP injects tool results verbatim into the model's context. A 50 KB JSON payload routinely costs 12,000+ tokens. Four mitigations in order of leverage: server-side field projection (strip what is not needed), pagination via follow-up tool calls, chunk-and-summarize on the client when pagination is not possible, and streaming with hard truncation thresholds when the transport supports it. Silent truncation is the worst outcome - always surface oversize as an explicit error contract. On the CCA-F this is a D2 plus D5 topic; the distractor to reject is "tell the model to ignore unused fields."
The silent context drain
An agent integrates the GitHub MCP server, calls list_issues with default parameters, and the response includes every field on every issue: title, body, labels, assignees, milestones, reactions, links to comments, embedded HTML. The raw payload is 40 KB; the tokenized version is closer to 14,000 tokens. The agent has burned 7% of a 200K window on a single tool call before doing any work. Three calls later it is out of context. The user sees a generic "I ran out of room to think about this" failure and nobody traces it back to the issue list.
Per the /concepts/mcp page in the vault: "The result is wrapped in a tool_result block and appended to the message list. Claude never knows where the tool ran or what authentication was used." That verbatim injection is the cost surface. The model cannot "skim" the payload; it cannot decline to read it; the tokens are there until the conversation rolls them out. Shrinking the payload before it enters context is the only structural fix.
Four patterns, in order of leverage, with examples
1. Server-side field projection. Highest leverage. The server accepts a fields parameter (or applies a default policy filter) and returns only what the agent needs. A representative SaaS API response typically drops from 8,000 tokens to about 400 with just three fields preserved. Cost: a few lines of orchestrator code. Benefit: the largest single token reduction available.
# Before: full payload, ~8,000 tokens
{
"id": 8814,
"title": "Refund flow breaks on canceled subscriptions",
"body": "## Steps to reproduce ...", # 2,400 tokens of markdown
"user": { ...20 fields... }, # 600 tokens
"labels": [ ...12 labels with metadata... ],
"comments": [ ...embedded 30 comments... ],
"reactions": { ...11 emoji counts... },
...
}
# After: projected to {id, status, summary}, ~400 tokens
{
"id": 8814,
"status": "open",
"summary": "Refund flow breaks on canceled subscriptions; bug in webhook handler."
}2. Pagination via follow-up tool calls. When the response is naturally a list, expose a cursor or offset plus a has_more flag. The agent fetches one page; if it needs more, it calls again with the next cursor. Per page-level cost stays bounded; total cost grows with the actual fetch volume, not with the worst-case payload. Pagination compounds with projection: paginate the list, project the items.
# tool_result page 1
{
"items": [ {id:1,...}, {id:2,...}, {id:3,...} ], # 3 items, ~600 tokens
"next_cursor": "eyJ...",
"has_more": true
}
# tool_result page 2 (after model decides to keep fetching)
{
"items": [ {id:4,...}, {id:5,...}, {id:6,...} ],
"next_cursor": "eyJ...",
"has_more": true
}3. Chunk-and-summarize. When the response is one large document (a contract, a transcript, a long article) and pagination is not natural, slice the payload into chunks and run a summarization step between them. The model reasons over the summary, not the raw chunks. Best implemented server-side when possible - a deterministic top-N extractor is cheaper than a client-side LLM summarization call. Save client-side LLM summarization for cases where deterministic summarization would be wrong (long-form prose, narrative).
4. Stream and accumulate. When the MCP transport supports streaming, accumulate state on the client incrementally. Pair with a token-counting middleware and a hard truncation threshold. When the threshold trips, return an explicit error contract (retryable: true, errorCode: RESULT_TRUNCATED) so the orchestrator can choose to paginate or project. Per /knowledge/mcp-error-contracts-retry-behavior, never truncate silently - the model reasoning over an unknown-incomplete payload is the worst failure mode.
The patterns compound. Projection plus pagination handles 90% of production cases. Add chunk-and-summarize for the long-document outliers. Reserve streaming for the high-throughput, low-latency cases where the orchestrator wants to start processing before the full payload lands. Skip projection and the others paper over the symptom without fixing the cause.
Seven checks for every MCP tool that returns JSON
- Measure with count_tokens, not eyeballed bytes. Tokens are the actual cost; bytes are not a reliable proxy.
- Apply server-side field projection by default. Pick the fields the agent actually needs; drop everything else at the source.
- Cap the per-call result size. Set a hard token ceiling on the tool response; return an explicit error contract above it.
- Expose pagination on any naturally-listy response. Cursor or offset plus a has_more flag.
- Choose chunk-and-summarize over pagination for one-shot documents. Server-side deterministic summarization first; client-side LLM only when the structure demands it.
- Never truncate silently. Surface oversize as a retryable error so the orchestrator can paginate or project.
- Measure again after any schema change.A "harmless" new field can quietly add 30% to every response.
Five recurring oversized-result mistakes
- Ignore-the-fields prompt. Telling Claude to skip irrelevant fields. Cause: confusing model attention with token cost. Fix: strip fields at the server.
- Silent truncation. Server caps at 4,000 tokens, drops the rest, model never knows. Cause: a defensive default that hides the symptom. Fix: explicit error contract per /knowledge/mcp-error-contracts-retry-behavior.
- Full payload on every call. No projection, no pagination, no cap. Cause: shipping the first thing that worked. Fix: projection as the default; pagination on lists.
- Client-side LLM summarization for everything. Paying for a summarization model call where a deterministic top-N filter would do. Cause: cargo cult from RAG patterns. Fix: deterministic summarization first; LLM only for prose.
- No size cap. A schema change adds a new field; every response grows by 30%. Cause: no per-call ceiling. Fix: a hard token cap on every tool response, enforced server-side.
How this shows up on the exam
Vault and external references
- Vault:
data/aeo/reports/2026-05-17-recommendations.md§Signal 1 - source of the four canonical mitigation patterns ranked by leverage. - Vault:
data/aeo/reports/2026-05-16-recommendations.md§Signal 1 - earliest formulation; same four-pattern recommendation. - Vault:
data/aeo/reports/2026-05-16-page-type-mix.md§Signal 1 - this question is the highest-frequency MCP-related search across competitor surfaces. - Vault:
public/concepts/mcp.md§How it works - tool_result is appended to the message list verbatim, which is why payload shape matters. - Vault:
public/concepts/context-window.md§How it works - count_tokens is deterministic; always measure large requests. - Vault:
public/concepts/attention-engineering.md§How it works - the U-shaped attention curve and why unused middle content degrades retrieval fidelity. - Vault:
public/concepts/stop-reason.md§How it works - max_tokens stop reason as the explicit truncation signal, vs silent stream cutoff. - Vault:
data/aeo/reports/2026-05-17-competitor-teardown.md- external competitor coverage; reference for what other prep sites get wrong on this question.