The number-one architecture question in the Claude Code community right now: "Should this be a skill or an MCP server?" People ship both wrong all the time — skills that should be MCP servers (because they need to hold state) and MCP servers that should be skills (because they're just teaching a workflow). Picking the wrong abstraction either wastes context or breaks every time the user's environment changes.
This post is the decision framework, with worked token-cost math so you can see the trade-off in numbers.
The One-Sentence Rule
- A skill teaches the agent how to do something. It's text the model reads on demand.
- An MCP server gives the agent something to call. It's a process the model invokes for capability.
If the answer to "what does this provide?" is knowledge, write a skill. If the answer is action against a live system, write (or use) an MCP server.
Everything else in this post is the long version of that sentence.
The Token-Budget Math
Every MCP server you connect adds its tool descriptions to the system prompt, every single request. Not when invoked — always. The model has to know the tool exists in order to consider calling it.
Here's a real-world worst case I measured in May 2026 on a working developer's project:
| MCP server | Tools exposed | System prompt tokens added |
|---|---|---|
| github | 8 | 2,140 |
| linear | 6 | 1,520 |
| postgres | 5 | 1,180 |
| filesystem | 4 | 980 |
| slack | 6 | 1,540 |
| sentry | 5 | 1,420 |
| stripe | 4 | 1,260 |
| chrome-devtools | 12 | 3,180 |
| Total | 50 | ~13,220 |
That's 13,000+ tokens of tool descriptions burned into the prompt before the user types "hi". On a 200K-context model that's only 6.6% — fine. On the cache-hit path, it has more subtle costs: every request must re-validate the same big tool block, the model has to attend over more low-value tokens, and the rate of "tool-confusion" errors (model picks github.search_issues when it should have picked linear.search_issues) goes up roughly linearly with the number of similarly-named tools.
Now compare a skill. A skill is loaded only when its trigger fires. A typical skill body is 800–2,500 tokens. If you have 200 skills installed, 199 of them cost zero tokens on any given turn.
The asymmetry is dramatic:
- 50 MCP tools, always loaded: ~13K tokens/turn × every turn.
- 200 skills, lazy-loaded: ~1.5K tokens/turn on average (assuming 1 skill activates).
That doesn't mean MCP is bad. It means MCP's cost model rewards having few servers with many tools, and skills' cost model rewards having many skills with focused scope. Architect accordingly.
When to Use a Skill
A skill is right when:
- The need is "how", not "what". Teaching the agent to follow a process, apply a convention, or avoid a known footgun.
- The work is stateless. No long-lived connection, no in-memory cache, no session that needs to persist across calls.
- The trigger is specific. You can describe in 3–5 keywords or file patterns when this should activate.
- The instructions are stable. Procedures that don't change every week. Stable enough to write down once.
Examples of natural skills:
nuxt-deploy-caddy— deploy a Nuxt app to a known host. Workflow knowledge; no live API.pg-migrations-safe— Postgres migration review checklist. Static heuristics.react-server-vs-client— when to use which in App Router. A teaching skill.fastapi-pydantic-patterns— common request/response patterns. Procedural knowledge.
If the same job were an MCP server, you'd pay the tool-description cost on every prompt — even the ones that have nothing to do with Nuxt or Postgres.
When to Use an MCP Server
An MCP server is right when:
- You need to call something external. A GitHub API, a database, a browser, a build system.
- State needs to persist between calls. A database connection. A logged-in browser session. An OAuth token.
- The integration is heavy enough to amortize. If your "tool" is one HTTP request, a skill that explains how to use
fetchis cheaper. If your tool is "open Chrome, navigate, inspect DOM, screenshot", that's an MCP server's job. - Multiple skills will want to use it. An MCP server is shared infrastructure — many skills can invoke the same
github.create_issuetool.
Examples of natural MCP servers:
- chrome-devtools — drives a real Chrome over the DevTools Protocol.
- github — actions against the GitHub API with stored OAuth.
- postgres — connects to a running DB, runs queries, returns rows.
- stripe — calls into the Stripe API for live data.
In each case, the model is asking for a result (DOM snapshot, list of issues, query rows, charge total) that requires hitting a live system. No amount of context-engineering knowledge replaces "actually call the API".
The Hybrid Pattern: Skill That Uses an MCP Server
The strongest setups combine both. A skill teaches the agent a workflow; that workflow calls MCP tools as steps.
Example: a pr-review-loop skill that says "use the github.get_pr_diff MCP tool to fetch the diff, then run these review heuristics, then post a review with github.create_pr_review". The skill is the playbook; the MCP server is the API.
This pattern compounds well: one MCP server can be referenced by ten skills, so the per-server token cost amortizes across many workflows. It also keeps the judgment (review heuristics) in editable markdown — which is much easier to iterate on than tool-schema JSON.
Decision Table
| Need | Best abstraction |
|---|---|
| "Teach the agent how to write a Pydantic validator." | Skill |
| "Let the agent query my production DB." | MCP server |
| "Stop the agent from running unsafe SQL migrations." | Skill |
| "Let the agent take a screenshot of a live page." | MCP server |
| "Codify our team's PR review checklist." | Skill |
| "Let the agent create a GitHub issue." | MCP server |
| "Deploy Nuxt to our Caddy host." | Skill (calls SSH; SSH is built-in, not MCP) |
| "Let the agent search through our Slack history." | MCP server |
| "Translate this design spec into a React component." | Skill |
| "Let the agent run our test suite and parse the output." | Skill (built-in bash) or MCP (if special harness) |
The Sub-Agent Footnote
A third pattern often gets dragged into this debate: sub-agents. They are not a substitute for either skills or MCP. Sub-agents are for parallel decomposition — spawn N copies of yourself, give each a slice of work, merge the results. They have separate context budgets and they cannot share state with the parent except via their final output.
A reasonable rule:
- Skill when the agent should know how to do X.
- MCP when the agent should be able to call X.
- Sub-agent when the agent should do X five times in parallel and merge.
Confusing skills with sub-agents is rare because sub-agents are awkward to invoke; confusing skills with MCP is common because both feel like "teach the agent a capability". The token-budget math above is the cleanest way to keep them straight.
Wrap-Up
The most expensive context tokens are the ones you don't realize you're paying. Every MCP tool description is silently in every prompt. Skills are silent until you need them.
Default to skills. Reach for MCP when the workflow genuinely requires holding state or calling a live external system. And measure: count how many tool descriptions are in your system prompt — if you're north of 5,000 tokens before the user types, you've over-MCP'd your setup.
For the broader picture on what skills are and how to install them, see the Claude Code skills guide. For a complete walkthrough of writing your own, see how to create a Claude Code skill. And browse the live skill index at orangebot.ai/skills for inspiration.