Skip to content
AIDeveloper ToolsClaude Code

The 944-Token Mistake: Why Your Agent.md File Is Making Claude Dumber

Updated Apr 17, 2026/10 min read

A contrarian take on context engineering. Most people burning tokens on CLAUDE.md files don't need them. Skills, progressive disclosure, and the recursive loop that actually ships.

I watched Greg Isenberg and Ras Mic break down how Claude agents and skills actually work — and the whole thing hinges on one number. A skill file whose full body is 944 tokens costs 944 tokens on every turn if you put it in agent.md. Put the same content in a skill, and Claude loads 53 tokens until the moment it decides it needs the rest.

That's the gap. Most people pay the full rent every turn for context the model almost never uses. Below is the contrarian playbook — what to strip out, how to actually build skills that survive week three, and why you should start with one agent, not fifteen.

The Models Are Good Now — Context Is What's Left

For the last two years the debate was "which model is better at X." That debate is mostly over. Claude Opus 4.6 and the newest GPT are genuinely strong at coding, reasoning, and research tasks. The remaining variance doesn't come from the model — it comes from the harness around it: the system prompt, what's in the codebase, what tools are wired up, and crucially, what you stuff into its context window on every single turn.

The context window has a hard ceiling — roughly 250,000 tokens before Claude Code or Codex start compacting. Once you're past about 70% full, quality drops measurably. Not because the model "gets tired," but because attention degrades and retrieval over long contexts becomes noisier. You want to live in the fresh-to-70% band.

This changes the optimization problem. Ten years of software engineering trained us to add — more docs, more config, more instructions, more rules. With LLMs, the dominant move is to subtract. Less is more. The job is to keep the context lean and relevant, and let the model do what it's already good at.

The 944-Token Problem

Here's the setup most Claude Code users have. They create an AGENT.md or CLAUDE.md file. They fill it with everything they want Claude to remember: the tech stack, the coding conventions, the tone of voice, the folder structure, the review checklist, the deployment process. A thousand lines isn't unusual.

That file loads into the context window on every single turn. If it's 7,000 tokens, you pay 7,000 tokens to ask Claude the time. You pay it again to ask what file to edit. You pay it again when Claude replies. By turn twenty, you've burned more than a hundred thousand tokens on instructions that were needed in maybe three of those turns.

Two things break at that point:

  1. Cost. Tokens aren't free. You're paying for context you're not using.
  2. Quality. You hit the 70% fill line sooner. The model gets measurably worse at the actual task because the window is bloated with reference material.

Ras Mic's test: take one skill file, count the tokens of the full body, count the tokens of just the name and description. The example in the video comes out to 944 tokens vs. 53 tokens. 18x difference — on every turn, for every session, for every user on your team.

Progressive Disclosure, In Plain English

Skills fix this with a pattern called progressive disclosure. The agent sees only the skill's name and description by default. The body of the file — the actual instructions, the step-by-step, the code samples — stays on disk until the agent looks at a task, scans its available skills, matches a description to the job, and then reads the full file.

Think of it like a library. You don't read every book on the shelf before writing an essay. You look at the spines, pick the ones that matter for the topic, and open those. Skills are spines. Agent.md is every book piled on your desk every morning whether you need it or not.

Three practical consequences:

  • A skill you never need costs almost nothing. You can have fifty skills available and pay for none of them on a turn that doesn't match.
  • Specificity beats volume. A sharp description — "use when generating a weekly sponsor report from eight data sources" — lets the model route correctly. Vague descriptions ("helpful stuff") cause misses.
  • The 5% exception still exists. If you genuinely have proprietary information the model must reference on every turn — a house style, a compliance constraint, a legal disclaimer — that belongs in agent.md. Ras Mic's claim: 95% of users don't have that. I think he's right. Check before you write.

Build Skills the Right Way (Don't Jump to the skill.md)

Here's where most teams lose the plot. They decide to build a skill, open the editor, and write skill.md from scratch — maybe with the AI's help, maybe from a template. The skill gets written before the workflow has ever actually succeeded.

Ras Mic's methodology, which lines up with what I see on client engagements:

  1. Identify the workflow. Not the abstract goal — the concrete, end-to-end thing you want to happen. "Screen a sponsor email, research the company, mark them in a spreadsheet, send me a Slack if they pass."
  2. Walk through it with the agent by hand. Step by step, in a normal conversation. Tell it to research. Look at what it comes back with. Correct it. Tell it the criteria it missed. Let it try again. This is slow on purpose.
  3. Only create the skill after a successful run. The agent now has the full context of what "right" looks like — what sources it checked, what logic it applied, what output format you accepted. Ask it to write the skill based on that successful run.
  4. Don't hand-write it. The AI is better at capturing its own working steps than you are. Your job is taste and correction, not transcription.

The failure mode Ras Mic calls out: people write a skill called "sponsor research," run it, the agent marks every company as legitimate, and the user concludes the technology is broken. The technology isn't broken. The skill had no criteria for rejection because the workflow had never actually executed a rejection. Models mimic. If you give them nothing to mimic, they'll mimic your optimism.

The Recursive Loop That Actually Works

Even a well-built skill fails on the sixth real-world edge case it's never seen. The question is what you do in that moment.

Most people: complain, rewrite the skill by hand, lose faith. Better: treat each failure as a free labeled example, and close the loop.

The pattern:

  1. The skill runs and fails. Something specific broke — a bad API call, a missing field, a wrong currency.
  2. Ask the agent what went wrong. Literally: "why did that fail? what was the error?" Claude will usually tell you precisely — "I got a 503 from the analytics endpoint" or "I couldn't parse the response because it returned XML instead of JSON."
  3. Tell the agent to fix the immediate issue. Normal debugging loop.
  4. Tell the agent to update the skill so the same failure can't happen again. Add the edge case. Add the fallback. Add the retry. Add the validation.

Five iterations of this loop on a non-trivial skill — Ras Mic describes an eight-data-source YouTube analytics report — and the skill runs cleanly every time, end to end, in about ten minutes of the agent's working time.

What you're building in that loop isn't just a prompt file. You're building institutional knowledge, captured as the agent's own diary of its past mistakes. It compounds. After a few months the skill reads like a runbook written by someone who's actually done the job a hundred times — because something effectively has.

Scale for Productivity, Not for Flash

There's a seductive move early in any agentic setup: spin up fifteen sub-agents, assign each a domain — marketing, research, email, CRM, code review — and call yourself agentic. Looks great in a screenshot. Very little actually works.

The reason is that the sub-agents are being asked to execute workflows the user has never executed themselves. No skills. No successful runs. No feedback loops. The graph is impressive; the throughput is zero.

Start with one agent. Give it everything — your sponsor email, your spreadsheet, your research — and build up reliable skills by walking the workflow one job at a time. Once a domain has a few proven skills and a predictable rhythm, then spin up a sub-agent for it. The sub-agent inherits proven context instead of aspirational context.

This is also a hedge against the current state of agent-framework churn. Multi-agent orchestration tools are evolving fast. Paperclip, LangGraph, the next thing — they all look great on the landing page. If the foundation under them is a pile of untested skills, switching frameworks just moves the same dysfunction into a new visualization layer.

What to Do Monday

If you've been building with Claude Code and something feels off — the agent is getting dumber deep into a session, or it's confidently wrong about things you told it, or costs are climbing faster than output — pick exactly one of these to try this week:

  1. Open your CLAUDE.md or AGENT.md. For each section, ask: does the model need this on every turn, or only when a specific task shows up? Move the task-specific pieces to skills. Delete the pieces the model already knows ("use React", "this is a TypeScript project" — it can see that).
  2. Pick one workflow you do at least weekly. Don't write a skill for it. Walk it through with the agent end-to-end, correcting in real time. Only after one clean run, ask Claude to write the skill based on that run.
  3. Next time a skill fails, don't edit it manually. Ask the agent what broke, have it fix the underlying issue, then have it update the skill so that class of failure can't repeat.
  4. Resist the multi-agent urge. If you have fewer than five battle-tested skills on one agent, don't spawn a second one. Productivity first, graph later.

The meta-point from the conversation — and it lines up with what I see across client engagements — is that the durable skill you're building isn't the skill.md file. It's your own judgment about what belongs where, what's worth codifying, and what's cheaper to let the model figure out every time. The .md files are the artifact. The judgment is the moat.


Source: How AI agents & Claude skills work (Clearly Explained) — Greg Isenberg in conversation with Ras Mic, April 2026. Worth the 35 minutes if you use Claude Code daily.


Context-engineered skills are one half of the picture; the other half is what you wire them into. See the full playbook for AI automations for business — the research agents, outreach pipelines, and support triage workflows global teams actually ship in production. For hands-on help, the AI automation service exists for exactly that.

FAQ

Common questions.

What is context engineering in Claude Code?

Context engineering is the practice of deciding what information the model needs on every turn versus what it should load only when a specific task appears. Content that sits in CLAUDE.md or AGENT.md is loaded on every single turn. Content in a skill file is loaded only when Claude decides it needs the rest. For any project beyond a handful of files, the difference in token cost and output quality is significant — often thousands of tokens and noticeable quality drop per turn.

What is a Claude Code skill?

A Claude Code skill is a markdown file (typically under .claude/skills/ in your repository) that encapsulates a specific task, workflow, or pattern. It lives dormant until Claude decides a given turn needs it — at which point its content is loaded into the context. A skill has a short description (usually under 100 tokens) and a longer body that only loads on demand. This is the progressive disclosure pattern that keeps agents fast and cheap without sacrificing capability.

What is the difference between CLAUDE.md and skills?

CLAUDE.md (or AGENT.md) is always-on context — every token in it is sent on every turn. Skills are on-demand context — the short description is seen every turn, the full body is loaded only when relevant. The rule of thumb: put project-level invariants (tech stack, coding style, hard rules) in CLAUDE.md; put task-specific workflows, runbooks, and templates in skills. Anything the model could plausibly figure out from the code itself should be in neither.

Why is my Claude agent getting dumber deep into a session?

Usually because the context window has filled up with CLAUDE.md bloat, irrelevant tool outputs, or stale conversation history. The fix is structural, not a prompt rewrite: move task-specific content from CLAUDE.md into skills, prune tool output that the model does not need to remember, and start fresh sessions when the current one has drifted. Quality degradation deep in a long session is almost always a context-engineering problem, not a model problem.

How do I build a Claude Code skill properly?

Do not start by writing a skill.md. Instead, walk the workflow end-to-end with Claude at least once, correcting in real time. Only after one clean run ask Claude to write the skill from that run. This way the skill captures how the workflow actually works, not how you imagined it would. Then, every time the skill fails, do not edit it manually — have the agent diagnose the failure and update the skill itself. That recursive loop is where the real leverage compounds.

Should I use multiple agents or one agent with many skills?

Start with one agent and many skills. Multi-agent orchestration is impressive on a landing page and rarely productive in practice — most multi-agent setups fail because the sub-agents are executing workflows the user has never validated. Build a single agent with a few proven skills, ship real work with it, and only spin up a sub-agent once a specific domain has enough battle-tested skills to justify its own context. Productivity first, graph later.

Part of the /ai hub

Free download · 6 pages · PDF

10 AI workflows that actually save hours.

Real agentic workflows running in production — not prompt packs. Stacks, costs, and failure modes from projects that shipped.

No spam. You also get the Sunday note — unsubscribe in one click.

Written by

David Dacruz

Digital architect in Ericeira, Portugal. 42 alumni. I write about building at the intersection of AI, web3, and what actually ships.