I watched a video on how to never hit Claude's session limit again and the whole thing rests on a single reframe: the limit is not the problem. Waste is. Most people hitting the ceiling on Claude Code are paying Opus prices for Haiku work, dragging forty turns of stale context into every new prompt, and treating subagents as an org-chart gimmick instead of what they actually are — a token strategy.
This is the practical playbook I use to stay under the limits while shipping more than I ever have. None of it is clever. All of it is discipline.
The Limit Isn't the Problem — Waste Is
Here is the shape of the complaint: "I hit Claude's session limit by 2pm every day." Almost always, the fix is not a bigger plan. It is a quieter harness.
Three patterns show up over and over when I audit someone's setup.
The Opus-for-everything tax. They set Opus as default and never switch. Every ls, every file read, every "what does this function do" burns the most expensive model in the fleet. A week of this and the quota is gone by Wednesday.
The forever-conversation tax. They never start a new chat. Turn thirty references turn three. The tool outputs from that ten-minute codebase exploration at 9am are still sitting in the context at 4pm. Every single turn is paying rent on work that finished hours ago.
The bloated CLAUDE.md tax. I wrote about this in the 944-token mistake — most of what people pack into CLAUDE.md is loaded on every single turn and used on almost none of them. It is the single most common reason a session gets dumb and expensive at the same time.
Fix those three and your "session limit problem" usually disappears without changing plans. The rest of this piece is how.
Route the Turn to the Right Model
The first discipline is boring and it is the highest-ROI move on this list: stop using Opus for everything.
A reasonable distribution on a normal week of building:
- Haiku for the 60% that is navigation and summarisation. File walking, grep, log reading, "which files reference this function," "summarise what this directory does." Haiku is fast, cheap, and its summaries are good enough to make the next decision with. You are not losing quality — you are losing price.
- Sonnet as the default workhorse. Writing code, multi-step reasoning, anything that mixes code and judgement. Most of the real building fits here.
- Opus for the 10% that is genuinely hard. Architecture decisions, tricky debugging, long-form writing, anything where a single bad answer costs you an afternoon. This is where the model's weight earns its price.
The trap is that Opus feels safer. Of course the smartest model gives the best answer. But it gives the best answer to every question, including "what is the time" — and you pay for that. Developers who live inside the quota have trained themselves to route the turn to the cheapest model that can finish it. It is the same instinct a good engineer has for picking the simplest tool that works.
Subagents Are a Token Strategy, Not an Org Chart
The second discipline is where most people leave the biggest savings on the table: subagents.
A subagent is a separate Claude instance spawned from your main conversation, with its own context window, its own tool allowlist, and — crucially — its own model choice. It does a job, returns a summary, and disappears. The tool outputs, the file reads, the back-and-forth it took to get the answer never touch your main conversation.
That last part is the whole point. When you ask your main agent to "explore the payments module and tell me how refunds flow," the exploration itself can chew through twenty file reads and five grep results. All of that sits in your context window for the rest of the day.
Delegate the same question to a subagent and you get back a clean paragraph. The main context stays sharp. Your token bill stays low.
Two settings compound the savings:
model: haikuon exploration-heavy subagents. Most exploration is summarisation work. Haiku does it well and does it cheap. A subagent reading thirty files on Opus is a waste of tokens that could have been Sonnet work on the main thread.- Tool scoping. Give the subagent only the tools it actually needs — Read, Grep, Glob for a code explorer; WebFetch for a research task. A scoped subagent is a faster subagent, and a faster subagent is a cheaper one.
The rule I settled on: if a task will generate more than a screenful of tool output before it produces a usable answer, it belongs in a subagent. Otherwise it belongs in the main thread. Over a week, this one habit has done more for my quota than any model change. I wrote up the skill-and-subagent patterns in full in Claude Code skills and subagents — subagents are the second half of the harness discipline the first half of this post keeps pointing at.
/clear Is a Production Tool, Not a Reset Button
Most people use /clear the way you use Ctrl-L in a terminal — when the screen gets messy and they want a tidy feeling. That is fine, but it is underusing the command by an order of magnitude.
/clear wipes the conversation history and keeps your harness — CLAUDE.md, project files, skills, settings. The moment your next prompt makes perfect sense to a brand-new Claude, the previous conversation is just ballast.
Some concrete moments I run /clear:
- Finished a bug hunt, starting a feature. The thirty tool outputs from "why is this test failing" are no longer useful. Clear, then start the feature fresh.
- Topic shifted. You were working in the billing module; you are now touching the email worker. The billing context is just noise in the new task.
- The model is getting confidently wrong. If Claude keeps referring to an older version of the file it read twenty turns ago, your context is rotting. Clear and show it the file again. Faster than debugging the confusion.
- Before any serious architecture or writing turn. The turn you most want to run on Opus deserves the cleanest context you can give it. Clear the junk, frame the question, then ask.
A lot of the quality problem people blame on the model is actually a context problem. Fresh context, well-framed question — the model gets sharper and cheaper in the same move.
Split Your Day Into Windows
Anthropic's usage window is rolling, not a daily reset. That small fact changes the optimal shape of your day.
If you sit down at 9am and run one continuous session until 5pm, you are stacking every token you burn into the same rolling window. You hit the ceiling, and you hit it hard. The ceiling stays hit for hours.
Split the same work into a morning block (9–11), an afternoon block (1–3), and an evening block (5–7) and something quietly magical happens: by the time you come back from lunch, some of the morning's usage has already aged out of the window. You start the afternoon with breathing room. Same total work, none of the limit hits.
This is a workflow change, not a Claude change. It pairs naturally with the way real software building actually looks: a bounded session of coding, a break to think or walk, a bounded session of reviewing what you built, another break, a bounded session of testing and writing it up. The builders I know who ship the most already work this way. Claude's usage window just happens to reward the same cadence.
Two tactical notes:
- Don't leave the session "running" across the break. Kill it. The next block is cleaner with a new conversation anyway.
- Use the breaks for non-Claude work. Read the PR on a teammate's branch. Write the sprint note. Triage email. Anything that advances the work without spending your quota.
Kill the Conversation Before It Rots
The final discipline is the one I had to relearn most often before it stuck: end conversations early.
Every turn pays for all the prior context. Turn fifteen on a long thread can cost five times what turn one cost. Turn thirty can cost fifteen times. And the irony is that turn thirty is also less accurate than turn three, because attention degrades as the window fills. You are paying more for a worse answer.
My rule is a soft ceiling of fifteen to twenty real turns. If I am still on the same thread at twenty, I stop, ask Claude to summarise what we have figured out so far, copy the summary into a new conversation, and keep going. It takes thirty seconds. It halves my token bill and usually sharpens the next turn.
Signals that a conversation should die sooner than twenty turns:
- You ran a big codebase exploration and the tool outputs are still hanging around.
- You changed topic halfway through.
- Claude is hedging more than it used to — classic "context is full" tell.
- You are about to ask the hardest question of the day. Give it a clean runway.
This is the same move great writers make without thinking about tokens — a fresh page is almost always better than the cluttered one. LLMs just make the economics of it explicit.
What to Do Monday
If your Claude quota is evaporating by Wednesday and you want it to last the week, pick one of these for the next five working days — not all five, one:
- Set Haiku as your default for the morning block. Only upgrade to Sonnet or Opus when the turn genuinely needs it. You will be surprised how rarely that is.
- Spawn a subagent for the next exploration task you would have run on the main thread. Set its model to Haiku. Notice how much cleaner the main context stays for the rest of the session.
- Install a
/clearhabit between topics. Every time you shift from "debug" to "build" to "review," clear first. Treat it as punctuation, not a last resort. - Split your day into three blocks instead of one marathon. Walk between them. Check whether you hit the limit at all by Friday.
- Set a twenty-turn ceiling on conversations. When you hit it, summarise and start fresh.
The meta-point is the same one that shows up whenever I write about working with Claude: the model is not the bottleneck. The harness around the model is. People who treat that harness as part of the craft ship more, pay less, and almost never hit the limit. People who treat it as Anthropic's problem to solve hit it every week and blame the plan.
If you want the full picture of the harness — skills, subagents, and the context-engineering side — the companion piece is the 944-token mistake, and the broader business context is in AI automations for business. Together they are most of what I think builders need to know to run Claude Code as a daily driver instead of a toy.
Source: How to Never Hit Your Claude Session Limit Again, April 2026. Short, practical, worth the watch.
For teams that want this wired into a real workflow instead of a personal discipline, the AI automation service and the fractional CTO engagement both include harness setup — model routing, subagent patterns, and a repo-level .claude/ layout that a team can actually maintain.
FAQ
Common questions.
Why do I keep hitting Claude's session limit?
Usually for one of three reasons: you are running every turn on the biggest model regardless of what the turn actually needs, your conversations are running too long so each turn pays for all the prior context, or your CLAUDE.md and tool outputs are bloating the window. The fix is almost never to buy a bigger plan. The fix is to stop spending tokens the model never reads. Route routine turns to Haiku, spawn subagents for exploration, and kill conversations before they drift past twenty turns.
Which Claude model should I use in Claude Code?
Use Haiku for anything that is 'find this, summarise that' — file walking, grep, log reading, quick classifications. Use Sonnet as the default for real coding work and multi-step reasoning. Use Opus for the hard problems: architecture decisions, tricky refactors, long-form writing. Most developers default to Opus for everything and wonder why their quota evaporates by Wednesday. The honest distribution on a productive week is roughly 60% Haiku, 30% Sonnet, 10% Opus.
What does the /clear command actually do in Claude Code?
`/clear` wipes the current conversation history while keeping your CLAUDE.md, project files, and skills in place. It is the fastest way to reset the context window without losing your harness. The rule of thumb: if your next prompt would make perfect sense in a brand-new terminal, run /clear first. Every stale tool output and half-finished thread you drag into the next turn is tokens the model pays for and you gain nothing from.
How do subagents help with Claude session limits?
A subagent runs in an isolated context window. When you spawn one to explore a codebase or research a topic, its tool outputs, file reads, and back-and-forth never touch your main conversation's budget. You get back a summary. The main context stays clean. Pair this with `model: haiku` on exploration-heavy subagents and you get two compounding wins: cheaper exploration and a main conversation that stays sharp far longer.
How long should a Claude Code conversation be before I restart it?
Somewhere between fifteen and twenty serious turns, with earlier resets if you have hit a lot of tool outputs or file reads. Every turn pays for all the prior context, so a thirtieth turn can cost fifteen times what the first turn cost. If you need context from the old chat, ask Claude to summarise the conversation, paste the summary into a new one, and carry on. Quality actually improves — attention degrades as the window fills, and a fresh start usually produces a smarter reply than turn thirty on the same thread.
Can I avoid hitting Claude's usage limit by splitting work across sessions?
Yes, and it is the single highest-leverage habit change for heavy users. Anthropic's usage window is rolling, not daily — earlier usage ages out over hours. If you run one eight-hour marathon session you hit the ceiling hard; if you run a morning, afternoon, and evening session instead, earlier usage has already aged out by the time you return. Same total work, none of the limit hits. The trick is treating the breaks as part of the workflow, not a concession.
Is it worth paying for the Claude Max plan to avoid session limits?
Only after you have done the harness work. If you are burning the free or Pro quota while running everything on Opus, never spawning subagents, and keeping every conversation alive for fifty turns, upgrading just raises the ceiling you crash into. The builders I know on Claude Max spent the first week fixing their harness — model routing, subagents, /clear discipline — and found that the upgrade was genuinely uncapped rather than a slightly taller wall. Fix the leak first. The plan is the faucet.
Part of the /ai hub
Free download · 6 pages · PDF
10 AI workflows that actually save hours.
Real agentic workflows running in production — not prompt packs. Stacks, costs, and failure modes from projects that shipped.
No spam. You also get the Sunday note — unsubscribe in one click.