Token Efficiency

Ultra Claude burns more tokens than normal development. That's by design — dedicated Reviewers, Testers, and a shared Knowledge agent increase quality, but they cost context. The framework is built to make that trade-off worthwhile and keep usage under control.

Cost per plan

Before execution starts, you see a cost estimate:

ComponentCostDetails
Per-task pipeline~120K tokensExecutor ~80K + Reviewer ~30K + Tester ~10K
Knowledge Brief synthesis~20K tokens amortizedLead loops /uc:research over the plan's Tech Stack in Phase 1.8. Cache hits are free; misses spawn a short-lived researcher subagent.
Mid-execution knowledge QUERYs~1K (hit) / ~15K (miss)Teammates send QUERY: to Lead; Lead runs /uc:research. Subsequent queries for the same topic hit the cache.
Project Manager~50K tokensObservational monitoring, entire execution

A 5-task plan costs roughly 670K tokens (5 × 120K + 20K + 50K, plus a few mid-execution QUERY misses). The estimate is shown before any agents spawn — you confirm before spending. Second and third plans in the same project benefit more from cache hits, driving the per-plan knowledge cost toward zero over time.

Why it's worth it

Normal Claude Code development is one agent doing everything — writing code, hoping it follows patterns, and trusting it works. Ultra Claude adds dedicated roles:

The extra tokens buy you code that follows your standards, passes real tests, and uses current library APIs. Without them, you spend the same tokens later fixing the bugs that review and testing would have caught.

How the framework saves tokens

Model assignment

Only the Executor runs on Opus (the most capable, most expensive model). Every other role — Reviewer, Tester, Project Manager — runs on Sonnet. The researcher subagent (spawned on cache miss) also runs on Sonnet. This concentrates cost where capability matters most: writing code.

Lazy Tester spawning

The Tester doesn't spawn when the task starts. It's created only after the Executor signals "code complete" — the moment all source code is written, before the Executor writes the implementation report. During the planning and implementation phases (often the longest part), only two agents are active instead of three, avoiding ~10K tokens of idle context per task. Firing the signal before the impl-report write also gives the Tester its cold-read window "for free", so it's ready to test almost as soon as the Executor is done.

Cache-first research (Lead as Sage)

Research findings are committed as first-class project documentation under documentation/technology/research/, not invisible cache. The /uc:research skill checks a small JSON index via jq before spawning anything — fresh entries (per-file expires field, library 10d / patterns 90d / market 30d / historical frozen) return immediately with zero agent overhead. Cache misses spawn the stateless researcher subagent one-shot via the Task tool; it writes the target file, updates the index, and exits. No persistent knowledge teammate, no per-task duplicate lookups, and the knowledge base grows across plans.

Typed spawn distinction

Teammates (TeamCreate) are persistent, stateful, and tracked on the PM dashboard. Subagents (Task tool) are stateless, one-shot, and invisible to the team graph. The researcher is a subagent because it fits the pattern "do this one thing, return the answer, die" — making it a persistent teammate would waste tokens on a lifecycle it doesn't need.

Ref.tools integration

External library documentation is fetched via Ref.tools MCP at ~500–5K tokens per query. The fallback (raw web search) costs 50K+ tokens for the same information. Combined with the cache, the per-lookup cost on a repeated topic drops to zero.

Checkpoint recovery

When a session dies mid-execution, checkpoints let you resume without re-running completed tasks. A 10-task plan interrupted after task 5 resumes from task 6 — the 600K tokens already spent on tasks 1–5 are not re-spent.

Concurrency limits

Parallel execution is capped at 1–4 task-teams based on plan size. This prevents token burn from too many agents running simultaneously while still allowing parallel progress.

Plan sizeMax concurrent teams
1–3 tasks1–2
4–8 tasks2–3
9+ tasks3–4

Usage management

Before execution, you're asked whether to enable extra usage:

Extra usage enabled

Development runs as fast as possible with no pauses. Use this when you need results quickly and your Anthropic account has extra usage capacity.

Extra usage disabled

A three-agent chain handles usage management with ~6x less token cost than the old PM polling approach:

  1. Haiku watchdog ticks every minute (~$0.03/5h) checking two independent rate-limit windows (5h: 80% soft / 90% hard, 7d: 90% soft / 95% hard) and executor staleness. Silent when healthy.
  2. PM validates watchdog alerts, adds operational context (team count, avg task cost), forwards to Lead with window-qualified recommendations. Event-driven, no cron.
  3. Lead acts uniformly across windows — on soft, in-flight work finishes and spawning stops until reset; on hard, all teams stop immediately. Multiple blocks (5h + 7d) track independently; Lead resumes only when all windows clear.

Per-task budget data (usage % at start/end of each task) is tracked in plan.json. Multiple pause/resume cycles are supported across rate-limit windows.

Plan during the day, execute overnight. Planning modes (feature, debug, verification) are interactive and use modest tokens. Execution is autonomous and token-heavy. Start a plan during the day when you can discuss scope, then kick off execution before bed — most plans complete within a single rate limit window.

Token budget by activity
ActivityToken rangeInteractive?
Feature planning~20–50KYes — conversation with user
Debug investigation~30–60KYes — hypothesis discussion
Verification scan~20–40KYes — discrepancy review
Tech research query~0.5–5KNo — automated retrieval
Plan execution (per task)~120KNo — autonomous
Plan execution (overhead)~150KNo — PM + Knowledge

Interactive activities are cheap. Execution is expensive but autonomous. This is why the "plan by day, execute by night" workflow works well.