Token Efficiency

Ultra Claude burns more tokens than normal development. That's by design — dedicated Reviewers, Testers, and a shared Knowledge agent increase quality, but they cost context. The framework is built to make that trade-off worthwhile and keep usage under control.

Cost per plan

Before execution starts, you see a cost estimate:

Component	Cost	Details
Per-task pipeline	~120K tokens	Executor ~80K + Reviewer ~30K + Tester ~10K
Knowledge Brief synthesis	~20K tokens amortized	Lead loops `/uc:research` over the plan's Tech Stack in Phase 1.8. Cache hits are free; misses spawn a short-lived researcher subagent.
Mid-execution knowledge QUERYs	~1K (hit) / ~15K (miss)	Teammates send `QUERY:` to Lead; Lead runs `/uc:research`. Subsequent queries for the same topic hit the cache.
Project Manager	~50K tokens	Observational monitoring, entire execution

A 5-task plan costs roughly 670K tokens (5 × 120K + 20K + 50K, plus a few mid-execution QUERY misses). The estimate is shown before any agents spawn — you confirm before spending. Second and third plans in the same project benefit more from cache hits, driving the per-plan knowledge cost toward zero over time.

Why it's worth it

Normal Claude Code development is one agent doing everything — writing code, hoping it follows patterns, and trusting it works. Ultra Claude adds dedicated roles:

Reviewer catches pattern violations and standards drift before code is merged
Tester validates against product docs, not just "does it compile"
Cache-first research serves current API docs so agents work with real signatures, not stale training data — and the cache persists across plans, so a second plan touching the same libraries pays almost nothing for knowledge

The extra tokens buy you code that follows your standards, passes real tests, and uses current library APIs. Without them, you spend the same tokens later fixing the bugs that review and testing would have caught.

How the framework saves tokens

Model assignment

Only the Executor runs on Opus (the most capable, most expensive model). Every other role — Reviewer, Tester, Project Manager — runs on Sonnet. The researcher subagent (spawned on cache miss) also runs on Sonnet. This concentrates cost where capability matters most: writing code.

Lazy Tester spawning

The Tester doesn't spawn when the task starts. It's created only after the Executor signals "code complete" — the moment all source code is written, before the Executor writes the implementation report. The Lead sends TESTER_SPAWNED via the communication protocol (signal + SendMessage) and the Executor picks it up via WaitForTeamMember. During the planning and implementation phases (often the longest part), only two agents are active instead of three, avoiding ~10K tokens of idle context per task. Firing the signal before the impl-report write also gives the Tester its cold-read window "for free", so it's ready to test almost as soon as the Executor is done.

Signal file I/O overhead

The per-task signals.jsonl file adds ~1K–5K tokens of overhead per task across all signal reads and writes. Each signal append costs a single echo >> bash call (~50 tokens). Each agent arms one persistent tail -F inbox monitor at startup and never re-arms: it burns zero tokens while parked and costs one small re-invocation only when a signal relevant to it actually lands — no periodic wake-up churn, and the agent stays responsive to direct messages the whole time. Because the monitor follows the file directly, a silently-dropped SendMessage still wakes the agent (the append is what it watches). This is negligible compared to the cost of a stalled pipeline (thousands of wasted tokens across multiple idle agents waiting for messages that never arrive).

Cache-first research (Lead as Sage)

Research findings are committed as first-class project documentation under documentation/technology/research/, not invisible cache. The /uc:research skill checks a small JSON index via jq before spawning anything — fresh entries (per-file expires field, library 10d / patterns 90d / market 30d / historical frozen) return immediately with zero agent overhead. Cache misses spawn the stateless researcher subagent one-shot via the Agent tool (one-shot mode); it writes the target file, updates the index, and exits. No persistent knowledge teammate, no per-task duplicate lookups, and the knowledge base grows across plans.

Two spawn modes of the Agent tool

Both spawn paths use the single Agent tool; the parameters decide which entity you get. Teammate mode (name + run_in_background) is persistent, stateful, and tracked on the PM dashboard. One-shot mode (foreground, no name) is stateless and invisible to the team graph. The researcher uses one-shot mode because it fits the pattern "do this one thing, return the answer, die" — making it a persistent teammate would waste tokens on a lifecycle it doesn't need.

Ref.tools integration

External library documentation is fetched via Ref.tools MCP at ~500–5K tokens per query. The fallback (raw web search) costs 50K+ tokens for the same information. Combined with the cache, the per-lookup cost on a repeated topic drops to zero.

Checkpoint recovery

When a session dies mid-execution, checkpoints let you resume without re-running completed tasks. A 10-task plan interrupted after task 5 resumes from task 6 — the 600K tokens already spent on tasks 1–5 are not re-spent.

Concurrency limits

Parallel execution is capped at 1–4 task-teams based on plan size. This prevents token burn from too many agents running simultaneously while still allowing parallel progress.

Plan size	Max concurrent teams
1–3 tasks	1–2
4–8 tasks	2–3
9+ tasks	3–4

Statusline cost awareness

The statusline shows a live cache indicator and response weight bar. The cache indicator (• green = warm, ○ red = cold) tracks the 5-minute Anthropic prompt cache TTL with a live countdown. The weight bar ($▁▂▃▅▇) combines context fill percentage, model type, and cache state into a single visual — it jumps in real time when the cache expires, showing the cost difference between cached and uncached responses.

Usage management

Before execution, you're asked whether to enable extra usage:

Extra usage enabled

Development runs as fast as possible with no pauses. Use this when you need results quickly and your Anthropic account has extra usage capacity.

Extra usage disabled

A two-agent design keeps usage management cheap and keeps the Lead almost never interrupted for it:

The usage mode is chosen first. The very first question of execution is push-through vs. auto-pause; it’s set once and never re-asked, so the Lead is never interrupted mid-run to make that call.
PM owns one script (scripts/usage-monitor.sh) run in the background via the Monitor tool. It is the single source of truth for usage (account resolution + thresholds) and emits a line only on actionable milestones: the critical-stop crossing, the work-can-restart reset (time-authoritative — fires on the known reset time even if usage data went stale while paused), and stalls. No STATUS, no advisory chatter. In push-through mode it suppresses usage emits entirely.
The soft limit never interrupts anyone — it only matters when starting new work, so the Lead enforces it with a one-shot status check before each spawn. The Lead is messaged only for a real start/stop: on a critical-stop it pauses in-flight work, checkpoints, and arms a self-owned one-shot wake at the known reset time (its guaranteed restart); on reset it resumes with full context.

Per-task budget data (usage % at start/end of each task) is tracked in plan.json. Multiple stop/restart cycles are supported with automatic recovery, and there is no separate watchdog agent to idle-nudge the Lead.

Plan during the day, execute overnight. Planning modes (feature, debug, verification) are interactive and use modest tokens. Execution is autonomous and token-heavy. Start a plan during the day when you can discuss scope, then kick off execution before bed — most plans complete within a single rate limit window.

Token budget by activity

Activity	Token range	Interactive?
Feature planning	~20–50K	Yes — conversation with user
Debug investigation	~30–60K	Yes — hypothesis discussion
Verification scan	~20–40K	Yes — discrepancy review
Tech research query	~0.5–5K	No — automated retrieval
Plan execution (per task)	~120K	No — autonomous
Plan execution (overhead)	~150K	No — PM + Knowledge

Interactive activities are cheap. Execution is expensive but autonomous. This is why the "plan by day, execute by night" workflow works well.