Plan Execution

Execute approved plans with coordinated agent teams. Each task gets a dedicated Executor, Reviewer, and Tester that work together through a structured pipeline — planning, implementation, code review, and testing happen automatically.

Running a plan

/uc:plan-execution {plan-name}

Before spawning any agents, execution presents a cost estimate and asks for confirmation. You'll see the task count, concurrency level, and estimated token usage.

The task pipeline

Each task in your plan gets a dedicated mini-team that self-coordinates:

Role Model What it does
Executor Opus Explores codebase, writes implementation plan, implements code, drives the review/test cycle
Reviewer Sonnet Reads standards and architecture before code is written, gives plan feedback, performs formal code review
Tester Sonnet Tests against product docs and success criteria, writes missing test coverage, validates UI in browser

One plan-wide teammate supports all tasks:

Knowledge lives per-task in each tasks/task-N/task.md's **Research:** section — a list of pointers to durable files under documentation/technology/research/, populated by planning Stage 2 via /uc:research. Lead reviews research coverage per-task just before spawning each team and fills any gaps. Mid-execution, teammates send QUERY: {question} to Lead for external docs; Lead runs /uc:research, replies with an ANSWER:, and appends the new pointer to the task's task.md so it's durable for re-spawns and future teammates. The typed distinction between Teammates (TeamCreate, stateful, tracked) and Subagents (Task tool, stateless, invisible to the team graph) keeps the team focused on pipeline work while allowing cheap isolated research lookups.

How a task flows

  1. Planning modes have already written tasks/task-N/task.md for every task (authoritative content: description, files, patterns, Research pointers, success criteria, dependencies). The plan README holds a flat task heading index only.
  2. Lead runs its pre-spawn checklist — reviews task.md's Research pointers, runs /uc:research for any gap or staleness, then spawns Executor + Reviewer in parallel (minimal pointer prompts; agents read the task directory on startup).
  3. Reviewer speaks first. After its startup read, Reviewer synthesizes a REVIEWER TAKE from task.md + standards + architecture + patterns and sends it to Executor before Executor plans.
  4. Executor reads task.md and explores the codebase while the Reviewer is still building its take. It blocks on REVIEWER TAKE before writing tasks/task-N/plan.md — waiting time is productive exploration and mental drafting, not idle. Once the take arrives, plan.md is written once, with the take baked in, as a thin execution delta (approach per file, criterion-to-approach mapping by ID, reviewer-take incorporation, risks). It does not restate task.md content.
  5. Executor runs a deviation self-check against task.md. If clean (all files in scope, all criteria mapped, no reviewer-take contradictions) it proceeds directly to implementation — no Lead gate. If the plan deviates, Executor sends ADVICE REQUEST task-N [deviation] and waits for APPROVED (Lead amends task.md and broadcasts) or AMEND.
  6. Executor implements. Reviewer reads files as they're written via progress updates.
  7. The moment code is done — before writing impl.md — Executor signals "code complete" to Lead. Lead lazy-spawns the Tester (which runs its own startup read in parallel) and may pre-spawn the next dependent task's team into a pipeline-wait gate.
  8. Executor writes tasks/task-N/impl.md (delta only — created/modified files, exports, INTEGRATION, GOTCHA), broadcasts FILE-UPDATED, then signals "ready for review" and "ready for test" simultaneously.
  9. Reviewer and Tester run in parallel. Both must PASS. If either fails, Executor fixes, updates impl.md with fix notes, re-broadcasts, and re-submits.
  10. Both pass → task done. Lead sends Implementation approved to any parked pipeline successor, shuts the team down, and fills the slot with the next unblocked task.

An optional ADVICE channel is open throughout for Executor to ask Lead for judgment on complicated problems, deep-reasoning design calls, or knowledge about orchestration context (other tasks in flight, plan history, user intent). ADVICE is non-blocking except for the mandatory [deviation] case.

Concurrency

Multiple tasks run in parallel based on plan size:

Plan sizeConcurrent teams
1–3 tasks1–2
4–8 tasks2–3
9+ tasks3–4

Tasks normally spawn when their dependencies are completed and a slot is available. As a pipeline optimization, when an Executor signals "code complete", the Lead may pre-spawn the next dependent task into a planning stage if a concurrency slot is free — that successor researches and plans during the predecessor's review/test window, then begins coding only after the predecessor fully passes.

What you see during execution

Between these, the system runs silently. The Lead doesn't narrate what agents are doing.

Checkpoints and session recovery

Checkpoints save automatically every 3 completed tasks, and on any usage pause. If your session dies mid-execution:

  1. Run the same /uc:plan-execution {plan-name} command again
  2. The system detects the checkpoint and shows you the progress so far
  3. Confirm resume — completed tasks are skipped, incomplete tasks are re-spawned with their context

All progress is preserved in the plan directory: task artifacts, shared notes, and checkpoint files survive across sessions.

Task directory as the team's single source of truth

Every task gets its own directory at documentation/plans/{plan}/tasks/task-N/ containing three files, each written by a different role:

  • task.md — written by the planning mode at Stage 4 (Lead may amend mid-execution). Authoritative task brief: description, files, patterns, **Research:** pointers, success criteria, dependencies. Every team member reads this on startup.
  • plan.md — written by the Executor during its Phase 3. A thin execution delta that extends task.md — concrete function/class/signature choices per file, criterion-to-approach mapping by ID, reviewer-take incorporation, risks. Does not restate task.md content.
  • impl.md — written by the Executor during Phase 4.5 after code complete. An implementation delta — created/modified files with line ranges, exports, INTEGRATION notes for other tasks, GOTCHA notes for library quirks. Does not restate task.md or plan.md content.

Spawn prompts carry no inline task content — they're minimal pointers. All team members (including lazy-spawned Tester, pipeline successors, and crash re-spawns) follow the shared task-team-startup protocol to read the task directory on their first action. File writes broadcast FILE-UPDATED task-N/{file}: reason to active teammates so state stays in sync without verbose messages.

Why Reviewer speaks first

Reviewer's unique value is standards / architecture / patterns knowledge — the stuff it reads from documentation/technology/standards/ and documentation/technology/architecture/ that Executor doesn't touch. In the old model, that knowledge only surfaced after Executor had already written a plan, making Reviewer's input an advisory round-trip on work that was already done.

In the new model, Reviewer front-loads that knowledge as a REVIEWER TAKE message sent to Executor before plan.md is written. Executor incorporates it directly into plan.md as a primary input. Three wins:

  • Parallelizes work. Reviewer reads standards while Executor explores the codebase — no serial bottleneck.
  • Plan.md gets written once. Executor blocks on the take before writing, so plan.md is authored a single time with the take baked in — no draft/feedback/rewrite cycle.
  • Lean plan.md. plan.md is a thin extension of task.md + reviewer take, not a multi-section document retro-fit to accommodate teammate concerns.

Reviewer's later role (formal code review on "ready for review") is unchanged. Its upfront voice is new; its backend voice is the same.

Lead plan review is gone — ADVICE and deviation self-check

The old blocking "Lead plan review" gate is removed. Rationale: task.md already encodes the authoritative scope (user-approved at Stage 4), REVIEWER TAKE covers standards/architecture fit, and Executor's plan.md is structurally a thin delta that can't silently expand scope without visible deviation.

Before implementing, Executor runs a deviation self-check:

  • Every file plan.md proposes to touch appears in task.md's Files list
  • Every success criterion in task.md has a mapping entry in plan.md
  • plan.md does not contradict any constraint from the REVIEWER TAKE

If all three pass, Executor proceeds to implementation with zero Lead round-trips. If any fails, Executor sends ADVICE REQUEST task-N [deviation]: {reason} and waits for APPROVED (Lead amends task.md and broadcasts) or AMEND: {instructions}.

Executor can also pull Lead's judgment at any time (planning or implementation) via ADVICE REQUEST task-N [complicated|deep-reasoning|knowledge]: {context + question}. Those cases are non-blocking — Executor decides whether to wait. ADVICE is distinct from QUERY: (external library docs via /uc:research): ADVICE is for Lead's judgment and orchestration context, QUERY is for external knowledge.

Failure handling

Failures are handled internally within each task team:

  • Reviewer or Tester sends FAIL → Executor fixes → re-submits for review/test
  • Up to 10 fix cycles before escalating to the user
  • If a team member crashes, it's re-spawned with full context from task artifacts
  • If the Lead discovers a plan-invalidating issue, the pipeline pauses until you decide how to proceed
Usage management

Before execution starts, you choose whether to enable extra usage:

  • Extra usage enabled: Runs at full speed. The Haiku watchdog still monitors for stalls but Lead does not perform budget assessments.
  • Extra usage disabled: A Haiku watchdog agent (costing ~$0.03/5h) ticks every minute checking two independent rate-limit windows (5h: 80% soft / 90% hard, 7d: 90% soft / 95% hard) and executor staleness. On alert, it signals the PM, who validates and forwards to Lead with window-qualified context. On soft, Lead lets in-flight work finish and stops spawning until reset. On hard, Lead stops all teams immediately and checkpoints. Lead can also decide not to start task 1 if either window is already elevated.

The three-agent chain (watchdog → PM → Lead) separates detection (cheap Haiku), coordination (PM validates and adds context), and decisions (Lead knows the work). Per-task budget data (usage % at start/end) is tracked in plan.json for the operational report.