calibrate

Run an evidence-seeking calibration roundtable to realign the plan with the North Star. Use when pausing between phases, when agents disagree, when reviewing work, when the user mentions "calibrate" or "realign", or when making decisions that affect the plan.

$ Installer

git clone https://github.com/Mburdo/knowledge_and_vibes /tmp/knowledge_and_vibes && cp -r /tmp/knowledge_and_vibes/.claude/skills/calibrate ~/.claude/skills/knowledge_and_vibes

// tip: Run this command in your terminal to install the skill


name: calibrate description: Run an evidence-seeking calibration roundtable to realign the plan with the North Star. Use when pausing between phases, when agents disagree, when reviewing work, when the user mentions "calibrate" or "realign", or when making decisions that affect the plan.

Calibrate — Orchestrator

Hard stop. Evidence-based calibration. Realign to North Star.

Pattern: This skill uses the orchestrator-subagent pattern. Each phase runs in a fresh context for optimal performance. See docs/guides/ORCHESTRATOR_SUBAGENT_PATTERN.md.

When This Applies

SignalAction
Phase completionRun scheduled calibration
User says "calibrate" or "realign"Run full protocol
Agents disagree on approachRun challenge/synthesis
Drift detectedAd-hoc calibration
User says "/calibrate"Run full protocol

Tool Reference

File Operations

ToolPurpose
Read(north_star_path)Read North Star Card
Read(requirements_path)Read REQ-/AC- specs
Write(file_path, content)Write phase reports

Beads/BV

CommandPurpose
bd list --jsonGet all beads with status
bd view <id>View specific bead
bv --robot-summaryDependency overview
bv --robot-alertsCheck for issues

Testing

CommandPurpose
pytestRun test suite
pytest --covCoverage check
ubs --stagedSecurity scan

Historical Context

CommandPurpose
cass search "calibration" --robot --limit 5Find past calibration decisions
cass search "drift" --robot --days 30Find recent drift incidents
cm context "calibration for <phase>" --jsonGet learned patterns

Evidence Types

TypeExample
Codesrc/auth/validator.ts:42
Testnpm test auth → PASS
DocURL + excerpt
Measurement"Response: 150ms"
Discriminating testFails A, passes B

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                    CALIBRATE ORCHESTRATOR                        │
│  - Creates session: sessions/calibrate-{timestamp}/              │
│  - Manages TodoWrite state                                       │
│  - Spawns subagents with minimal context                         │
│  - Passes report_path + summary between phases                   │
└─────────────────────────────────────────────────────────────────┘
                              │
         ┌────────────────────┼────────────────────┐
         │                    │                    │
         ▼                    ▼                    ▼
┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐
│  Coverage Agent │  │   Drift Agent   │  │ Challenge Agent │
│  agents/coverage│  │  agents/drift   │  │ agents/challenge│
│  Fresh context  │  │  Fresh context  │  │  Fresh context  │
└────────┬────────┘  └────────┬────────┘  └────────┬────────┘
         │                    │                    │
         ▼                    ▼                    ▼
    01_coverage.md      02_drift.md        03_challenge.md
         │                    │                    │
         └────────────────────┼────────────────────┘
                              │
         ┌────────────────────┼────────────────────┐
         ▼                    ▼                    ▼
┌─────────────────┐  ┌─────────────────┐
│ Synthesize Agent│  │  Report Agent   │ → Final output to user
│agents/synthesize│  │  agents/report  │
│  Fresh context  │  │  Fresh context  │
└────────┬────────┘  └────────┬────────┘
         │                    │
    04_synthesis.md     05_user_report.md

Subagents

PhaseAgentInputOutput
1agents/coverage.mdrequirements, beadscoverage gaps
2agents/drift.mdNorth Star, coverage reportdrift items
3agents/challenge.mdcoverage + drift reportstest results
4agents/synthesize.mdall reportsdecisions + dissent
5agents/report.mdsynthesisuser-facing report

Philosophy

Tests adjudicate, not rhetoric. Pursue verifiable truth, not persuasive agreement.

Key insight (DebateCoder, 2025): "Tests are the medium of disagreement, not rhetoric." Rhetorical debate degrades outcomes—voting alone beats extended debate (research/003-debate-or-vote.md).

PrincipleMeaning
Tests over rhetoricDisagreements resolved by test results, not persuasion
Write discriminating testsTests that PASS for one approach, FAIL for another
No compromiseEvidence decides winner; don't average opinions
Preserve dissentIf tests don't discriminate, present both positions to user
User decides when value-dependentIf the "right" answer depends on user preferences, stop and ask

Execution Flow

1. Setup (Orchestrator)

1. Create session directory:
   mkdir -p sessions/calibrate-{timestamp}

2. Initialize TodoWrite with phases:
   - [ ] Phase 1: Coverage Analysis
   - [ ] Phase 2: Drift Detection
   - [ ] Phase 3: Test-Based Challenge
   - [ ] Phase 4: Synthesis
   - [ ] Phase 5: User Report

3. Gather inputs:
   - phase_name: The phase being calibrated
   - north_star_path: Path to North Star Card
   - requirements_path: Path to REQ-*/AC-* file
   - beads_status: bd list --json

2. Phase 1: Coverage Analysis

Spawn: agents/coverage.md

Input:

{
  "phase_name": "<phase>",
  "session_dir": "sessions/calibrate-{timestamp}",
  "requirements_path": "PLAN/01_requirements.md",
  "beads_status": "<bd list --json output>"
}

Expected output:

{
  "report_path": "sessions/.../01_coverage_report.md",
  "p0_coverage": "4/5 (80%)",
  "gaps_summary": "1 P0 missing bead, 1 P0 missing tests"
}

3. Phase 2: Drift Detection

Spawn: agents/drift.md

Input:

{
  "phase_name": "<phase>",
  "session_dir": "sessions/calibrate-{timestamp}",
  "north_star_path": "PLAN/00_north_star.md",
  "coverage_report_path": "<from Phase 1>"
}

Expected output:

{
  "report_path": "sessions/.../02_drift_report.md",
  "alignment_summary": "5/7 ALIGNED, 1 DRIFTING, 1 OFF-TRACK",
  "drift_items": ["NS-1: Auth method", "NS-3: Mobile support"]
}

4. Phase 3: Test-Based Challenge

Spawn: agents/challenge.md

Input:

{
  "session_dir": "sessions/calibrate-{timestamp}",
  "coverage_report_path": "<from Phase 1>",
  "drift_report_path": "<from Phase 2>"
}

Expected output:

{
  "report_path": "sessions/.../03_challenge_report.md",
  "verified_claims": ["NS-1 drift", "NS-3 mobile gap"],
  "unresolved": ["API rate limit assumption"]
}

5. Phase 4: Synthesis

Spawn: agents/synthesize.md

Input:

{
  "session_dir": "sessions/calibrate-{timestamp}",
  "coverage_report_path": "<from Phase 1>",
  "drift_report_path": "<from Phase 2>",
  "challenge_report_path": "<from Phase 3>"
}

Expected output:

{
  "report_path": "sessions/.../04_synthesis_report.md",
  "decisions": [{"action": "Implement SSO", "priority": "P0"}],
  "user_questions": ["Load test timing?"],
  "preserved_dissent": ["API rate limit adequacy"]
}

6. Phase 5: User Report

Spawn: agents/report.md

Input:

{
  "session_dir": "sessions/calibrate-{timestamp}",
  "synthesis_report_path": "<from Phase 4>",
  "north_star_path": "PLAN/00_north_star.md"
}

Expected output:

{
  "report_path": "sessions/.../05_user_report.md",
  "summary": {"alignment": "5/7", "blocking": 2},
  "user_questions": ["Load test timing?", "bd-130 scope creep?"]
}

7. Finalize (Orchestrator)

  1. Update TodoWrite (all phases complete)
  2. Read 05_user_report.md
  3. Present to user
  4. Log changes to .beads/change-log.md if decisions made

Context Optimization

Why subagents beat monolithic calibration:

MonolithicSubagent Pattern
All context in one windowEach phase gets fresh 200k
"Lost in middle" riskNo degradation
One failure corrupts allPhases are isolated
~3000 token prompt~500 tokens per phase

Research backing:

  • research/056-multi-agent-orchestrator.md: +90.2% over single-agent
  • research/004-context-length-hurts.md: Context degradation is real

Evidence Standard (All Subagents)

For any non-trivial claim, include at least one:

Evidence TypeExample
Code evidencesrc/auth/validator.ts:42
Test evidencenpm test auth → PASS
Doc evidenceURL + relevant excerpt
Measurement"Response time: 150ms"
Discriminating testTest that fails one option, passes another

Status Labels

CategoryValues
BeadsSOUND / FLAWED / UNCERTAIN
AlignmentALIGNED / DRIFTING / OFF-TRACK
AssumptionsVERIFIED / UNVERIFIED / RISKY
ChallengesACCEPTED / REJECTED

Anti-Patterns

Don'tWhy
Compromise for harmonyTruth > harmony
Soften criticismClarity > comfort
Skip pre-workUnprepared = unproductive
Force agreementPreserve dissent
Argue by rhetoricEvidence only
Pass full content between phasesPass paths + summaries

Templates

Located in .claude/templates/calibration/:

  • user-report.md — Final output to user
  • broadcast.md — Agent analysis broadcast
  • response.md — Challenge responses
  • decision.md — Falsifiable decisions
  • summary.md — Agent-to-agent summary
  • change-log-entry.md — Plan change records

See Also

  • agents/ — Subagent definitions
  • docs/guides/ORCHESTRATOR_SUBAGENT_PATTERN.md — Pattern documentation
  • docs/workflow/IDEATION_TO_PRODUCTION.md — Complete pipeline
  • docs/workflow/PROTOCOLS.md — Protocol cards