Unnamed Skill
Use when designing/modifying system architecture or evaluating technology choices. Enforces 7-section TodoWrite with 22+ items. Triggers: "design architecture", "system design", "architectural decision", "should we use [tech]", "compare [A] vs [B]", "add new service", "microservices", "database choice", "API design", "scale to [X] users", "infrastructure decision". If thinking ANY of these, USE THIS SKILL: "quick recommendation is fine", "obvious choice", "we already know the answer", "just need to pick one", "simple architecture question".
$ Installer
git clone https://github.com/pvillega/claude-templates /tmp/claude-templates && cp -r /tmp/claude-templates/.claude/skills/architecture-discipline ~/.claude/skills/claude-templates// tip: Run this command in your terminal to install the skill
name: architecture-discipline description: Use when designing/modifying system architecture or evaluating technology choices. Enforces 7-section TodoWrite with 22+ items. Triggers: "design architecture", "system design", "architectural decision", "should we use [tech]", "compare [A] vs [B]", "add new service", "microservices", "database choice", "API design", "scale to [X] users", "infrastructure decision". If thinking ANY of these, USE THIS SKILL: "quick recommendation is fine", "obvious choice", "we already know the answer", "just need to pick one", "simple architecture question".
Architecture Discipline
When to Use
Use when decisions affect:
- Data models or schema changes
- Service boundaries or new services
- Deployment topology or infrastructure
- Scale characteristics (10x growth implications)
- Technology stack choices
- API contracts or external integrations
Skip for:
- Parameter tweaks or configuration changes
- UI/styling changes
- Localization or copy changes
- Bug fixes within existing architecture
- Adding fields to existing models (unless schema migration)
Threshold: If the decision could cause a 3+ month re-architecture project if wrong, use this skill.
CRITICAL: This Is Reasoning Discipline, Not a Checklist
The 7 sections are a reasoning sequence, not boxes to check:
- Alternatives BEFORE choosing
- Scale requirements BEFORE design
- Failure modes built in, not bolted on
🚨 If you wrote architecture without starting with all 7 sections: DELETE and restart. Retrofitting analysis is rationalization, not evaluation.
MANDATORY FIRST STEP
CREATE TodoWrite with these 7 sections (22+ items total):
| Section | Minimum Items |
|---|---|
| Scale Analysis | 4+ |
| Architectural Options | 3+ |
| Ripple Effect Analysis | 5+ |
| Failure Modes | 3+ |
| Observability | 3+ |
| Documentation | 2+ |
| Migration/Compatibility | 2+ |
Do not design, propose solutions, or implement until TodoWrite is verified.
Verification Checkpoint
After creating TodoWrite, verify 3 random items pass this test:
Each item must have ALL THREE:
- ✓ Concrete numbers/thresholds ("100K users", "$500/mo", "P95 < 500ms")
- ✓ Specific tools/technologies ("PostgreSQL", "Redis", "CloudWatch")
- ✓ Measurable outcome ("handles 1M req/sec", "costs $X at 10x")
| ❌ FAILS | ✅ PASSES |
|---|---|
| "Add monitoring" | "CloudWatch: websocket.connections.active, alert if >5% error rate via PagerDuty" |
| "Evaluate caching" | "Compare: Redis (1ms, $300/mo) vs In-memory LRU (0.1ms, $0) vs No cache (100ms)" |
| "Analyze scale" | "Current: 100K DAU, 50 req/sec. 10x: 1M users, 500 req/sec. Bottleneck: PostgreSQL connection pool" |
DO NOT PROCEED until 22+ items AND quality check passes.
Section Requirements
1. Scale Analysis (4+ items)
NEVER design for current scale only. Before proposing any solution:
- Current scale: Users (DAU/MAU), requests/sec, data volume, read/write ratio
- 10x scale: What numbers at 10x? When expected?
- Bottlenecks: What breaks at 10x? (DB connections, API limits, memory)
- Mitigation: Specific solution for each bottleneck
2. Architectural Options (3+ items)
NEVER present single solution. Minimum 3 distinct options, each with:
- Performance: Latency (P50/P95/P99), throughput, scale limit
- Complexity: LOC estimate, services involved, operational burden
- Cost: Infrastructure ($X/mo current, $Y/mo at 10x), development (engineer-weeks)
- Trade-offs: Specific advantages (✅) and disadvantages (❌)
If stakeholder suggests solution: Add as Option A, evaluate with SAME rigor as alternatives.
3. Ripple Effect Analysis (5+ items)
Changes propagate across layers. Analyze ALL:
- Data layer: Schema changes, migrations, indexes, query performance
- Services: Which need updates? API contracts changed?
- API: Breaking changes? Version bump? Backward compatibility?
- Clients: Mobile updates? Web UI changes?
- Operations: Deployment changes? New monitoring? Cost changes?
4. Failure Modes (3+ items)
For each mode:
- Scenario: [Component] fails because [reason]
- Detection: How we know (metrics drop, error rate spike)
- Impact: What breaks (user features, data integrity)
- Mitigation: Circuit breaker, fallback, redundancy
5. Observability (3+ items)
- Metrics: Specific (latency P95, error rate %, throughput)
- Alerts: Conditions (error rate > 5%, latency P95 > 500ms)
- Dashboards: Key visualizations
6. Documentation (2+ items)
- ADR: Chosen option, rejected alternatives, trade-offs, constraints
- Diagram: New components, data flows, failure paths
7. Migration/Compatibility (2+ items)
- Backward compatibility: Old clients work? API versioning?
- Migration path: Phased rollout, feature flags, rollback procedure
Red Flags - STOP When You Think:
| Thought | Reality |
|---|---|
| "Analysis paralysis" | This IS the analysis that prevents expensive mistakes |
| "We'll add scale/alternatives/failure modes later" | Retrofitting costs 5-10x more |
| "CTO already decided" | Still needs independent evaluation |
| "Being pragmatic not dogmatic" | These requirements ARE pragmatic |
| "Just a simple feature" | Simple becomes complex at scale |
| "We already know the solution" | Compare 3 alternatives first |
| "Keep it simple" | Simple for current scale = complex re-architecture at 10x |
| "I can add missing sections to existing work" | DELETE and restart |
Override Requirements
To skip ANY requirement, you MUST provide ALL 4:
- Specific retrofit date (not "later")
- Budget allocated (engineer-weeks)
- Risk acceptance signed by decision maker
- Interim mitigation plan
| Skipped | Risk | Cost |
|---|---|---|
| Scale Analysis | Re-architecture in 6-12 months | 3-6 month project, 5-10x cost |
| Alternatives | Optimize wrong dimension | 2-4 month migration |
| Failure Modes | Production incidents | $5-50K per incident |
| Ripple Effects | Broken clients, data issues | Deployment failures |
Verification Before Complete
| Category | Requirements |
|---|---|
| Scale | ✓ Current + 10x projected ✓ Bottlenecks ✓ Mitigations |
| Trade-offs | ✓ 3+ options ✓ Performance/complexity/cost ✓ Rationale |
| Impact | ✓ All layers analyzed ✓ Breaking changes identified |
| Failure | ✓ Specific modes ✓ Detection ✓ Mitigation ✓ Rollback |
| Documentation | ✓ ADR ✓ Diagram updated |
If any item missing, do not proceed to implementation.
Repository
