kaizen

Continuous improvement methodology for SignalRoom. Use after incidents, when reviewing processes, or when looking for ways to prevent repeat problems. Implements structured retrospectives and improvement cycles.

$ 安裝

git clone https://github.com/majiayu000/claude-skill-registry /tmp/claude-skill-registry && cp -r /tmp/claude-skill-registry/skills/data/kaizen ~/.claude/skills/claude-skill-registry

// tip: Run this command in your terminal to install the skill

SKILL.md

View on GitHub →

name: kaizen description: Continuous improvement methodology for SignalRoom. Use after incidents, when reviewing processes, or when looking for ways to prevent repeat problems. Implements structured retrospectives and improvement cycles.

Kaizen — Continuous Improvement

Philosophy

Kaizen (改善) means "change for better." Small, continuous improvements compound over time.

After every incident or friction point:

Understand what happened
Identify the root cause
Implement a prevention measure
Document the learning

The 5 Whys

Drill down to root cause by asking "Why?" five times:

Problem: Deployment flooded Slack with errors

Why? → Worker kept failing with auth errors
Why? → Database password was wrong
Why? → Credentials weren't loading from .env
Why? → env_file setting was removed from config
Why? → Engineer conflated unrelated errors and made unnecessary changes

Root Cause: Panic-driven debugging without understanding the problem
Prevention: Pre-deployment checklist, local testing requirement

Post-Incident Template

After any incident, create a record:

# Incident: [Brief Title]
**Date:** YYYY-MM-DD
**Duration:** X minutes
**Impact:** What was affected

## Timeline
- HH:MM - What happened
- HH:MM - What was tried
- HH:MM - What fixed it

## Root Cause
Single sentence explaining the actual cause.

## What Should Have Happened
The correct sequence of actions.

## Prevention Measures
- [ ] Specific action item 1
- [ ] Specific action item 2

## Learnings
What we now know that we didn't before.

PDCA Cycle

Plan → Do → Check → Act

Phase	Action	SignalRoom Example
Plan	Identify improvement	"Add pre-deploy checklist"
Do	Implement small change	Add checklist to CLAUDE.md
Check	Verify it works	Next deploy follows checklist
Act	Standardize or adjust	Update checklist based on feedback

Improvement Categories

Process Improvements

Pre-deployment checklists
Local testing requirements
Code review standards
Documentation updates

Tooling Improvements

Better error messages
Automated checks
Monitoring and alerts
Skills and shortcuts

Knowledge Improvements

Document tribal knowledge
Create runbooks
Update CLAUDE.md
Add skills for common patterns

SignalRoom Improvements Log

Track improvements over time:

Date	Trigger	Improvement	Location
2025-12-19	Fly.io incident	Deployment discipline section	CLAUDE.md
2025-12-19	Fly.io incident	Deploy skill with checklist	.claude/skills/deploy
2025-12-19	Fly.io incident	Root-cause-tracing skill	.claude/skills/root-cause-tracing

Questions to Ask

After any friction:

What took longer than expected?
What caused confusion?
What would have helped to know beforehand?
What will we do differently next time?
How do we prevent this class of problem?

Anti-Patterns to Avoid

Blame Culture

Focus on process failures, not people. "The process allowed this to happen" not "X made a mistake."

Improvement Theater

Creating documentation nobody reads. Make improvements actionable and discoverable.

Over-Engineering

Don't build a framework when a checklist will do. Start simple, iterate.

Forgetting

Improvements only work if they're discoverable. Put them where they'll be seen (CLAUDE.md, skills, pre-commit hooks).

Integrating with Skills

Skills are a kaizen mechanism:

After learning something, encode it in a skill
Next time the situation arises, Claude uses the skill automatically
Knowledge persists across sessions

Example: Today's deployment incident → deploy skill → future deployments follow the checklist automatically.

The Compound Effect

Day 1: Add pre-deploy checklist
Day 7: Add root-cause-tracing skill
Day 14: Add automated local test before deploy
Day 30: Deployments that used to take 45 minutes now take 5

Small improvements. Consistent application. Compounding results.