Unnamed Skill

Use when designing deployment pipelines, CI/CD, terraform, or infrastructure automation. Enforces rollback checkpoint then TodoWrite with 19+ items. Triggers: "deploy", "CI/CD", "kubernetes", "terraform". If thinking "rollback later" - use this first.

$ Installer

git clone https://github.com/pvillega/claude-templates /tmp/claude-templates && cp -r /tmp/claude-templates/.claude/skills/deployment-automation-enforcer ~/.claude/skills/claude-templates

// tip: Run this command in your terminal to install the skill

SKILL.md

View on GitHub →

name: deployment-automation-enforcer description: Use when designing deployment pipelines, CI/CD, terraform, or infrastructure automation. Enforces rollback checkpoint then TodoWrite with 19+ items. Triggers: "deploy", "CI/CD", "kubernetes", "terraform". If thinking "rollback later" - use this first.

Deployment Automation Enforcer

ROLLBACK CHECKPOINT (COMPLETE FIRST)

MANDATORY before creating TodoWrite:

Rollback script exists? [Path: _______ | or "NEW - will create first"]
Tested in staging? [Date: _______ | or "must test before production"]
Duration measured? [_____ minutes]
Triggers defined? [List: _______]

Why first: 27% skip rollback when checkpoint appears later.

TodoWrite Requirements

CREATE TodoWrite with 4 sections (19+ items total):

Section	Min Items	Order
Automation	5+	1st
Observability	5+	2nd (BEFORE Failure Recovery)
Failure Recovery	5+	3rd (requires Observability)
Verification	4+	4th

Section order matters: You cannot define failure recovery without observability to detect failures.

Verification Checkpoint

After creating TodoWrite, verify 3 random items:

Each item must have ALL THREE:

✓ Concrete numbers/thresholds ("error rate > 5%", "15 min timeout")
✓ Specific tools ("GitHub Actions", "CloudWatch", "PagerDuty")
✓ Measurable outcome ("rollback tested on [date]", "alert fires within 5min")

❌ FAILS	✅ PASSES
"Add monitoring"	"CloudWatch: `deployment.duration_seconds`, Grafana dashboard at `/dashboards/deployments`, PagerDuty alert if error rate > 5% for 3min"
"Implement rollback"	"Rollback `.github/workflows/rollback.yml` reverts to previous Docker tag from S3 `deployment-history/latest-stable.txt`. Triggers: manual OR error rate > 5% for 3min. Target: < 5 minutes. Test staging on [date]"

Section Requirements

Automation (5+ items)

Identify manual steps in current deployment
Replace with automated scripts/workflows (GitHub Actions, GitLab CI)
Idempotency checks for safe re-runs
Rollback automation for this change
Document exceptions for remaining manual steps

Observability (5+ items) - BEFORE Failure Recovery

Deployment logging (structured: deployment-id, timestamps, steps)
Failure alerts (PagerDuty/SNS on failure, error rate spike)
Metrics (duration, success rate in CloudWatch/Datadog)
Health endpoint (/health returns 200 + dependency status)
Log/metric locations documented

Failure Recovery (5+ items) - AFTER Observability

Failure scenarios defined (won't start, migration fails, health check fails)
Automated rollback triggers (error rate > X%, failed health checks Y minutes)
Health checks post-deployment
Rollback tested in staging (date, duration, success)
Manual recovery documentation as last resort

Verification (4+ items)

Pre-deployment tests automated (unit, integration, lint)
Smoke tests post-deployment (critical flows, key endpoints)
Monitoring/alerts verified working (trigger test alert)
Rollback procedure accessible (script in repo, documented)

Red Flags - STOP When You Think:

Thought	Reality	Data
"Manual deploy is broken, need automation fast"	Automating without rollback creates WORSE problems	27% skip rollback
"We'll add monitoring/rollback after"	Can't detect/recover from failures without them	80% never add "later"
"Rollback is overkill"	Manual recovery ALWAYS takes 10x longer	30+ min manual vs 2 min automated
"We can manually revert"	Detect (no monitoring) + find version (no automation) + apply (error-prone)	30+ min

Response Templates

"We'll Add Rollback Later"

❌ BLOCKED: Cannot deploy without rollback capability.

27% skip rollback when not required upfront
80% of "add later" items never get added
Manual recovery takes 30+ minutes vs 2 minutes automated
Production incidents without rollback = extended downtime + customer impact

Required to override:

Specific retrofit date (not "later")
Budget allocated (engineer-weeks + incident risk cost)
Risk acceptance signed by decision maker
Interim mitigation plan (24/7 on-call? manual monitoring?)

Override Requirements

To skip ANY requirement, provide ALL 4:

Specific retrofit date (not "later")
Budget allocated (engineer-weeks + incident risk)
Risk acceptance signed by decision maker
Interim mitigation plan (24/7 on-call? manual monitoring?)

Self-Grading Before Complete

[ ] 19+ items across 4 sections
[ ] 80%+ items have concrete numbers
[ ] 80%+ items name specific tools
[ ] 100% items have measurable outcomes
[ ] 3 random items pass specificity test
[ ] Rollback checkpoint completed
[ ] Observability BEFORE Failure Recovery (correct order)

Grade 7+/8: Ready to proceed
Grade <7: Revise TodoWrite

Evidence Collection

Before marking complete:

Automation code link (workflow file URL)
Staging deployment log (screenshot/excerpt)
Monitoring dashboard screenshot
Rollback test evidence (log with timestamp, duration)
Alert test confirmation