Unnamed Skill

Use when designing deployment pipelines, CI/CD, terraform, or infrastructure automation. Enforces rollback checkpoint then TodoWrite with 19+ items. Triggers: "deploy", "CI/CD", "kubernetes", "terraform". If thinking "rollback later" - use this first.

$ 설치

git clone https://github.com/pvillega/claude-templates /tmp/claude-templates && cp -r /tmp/claude-templates/.claude/skills/deployment-automation-enforcer ~/.claude/skills/claude-templates

// tip: Run this command in your terminal to install the skill


name: deployment-automation-enforcer description: Use when designing deployment pipelines, CI/CD, terraform, or infrastructure automation. Enforces rollback checkpoint then TodoWrite with 19+ items. Triggers: "deploy", "CI/CD", "kubernetes", "terraform". If thinking "rollback later" - use this first.

Deployment Automation Enforcer

ROLLBACK CHECKPOINT (COMPLETE FIRST)

MANDATORY before creating TodoWrite:

  • Rollback script exists? [Path: _______ | or "NEW - will create first"]
  • Tested in staging? [Date: _______ | or "must test before production"]
  • Duration measured? [_____ minutes]
  • Triggers defined? [List: _______]

Why first: 27% skip rollback when checkpoint appears later.


TodoWrite Requirements

CREATE TodoWrite with 4 sections (19+ items total):

SectionMin ItemsOrder
Automation5+1st
Observability5+2nd (BEFORE Failure Recovery)
Failure Recovery5+3rd (requires Observability)
Verification4+4th

Section order matters: You cannot define failure recovery without observability to detect failures.


Verification Checkpoint

After creating TodoWrite, verify 3 random items:

Each item must have ALL THREE:

  • ✓ Concrete numbers/thresholds ("error rate > 5%", "15 min timeout")
  • ✓ Specific tools ("GitHub Actions", "CloudWatch", "PagerDuty")
  • ✓ Measurable outcome ("rollback tested on [date]", "alert fires within 5min")
❌ FAILS✅ PASSES
"Add monitoring""CloudWatch: deployment.duration_seconds, Grafana dashboard at /dashboards/deployments, PagerDuty alert if error rate > 5% for 3min"
"Implement rollback""Rollback .github/workflows/rollback.yml reverts to previous Docker tag from S3 deployment-history/latest-stable.txt. Triggers: manual OR error rate > 5% for 3min. Target: < 5 minutes. Test staging on [date]"

Section Requirements

Automation (5+ items)

  • Identify manual steps in current deployment
  • Replace with automated scripts/workflows (GitHub Actions, GitLab CI)
  • Idempotency checks for safe re-runs
  • Rollback automation for this change
  • Document exceptions for remaining manual steps

Observability (5+ items) - BEFORE Failure Recovery

  • Deployment logging (structured: deployment-id, timestamps, steps)
  • Failure alerts (PagerDuty/SNS on failure, error rate spike)
  • Metrics (duration, success rate in CloudWatch/Datadog)
  • Health endpoint (/health returns 200 + dependency status)
  • Log/metric locations documented

Failure Recovery (5+ items) - AFTER Observability

  • Failure scenarios defined (won't start, migration fails, health check fails)
  • Automated rollback triggers (error rate > X%, failed health checks Y minutes)
  • Health checks post-deployment
  • Rollback tested in staging (date, duration, success)
  • Manual recovery documentation as last resort

Verification (4+ items)

  • Pre-deployment tests automated (unit, integration, lint)
  • Smoke tests post-deployment (critical flows, key endpoints)
  • Monitoring/alerts verified working (trigger test alert)
  • Rollback procedure accessible (script in repo, documented)

Red Flags - STOP When You Think:

ThoughtRealityData
"Manual deploy is broken, need automation fast"Automating without rollback creates WORSE problems27% skip rollback
"We'll add monitoring/rollback after"Can't detect/recover from failures without them80% never add "later"
"Rollback is overkill"Manual recovery ALWAYS takes 10x longer30+ min manual vs 2 min automated
"We can manually revert"Detect (no monitoring) + find version (no automation) + apply (error-prone)30+ min

Response Templates

"We'll Add Rollback Later"

BLOCKED: Cannot deploy without rollback capability.

  • 27% skip rollback when not required upfront
  • 80% of "add later" items never get added
  • Manual recovery takes 30+ minutes vs 2 minutes automated
  • Production incidents without rollback = extended downtime + customer impact

Required to override:

  1. Specific retrofit date (not "later")
  2. Budget allocated (engineer-weeks + incident risk cost)
  3. Risk acceptance signed by decision maker
  4. Interim mitigation plan (24/7 on-call? manual monitoring?)

Override Requirements

To skip ANY requirement, provide ALL 4:

  1. Specific retrofit date (not "later")
  2. Budget allocated (engineer-weeks + incident risk)
  3. Risk acceptance signed by decision maker
  4. Interim mitigation plan (24/7 on-call? manual monitoring?)

Self-Grading Before Complete

[ ] 19+ items across 4 sections
[ ] 80%+ items have concrete numbers
[ ] 80%+ items name specific tools
[ ] 100% items have measurable outcomes
[ ] 3 random items pass specificity test
[ ] Rollback checkpoint completed
[ ] Observability BEFORE Failure Recovery (correct order)

Grade 7+/8: Ready to proceed
Grade <7: Revise TodoWrite

Evidence Collection

Before marking complete:

  • Automation code link (workflow file URL)
  • Staging deployment log (screenshot/excerpt)
  • Monitoring dashboard screenshot
  • Rollback test evidence (log with timestamp, duration)
  • Alert test confirmation