site-reliability-engineer

Production monitoring, observability, SLO/SLI management, and incident response.Trigger terms: monitoring, observability, SRE, site reliability, alerting, incident response,SLO, SLI, error budget, Prometheus, Grafana, Datadog, New Relic, ELK stack, logs, metrics,traces, on-call, production monitoring, health checks, uptime, availability, dashboards,post-mortem, incident management, runbook.Completes SDD Stage 8 (Monitoring) with comprehensive production observability:- SLI/SLO definitions and tracking- Monitoring stack setup (Prometheus, Grafana, ELK, Datadog, etc.)- Alert rules and notification channels- Incident response runbooks- Observability dashboards (logs, metrics, traces)- Post-mortem templates and analysis- Health check endpoints- Error budget trackingUse when: user needs production monitoring, observability platform, alerting, SLOs,incident response, or post-deployment health tracking.

$ インストール

git clone https://github.com/majiayu000/claude-skill-registry /tmp/claude-skill-registry && cp -r /tmp/claude-skill-registry/skills/data/site-reliability-engineer ~/.claude/skills/claude-skill-registry

// tip: Run this command in your terminal to install the skill