🔧

Monitoring

153 skills in DevOps > Monitoring

server-management

Server management principles and decision-making. Process management, monitoring strategy, and scaling decisions. Teaches thinking, not commands.

xenitV1/claude-code-maestro
62
15
Aktualisiert 3d ago

monitoring-observability

Implement comprehensive monitoring, logging, metrics, tracing, and alerting for production applications to ensure reliability and quick incident response. Use when setting up application monitoring, implementing structured logging, creating metrics and dashboards, setting up alerts, implementing distributed tracing, monitoring performance, tracking errors, or building observability into applications.

korallis/Droidz
49
6
Aktualisiert 3d ago

sentry-performance-monitoring

Marketplace

Use when setting up performance monitoring, distributed tracing, or profiling with Sentry. Covers transactions, spans, and performance insights.

TheBushidoCollective/han
47
5
Aktualisiert 3d ago

sre-monitoring-and-observability

Marketplace

Use when building comprehensive monitoring and observability systems.

TheBushidoCollective/han
47
5
Aktualisiert 3d ago

aws-cost-operations

Marketplace

This skill provides AWS cost optimization, monitoring, and operational best practices with integrated MCP servers for billing analysis, cost estimation, observability, and security assessment.

zxkane/aws-skills
40
7
Aktualisiert 3d ago

prometheus-monitoring

Set up Prometheus monitoring for applications with custom metrics, scraping configurations, and service discovery. Use when implementing time-series metrics collection, monitoring applications, or building observability infrastructure.

aj-geddes/useful-ai-prompts
25
1
Aktualisiert 3d ago

correlation-tracing

Implement distributed tracing with correlation IDs, trace propagation, and span tracking across microservices. Use when debugging distributed systems, monitoring request flows, or implementing observability.

aj-geddes/useful-ai-prompts
25
1
Aktualisiert 3d ago

log-aggregation

Implement centralized logging with ELK Stack, Loki, or Splunk for log collection, parsing, storage, and analysis across infrastructure.

aj-geddes/useful-ai-prompts
25
1
Aktualisiert 3d ago

dev-sre

Marketplace

Gate 2 of the development cycle. VALIDATES that observability was correctly implemented by developers. Does not implement observability code - only validates it.

LerianStudio/ring
25
1
Aktualisiert 3d ago

infrastructure-monitoring

Set up comprehensive infrastructure monitoring with Prometheus, Grafana, and alerting systems for metrics, health checks, and performance tracking.

aj-geddes/useful-ai-prompts
25
1
Aktualisiert 3d ago

qa-observability

Production observability and performance engineering with OpenTelemetry, distributed tracing, metrics, logging, SLO/SLI design, capacity planning, performance profiling, APM integration, and observability maturity progression for modern cloud-native systems.

vasilyu1983/AI-Agents-public
21
6
Aktualisiert 2d ago

performance-monitor

Expert performance monitor specializing in system-wide metrics collection, analysis, and optimization. Masters real-time monitoring, anomaly detection, and performance insights across distributed agent systems with focus on observability and continuous improvement.

zenobi-us/dotfiles
21
4
Aktualisiert 2d ago

site-reliability-engineer

Production monitoring, observability, SLO/SLI management, and incident response. Trigger terms: monitoring, observability, SRE, site reliability, alerting, incident response, SLO, SLI, error budget, Prometheus, Grafana, Datadog, New Relic, ELK stack, logs, metrics, traces, on-call, production monitoring, health checks, uptime, availability, dashboards, post-mortem, incident management, runbook. Completes SDD Stage 8 (Monitoring) with comprehensive production observability: - SLI/SLO definitions and tracking - Monitoring stack setup (Prometheus, Grafana, ELK, Datadog, etc.) - Alert rules and notification channels - Incident response runbooks - Observability dashboards (logs, metrics, traces) - Post-mortem templates and analysis - Health check endpoints - Error budget tracking Use when: user needs production monitoring, observability platform, alerting, SLOs, incident response, or post-deployment health tracking.

nahisaho/MUSUBI
19
2
Aktualisiert 2d ago

site-reliability-engineer

Production monitoring, observability, SLO/SLI management, and incident response. Trigger terms: monitoring, observability, SRE, site reliability, alerting, incident response, SLO, SLI, error budget, Prometheus, Grafana, Datadog, New Relic, ELK stack, logs, metrics, traces, on-call, production monitoring, health checks, uptime, availability, dashboards, post-mortem, incident management, runbook. Completes SDD Stage 8 (Monitoring) with comprehensive production observability: - SLI/SLO definitions and tracking - Monitoring stack setup (Prometheus, Grafana, ELK, Datadog, etc.) - Alert rules and notification channels - Incident response runbooks - Observability dashboards (logs, metrics, traces) - Post-mortem templates and analysis - Health check endpoints - Error budget tracking Use when: user needs production monitoring, observability platform, alerting, SLOs, incident response, or post-deployment health tracking.

nahisaho/MUSUBI
19
2
Aktualisiert 2d ago

grey-haven-observability-engineering

Marketplace

Production-ready monitoring, logging, and tracing using Prometheus, Grafana, OpenTelemetry, DataDog, and Sentry. Use when setting up production monitoring, implementing SLOs, distributed tracing, or performance tracking.

greyhaven-ai/claude-code-config
15
2
Aktualisiert 2d ago

Observability Instrumentation

Marketplace

Comprehensive observability methodology implementing three pillars (logs, metrics, traces) with structured logging using Go slog, Prometheus-style metrics, and distributed tracing patterns. Use when adding observability from scratch, logs unstructured or inadequate, no metrics collection, debugging production issues difficult, or need performance monitoring. Provides structured logging patterns (contextual logging, log levels DEBUG/INFO/WARN/ERROR, request ID propagation), metrics instrumentation (counter/gauge/histogram patterns, Prometheus exposition), tracing setup (span creation, context propagation, sampling strategies), and Go slog best practices (JSON formatting, attribute management, handler configuration). Validated in meta-cc with 23-46x speedup vs ad-hoc logging, 90-95% transferability across languages (slog specific to Go but patterns universal).

yaleh/meta-cc
15
1
Aktualisiert 2d ago

ln-367-observability-auditor

Marketplace

Observability audit worker (L3). Checks structured logging, health check endpoints, metrics collection, request tracing, log levels. Returns findings with severity, location, effort, recommendations.

levnikolaevich/claude-code-skills
13
1
Aktualisiert 2d ago

Unnamed Skill

Marketplace

Use when setting up monitoring systems, logging, metrics, tracing, or alerting. Invoke for dashboards, Prometheus/Grafana, load testing, profiling, capacity planning. Keywords: monitoring, observability, logging, metrics, tracing, alerting, Prometheus, Grafana.

Jeffallan/claude-skills
12
1
Aktualisiert 2d ago

monitoring

Monitoring standards for monitoring in Devops environments. Covers best

williamzujkowski/standards
11
0
Aktualisiert 2d ago

agent-performance-monitor

Expert performance monitor specializing in system-wide metrics collection, analysis, and optimization. Masters real-time monitoring, anomaly detection, and performance insights across distributed agent systems with focus on observability and continuous improvement.

Tony363/SuperClaude
10
0
Aktualisiert 2d ago