server-management

Server management principles and decision-making. Process management, monitoring strategy, and scaling decisions. Teaches thinking, not commands.

$ Instalar

git clone https://github.com/xenitV1/claude-code-maestro /tmp/claude-code-maestro && cp -r /tmp/claude-code-maestro/skills/server-management ~/.claude/skills/claude-code-maestro

// tip: Run this command in your terminal to install the skill


name: server-management description: Server management principles and decision-making. Process management, monitoring strategy, and scaling decisions. Teaches thinking, not commands.

Server Management

Server management principles for production operations. Learn to THINK, not memorize commands.


1. Process Management Principles

Tool Selection

ScenarioTool
Node.js appPM2 (clustering, reload)
Any appsystemd (Linux native)
ContainersDocker/Podman
OrchestrationKubernetes, Docker Swarm

Process Management Goals

GoalWhat It Means
Restart on crashAuto-recovery
Zero-downtime reloadNo service interruption
ClusteringUse all CPU cores
PersistenceSurvive server reboot

2. Monitoring Principles

What to Monitor

CategoryKey Metrics
AvailabilityUptime, health checks
PerformanceResponse time, throughput
ErrorsError rate, types
ResourcesCPU, memory, disk

Alert Severity Strategy

LevelResponse
CriticalImmediate action
WarningInvestigate soon
InfoReview daily

Monitoring Tool Selection

NeedOptions
Simple/FreePM2 metrics, htop
Full observabilityGrafana, Datadog
Error trackingSentry
UptimeUptimeRobot, Pingdom

3. Log Management Principles

Log Strategy

Log TypePurpose
Application logsDebug, audit
Access logsTraffic analysis
Error logsIssue detection

Log Principles

  1. Rotate logs to prevent disk fill
  2. Structured logging (JSON) for parsing
  3. Appropriate levels (error/warn/info/debug)
  4. No sensitive data in logs

4. Scaling Decisions

When to Scale

SymptomSolution
High CPUAdd instances (horizontal)
High memoryIncrease RAM or fix leak
Slow responseProfile first, then scale
Traffic spikesAuto-scaling

Scaling Strategy

TypeWhen to Use
VerticalQuick fix, single instance
HorizontalSustainable, distributed
AutoVariable traffic

5. Health Check Principles

What Constitutes Healthy

CheckMeaning
HTTP 200Service responding
Database connectedData accessible
Dependencies OKExternal services reachable
Resources OKCPU/memory not exhausted

Health Check Implementation

  • Simple: Just return 200
  • Deep: Check all dependencies
  • Choose based on load balancer needs

6. Security Principles

AreaPrinciple
AccessSSH keys only, no passwords
FirewallOnly needed ports open
UpdatesRegular security patches
SecretsEnvironment vars, not files
AuditLog access and changes

7. Troubleshooting Priority

When something's wrong:

  1. Check if running (process status)
  2. Check logs (error messages)
  3. Check resources (disk, memory, CPU)
  4. Check network (ports, DNS)
  5. Check dependencies (database, APIs)

8. Anti-Patterns

❌ Don't✅ Do
Run as rootUse non-root user
Ignore logsSet up log rotation
Skip monitoringMonitor from day one
Manual restartsAuto-restart config
No backupsRegular backup schedule

Remember: A well-managed server is boring. That's the goal.