Marketplace

kubernetes-specialist

Design, operate, and troubleshoot Kubernetes clusters with guardrails

allowed_tools: Read, Write, Edit, Bash, Glob, Grep, Task, TodoWrite

model: sonnet

$ Instalar

git clone https://github.com/DNYoussef/context-cascade /tmp/context-cascade && cp -r /tmp/context-cascade/skills/operations/cloud-platforms/kubernetes-specialist ~/.claude/skills/context-cascade

// tip: Run this command in your terminal to install the skill

SKILL.md

View on GitHub →

name: kubernetes-specialist description: Design, operate, and troubleshoot Kubernetes clusters with guardrails allowed-tools: Read, Write, Edit, Bash, Glob, Grep, Task, TodoWrite model: sonnet x-version: 3.2.0 x-category: operations x-vcl-compliance: v3.1.1 x-cognitive-frames:

HON
MOR
COM
CLS
EVD
ASP
SPC

STANDARD OPERATING PROCEDURE

Purpose

Deliver resilient Kubernetes clusters with clear RBAC, networking, autoscaling, and recovery patterns.

Trigger Conditions

Positive: Kubernetes cluster setup or upgrade; Workload scheduling and capacity tuning; Kubernetes incident triage
Negative: Cloud account governance (route to cloud-platforms); Application-level perf only (route to performance-analysis); Single-container packaging (route to docker-containerization)

Guardrails

Structure-first: keep SKILL.md aligned with examples/, tests/, and any resources/references so downstream agents always have scaffolding.
Adversarial validation is mandatory: cover boundary cases, failure paths, and rollback drills before declaring the SOP complete.
Prompt hygiene: separate hard vs. soft vs. inferred constraints and confirm inferred constraints before acting.
Explicit confidence ceilings: format as 'Confidence: X.XX (ceiling: TYPE Y.YY)' and never exceed the ceiling for the claim type.
MCP traceability: tag sessions WHO=operations-{name}-{session_id}, WHY=skill-execution, and capture evidence links in outputs.
Avoid anti-patterns: undocumented changes, missing rollback paths, skipped tests, or unbounded automation without approvals.

Required Artifacts

SKILL.md (this SOP)
metadata.json for registry details

Execution Phases

Assess cluster health
- Capture versions, control plane status, and add-ons
- Review policies: RBAC, OPA, network policies, quotas
- Map workloads, capacity, and SLOs
Plan topology and workloads
- Design namespace model, ingress/egress, storage classes
- Define deployment strategy (Helm/Kustomize) with approvals
- Set autoscaling policies and resource limits/requests
Execute changes
- Apply manifests or upgrades in staged environments
- Validate admission controls and runtime security
- Tune nodes, CNI, and observability hooks
Validate resilience
- Run health checks, conformance tests, and chaos/DR drills
- Verify backup/restore for critical data
- Document runbooks and escalation paths

Output Format

Cluster profile with risks and dependencies
Deployment plan with manifests/Helm references
Operational controls (RBAC, quotas, policies) documented
Validation results (conformance, performance, DR) with evidence
Runbook updates and contact paths

Validation Checklist

API/server health verified pre/post changes
RBAC, network, and quota policies defined and applied
Autoscaling and capacity checks completed
Backup/restore or snapshot path identified
Confidence ceiling stated for cluster readiness

Confidence: 0.70 (ceiling: inference 0.70) - Kubernetes SOP aligns to guardrails and staged validation