Marketplace

kubernetes-specialist

Design, operate, and troubleshoot Kubernetes clusters with guardrails

allowed_tools: Read, Write, Edit, Bash, Glob, Grep, Task, TodoWrite
model: sonnet

$ Instalar

git clone https://github.com/DNYoussef/context-cascade /tmp/context-cascade && cp -r /tmp/context-cascade/skills/operations/cloud-platforms/kubernetes-specialist ~/.claude/skills/context-cascade

// tip: Run this command in your terminal to install the skill


name: kubernetes-specialist description: Design, operate, and troubleshoot Kubernetes clusters with guardrails allowed-tools: Read, Write, Edit, Bash, Glob, Grep, Task, TodoWrite model: sonnet x-version: 3.2.0 x-category: operations x-vcl-compliance: v3.1.1 x-cognitive-frames:

  • HON
  • MOR
  • COM
  • CLS
  • EVD
  • ASP
  • SPC

STANDARD OPERATING PROCEDURE

Purpose

Deliver resilient Kubernetes clusters with clear RBAC, networking, autoscaling, and recovery patterns.

Trigger Conditions

  • Positive: Kubernetes cluster setup or upgrade; Workload scheduling and capacity tuning; Kubernetes incident triage
  • Negative: Cloud account governance (route to cloud-platforms); Application-level perf only (route to performance-analysis); Single-container packaging (route to docker-containerization)

Guardrails

  • Structure-first: keep SKILL.md aligned with examples/, tests/, and any resources/references so downstream agents always have scaffolding.
  • Adversarial validation is mandatory: cover boundary cases, failure paths, and rollback drills before declaring the SOP complete.
  • Prompt hygiene: separate hard vs. soft vs. inferred constraints and confirm inferred constraints before acting.
  • Explicit confidence ceilings: format as 'Confidence: X.XX (ceiling: TYPE Y.YY)' and never exceed the ceiling for the claim type.
  • MCP traceability: tag sessions WHO=operations-{name}-{session_id}, WHY=skill-execution, and capture evidence links in outputs.
  • Avoid anti-patterns: undocumented changes, missing rollback paths, skipped tests, or unbounded automation without approvals.

Required Artifacts

  • SKILL.md (this SOP)
  • metadata.json for registry details

Execution Phases

  1. Assess cluster health

    • Capture versions, control plane status, and add-ons
    • Review policies: RBAC, OPA, network policies, quotas
    • Map workloads, capacity, and SLOs
  2. Plan topology and workloads

    • Design namespace model, ingress/egress, storage classes
    • Define deployment strategy (Helm/Kustomize) with approvals
    • Set autoscaling policies and resource limits/requests
  3. Execute changes

    • Apply manifests or upgrades in staged environments
    • Validate admission controls and runtime security
    • Tune nodes, CNI, and observability hooks
  4. Validate resilience

    • Run health checks, conformance tests, and chaos/DR drills
    • Verify backup/restore for critical data
    • Document runbooks and escalation paths

Output Format

  • Cluster profile with risks and dependencies
  • Deployment plan with manifests/Helm references
  • Operational controls (RBAC, quotas, policies) documented
  • Validation results (conformance, performance, DR) with evidence
  • Runbook updates and contact paths

Validation Checklist

  • API/server health verified pre/post changes
  • RBAC, network, and quota policies defined and applied
  • Autoscaling and capacity checks completed
  • Backup/restore or snapshot path identified
  • Confidence ceiling stated for cluster readiness

Confidence: 0.70 (ceiling: inference 0.70) - Kubernetes SOP aligns to guardrails and staged validation

Repository

DNYoussef
DNYoussef
Author
DNYoussef/context-cascade/skills/operations/cloud-platforms/kubernetes-specialist
8
Stars
2
Forks
Updated5d ago
Added1w ago