Marketplace

ops-platform-onboarding

Structured workflow for onboarding new services to the internal platform including infrastructure provisioning, observability setup, and documentation.

$ 설치

git clone https://github.com/LerianStudio/ring /tmp/ring && cp -r /tmp/ring/ops-team/skills/ops-platform-onboarding ~/.claude/skills/ring

// tip: Run this command in your terminal to install the skill


name: ops-platform-onboarding description: | Structured workflow for onboarding new services to the internal platform including infrastructure provisioning, observability setup, and documentation.

trigger: |

  • New service deployment to platform
  • Service migration to platform
  • Platform capability adoption
  • Team onboarding to platform

skip_when: |

  • Application development -> use ring-dev-team specialists
  • Existing service configuration changes -> standard change management
  • Non-platform infrastructure -> use ops-infrastructure-architect

related: similar: [dev-cycle] uses: [platform-engineer]

Platform Onboarding Workflow

This skill defines the structured process for onboarding services to the internal developer platform. Use it to ensure consistent, compliant service deployments.


Onboarding Phases

PhaseFocusOutput
1. RequirementsGather service requirementsRequirements doc
2. Golden Path SelectionChoose deployment patternSelected template
3. Infrastructure ProvisioningCreate service resourcesInfrastructure ready
4. Observability SetupConfigure monitoringDashboards/alerts
5. Security ConfigurationApply security controlsSecurity validated
6. DocumentationComplete service docsRunbook ready
7. HandoffTransfer to service teamOwnership confirmed

Phase 1: Requirements Gathering

Service Requirements Checklist

## Service Onboarding Request

**Service Name:** [name]
**Team:** [owning team]
**Requested By:** [name]
**Target Date:** YYYY-MM-DD

### Service Information

| Attribute | Value |
|-----------|-------|
| Service type | [API / Worker / Batch / Frontend] |
| Language/runtime | [Go / Node.js / Python / etc.] |
| Criticality | [Tier 1/2/3/4] |
| External traffic | [Yes / No] |
| Data sensitivity | [PII / Financial / Public] |

### Resource Requirements

| Resource | Requirement | Notes |
|----------|-------------|-------|
| CPU | [cores] | [peak/average] |
| Memory | [GB] | [peak/average] |
| Storage | [GB] | [type: SSD/HDD] |
| Database | [type] | [shared/dedicated] |
| Cache | [type] | [shared/dedicated] |

### Dependencies

| Dependency | Type | SLA Required |
|------------|------|--------------|
| [service] | Internal | [Yes/No] |
| [external] | External | [Yes/No] |

### Compliance Requirements

- [ ] SOC2
- [ ] PCI-DSS
- [ ] GDPR
- [ ] HIPAA
- [ ] Other: ____________

Phase 2: Golden Path Selection

Available Golden Paths

Golden PathUse CaseIncludes
api-serviceREST/GraphQL APIsALB, EKS, RDS, ElastiCache
worker-serviceBackground processingSQS, EKS, auto-scaling
batch-jobScheduled jobsEventBridge, Lambda/Fargate
frontend-appStatic sites, SPAsCloudFront, S3, API Gateway
data-pipelineETL, streamingKinesis, Glue, S3

Golden Path Selection Matrix

Requirementapi-serviceworker-servicebatch-job
HTTP trafficYesNoNo
Queue processingOptionalYesOptional
Scheduled runsNoNoYes
Real-timeYesNear-real-timeNo
Auto-scalingYesYesN/A

Selection Template

## Golden Path Selection

**Service:** [name]
**Selected Path:** [api-service / worker-service / etc.]

### Rationale

1. Service type [X] matches [golden path] pattern
2. Traffic requirements of [X] supported by [features]
3. Compliance requirements met by built-in [controls]

### Customizations Required

| Standard Component | Customization | Reason |
|--------------------|---------------|--------|
| [component] | [change] | [why] |

### Approval

- [ ] Platform team reviewed
- [ ] Security team reviewed (if customizations)
- [ ] Architecture team reviewed (if non-standard)

Phase 3: Infrastructure Provisioning

Provisioning Checklist

  • Namespace/project created
  • Compute resources provisioned
  • Database provisioned (if required)
  • Cache provisioned (if required)
  • Load balancer configured
  • DNS entries created
  • SSL certificates provisioned
  • Secrets stored in vault
  • IAM roles/service accounts created

Terraform/IaC Template

# Example service provisioning
module "service" {
  source = "platform/service-template"

  service_name    = var.service_name
  team            = var.team
  environment     = var.environment
  golden_path     = "api-service"

  # Compute
  cpu_request     = "500m"
  memory_request  = "512Mi"
  replicas_min    = 2
  replicas_max    = 10

  # Database
  database_enabled = true
  database_class   = "db.t3.medium"

  # Tags
  tags = {
    Team        = var.team
    Environment = var.environment
    CostCenter  = var.cost_center
  }
}

Provisioning Verification

# Verify namespace
kubectl get namespace [service-name]

# Verify compute
kubectl get deployment -n [service-name]

# Verify database
aws rds describe-db-instances --db-instance-identifier [service-db]

# Verify DNS
dig [service-name].internal.example.com

Phase 4: Observability Setup

Observability Checklist

  • Structured logging configured
  • Tracing instrumentation added
  • Metrics endpoints exposed
  • Service dashboard created
  • SLI/SLO defined
  • Alerts configured
  • On-call integration set up

Dashboard Template

Standard service dashboard includes:

PanelMetrics
Request raterequests/sec, by status code
Error rate5xx rate, 4xx rate
Latencyp50, p95, p99
SaturationCPU, memory utilization
DependenciesUpstream/downstream health

Alert Configuration

AlertConditionSeverityResponse
High error rate5xx > 1% for 5mCriticalPage on-call
High latencyp99 > 1s for 5mWarningAlert team
Low availabilityuptime < 99.9%CriticalPage on-call
Resource saturationCPU > 85% for 10mWarningAlert team

SLI/SLO Definition

## Service Level Objectives

**Service:** [name]
**SLO Version:** 1.0

| SLI | Target | Measurement |
|-----|--------|-------------|
| Availability | 99.9% | Successful requests / total requests |
| Latency | p99 < 500ms | Request duration percentile |
| Error rate | < 0.1% | 5xx responses / total responses |

### Error Budget

- Monthly budget: 43.2 minutes downtime
- Current consumption: [X]%
- Actions if budget exceeded: [escalation process]

Phase 5: Security Configuration

Security Checklist

  • Network policies applied
  • Service mesh mTLS configured
  • Secrets management configured
  • IAM permissions follow least privilege
  • Security scanning in CI/CD
  • Dependency scanning enabled
  • WAF rules applied (if external)

Network Policy Template

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: service-policy
  namespace: [service-name]
spec:
  podSelector:
    matchLabels:
      app: [service-name]
  policyTypes:
    - Ingress
    - Egress
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              name: istio-system
      ports:
        - port: 8080
  egress:
    - to:
        - namespaceSelector:
            matchLabels:
              name: database
      ports:
        - port: 5432

Security Review

## Security Configuration Review

**Service:** [name]
**Reviewer:** @security-team

| Control | Status | Notes |
|---------|--------|-------|
| mTLS enabled | PASS | Istio strict mode |
| Network policies | PASS | Ingress/egress restricted |
| Secrets management | PASS | Using Vault |
| Least privilege IAM | PASS | Scoped to required resources |
| Vulnerability scanning | PASS | Trivy in CI/CD |

Phase 6: Documentation

Required Documentation

DocumentPurposeTemplate
Service OverviewWhat the service doesREADME.md
RunbookOperational proceduresrunbook.md
ArchitectureDesign decisionsarchitecture.md
API DocsInterface documentationOpenAPI spec
On-call GuideIncident handlingoncall.md

Runbook Template

## [Service Name] Runbook

### Service Overview

[Brief description of what the service does]

### Quick Reference

| Item | Value |
|------|-------|
| Repository | [link] |
| Dashboard | [link] |
| Logs | [query link] |
| On-call | [PagerDuty service] |

### Common Operations

#### Restart Service

```bash
kubectl rollout restart deployment/[service] -n [namespace]

Scale Service

kubectl scale deployment/[service] -n [namespace] --replicas=X

Check Logs

kubectl logs -l app=[service] -n [namespace] --tail=100

Troubleshooting

SymptomPossible CauseResolution
High latencyDB connection poolScale DB or optimize queries
5xx errorsDependency downCheck upstream services
OOM killsMemory leakInvestigate heap, restart

Escalation

LevelContactWhen
L1[team Slack channel]First response
L2[on-call engineer]Cannot resolve in 15m
L3[service owner]Critical/extended outage

---

## Phase 7: Handoff

### Handoff Checklist

- [ ] Service owner identified and trained
- [ ] On-call rotation set up
- [ ] Access provisioned to team
- [ ] Documentation reviewed by team
- [ ] Shadowing session completed
- [ ] Ownership officially transferred

### Handoff Template

```markdown
## Service Handoff Confirmation

**Service:** [name]
**Date:** YYYY-MM-DD
**Platform Team:** @[name]
**Service Owner:** @[name]

### Completed Items

- [x] Infrastructure provisioned and documented
- [x] Observability configured
- [x] Security controls applied
- [x] Runbook created and reviewed
- [x] On-call rotation configured
- [x] Training session completed

### Outstanding Items

| Item | Owner | Due Date |
|------|-------|----------|
| [item] | [owner] | YYYY-MM-DD |

### Acknowledgment

By signing below, the service owner confirms:
1. Receipt of all documentation
2. Understanding of operational procedures
3. Acceptance of on-call responsibilities

**Service Owner:** _________________ Date: _______
**Platform Team:** _________________ Date: _______

Anti-Rationalization Table

RationalizationWhy It's WRONGRequired Action
"Skip documentation, code is self-explanatory"On-call != developersComplete runbook
"We'll add observability later"Blind deployments = incidentsObservability on day 1
"Golden path doesn't fit exactly"Customizations add complexityJustify every deviation
"Security can come later"Later = never for securitySecurity from start
"Team can figure it out"Assumptions cause outagesComplete handoff process

Dispatch Specialist

For platform onboarding tasks, dispatch:

Task tool:
  subagent_type: "platform-engineer"
  model: "opus"
  prompt: |
    SERVICE ONBOARDING REQUEST
    Service: [name]
    Team: [team]
    Type: [API/Worker/Batch]
    Requirements: [summary]
    Golden Path: [if known]