Marketplace
self-service-infrastructure
Use when designing infrastructure self-service portals, IaC templates, or automated provisioning systems. Covers Terraform modules, Pulumi, environment provisioning, and infrastructure guardrails.
allowed_tools: Read, Glob, Grep
$ 설치
git clone https://github.com/melodic-software/claude-code-plugins /tmp/claude-code-plugins && cp -r /tmp/claude-code-plugins/plugins/systems-design/skills/self-service-infrastructure ~/.claude/skills/claude-code-plugins// tip: Run this command in your terminal to install the skill
SKILL.md
name: self-service-infrastructure description: Use when designing infrastructure self-service portals, IaC templates, or automated provisioning systems. Covers Terraform modules, Pulumi, environment provisioning, and infrastructure guardrails. allowed-tools: Read, Glob, Grep
Self-Service Infrastructure
Patterns for enabling developers to provision infrastructure without tickets, while maintaining governance and control.
When to Use This Skill
- Designing infrastructure self-service capabilities
- Creating reusable Terraform/Pulumi modules
- Building environment provisioning systems
- Implementing infrastructure guardrails
- Reducing infrastructure request bottlenecks
- Balancing developer autonomy with governance
Self-Service Fundamentals
What is Self-Service Infrastructure?
Self-Service Infrastructure:
Enabling developers to provision and manage infrastructure
directly, without filing tickets or waiting for ops teams.
Traditional Model:
┌─────────────────────────────────────────────────────────────┐
│ Developer → Ticket → Ops Review → Manual Provision → Done │
│ │
│ Timeline: Days to weeks │
│ Bottleneck: Ops team capacity │
│ Result: Shadow IT, workarounds, frustration │
└─────────────────────────────────────────────────────────────┘
Self-Service Model:
┌─────────────────────────────────────────────────────────────┐
│ Developer → Portal/API → Automatic Provision → Done │
│ │
│ Timeline: Minutes to hours │
│ Bottleneck: None (automated) │
│ Result: Speed, consistency, compliance │
└─────────────────────────────────────────────────────────────┘
Self-Service Spectrum:
├── Fully Managed: Click a button, get a database
├── Template-Based: Customize from approved templates
├── Policy-Constrained: Write IaC within guardrails
└── Full Freedom: Any infrastructure (risky)
Sweet Spot: Template-Based with Policy Guardrails
Key Benefits
Self-Service Benefits:
For Developers:
├── Speed: Minutes instead of days
├── Autonomy: Provision when needed
├── Consistency: Same infrastructure every time
├── Learning: Understand infrastructure better
└── Ownership: More responsibility, more control
For Operations:
├── Scale: Handle more requests without more people
├── Consistency: Enforce standards automatically
├── Focus: Work on platform, not tickets
├── Audit: Clear trail of who provisioned what
└── Compliance: Built-in policy enforcement
For Organization:
├── Velocity: Faster time to market
├── Cost: Reduced ops overhead
├── Governance: Better compliance posture
├── Security: Consistent security controls
└── Efficiency: Resources provisioned when needed
Self-Service Architecture
Component Architecture
Self-Service Infrastructure Architecture:
┌─────────────────────────────────────────────────────────────┐
│ USER INTERFACE │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Portal │ │ CLI │ │ API │ │
│ │ (Web UI) │ │ (Terraform) │ │ (REST/gRPC)│ │
│ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │
│ └────────────────┼────────────────┘ │
│ │ │
├──────────────────────────┼───────────────────────────────────┤
│ ▼ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ ORCHESTRATION LAYER │ │
│ │ ├── Request validation │ │
│ │ ├── Policy evaluation (OPA/Sentinel) │ │
│ │ ├── Cost estimation │ │
│ │ ├── Approval workflow (if needed) │ │
│ │ └── Execution orchestration │ │
│ └─────────────────────────────────────────────────────┘ │
│ │ │
├──────────────────────────┼───────────────────────────────────┤
│ ▼ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ TEMPLATE LIBRARY │ │
│ │ ├── Database modules (RDS, Cloud SQL) │ │
│ │ ├── Compute modules (EKS, GKE, VMs) │ │
│ │ ├── Storage modules (S3, GCS) │ │
│ │ ├── Network modules (VPC, subnets) │ │
│ │ └── Composite modules (full environments) │ │
│ └─────────────────────────────────────────────────────┘ │
│ │ │
├──────────────────────────┼───────────────────────────────────┤
│ ▼ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ EXECUTION ENGINE │ │
│ │ ├── Terraform Cloud/Enterprise │ │
│ │ ├── Pulumi Service │ │
│ │ ├── Crossplane │ │
│ │ └── Cloud-native (CDK, ARM, Deployment Manager) │ │
│ └─────────────────────────────────────────────────────┘ │
│ │ │
├──────────────────────────┼───────────────────────────────────┤
│ ▼ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ CLOUD PROVIDERS │ │
│ │ AWS │ GCP │ Azure │ Kubernetes │ Others │ │
│ └─────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
Request Flow
Self-Service Request Flow:
┌─────────────────────────────────────────────────────────────┐
│ 1. REQUEST │
│ Developer: "I need a PostgreSQL database for staging" │
│ └── Via portal, CLI, or API │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ 2. VALIDATION │
│ ├── User has permission? ✓ Team member │
│ ├── Request well-formed? ✓ Valid config │
│ ├── Within quotas? ✓ Under team limit │
│ └── Meets policy? ✓ Allowed instance type│
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ 3. ENRICHMENT │
│ ├── Apply defaults db.t3.medium │
│ ├── Generate names myapp-staging-db │
│ ├── Assign network staging-vpc │
│ ├── Configure monitoring Datadog integration │
│ └── Estimate cost ~$50/month │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ 4. APPROVAL (if required) │
│ ├── Auto-approve: staging, dev ✓ Auto-approved │
│ ├── Manual approve: production (Would need approval) │
│ └── Cost threshold: >$500/month (Would need approval) │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ 5. EXECUTION │
│ ├── Generate Terraform Based on template │
│ ├── Plan Preview changes │
│ ├── Apply Create resources │
│ └── Verify Health checks │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ 6. DELIVERY │
│ ├── Connection string → Vault │
│ ├── Notification → Slack/email │
│ ├── Documentation → Auto-generated │
│ └── Registration → Service catalog │
└─────────────────────────────────────────────────────────────┘
IaC Module Design
Terraform Module Patterns
Terraform Module Structure:
Organization-Wide Module Library:
terraform-modules/
├── databases/
│ ├── rds-postgres/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ ├── outputs.tf
│ │ ├── versions.tf
│ │ ├── README.md
│ │ └── examples/
│ │ ├── simple/
│ │ └── production/
│ └── elasticache-redis/
├── compute/
│ ├── eks-cluster/
│ └── ecs-service/
├── storage/
│ └── s3-bucket/
└── network/
└── vpc/
Module Design Principles:
1. Opinionated Defaults
# variables.tf
variable "instance_class" {
type = string
default = "db.t3.medium" # Sensible default
description = "RDS instance type"
validation {
condition = can(regex("^db\\.(t3|r5|m5)", var.instance_class))
error_message = "Only approved instance families allowed."
}
}
2. Minimal Required Inputs
# Only require what can't be defaulted
variable "name" {
type = string
description = "Database identifier"
}
variable "environment" {
type = string
description = "Environment (dev, staging, prod)"
}
3. Complete Outputs
# outputs.tf
output "endpoint" {
description = "Database connection endpoint"
value = aws_db_instance.main.endpoint
}
output "connection_secret_arn" {
description = "ARN of secret with credentials"
value = aws_secretsmanager_secret.db_credentials.arn
}
4. Built-in Best Practices
# Security hardened by default
resource "aws_db_instance" "main" {
# Encryption always on
storage_encrypted = true
# No public access
publicly_accessible = false
# Automated backups
backup_retention_period = var.environment == "prod" ? 30 : 7
# Enhanced monitoring
monitoring_interval = 60
}
Module Versioning
Module Versioning Strategy:
Semantic Versioning:
├── MAJOR: Breaking changes (new required inputs, removed outputs)
├── MINOR: New features (new optional inputs, new outputs)
└── PATCH: Bug fixes (no interface changes)
Version Constraints:
# Allow patch updates automatically
module "database" {
source = "terraform.company.com/modules/rds-postgres"
version = "~> 2.1.0" # >=2.1.0, <2.2.0
}
# Pin to exact version (production)
module "database" {
source = "terraform.company.com/modules/rds-postgres"
version = "= 2.1.3"
}
Deprecation Policy:
┌─────────────────────────────────────────────────────────────┐
│ Module Version Lifecycle │
├─────────────────────────────────────────────────────────────┤
│ Current (v2.x): Supported, new features │
│ Previous (v1.x): Supported, security fixes only │
│ Deprecated (v0.x): Warning on use, no support │
│ Removed: Will not work │
│ │
│ Notification: │
│ ├── Slack announcement when version deprecated │
│ ├── Warning in terraform plan output │
│ ├── Dashboard showing deprecated module usage │
│ └── Migration guide provided │
└─────────────────────────────────────────────────────────────┘
Policy and Guardrails
Policy as Code
Policy as Code Options:
1. HashiCorp Sentinel (Terraform Enterprise)
# Require encryption for all storage
import "tfplan/v2" as tfplan
s3_buckets = filter tfplan.resource_changes as _, rc {
rc.type is "aws_s3_bucket" and
rc.mode is "managed" and
(rc.change.actions contains "create" or
rc.change.actions contains "update")
}
encryption_enabled = rule {
all s3_buckets as _, bucket {
bucket.change.after.server_side_encryption_configuration
is not null
}
}
main = rule { encryption_enabled }
2. Open Policy Agent (OPA)
# Rego policy for Kubernetes
package kubernetes.admission
deny[msg] {
input.request.kind.kind == "Pod"
container := input.request.object.spec.containers[_]
not container.securityContext.runAsNonRoot
msg := "Containers must run as non-root"
}
3. Cloud-Native Policies
# AWS Service Control Policy
{
"Version": "2012-10-17",
"Statement": [{
"Sid": "RequireEncryption",
"Effect": "Deny",
"Action": ["s3:CreateBucket"],
"Resource": "*",
"Condition": {
"StringNotEquals": {
"s3:x-amz-server-side-encryption": "AES256"
}
}
}]
}
Guardrail Categories
Infrastructure Guardrails:
1. Security Guardrails
├── Encryption required (at-rest, in-transit)
├── No public access by default
├── Required security groups
├── IAM role requirements
└── Vulnerability scanning
2. Cost Guardrails
├── Instance type restrictions
├── Storage size limits
├── Required cost tags
├── Budget thresholds
└── Approval for large resources
3. Compliance Guardrails
├── Allowed regions (data residency)
├── Required logging
├── Backup requirements
├── Retention policies
└── Audit trail requirements
4. Operational Guardrails
├── Naming conventions
├── Required tags (owner, cost-center)
├── Resource quotas per team
├── Monitoring requirements
└── Deletion protection
Guardrail Implementation:
┌─────────────────────────────────────────────────────────────┐
│ Guardrail Timing │
├─────────────────────────────────────────────────────────────┤
│ │
│ Pre-Plan (fastest feedback): │
│ ├── Validate terraform files │
│ ├── Static analysis (tfsec, checkov) │
│ └── Module version checks │
│ │
│ Post-Plan (resource-aware): │
│ ├── OPA/Sentinel policy evaluation │
│ ├── Cost estimation │
│ └── Blast radius assessment │
│ │
│ Post-Apply (verification): │
│ ├── Configuration validation │
│ ├── Security scanning │
│ └── Compliance audit │
│ │
└─────────────────────────────────────────────────────────────┘
Environment Provisioning
Environment Templates
Environment Provisioning:
Environment Types:
┌─────────────────────────────────────────────────────────────┐
│ Development Environment │
│ ├── Purpose: Individual developer testing │
│ ├── Lifetime: Hours to days │
│ ├── Resources: Minimal (smallest instances) │
│ ├── Data: Synthetic or anonymized │
│ └── Approval: None (within quota) │
├─────────────────────────────────────────────────────────────┤
│ Staging Environment │
│ ├── Purpose: Integration testing, QA │
│ ├── Lifetime: Persistent per service │
│ ├── Resources: Production-like (scaled down) │
│ ├── Data: Sanitized production subset │
│ └── Approval: None (within quota) │
├─────────────────────────────────────────────────────────────┤
│ Production Environment │
│ ├── Purpose: Live customer traffic │
│ ├── Lifetime: Permanent │
│ ├── Resources: Full capacity │
│ ├── Data: Real customer data │
│ └── Approval: Required (security review) │
└─────────────────────────────────────────────────────────────┘
Environment Template:
# environment/main.tf
module "network" {
source = "../modules/vpc"
environment = var.environment
cidr_block = var.network_cidr
}
module "kubernetes" {
source = "../modules/eks"
environment = var.environment
vpc_id = module.network.vpc_id
node_count = var.environment == "prod" ? 5 : 2
}
module "database" {
source = "../modules/rds"
environment = var.environment
vpc_id = module.network.vpc_id
instance_class = var.environment == "prod" ? "db.r5.xlarge" : "db.t3.medium"
multi_az = var.environment == "prod"
}
module "cache" {
source = "../modules/elasticache"
environment = var.environment
vpc_id = module.network.vpc_id
node_type = var.environment == "prod" ? "cache.r5.large" : "cache.t3.micro"
}
Ephemeral Environments
Ephemeral/Preview Environments:
Use Cases:
├── PR preview environments
├── Feature branch testing
├── Demo environments
├── Load testing environments
└── Incident reproduction
Lifecycle:
┌─────────────────────────────────────────────────────────────┐
│ │
│ PR Created ──► Environment Created ──► Tests Run │
│ │ │ │ │
│ │ ▼ ▼ │
│ │ Preview URL PR Updated │
│ │ Posted to PR │ │
│ │ │ │
│ ▼ ▼ │
│ PR Merged ───────────────────────► Environment Destroyed │
│ │
│ Timeout: Auto-destroy after 7 days of inactivity │
│ │
└─────────────────────────────────────────────────────────────┘
Implementation:
# .github/workflows/preview.yml
name: Preview Environment
on:
pull_request:
types: [opened, synchronize]
jobs:
deploy-preview:
runs-on: ubuntu-latest
steps:
- name: Create/Update Environment
run: |
terraform workspace select pr-${{ github.event.pull_request.number }} || \
terraform workspace new pr-${{ github.event.pull_request.number }}
terraform apply -auto-approve
- name: Comment Preview URL
uses: actions/github-script@v6
with:
script: |
github.rest.issues.createComment({
issue_number: context.issue.number,
body: '🚀 Preview: https://pr-${{ github.event.pull_request.number }}.preview.company.com'
})
Technology Options
Self-Service Platforms
Platform Comparison:
1. Terraform Cloud/Enterprise
├── Native Terraform experience
├── Policy as Code (Sentinel)
├── Private module registry
├── Cost estimation
└── Enterprise features (SSO, audit)
2. Pulumi
├── Real programming languages
├── Strong typing and IDE support
├── Policy as Code (CrossGuard)
└── Automation API
3. Crossplane
├── Kubernetes-native
├── GitOps workflow
├── Composition for modules
└── Multi-cloud abstraction
4. Backstage + Terraform
├── Unified developer portal
├── Software templates
├── Plugin ecosystem
└── Service catalog integration
5. Port/Cortex/OpsLevel
├── Commercial developer portals
├── Quick to implement
├── Built-in integrations
└── Self-service workflows
Selection Criteria:
┌────────────────────────────────────────────────────────────┐
│ Factor │ Best Fit │
├──────────────────────┼─────────────────────────────────────┤
│ Existing Terraform │ Terraform Cloud/Enterprise │
│ Kubernetes-first │ Crossplane │
│ Developer portal │ Backstage or commercial │
│ Programming language │ Pulumi │
│ Quick start │ Commercial (Port, OpsLevel) │
│ Maximum control │ Build custom │
└────────────────────────────────────────────────────────────┘
Cost Management
Cost Controls
Cost Management in Self-Service:
1. Cost Visibility
├── Estimated cost shown before provisioning
├── Cost tags automatically applied
├── Per-team/project dashboards
└── Anomaly detection and alerts
2. Cost Guardrails
├── Instance type restrictions
├── Budget thresholds by team
├── Approval required above threshold
└── Auto-shutdown of unused resources
3. Cost Optimization
├── Right-sizing recommendations
├── Reserved instance suggestions
├── Spot instance for non-production
└── Scheduled scaling
Cost Estimation Flow:
┌─────────────────────────────────────────────────────────────┐
│ Request: PostgreSQL database for staging │
├─────────────────────────────────────────────────────────────┤
│ │
│ Cost Estimate: │
│ ├── Compute (db.t3.medium): $30/month │
│ ├── Storage (100GB gp3): $10/month │
│ ├── Backup storage: ~$5/month │
│ └── Data transfer: ~$5/month │
│ ───────── │
│ Estimated Total: ~$50/month │
│ │
│ ✓ Within team budget ($500/month quota) │
│ ✓ No approval required │
│ │
│ [Proceed] [Modify] [Cancel] │
└─────────────────────────────────────────────────────────────┘
Best Practices
Self-Service Infrastructure Best Practices:
1. Start Small, Expand Gradually
├── Begin with 2-3 common resources
├── Add based on demand
├── Iterate on feedback
└── Don't try to cover everything day 1
2. Balance Autonomy and Governance
├── Guardrails not gates
├── Automate approvals where safe
├── Clear escalation paths
└── Trust but verify
3. Optimize for Developer Experience
├── Minimal required inputs
├── Sensible defaults
├── Clear error messages
└── Fast feedback loops
4. Maintain Module Quality
├── Automated testing
├── Documentation requirements
├── Versioning strategy
└── Deprecation process
5. Monitor and Improve
├── Track provisioning success rate
├── Measure time to provision
├── Gather user feedback
└── Identify automation opportunities
6. Handle Edge Cases
├── What if provisioning fails?
├── How to handle orphaned resources?
├── What about existing resources?
└── How to migrate between versions?
Anti-Patterns
Self-Service Anti-Patterns:
1. "Self-Service Everything"
❌ Every possible configuration option
✓ Curated set of approved patterns
2. "Security Theater"
❌ Manual approvals that don't add value
✓ Automated policy enforcement
3. "Configuration Explosion"
❌ 50 parameters per resource
✓ Sensible defaults with few overrides
4. "Ignore Cost"
❌ No visibility into provisioned cost
✓ Cost estimation and budgets
5. "Build vs Buy Wrong"
❌ Building everything from scratch
✓ Use existing tools where appropriate
6. "No Escape Hatch"
❌ Blocking legitimate exceptions
✓ Process for justified deviations
Related Skills
internal-developer-platform- Platform engineering overviewgolden-paths- Standardized workflowscontainer-orchestration- Kubernetes infrastructureserverless-patterns- Serverless infrastructure
Repository

melodic-software
Author
melodic-software/claude-code-plugins/plugins/systems-design/skills/self-service-infrastructure
3
Stars
0
Forks
Updated1d ago
Added6d ago