cloud-infrastructure

Cloud infrastructure design and deployment patterns for AWS, Azure, and GCP. Use when designing cloud architectures, implementing IaC with Terraform, optimizing costs, or setting up multi-region deployments.

$ 설치

git clone https://github.com/89jobrien/steve /tmp/steve && cp -r /tmp/steve/steve/skills/cloud-infrastructure ~/.claude/skills/steve

// tip: Run this command in your terminal to install the skill


name: cloud-infrastructure description: Cloud infrastructure design and deployment patterns for AWS, Azure, and GCP. Use when designing cloud architectures, implementing IaC with Terraform, optimizing costs, or setting up multi-region deployments. author: Joseph OBrien status: unpublished updated: '2025-12-23' version: 1.0.1 tag: skill type: skill

Cloud Infrastructure

Comprehensive cloud infrastructure skill covering multi-cloud architecture, Infrastructure as Code, cost optimization, and production deployment patterns.

When to Use This Skill

  • Designing cloud architecture for new applications
  • Implementing Infrastructure as Code (Terraform, CloudFormation, Pulumi)
  • Cost optimization and resource right-sizing
  • Multi-region and high-availability deployments
  • Cloud migration planning
  • Security and compliance implementation
  • Auto-scaling and performance optimization

Cloud Architecture Patterns

Compute Patterns

PatternAWSAzureGCPUse Case
ServerlessLambdaFunctionsCloud FunctionsEvent-driven, variable load
ContainersECS/EKSAKSGKEMicroservices, consistent env
VMsEC2Virtual MachinesCompute EngineLegacy apps, full control
BatchBatchBatchBatchLarge-scale processing

Storage Patterns

TypeAWSAzureGCPUse Case
ObjectS3Blob StorageCloud StorageStatic files, backups
BlockEBSManaged DisksPersistent DiskDatabase storage
FileEFSAzure FilesFilestoreShared file systems
ArchiveGlacierArchiveColdlineLong-term retention

Database Patterns

TypeAWSAzureGCPUse Case
RelationalRDS, AuroraSQL DatabaseCloud SQLACID transactions
NoSQLDynamoDBCosmos DBFirestoreFlexible schema
CacheElastiCacheCache for RedisMemorystoreSession, caching
Data WarehouseRedshiftSynapseBigQueryAnalytics

Infrastructure as Code

Terraform Best Practices

Project Structure:

infrastructure/
├── modules/
│   ├── networking/
│   ├── compute/
│   └── database/
├── environments/
│   ├── dev/
│   ├── staging/
│   └── prod/
├── main.tf
├── variables.tf
├── outputs.tf
└── versions.tf

State Management:

  • Use remote state (S3, Azure Blob, GCS)
  • Enable state locking (DynamoDB, Blob lease)
  • Separate state per environment
  • Never commit state files

Module Design:

  • Single responsibility per module
  • Expose minimal required variables
  • Document inputs/outputs
  • Version modules with git tags

Cost Optimization

Compute Savings:

  • Reserved Instances (1-3 year commitment): 30-60% savings
  • Spot/Preemptible instances: 60-90% savings for interruptible workloads
  • Right-sizing: Match instance size to actual usage
  • Auto-scaling: Scale down during low usage

Storage Savings:

  • Lifecycle policies: Auto-transition to cheaper tiers
  • Compression: Reduce storage footprint
  • Deduplication: Eliminate redundant data
  • Delete unused resources: Orphaned volumes, snapshots

Network Savings:

  • Use CDN for static content
  • Optimize data transfer paths
  • Use private endpoints
  • Compress API responses

High Availability Patterns

Multi-AZ Deployment

  • Deploy across 2-3 availability zones
  • Use load balancers for distribution
  • Database replication across AZs
  • Automatic failover configuration

Multi-Region Deployment

  • Active-active or active-passive
  • DNS-based routing (Route53, Traffic Manager)
  • Data replication strategy
  • Disaster recovery procedures

Resilience Patterns

  • Circuit breakers for external dependencies
  • Retry with exponential backoff
  • Bulkhead isolation
  • Graceful degradation

Security Best Practices

Identity & Access

  • Principle of least privilege
  • Use IAM roles, not long-term credentials
  • Enable MFA for privileged accounts
  • Regular access reviews

Network Security

  • VPC/VNet isolation
  • Security groups as firewalls
  • Private subnets for backend services
  • VPN/Direct Connect for hybrid

Data Protection

  • Encryption at rest (KMS)
  • Encryption in transit (TLS)
  • Key rotation policies
  • Backup and recovery testing

Monitoring & Observability

Key Metrics

  • CPU, Memory, Disk utilization
  • Network throughput and latency
  • Error rates and types
  • Cost per service/team

Alerting Strategy

  • Set thresholds based on baselines
  • Alert on symptoms, not causes
  • Runbooks for each alert
  • Escalation paths defined

Reference Files

  • references/terraform_patterns.md - IaC patterns and examples
  • references/cost_optimization.md - Detailed cost reduction strategies

Integration with Other Skills

  • security-engineering - For security architecture
  • network-engineering - For network design
  • performance - For optimization strategies
  • devops-runbooks - For operational procedures