infra-engineer

Comprehensive infrastructure engineering covering DevOps, cloud platforms, FinOps, and DevSecOps. Platforms: AWS (EC2, Lambda, S3, ECS, EKS, RDS, CloudFormation), Azure basics, Cloudflare (Workers, R2, D1, Pages), GCP (GKE, Cloud Run, Cloud Storage), Docker, Kubernetes. Capabilities: CI/CD pipelines (GitHub Actions, GitLab CI, Jenkins), GitOps, infrastructure as code (Terraform, CloudFormation), container orchestration, cost optimization, security scanning, vulnerability management, secrets management, compliance (SOC2, HIPAA). Actions: deploy, configure, manage, scale, monitor, secure, optimize cloud infrastructure. Keywords: AWS, EC2, Lambda, S3, ECS, EKS, RDS, CloudFormation, Azure, Kubernetes, k8s, Docker, Terraform, CI/CD, GitHub Actions, GitLab CI, Jenkins, ArgoCD, Flux, cost optimization, FinOps, reserved instances, spot instances, security scanning, SAST, DAST, vulnerability management, secrets management, Vault, compliance, monitoring, observability. Use when: deploying to AWS/Azure/GCP/Cloudflare, setting up CI/CD pipelines, implementing GitOps workflows, managing Kubernetes clusters, optimizing cloud costs, implementing security best practices, managing infrastructure as code, container orchestration, compliance requirements, cost analysis and optimization.

$ Installer

git clone https://github.com/samhvw8/dotfiles /tmp/dotfiles && cp -r /tmp/dotfiles/dot_claude/skills/infra-engineer ~/.claude/skills/dotfiles

// tip: Run this command in your terminal to install the skill


name: infra-engineer description: "Comprehensive infrastructure engineering covering DevOps, cloud platforms, FinOps, and DevSecOps. Platforms: AWS (EC2, Lambda, S3, ECS, EKS, RDS, CloudFormation), Azure basics, Cloudflare (Workers, R2, D1, Pages), GCP (GKE, Cloud Run, Cloud Storage), Docker, Kubernetes. Capabilities: CI/CD pipelines (GitHub Actions, GitLab CI, Jenkins), GitOps, infrastructure as code (Terraform, CloudFormation), container orchestration, cost optimization, security scanning, vulnerability management, secrets management, compliance (SOC2, HIPAA). Actions: deploy, configure, manage, scale, monitor, secure, optimize cloud infrastructure. Keywords: AWS, EC2, Lambda, S3, ECS, EKS, RDS, CloudFormation, Azure, Kubernetes, k8s, Docker, Terraform, CI/CD, GitHub Actions, GitLab CI, Jenkins, ArgoCD, Flux, cost optimization, FinOps, reserved instances, spot instances, security scanning, SAST, DAST, vulnerability management, secrets management, Vault, compliance, monitoring, observability. Use when: deploying to AWS/Azure/GCP/Cloudflare, setting up CI/CD pipelines, implementing GitOps workflows, managing Kubernetes clusters, optimizing cloud costs, implementing security best practices, managing infrastructure as code, container orchestration, compliance requirements, cost analysis and optimization." license: MIT version: 2.0.0

Infrastructure Engineering Skill

Comprehensive guide for modern infrastructure engineering covering DevOps practices, multi-cloud platforms (AWS, Azure, GCP, Cloudflare), FinOps cost optimization, and DevSecOps security practices.

When to Use This Skill

Use this skill when:

  • DevOps: Setting up CI/CD pipelines (GitHub Actions, GitLab CI, Jenkins), implementing GitOps workflows (ArgoCD, Flux)
  • AWS: Deploying to EC2, Lambda, ECS, EKS, managing S3, RDS, using CloudFormation/CDK
  • Azure: Working with Azure VMs, App Service, AKS, Azure Functions, Storage Accounts
  • GCP: Managing Compute Engine, GKE, Cloud Run, Cloud Storage, App Engine
  • Cloudflare: Deploying Workers, R2 storage, D1 databases, Pages applications
  • Kubernetes: Managing clusters, deployments, services, ingress, Helm charts, operators
  • Docker: Containerizing applications, multi-stage builds, Docker Compose, registries
  • FinOps: Analyzing cloud costs, optimizing spend, reserved instances, spot instances, rightsizing
  • DevSecOps: Security scanning (SAST/DAST), vulnerability management, secrets management, compliance
  • IaC: Terraform, CloudFormation, Pulumi, configuration management
  • Monitoring: Setting up observability, logging, metrics, alerting, distributed tracing

Platform Selection Guide

When to Use AWS

Best For:

  • General-purpose cloud computing at scale
  • Mature ecosystem with 200+ services
  • Enterprise workloads with compliance requirements
  • Hybrid cloud with AWS Outposts
  • Extensive third-party integrations
  • Advanced networking and security controls

Key Services:

  • EC2 (virtual machines, flexible compute)
  • Lambda (serverless functions, event-driven)
  • ECS/EKS (container orchestration)
  • S3 (object storage, industry standard)
  • RDS (managed relational databases)
  • DynamoDB (NoSQL, global tables)
  • CloudFormation/CDK (infrastructure as code)
  • IAM (identity and access management)
  • VPC (virtual private cloud networking)

Cost Profile: Pay-as-you-go, reserved instances (up to 72% discount), savings plans, spot instances (up to 90% discount)

When to Use Azure

Best For:

  • Microsoft-centric organizations (.NET, Active Directory)
  • Hybrid cloud scenarios (Azure Arc, Stack)
  • Enterprise agreements with Microsoft
  • Windows Server and SQL Server workloads
  • Integration with Microsoft 365 and Dynamics
  • Strong compliance certifications (90+ certifications)

Key Services:

  • Virtual Machines (Windows/Linux compute)
  • App Service (PaaS for web apps)
  • AKS (managed Kubernetes)
  • Azure Functions (serverless compute)
  • Storage Accounts (Blob, File, Queue, Table)
  • SQL Database (managed SQL Server)
  • Active Directory (identity management)
  • ARM Templates/Bicep (infrastructure as code)

Cost Profile: Pay-as-you-go, reserved instances, Azure Hybrid Benefit for Windows/SQL Server licenses

When to Use Cloudflare

Best For:

  • Edge-first applications with global distribution
  • Ultra-low latency requirements (<50ms)
  • Static sites with serverless functions
  • Zero egress cost scenarios (R2 storage)
  • WebSocket/real-time applications (Durable Objects)
  • AI/ML at the edge (Workers AI)

Key Products:

  • Workers (serverless functions)
  • R2 (object storage, S3-compatible)
  • D1 (SQLite database with global replication)
  • KV (key-value store)
  • Pages (static hosting + functions)
  • Durable Objects (stateful compute)
  • Browser Rendering (headless browser automation)

Cost Profile: Pay-per-request, generous free tier, zero egress fees

When to Use Kubernetes

Best For:

  • Container orchestration at scale
  • Microservices architectures with 10+ services
  • Multi-cloud and hybrid deployments
  • Self-healing and auto-scaling workloads
  • Complex deployment strategies (blue/green, canary)
  • Service mesh architectures (Istio, Linkerd)
  • Stateful applications with operators

Key Features:

  • Declarative configuration (YAML manifests)
  • Automated rollouts and rollbacks
  • Service discovery and load balancing
  • Self-healing (restarts failed containers)
  • Horizontal pod autoscaling
  • Secret and configuration management
  • Storage orchestration
  • Batch job execution

Managed Options: EKS (AWS), AKS (Azure), GKE (GCP), managed k8s providers

Cost Profile: Cluster management fees + node costs (optimize with spot instances, cluster autoscaling)

When to Use Docker

Best For:

  • Local development consistency
  • Microservices architectures
  • Multi-language stack applications
  • Traditional VPS/VM deployments
  • Foundation for Kubernetes workloads
  • CI/CD build environments
  • Database containerization (dev/test)

Key Capabilities:

  • Application isolation and portability
  • Multi-stage builds for optimization
  • Docker Compose for multi-container apps
  • Volume management for data persistence
  • Network configuration and service discovery
  • Cross-platform compatibility (amd64, arm64)
  • BuildKit for improved build performance

Cost Profile: Infrastructure cost only (compute + storage), no orchestration overhead

When to Use Google Cloud

Best For:

  • Enterprise-scale applications
  • Data analytics and ML pipelines (BigQuery, Vertex AI)
  • Hybrid/multi-cloud deployments
  • Kubernetes at scale (GKE)
  • Managed databases (Cloud SQL, Firestore, Spanner)
  • Complex IAM and compliance requirements

Key Services:

  • Compute Engine (VMs)
  • GKE (managed Kubernetes)
  • Cloud Run (containerized serverless)
  • App Engine (PaaS)
  • Cloud Storage (object storage)
  • Cloud SQL (managed databases)

Cost Profile: Varied pricing, sustained use discounts, committed use contracts

Quick Start

AWS Lambda Function

# Install AWS CLI
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip && sudo ./aws/install

# Configure credentials
aws configure

# Create Lambda function with SAM
sam init --runtime python3.11
sam build && sam deploy --guided

See: references/aws-lambda.md

AWS EKS Kubernetes Cluster

# Install eksctl
brew install eksctl  # or curl download

# Create cluster
eksctl create cluster \
  --name my-cluster \
  --region us-west-2 \
  --nodegroup-name standard-workers \
  --node-type t3.medium \
  --nodes 3 \
  --nodes-min 1 \
  --nodes-max 4

See: references/kubernetes-basics.md

Azure Deployment

# Install Azure CLI
curl -L https://aka.ms/InstallAzureCli | bash

# Login and create resources
az login
az group create --name myResourceGroup --location eastus
az webapp create --resource-group myResourceGroup \
  --name myapp --runtime "NODE:18-lts"

See: references/azure-basics.md

Cloudflare Workers

# Install Wrangler CLI
npm install -g wrangler

# Create and deploy Worker
wrangler init my-worker
cd my-worker
wrangler deploy

See: references/cloudflare-workers-basics.md

Kubernetes Deployment

# Create deployment
kubectl create deployment nginx --image=nginx:latest
kubectl expose deployment nginx --port=80 --type=LoadBalancer

# Apply from manifest
kubectl apply -f deployment.yaml

# Check status
kubectl get pods,services,deployments

See: references/kubernetes-basics.md

Docker Container

# Create Dockerfile
cat > Dockerfile <<EOF
FROM node:20-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --production
COPY . .
EXPOSE 3000
CMD ["node", "server.js"]
EOF

# Build and run
docker build -t myapp .
docker run -p 3000:3000 myapp

See: references/docker-basics.md

Reference Navigation

AWS (Amazon Web Services)

  • aws-overview.md - AWS fundamentals, account setup, IAM basics
  • aws-ec2.md - EC2 instances, AMIs, security groups, auto-scaling
  • aws-lambda.md - Serverless functions, SAM, event sources, layers
  • aws-ecs-eks.md - Container orchestration, ECS vs EKS, Fargate
  • aws-s3-rds.md - S3 storage, RDS databases, backup strategies
  • aws-cloudformation.md - Infrastructure as code, CDK, best practices
  • aws-networking.md - VPC, subnets, security groups, load balancers

Azure (Microsoft Azure)

  • azure-basics.md - Azure fundamentals, subscriptions, resource groups
  • azure-compute.md - VMs, App Service, AKS, Azure Functions
  • azure-storage.md - Storage Accounts, Blob, Files, managed disks

Cloudflare Platform

  • cloudflare-platform.md - Edge computing overview, key components
  • cloudflare-workers-basics.md - Getting started, handler types, basic patterns
  • cloudflare-workers-advanced.md - Advanced patterns, performance, optimization
  • cloudflare-workers-apis.md - Runtime APIs, bindings, integrations
  • cloudflare-r2-storage.md - R2 object storage, S3 compatibility, best practices
  • cloudflare-d1-kv.md - D1 SQLite database, KV store, use cases
  • browser-rendering.md - Puppeteer/Playwright automation on Cloudflare

Kubernetes & Container Orchestration

  • kubernetes-basics.md - Core concepts, pods, deployments, services
  • kubernetes-advanced.md - StatefulSets, operators, custom resources
  • kubernetes-networking.md - Ingress, service mesh, network policies
  • helm-charts.md - Package management, charts, repositories

Docker Containerization

  • docker-basics.md - Core concepts, Dockerfile, images, containers
  • docker-compose.md - Multi-container apps, networking, volumes
  • docker-security.md - Image scanning, secrets, best practices

Google Cloud Platform

  • gcloud-platform.md - GCP overview, gcloud CLI, authentication
  • gcloud-services.md - Compute Engine, GKE, Cloud Run, App Engine

CI/CD & GitOps

  • cicd-github-actions.md - GitHub Actions workflows, runners, secrets
  • cicd-gitlab.md - GitLab CI/CD pipelines, artifacts, caching
  • gitops-argocd.md - ArgoCD setup, app of apps pattern, sync policies
  • gitops-flux.md - Flux controllers, GitOps toolkit, multi-tenancy

FinOps (Cost Optimization)

  • finops-basics.md - Cost optimization principles, FinOps lifecycle
  • finops-aws.md - AWS cost optimization, RI, savings plans, spot
  • finops-azure.md - Azure cost management, reservations, hybrid benefit
  • finops-gcp.md - GCP cost optimization, committed use, sustained use
  • finops-tools.md - Cost analysis tools, Kubecost, CloudHealth, Infracost

DevSecOps (Security)

  • devsecops-basics.md - Security best practices, shift-left security
  • devsecops-scanning.md - SAST, DAST, SCA, container scanning
  • secrets-management.md - Vault, AWS Secrets Manager, sealed secrets
  • compliance.md - SOC2, HIPAA, PCI-DSS, audit logging

Infrastructure as Code

  • terraform-basics.md - Terraform fundamentals, providers, state
  • terraform-advanced.md - Modules, workspaces, remote state
  • cloudformation-basics.md - CloudFormation templates, stacks, change sets

Utilities & Scripts

  • scripts/cloudflare-deploy.py - Automate Cloudflare Worker deployments
  • scripts/docker-optimize.py - Analyze and optimize Dockerfiles
  • scripts/cost-analyzer.py - Cloud cost analysis and reporting
  • scripts/security-scanner.py - Automated security scanning

Common Workflows

Multi-Cloud Architecture

# Edge Layer: Cloudflare Workers (global routing, caching)
# Compute Layer: AWS ECS/Lambda or Azure App Service (application logic)
# Data Layer: AWS RDS or Azure SQL (persistent storage)
# CDN/Storage: Cloudflare R2 or AWS S3 (static assets)

Benefits:
- Best-of-breed services per layer
- Geographic redundancy
- Cost optimization across providers

AWS ECS Deployment with CI/CD

# GitHub Actions workflow
name: Deploy to ECS
on: push
jobs:
  deploy:
    - Build Docker image
    - Push to ECR
    - Update ECS task definition
    - Deploy to ECS service
    - Wait for deployment stabilization

Kubernetes GitOps with ArgoCD

# Git repository structure
/apps
  /production
    - deployment.yaml
    - service.yaml
    - ingress.yaml
  /staging
    - deployment.yaml

# ArgoCD syncs cluster state from Git
# Changes: Git commit → ArgoCD detects → Auto-sync to cluster

Multi-Stage Docker Build

# Build stage
FROM node:20-alpine AS build
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build

# Production stage
FROM node:20-alpine
WORKDIR /app
COPY --from=build /app/dist ./dist
COPY --from=build /app/node_modules ./node_modules
USER node
CMD ["node", "dist/server.js"]

FinOps Cost Optimization Workflow

# 1. Discovery: Identify untagged resources
# 2. Analysis: Right-size instances (CPU/memory utilization)
# 3. Optimization:
#    - Convert to reserved instances (predictable workloads)
#    - Use spot instances (fault-tolerant workloads)
#    - Schedule start/stop (dev environments)
# 4. Monitoring: Set budget alerts, track savings
# 5. Governance: Enforce tagging policies

DevSecOps Security Pipeline

# 1. Code Commit
# 2. SAST Scan: SonarQube, Semgrep (static code analysis)
# 3. Dependency Check: Snyk, Trivy (vulnerability scanning)
# 4. Build: Docker image
# 5. Container Scan: Trivy, Grype (image vulnerabilities)
# 6. DAST Scan: OWASP ZAP (runtime security testing)
# 7. Deploy: Only if all scans pass
# 8. Runtime Protection: Falco, AWS GuardDuty

Terraform Infrastructure Deployment

# 1. Write: Define infrastructure in .tf files
# 2. Init: terraform init (download providers)
# 3. Plan: terraform plan (preview changes)
# 4. Apply: terraform apply (create/update resources)
# 5. State: Store state in S3 with DynamoDB locking
# 6. Modules: Reuse common patterns across environments

Best Practices

DevOps

  • CI/CD: Automate testing and deployment, use feature flags for progressive rollouts
  • GitOps: Declarative infrastructure, Git as single source of truth, automated sync
  • Monitoring: Implement observability (logs, metrics, traces), set up alerting
  • Incident Management: Runbooks, postmortems, blameless culture
  • Automation: Infrastructure as code, configuration management, self-service platforms

Security (DevSecOps)

  • Shift Left: Security scanning early in pipeline (SAST, dependency checks)
  • Secrets Management: Use Vault, AWS Secrets Manager, or sealed secrets (never in code/Git)
  • Container Security: Run as non-root, minimal base images, regular scanning
  • Network Security: Zero-trust architecture, service mesh, network policies
  • Access Control: Least privilege IAM, MFA, temporary credentials
  • Compliance: Audit logging, encryption at rest/transit, regular security reviews
  • Runtime Protection: Security monitoring, intrusion detection, automated response

Cost Optimization (FinOps)

  • Tagging: Enforce resource tagging for cost allocation and tracking
  • Rightsizing: Analyze utilization, downsize over-provisioned resources
  • Reserved Capacity: Purchase RI/savings plans for predictable workloads (up to 72% discount)
  • Spot/Preemptible: Use for fault-tolerant workloads (up to 90% discount)
  • Scheduling: Auto-stop dev/test environments during off-hours
  • Storage Optimization: Lifecycle policies, archive to cheaper tiers, delete orphaned resources
  • Monitoring: Budget alerts, cost anomaly detection, chargeback/showback
  • Governance: Approval workflows for expensive resources, quota management

Kubernetes

  • Resource Management: Set requests/limits, use horizontal pod autoscaling
  • High Availability: Multi-zone clusters, pod disruption budgets, anti-affinity rules
  • Security: RBAC, pod security policies, network policies, admission controllers
  • Observability: Prometheus metrics, distributed tracing, centralized logging
  • GitOps: ArgoCD/Flux for declarative deployments, automatic drift correction

Performance

  • Compute: Auto-scaling, load balancing, multi-region for low latency
  • Caching: CDN, in-memory caching (Redis/Memcached), edge computing
  • Storage: Choose appropriate tier (SSD vs HDD), enable caching, CDN for static assets
  • Containers: Multi-stage builds, minimal images, layer caching
  • Databases: Connection pooling, read replicas, query optimization, indexing

Development

  • Local Development: Docker Compose for consistent environments, dev containers
  • Testing: Unit, integration, end-to-end tests in CI/CD pipeline
  • Infrastructure as Code: Terraform/CloudFormation for repeatability
  • Documentation: Architecture diagrams, runbooks, API documentation
  • Version Control: Git for code and infrastructure, semantic versioning

Decision Matrix

NeedChoose
Compute
Sub-50ms latency globallyCloudflare Workers
Serverless functions (AWS ecosystem)AWS Lambda
Serverless functions (Azure ecosystem)Azure Functions
Containerized workloads (managed)AWS ECS/Fargate, Azure AKS, GCP Cloud Run
Kubernetes at scaleAWS EKS, Azure AKS, GCP GKE
VMs with full controlAWS EC2, Azure VMs, GCP Compute Engine
Storage
Object storage (S3-compatible)AWS S3, Cloudflare R2 (zero egress), Azure Blob
Block storage for VMsAWS EBS, Azure Managed Disks, GCP Persistent Disk
File storage (NFS/SMB)AWS EFS, Azure Files, GCP Filestore
Database
Managed SQL (AWS)AWS RDS (PostgreSQL, MySQL, SQL Server)
Managed SQL (Azure)Azure SQL Database
Managed SQL (GCP)Cloud SQL
NoSQL key-valueAWS DynamoDB, Azure Cosmos DB, Cloudflare KV
Global SQL (edge reads)Cloudflare D1, AWS Aurora Global
CI/CD & GitOps
GitHub-integrated CI/CDGitHub Actions
Self-hosted CI/CDGitLab CI/CD, Jenkins
Kubernetes GitOpsArgoCD, Flux
Cost Optimization
Predictable workloadsReserved Instances, Savings Plans
Fault-tolerant workloadsSpot Instances (AWS), Preemptible VMs (GCP)
Dev/test environmentsAuto-scheduling, budget alerts
Security
Secrets managementHashiCorp Vault, AWS Secrets Manager, Azure Key Vault
Container scanningTrivy, Snyk, AWS ECR scanning
SAST/DASTSonarQube, Semgrep, OWASP ZAP
Special Use Cases
Static site + edge functionsCloudflare Pages, AWS Amplify
WebSocket/real-timeCloudflare Durable Objects, AWS API Gateway WebSocket
ML/AI pipelinesAWS SageMaker, GCP Vertex AI, Azure ML
Browser automationCloudflare Browser Rendering, AWS Lambda + Puppeteer

Resources

Cloud Providers

Container & Orchestration

CI/CD & GitOps

Infrastructure as Code

Security & Compliance

FinOps & Cost Optimization

Implementation Checklist

AWS Lambda Deployment

  • Install AWS CLI and SAM CLI
  • Configure AWS credentials (access key, secret key)
  • Create Lambda function with SAM template
  • Configure IAM role and policies
  • Test locally with sam local invoke
  • Deploy with sam deploy
  • Set up CloudWatch monitoring and alarms

AWS EKS Kubernetes Cluster

  • Install kubectl, eksctl, aws-cli
  • Configure AWS credentials
  • Create EKS cluster with eksctl
  • Configure kubectl context
  • Install cluster autoscaler
  • Set up Helm for package management
  • Deploy applications with kubectl/Helm
  • Configure ingress controller (ALB/NGINX)

Azure Deployment

  • Install Azure CLI
  • Login with az login
  • Create resource group
  • Deploy App Service or AKS
  • Configure continuous deployment
  • Set up monitoring with Application Insights

Kubernetes on Any Cloud

  • Install kubectl and helm
  • Connect to cluster (update kubeconfig)
  • Create namespaces for environments
  • Apply RBAC policies
  • Deploy applications (deployments, services)
  • Configure ingress for external access
  • Set up monitoring (Prometheus, Grafana)
  • Implement GitOps with ArgoCD/Flux

CI/CD Pipeline (GitHub Actions)

  • Create .github/workflows/deploy.yml
  • Configure secrets (cloud credentials, API keys)
  • Add build and test jobs
  • Add container build and push to registry
  • Add deployment job to cloud platform
  • Set up branch protection rules
  • Enable status checks and notifications

FinOps Cost Optimization

  • Implement resource tagging strategy
  • Enable cost allocation tags
  • Set up budget alerts
  • Analyze resource utilization (CloudWatch, Azure Monitor)
  • Identify rightsizing opportunities
  • Purchase reserved instances for predictable workloads
  • Configure auto-scaling and scheduling
  • Regular cost reviews and optimization

DevSecOps Security

  • Add SAST scanning to CI/CD (SonarQube, Semgrep)
  • Add dependency scanning (Snyk, Trivy)
  • Implement container image scanning
  • Set up secrets management (Vault, cloud provider)
  • Configure security groups and network policies
  • Enable audit logging
  • Implement security monitoring and alerting
  • Regular vulnerability assessments

Cloudflare Workers

  • Install Wrangler CLI
  • Create Worker project
  • Configure wrangler.toml (bindings, routes)
  • Test locally with wrangler dev
  • Deploy with wrangler deploy

Docker

  • Write Dockerfile with multi-stage builds
  • Create .dockerignore file
  • Test build locally
  • Push to registry (ECR, ACR, GCR, Docker Hub)
  • Deploy to target platform