Marketplace

ops-cost-optimization

Structured workflow for cloud cost analysis and optimization including rightsizing, reserved capacity planning, and FinOps practices.

$ Instalar

git clone https://github.com/LerianStudio/ring /tmp/ring && cp -r /tmp/ring/ops-team/skills/ops-cost-optimization ~/.claude/skills/ring

// tip: Run this command in your terminal to install the skill


name: ops-cost-optimization description: | Structured workflow for cloud cost analysis and optimization including rightsizing, reserved capacity planning, and FinOps practices.

trigger: |

  • Monthly/quarterly cost reviews
  • Cost anomaly investigation
  • Budget overrun alerts
  • Reserved instance planning

skip_when: |

  • Capacity planning focus -> use ops-capacity-planning
  • One-time cost question -> direct query
  • Application performance -> use ring-dev-team specialists

related: similar: [ops-capacity-planning] uses: [cloud-cost-optimizer]

Cost Optimization Workflow

This skill defines the structured process for cloud cost optimization. Use it for systematic cost analysis and data-driven optimization.


Cost Optimization Phases

PhaseFocusOutput
1. Cost VisibilityUnderstand current spendCost breakdown
2. Anomaly DetectionIdentify unusual spendAnomaly report
3. Optimization AnalysisFind savings opportunitiesOpportunities list
4. Risk AssessmentEvaluate optimization risksRisk matrix
5. ImplementationExecute optimizationsCost reduction
6. MonitoringTrack savingsSavings report

Phase 1: Cost Visibility

Cost Breakdown Dimensions

Analyze costs across multiple dimensions:

DimensionPurposeTool
ServiceWhich AWS services cost mostCost Explorer
AccountWhich accounts spend mostCost Explorer
TagCost by team/project/environmentCost Allocation Tags
ResourceIndividual resource costsCost Explorer
TimeCost trends over timeCost Explorer

Cost Visibility Template

## Cost Visibility Report

**Period:** [Month YYYY]
**Total Spend:** $XX,XXX
**Budget:** $XX,XXX
**Variance:** [+/-X%]

### Cost by Service

| Service | Cost | % of Total | MoM Change |
|---------|------|------------|------------|
| EC2 | $X,XXX | XX% | +X% |
| RDS | $X,XXX | XX% | +X% |
| S3 | $X,XXX | XX% | +X% |
| Data Transfer | $X,XXX | XX% | +X% |
| Other | $X,XXX | XX% | +X% |

### Cost by Environment

| Environment | Cost | % of Total |
|-------------|------|------------|
| Production | $X,XXX | XX% |
| Staging | $X,XXX | XX% |
| Development | $X,XXX | XX% |

### Cost by Team

| Team | Cost | % of Total |
|------|------|------------|
| Platform | $X,XXX | XX% |
| API | $X,XXX | XX% |
| Data | $X,XXX | XX% |

Tagging Requirements

Minimum required tags for cost allocation:

TagPurposeExample
EnvironmentEnv separationprod, staging, dev
TeamCost ownershipplatform, api, data
ServiceService identificationapi-gateway, auth
CostCenterFinancial allocationCC-1234

Phase 2: Anomaly Detection

Anomaly Detection Rules

RuleThresholdAlert
Daily spend spike>20% vs 7-day avgWarning
Service cost jump>50% vs last monthCritical
New service appearsAny new service >$100/dayInfo
Tag coverage drop<95% coverageWarning

Anomaly Investigation

When anomaly detected:

  1. Identify the spike:

    • Which service/resource?
    • When did it start?
    • What changed?
  2. Check common causes:

    • New deployment
    • Traffic increase
    • Data growth
    • Misconfiguration
    • Forgotten resources
  3. Validate intentionality:

    • Expected growth?
    • Approved change?
    • One-time vs recurring?

Anomaly Report Template

## Cost Anomaly Report

**Detected:** YYYY-MM-DD HH:MM
**Severity:** [Critical/Warning/Info]

### Anomaly Details

| Metric | Expected | Actual | Delta |
|--------|----------|--------|-------|
| Daily spend | $X,XXX | $X,XXX | +XX% |

### Investigation

**Root Cause:** [description]

**Contributing Factors:**
1. [Factor 1]
2. [Factor 2]

**Intentional:** [Yes/No]

### Action Required

- [ ] [Action if remediation needed]
- [ ] [Update budget if expected]

Phase 3: Optimization Analysis

Optimization Categories

CategoryTypical SavingsEffortRisk
Rightsizing20-40%LowLow
Reserved Capacity30-70%MediumLow-Medium
Spot Instances60-90%MediumMedium
Storage Tiering20-50%LowLow
Idle Resources100%LowNone
Data Transfer10-30%MediumLow

Rightsizing Analysis

## Rightsizing Opportunities

### Underutilized Instances

| Instance | Type | Avg CPU | Avg Mem | Recommendation | Savings |
|----------|------|---------|---------|----------------|---------|
| api-prod-1 | m5.xlarge | 15% | 25% | m5.large | $70/mo |
| worker-2 | c5.2xlarge | 30% | 20% | c5.xlarge | $140/mo |

### Criteria Used

- CPU avg <40% over 14 days -> downsize candidate
- Memory avg <50% over 14 days -> downsize candidate
- Excluded: ASG instances (handled by ASG sizing)

Reserved Instance Analysis

## Reserved Instance Coverage

### Current Coverage

| Service | On-Demand | Reserved | Coverage |
|---------|-----------|----------|----------|
| EC2 | $5,000 | $3,000 | 38% |
| RDS | $2,000 | $0 | 0% |
| ElastiCache | $500 | $500 | 50% |

### RI Recommendations

| Resource Type | Term | Payment | Monthly Savings | Break-even |
|---------------|------|---------|-----------------|------------|
| 10x m5.large | 1 year | No upfront | $350 | 0 months |
| db.r5.xlarge | 1 year | Partial | $180 | 4 months |

### RI Purchase Criteria

- Stable workload for >80% of term
- Usage predictable for commitment period
- Consider convertible RIs for flexibility

Idle Resource Detection

## Idle Resources

### Unattached EBS Volumes

| Volume ID | Size | Cost/Month | Last Attached |
|-----------|------|------------|---------------|
| vol-xxx | 100GB | $10 | 90 days ago |
| vol-yyy | 500GB | $50 | Never |

### Unused Elastic IPs

| IP | Allocation ID | Associated | Cost/Month |
|----|---------------|------------|------------|
| x.x.x.x | eipalloc-xxx | No | $3.60 |

### Idle Load Balancers

| LB Name | Target Groups | Requests/Day | Cost/Month |
|---------|---------------|--------------|------------|
| old-api | 0 | 0 | $16.50 |

Phase 4: Risk Assessment

Optimization Risk Matrix

OptimizationRisk LevelPotential ImpactMitigation
Downsize instanceLowPerformance degradationMonitor, quick rollback
Purchase RILow-MediumUnused commitmentConvertible RIs
Spot instancesMediumInstance interruptionDiversify, checkpointing
Delete idleNone-LowLost data (if EBS)Snapshot first
Storage tieringLowRetrieval latencyTest access patterns

Risk Assessment Checklist

  • Rollback plan documented
  • Performance baseline captured
  • Monitoring in place
  • Stakeholders informed
  • Timeline appropriate (not during peak)

Phase 5: Implementation

Implementation Priority

PriorityCriteriaExamples
Quick WinsLow effort, no risk, immediate savingsDelete idle resources
High ImpactSignificant savings, manageable riskRI purchases
Medium ImpactModerate savings, requires planningRightsizing
Long-termArchitectural changesSpot migration

Implementation Checklist

  • Change request approved
  • Scheduled during low-traffic period
  • Rollback plan ready
  • Monitoring dashboards open
  • Communication sent to stakeholders

Phase 6: Monitoring

Savings Tracking

## Savings Report

**Period:** [Month YYYY]
**Target Savings:** $X,XXX
**Actual Savings:** $X,XXX
**Achievement:** XX%

### Savings by Category

| Category | Target | Actual | Status |
|----------|--------|--------|--------|
| Rightsizing | $500 | $450 | 90% |
| Reserved Instances | $2,000 | $2,100 | 105% |
| Idle Resources | $200 | $200 | 100% |

### Monthly Trend

| Month | Spend | Savings | Cumulative |
|-------|-------|---------|------------|
| Jan | $50,000 | $0 | $0 |
| Feb | $48,000 | $2,000 | $2,000 |
| Mar | $47,500 | $2,500 | $4,500 |

Anti-Rationalization Table

RationalizationWhy It's WRONGRequired Action
"Small savings not worth it"Small savings compoundEvaluate ALL opportunities
"RIs are too risky"RI risk is manageableAnalyze stable workloads
"Dev doesn't need optimization"Dev is often 30%+ of costOptimize ALL environments
"Can't predict future usage"Historical data helpsUse data-driven forecasting
"Optimization takes too much time"ROI on optimization is highInvest in systematic process

Pressure Resistance

User SaysYour Response
"Just cut costs by 30%""Cannot proceed without analysis. Blind cuts cause outages. Will provide data-driven recommendations."
"Skip the analysis, buy RIs""RI purchases require usage analysis. Wrong RIs waste money. Analysis required first."
"Dev environment is fine as-is""Dev costs are significant. Optimization applies to all environments."

Dispatch Specialist

For cost optimization tasks, dispatch:

Task tool:
  subagent_type: "cloud-cost-optimizer"
  model: "opus"
  prompt: |
    COST ANALYSIS REQUEST
    Scope: [accounts/services to analyze]
    Period: [time range]
    Focus: [rightsizing/RI/general optimization]
    Constraints: [budget targets, risk tolerance]