Marketplace

scalability-advisor

Guidance for scaling systems from startup to enterprise scale. Use when planning for growth, diagnosing bottlenecks, or designing systems that need to handle 10x-1000x current load.

$ 安裝

git clone https://github.com/alirezarezvani/claude-cto-team /tmp/claude-cto-team && cp -r /tmp/claude-cto-team/skills/scalability-advisor ~/.claude/skills/claude-cto-team

// tip: Run this command in your terminal to install the skill


name: scalability-advisor description: Guidance for scaling systems from startup to enterprise scale. Use when planning for growth, diagnosing bottlenecks, or designing systems that need to handle 10x-1000x current load.

Scalability Advisor

Provides systematic guidance for scaling systems at different growth stages, identifying bottlenecks, and designing for horizontal scalability.

When to Use

  • Planning for 10x, 100x, or 1000x growth
  • Diagnosing current performance bottlenecks
  • Designing new systems for scale
  • Evaluating scaling strategies (vertical vs. horizontal)
  • Capacity planning and infrastructure sizing

Scaling Stages Framework

Stage Overview

┌─────────────────────────────────────────────────────────────────────┐
│                    SCALING JOURNEY                                  │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  Stage 1        Stage 2         Stage 3         Stage 4             │
│  Startup        Growth          Scale           Enterprise          │
│  0-10K users    10K-100K        100K-1M         1M+ users           │
│                                                                     │
│  Single         Add caching,    Horizontal      Global,             │
│  server         read replicas   scaling         multi-region        │
│                                                                     │
│  $100/mo        $1K/mo          $10K/mo         $100K+/mo           │
└─────────────────────────────────────────────────────────────────────┘

Stage 1: Startup (0-10K Users)

Architecture

┌────────────────────────────────────────┐
│           Single Server                │
│  ┌──────────────────────────────────┐  │
│  │  App Server (Node/Python/etc)    │  │
│  │  + Database (PostgreSQL)         │  │
│  │  + File Storage (local/S3)       │  │
│  └──────────────────────────────────┘  │
└────────────────────────────────────────┘

Key Metrics

MetricTargetWarning
Response time (P95)< 500ms> 1s
Database queries/request< 10> 20
Server CPU< 70%> 85%
Database connections< 50% pool> 80% pool

What to Focus On

DO:

  • Write clean, maintainable code
  • Use database indexes on frequently queried columns
  • Implement basic monitoring (uptime, errors)
  • Keep architecture simple (monolith is fine)

DON'T:

  • Over-engineer for scale you don't have
  • Add caching before you need it
  • Split into microservices prematurely
  • Worry about multi-region yet

When to Move to Stage 2

  • Database CPU consistently > 70%
  • Response times degrading
  • Single queries taking > 100ms
  • Server resources maxed

Stage 2: Growth (10K-100K Users)

Architecture

┌─────────────────────────────────────────────────────────────┐
│                                                             │
│    ┌─────────┐      ┌─────────────────────────────────┐     │
│    │   CDN   │      │      Load Balancer              │     │
│    └────┬────┘      └──────────────┬──────────────────┘     │
│         │                          │                        │
│         │           ┌──────────────┼──────────────┐         │
│         │           │              │              │         │
│         ▼           ▼              ▼              ▼         │
│    ┌─────────┐ ┌─────────┐   ┌─────────┐   ┌─────────┐      │
│    │ Static  │ │ App 1   │   │ App 2   │   │ App 3   │      │
│    │ Assets  │ └────┬────┘   └────┬────┘   └────┬────┘      │
│    └─────────┘      │             │             │           │
│                     └──────────────┼────────────┘           │
│                                    │                        │
│                     ┌──────────────┼──────────────┐         │
│                     │              │              │         │
│                     ▼              ▼              ▼         │
│               ┌─────────┐   ┌─────────┐   ┌─────────┐       │
│               │ Primary │   │  Read   │   │  Redis  │       │
│               │   DB    │───│ Replica │   │  Cache  │       │
│               └─────────┘   └─────────┘   └─────────┘       │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Key Additions

ComponentPurposeWhen to Add
CDNStatic asset cachingImages, JS, CSS taking > 20% bandwidth
Load BalancerDistribute trafficSingle server CPU > 70%
Read ReplicasOffload reads> 80% database ops are reads
Redis CacheApplication cachingSame queries repeated frequently
Job QueueAsync processingBackground tasks blocking requests

Caching Strategy

Request Flow with Caching:

1. Check CDN (static assets)         ─► HIT: Return cached
                                           │
2. Check Application Cache (Redis)   ─► HIT: Return cached
                                           │
3. Check Database                    ─► Return + Cache result

What to Cache:

  • Session data (TTL: session duration)
  • User profile data (TTL: 5-15 minutes)
  • API responses (TTL: varies by freshness needs)
  • Database query results (TTL: 1-5 minutes)
  • Computed values (TTL: based on computation cost)

Database Optimization

-- Find slow queries
SELECT query, calls, mean_time, total_time
FROM pg_stat_statements
ORDER BY total_time DESC
LIMIT 20;

-- Find missing indexes
SELECT schemaname, tablename, indexrelname, idx_scan, seq_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0 AND seq_scan > 1000;

When to Move to Stage 3

  • Write traffic overwhelming single primary
  • Cache hit rate plateauing despite optimization
  • Read replicas can't keep up with replication lag
  • Need independent scaling of components

Stage 3: Scale (100K-1M Users)

Architecture

┌──────────────────────────────────────────────────────────────────────┐
│                           CDN / Edge                                 │
└──────────────────────────────────────────────────────────────────────┘
                                    │
┌──────────────────────────────────────────────────────────────────────┐
│                        API Gateway                                   │
│              (Rate limiting, Auth, Routing)                          │
└──────────────────────────────────────────────────────────────────────┘
                                    │
        ┌───────────────────────────┼───────────────────────────┐
        │                           │                           │
        ▼                           ▼                           ▼
┌───────────────┐          ┌───────────────┐          ┌───────────────┐
│   Service A   │          │   Service B   │          │   Service C   │
│   (Users)     │          │   (Orders)    │          │   (Search)    │
│   Auto-scale  │          │   Auto-scale  │          │   Auto-scale  │
└───────┬───────┘          └───────┬───────┘          └───────┬───────┘
        │                          │                          │
        ▼                          ▼                          ▼
┌───────────────┐          ┌───────────────┐          ┌───────────────┐
│   User DB     │          │   Order DB    │          │ Elasticsearch │
│   (Sharded)   │          │   (Sharded)   │          │   (Cluster)   │
└───────────────┘          └───────────────┘          └───────────────┘
                                    │
                                    ▼
                    ┌───────────────────────────┐
                    │     Message Queue         │
                    │     (Kafka / SQS)         │
                    └───────────────────────────┘

Key Patterns

Database Sharding

Sharding Strategies:

1. Hash-based (user_id % num_shards)
   PRO: Even distribution
   CON: Hard to add shards

2. Range-based (user_id 1-1M → shard 1)
   PRO: Easy to add shards
   CON: Hotspots possible

3. Directory-based (lookup table)
   PRO: Flexible
   CON: Lookup overhead

Event-Driven Architecture

Synchronous → Asynchronous

Before:
  API → Service A → Service B → Service C → Response (slow)

After:
  API → Service A → Queue → Response (fast)
                      ↓
              Service B, C process async

Scaling Checklist

  • Stateless application servers (no local state)
  • Database read/write separation
  • Asynchronous processing for non-critical paths
  • Circuit breakers between services
  • Distributed tracing implemented
  • Auto-scaling configured with proper metrics
  • Database connection pooling (PgBouncer, ProxySQL)

When to Move to Stage 4

  • Need geographic distribution for latency
  • Regulatory requirements (data residency)
  • Single region can't handle failover
  • Global user base with latency requirements

Stage 4: Enterprise (1M+ Users)

Architecture

┌─────────────────────────────────────────────────────────────────────────┐
│                          Global Load Balancer                           │
│                       (GeoDNS, Anycast, Route53)                        │
└─────────────────────────────────────────────────────────────────────────┘
                    │                                │
           ┌────────┴────────┐              ┌───────┴────────┐
           │                 │              │                │
           ▼                 ▼              ▼                ▼
    ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐
    │  US-East     │  │  US-West     │  │  EU-West     │  │  AP-South    │
    │  Region      │  │  Region      │  │  Region      │  │  Region      │
    │  ┌────────┐  │  │  ┌────────┐  │  │  ┌────────┐  │  │  ┌────────┐  │
    │  │Services│  │  │  │Services│  │  │  │Services│  │  │  │Services│  │
    │  └────────┘  │  │  └────────┘  │  │  └────────┘  │  │  └────────┘  │
    │  ┌────────┐  │  │  ┌────────┐  │  │  ┌────────┐  │  │  ┌────────┐  │
    │  │Database│  │  │  │Database│  │  │  │Database│  │  │  │Database│  │
    │  │(Primary)│ │  │  │(Replica)│ │  │  │(Primary)│ │  │  │(Replica)│ │
    │  └────────┘  │  │  └────────┘  │  │  └────────┘  │  │  └────────┘  │
    └──────────────┘  └──────────────┘  └──────────────┘  └──────────────┘
                              │                   │
                              └─────────┬─────────┘
                                        │
                              Cross-Region Replication

Multi-Region Patterns

PatternConsistencyLatencyComplexity
Active-PassiveStrongHigh failoverLow
Active-ActiveEventualLowHigh
Follow-the-SunStrong per regionMediumMedium

Data Consistency Strategies

CAP Theorem Trade-offs:

Strong Consistency (CP):
- All regions see same data
- Higher latency for writes
- Use for: Financial transactions, inventory

Eventual Consistency (AP):
- Regions may have stale data briefly
- Low latency always
- Use for: Social feeds, analytics, non-critical

Causal Consistency:
- Related operations ordered correctly
- Balance of latency and correctness
- Use for: Messaging, collaboration

Enterprise Checklist

  • Multi-region deployment
  • Cross-region data replication
  • Global CDN with edge caching
  • Disaster recovery tested
  • Compliance (SOC 2, GDPR, data residency)
  • 99.99% SLA architecture
  • Zero-downtime deployments
  • Chaos engineering practice

Bottleneck Diagnosis Guide

Finding the Bottleneck

Systematic Diagnosis:

1. Where is time spent?
   └─► Distributed tracing (Jaeger, Datadog)

2. Is it the database?
   └─► Check slow query logs, connection pool

3. Is it the application?
   └─► CPU profiling, memory analysis

4. Is it the network?
   └─► Latency between services, DNS resolution

5. Is it external services?
   └─► Third-party API latency, rate limits

Common Bottlenecks by Layer

LayerSymptomsSolutions
DatabaseSlow queries, high CPUIndexing, read replicas, caching
ApplicationHigh CPU, memoryOptimize code, scale horizontally
NetworkHigh latency, timeoutsCDN, edge caching, connection pooling
StorageSlow I/O, high waitSSD, object storage, caching
External APIsTimeouts, rate limitsCircuit breakers, caching, fallbacks

Database Bottleneck Checklist

## Quick Database Health Check

1. Connection Pool
   - Current connections vs max?
   - Connection wait time?
   - Pool exhaustion events?

2. Query Performance
   - Slowest queries (pg_stat_statements)?
   - Missing indexes (seq scans > 10K)?
   - Lock contention?

3. Replication
   - Replica lag?
   - Write throughput?
   - Read distribution?

4. Storage
   - Disk I/O wait?
   - Table/index bloat?
   - WAL write latency?

Scaling Calculations

Capacity Planning Formula

Required Capacity = Peak Traffic × Growth Factor × Safety Margin

Example:
- Current peak: 1,000 req/sec
- Expected growth: 3x in 12 months
- Safety margin: 1.5x

Required: 1,000 × 3 × 1.5 = 4,500 req/sec capacity

Database Sizing

Connection Pool Size:
  connections = (num_cores × 2) + effective_spindle_count

  Example: 8 cores, SSD
  connections = (8 × 2) + 1 = 17 connections per instance

Read Replica Sizing:
  replicas = ceiling(read_traffic / single_replica_capacity)

  Example: 10,000 reads/sec, 3,000/replica capacity
  replicas = ceiling(10,000 / 3,000) = 4 replicas

Cache Sizing

Cache Size:
  memory = working_set_size × (1 + overhead_factor)

  Working set = frequently accessed data (usually 10-20% of total)
  Overhead = ~1.5x for Redis data structures

  Example: 10GB working set
  Redis memory = 10GB × 1.5 = 15GB

Quick Reference

Scaling Decision Matrix

SymptomFirst TryThen TryFinally
Slow page loadsAdd cachingCDNEdge compute
Database slowAdd indexesRead replicasSharding
API timeoutsAsync processingCircuit breakersEvent-driven
High server CPUVertical scaleHorizontal scaleOptimize code
High memoryIncrease RAMFix memory leaksRedesign data structures

Infrastructure Cost at Scale

UsersArchitectureMonthly Cost
10KSingle server$100-300
100KLoad balanced + cache$1,000-3,000
1MMicroservices + sharding$10,000-30,000
10MMulti-region$100,000+

References