Marketplace
latency-optimization
Use when optimizing end-to-end latency, reducing response times, or improving performance for latency-sensitive applications. Covers latency budgets, geographic routing, protocol optimization, and latency measurement techniques.
allowed_tools: Read, Glob, Grep
$ Installieren
git clone https://github.com/melodic-software/claude-code-plugins /tmp/claude-code-plugins && cp -r /tmp/claude-code-plugins/plugins/systems-design/skills/latency-optimization ~/.claude/skills/claude-code-plugins// tip: Run this command in your terminal to install the skill
SKILL.md
name: latency-optimization description: Use when optimizing end-to-end latency, reducing response times, or improving performance for latency-sensitive applications. Covers latency budgets, geographic routing, protocol optimization, and latency measurement techniques. allowed-tools: Read, Glob, Grep
Latency Optimization
Comprehensive guide to reducing end-to-end latency in distributed systems - from network to application to database layers.
When to Use This Skill
- Optimizing response times for user-facing applications
- Creating latency budgets for distributed systems
- Implementing geographic routing strategies
- Reducing database query latency
- Optimizing API response times
- Understanding and measuring latency components
Latency Fundamentals
Understanding Latency
Latency Components:
Total Latency = Network + Processing + Queue + Serialization
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Request Journey โ
โ โ
โ Client โโโบ DNS โโโบ TCP โโโบ TLS โโโบ Server โโโบ DB โโโบ Back โ
โ โ
โ Components: โ
โ โโโ DNS Resolution: 0-100ms (cached: 0ms) โ
โ โโโ TCP Handshake: 1 RTT (~10-200ms) โ
โ โโโ TLS Handshake: 1-2 RTT (~20-400ms) โ
โ โโโ Request Transfer: depends on size โ
โ โโโ Server Processing: application-specific โ
โ โโโ Database Query: 1-1000ms typical โ
โ โโโ Response Transfer: depends on size โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Key Metrics:
- P50: Median latency (50th percentile)
- P95: 95th percentile (tail latency starts)
- P99: 99th percentile (important for SLOs)
- P99.9: Three nines (critical systems)
Latency Numbers Every Developer Should Know
Latency Reference (2024 estimates):
Operation Time
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
L1 cache reference 1 ns
L2 cache reference 4 ns
Branch mispredict 5 ns
L3 cache reference 10 ns
Mutex lock/unlock 25 ns
Main memory reference 100 ns
Compress 1KB with Snappy 2,000 ns (2 ฮผs)
SSD random read 16,000 ns (16 ฮผs)
Read 1 MB from memory 50,000 ns (50 ฮผs)
Read 1 MB from SSD 200,000 ns (200 ฮผs)
Round trip same datacenter 500,000 ns (500 ฮผs)
Read 1 MB from network (1Gbps) 10,000,000 ns (10 ms)
HDD random read 10,000,000 ns (10 ms)
Round trip US East to US West 40,000,000 ns (40 ms)
Round trip US to Europe 80,000,000 ns (80 ms)
Round trip US to Asia 150,000,000 ns (150 ms)
Key Insights:
- Memory is 100x faster than SSD
- Same-datacenter is 80x faster than cross-continent
- Caching at any level provides huge wins
Latency Budget
Latency Budget Example (200ms target):
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ 200ms Total Budget โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ โโโโโโโโโโโโฌโโโโโโโโโโโฌโโโโโโโโโโโฌโโโโโโโโโโโฌโโโโโโโโโโโ โ
โ โ Network โ Auth โ Service โ DB โ Response โ โ
โ โ 50ms โ 20ms โ 50ms โ 60ms โ 20ms โ โ
โ โโโโโโโโโโโโดโโโโโโโโโโโดโโโโโโโโโโโดโโโโโโโโโโโดโโโโโโโโโโโ โ
โ โ
โ Breakdown: โ
โ โโโ Network (client โ edge โ origin): 50ms โ
โ โโโ Authentication/Authorization: 20ms โ
โ โโโ Service Processing: 50ms โ
โ โโโ Database Queries: 60ms โ
โ โโโ Response Serialization + Transfer: 20ms โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Budget Rules:
1. Allocate budgets based on criticality
2. Leave 10-20% headroom for variance
3. Monitor P99 against budget
4. Alert when consistently over budget
5. Renegotiate budgets as system evolves
Network Latency Optimization
Geographic Routing
Geographic Routing Strategies:
1. GeoDNS Routing
User IP โโโบ DNS Resolver โโโบ Nearest Server IP
Pros: Simple, works everywhere
Cons: DNS caching, IP geolocation inaccuracy
2. Anycast Routing
Same IP advertised from multiple locations
BGP routes to nearest (network topology)
Pros: Instant failover, no DNS delay
Cons: Requires BGP expertise, stateful sessions tricky
3. Load Balancer Geo-routing
Global LB โโโบ Regional LB โโโบ Servers
Pros: Fine-grained control, health checking
Cons: Adds latency hop, more complex
Selection Guide:
โโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Use Case โ Recommended Approach โ
โโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Static content โ Anycast CDN โ
โ API services โ GeoDNS + Regional deployments โ
โ Real-time apps โ Anycast + Connection persistence โ
โ Stateful apps โ GeoDNS with session affinity โ
โโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Protocol Optimization
Protocol-Level Optimizations:
1. HTTP/2 Benefits
โโโ Multiplexing (no head-of-line blocking)
โโโ Header compression (HPACK)
โโโ Server push (preemptive responses)
โโโ Single connection (reduced handshakes)
Latency Impact: 20-50% improvement typical
2. HTTP/3 (QUIC) Benefits
โโโ 0-RTT connection resumption
โโโ No TCP head-of-line blocking
โโโ Built-in encryption
โโโ Connection migration (IP changes)
Latency Impact: 10-30% over HTTP/2
3. TLS Optimization
โโโ TLS 1.3 (1-RTT handshake)
โโโ Session resumption (0-RTT)
โโโ OCSP stapling (no CA roundtrip)
โโโ Certificate chain optimization
Latency Impact: 50-200ms saved per connection
4. TCP Optimization
โโโ TCP Fast Open (TFO)
โโโ Increased initial congestion window
โโโ BBR congestion control
โโโ Keep-alive for connection reuse
Connection Optimization
Connection Strategies:
1. Connection Pooling
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Connection Pool โ
โ โโโโโโโ โโโโโโโ โโโโโโโ โโโโโโโ โ
โ โConn1โ โConn2โ โConn3โ โConn4โ โ
โ โโโโฌโโโ โโโโฌโโโ โโโโฌโโโ โโโโฌโโโ โ
โโโโโโโผโโโโโโโผโโโโโโโผโโโโโโโผโโโโโโโโโโโโโ
โ โ โ โ
Reuse connections, avoid handshake cost
2. Preconnect/Prefetch
<link rel="preconnect" href="https://api.example.com">
<link rel="dns-prefetch" href="https://cdn.example.com">
Triggers early connection establishment
3. Connection Coalescing (HTTP/2)
Multiple domains โ single connection
(When sharing same IP and certificate)
Application Latency Optimization
Caching Strategies
Caching Layers:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Caching Hierarchy โ
โ โ
โ Browser โโโบ CDN Edge โโโบ App Cache โโโบ DB Cache โโโบ DB โ
โ 1ms 10ms 20ms 50ms 100ms โ
โ โ
โ Each layer should catch most requests before next layer โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Cache Type Selection:
โโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Data Type โ Cache Location โ TTL Strategy โ
โโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Static assets โ CDN + Browser โ Long (1 year), hashed โ
โ API responses โ CDN + App โ Short (seconds-mins) โ
โ Session data โ App (Redis) โ Session duration โ
โ DB query results โ App (local/dist)โ Varies by query โ
โ Computed results โ App โ Based on input stalenessโ
โโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโ
Async Processing
Async Patterns for Latency:
1. Background Processing
Request โโโบ Validate โโโบ Queue โโโบ Response (fast)
โ
โโโโบ Worker (async processing)
User sees fast response, heavy work happens later
2. Parallel Requests
Sequential:
A(100ms) โ B(100ms) โ C(100ms) = 300ms
Parallel:
A(100ms) โโ
B(100ms) โโผโโโบ 100ms total
C(100ms) โโ
3. Speculative Execution
Start likely-needed work before confirmed
Cancel if not needed
Risk: Wasted resources if prediction wrong
4. Read-Your-Writes with Async
Write โโโบ Queue โโโบ Response + Local Cache Update
โ
User sees their write immediately
Backend processes asynchronously
Serialization Optimization
Serialization Format Comparison:
Format Encode Decode Size Human
Speed Speed (relative) Readable
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
JSON Fast Fast Large Yes
MessagePack V.Fast V.Fast Small No
Protocol Buf Fast V.Fast V.Small No
FlatBuffers Zero-copy V.Fast Small No
Avro Fast Fast Small Schema
Recommendations:
- Internal services: Protocol Buffers or MessagePack
- Public APIs: JSON (compatibility) or gRPC (performance)
- High-throughput: FlatBuffers (zero-copy)
- Schema evolution: Avro or Protocol Buffers
Optimization Tips:
1. Avoid serializing unnecessary fields
2. Use streaming for large payloads
3. Compress large responses (gzip/brotli)
4. Consider binary formats for internal traffic
Database Latency Optimization
Query Optimization
Database Latency Patterns:
1. Index Optimization
โ Full table scan: O(n) - slow
โ Index lookup: O(log n) - fast
โ Covering index: No table lookup needed
Monitor: Slow query logs, EXPLAIN plans
2. Query Patterns
โ N+1 queries: 1 + N roundtrips
โ Batch queries: 1 roundtrip
โ JOINs (when appropriate): 1 roundtrip
Example:
โ for user in users: get_orders(user.id) # N queries
โ get_orders_for_users(user_ids) # 1 query
3. Connection Management
โโโ Connection pooling (avoid connection overhead)
โโโ Prepared statements (avoid parsing overhead)
โโโ Connection proximity (same region as app)
4. Read Replicas
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Writes โโโบ Primary โ
โ Reads โโโบ Read Replica (lower latency)โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Database Proximity
Database Placement Strategies:
1. Co-located Database
App and DB in same availability zone
Latency: <1ms
Best for: Primary workloads
2. Same-Region Replica
Read replica in same region
Latency: 1-5ms
Best for: Read scaling
3. Cross-Region Replica
Replica in user's region
Latency: Local (~5ms) vs cross-region (~100ms)
Best for: Global read-heavy apps
4. Globally Distributed
Database spans regions (CockroachDB, Spanner)
Write latency: Higher (consensus)
Read latency: Local
Best for: Global consistency requirements
Measurement and Monitoring
Latency Measurement
Measurement Points:
1. Client-Side (Real User Monitoring)
โโโ Measures actual user experience
โโโ Includes network variability
โโโ Tools: Browser timing API, RUM services
2. Edge/CDN Metrics
โโโ Time to first byte (TTFB)
โโโ Cache hit ratio
โโโ Origin fetch time
3. Server-Side (APM)
โโโ Request processing time
โโโ Downstream service calls
โโโ Database query time
โโโ Tools: OpenTelemetry, APM vendors
4. Synthetic Monitoring
โโโ Consistent measurement conditions
โโโ Multiple geographic locations
โโโ Baseline for comparison
Distributed Tracing:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Request โโโบ Gateway โโโบ Service A โโโบ Service B โโโบ DB โ
โ โ โ โ โ โ โ
โ โโโโโโโโโโโโโดโโโโโโโโโโโโโดโโโโโโโโโโโโโดโโโโโโโโโโโโ โ
โ Trace ID links all spans together โ
โ Each span has start time + duration โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Latency SLOs
Setting Latency SLOs:
1. Define Meaningful Metrics
- P50: Typical experience
- P95: Most users' worst case
- P99: Tail latency for critical paths
2. Set Realistic Targets
P50: 50ms (snappy feel)
P95: 200ms (acceptable)
P99: 500ms (degraded but functional)
3. Error Budget Approach
If target is P99 < 500ms with 99.9% SLO:
- Budget: 0.1% of requests can exceed 500ms
- ~43 minutes per month of violations allowed
4. Alert Thresholds
โโโ Warning: P99 > 400ms (80% of budget)
โโโ Critical: P99 > 500ms (at budget)
โโโ Page: P99 > 600ms for 5 minutes (over budget)
Common Anti-Patterns
Latency Anti-Patterns:
1. "Chattiness"
โ Many small requests instead of batched
โ Batch requests, use GraphQL, aggregate APIs
2. "Synchronous Chains"
โ A โ B โ C โ D (sequential)
โ Parallelize independent calls, use async
3. "Unbounded Queries"
โ SELECT * without limits or pagination
โ Always paginate, limit result sets
4. "Cache Miss Storms"
โ Cache expires, all requests hit origin
โ Staggered TTLs, request coalescing, warm cache
5. "Logging in Hot Path"
โ Synchronous logging on every request
โ Async logging, sampling for high volume
6. "Premature Serialization"
โ Serialize before knowing if needed
โ Lazy serialization, stream when possible
7. "Ignoring Tail Latency"
โ Only monitoring averages
โ Track P95, P99, P99.9 for user experience
Best Practices
Latency Optimization Best Practices:
1. Measure First
โก Establish baseline measurements
โก Identify bottlenecks before optimizing
โก Use distributed tracing
โก Monitor percentiles, not just averages
2. Optimize Strategically
โก Start with biggest bottlenecks
โก Apply latency budgets
โก Consider cost vs benefit
โก Test optimizations under load
3. Network Layer
โก Deploy close to users (CDN, edge)
โก Use modern protocols (HTTP/2, HTTP/3)
โก Optimize TLS (1.3, session resumption)
โก Connection pooling and keep-alive
4. Application Layer
โก Cache aggressively and appropriately
โก Parallelize independent operations
โก Use async processing for non-critical work
โก Optimize serialization formats
5. Data Layer
โก Index frequently queried columns
โก Use read replicas for read-heavy loads
โก Connection pooling
โก Query optimization (avoid N+1)
6. Continuous Improvement
โก Regular latency reviews
โก Load testing with latency assertions
โก Automated regression detection
โก User experience correlation
Related Skills
caching-strategies- Application-level caching patternsmulti-region-deployment- Geographic distributioncdn-architecture- Edge caching and deliverydistributed-tracing- End-to-end latency visibility
Repository

melodic-software
Author
melodic-software/claude-code-plugins/plugins/systems-design/skills/latency-optimization
3
Stars
0
Forks
Updated1d ago
Added6d ago