service-mesh

A service mesh is an infrastructure layer that provides transparent service-to-service communication with built-in observability, traffic management, and security features without requiring application code changes.

$ Installer

git clone https://github.com/williamzujkowski/standards /tmp/standards && cp -r /tmp/standards/skills/cloud-native/service-mesh ~/.claude/skills/standards

// tip: Run this command in your terminal to install the skill


name: service-mesh category: cloud-native difficulty: advanced tags:

  • istio
  • linkerd
  • envoy
  • mtls
  • traffic-management
  • observability prerequisites:
  • kubernetes
  • networking
  • security estimated_time: 8-12 hours description: A service mesh is an infrastructure layer that provides transparent service-to-service communication with built-in observability, traffic management, and security features without requiring application code changes.

Service Mesh

Level 1: Quick Reference

What is a Service Mesh?

A service mesh is an infrastructure layer that provides transparent service-to-service communication with built-in observability, traffic management, and security features without requiring application code changes.

Core Components:

  • Control Plane: Configuration and policy management (Istiod)
  • Data Plane: Sidecar proxies handling traffic (Envoy)
  • Service Identity: Certificate-based authentication (mTLS)

Key Benefits

1. Observability

  • Automatic metrics collection (latency, throughput, errors)
  • Distributed tracing (request flow visualization)
  • Traffic topology and service dependencies
  • Real-time dashboards (Kiali, Grafana)

2. Traffic Management

  • Intelligent routing (canary, blue-green, A/B)
  • Load balancing (round-robin, least-request, consistent hash)
  • Traffic splitting and mirroring
  • Request retries and timeouts

3. Security

  • Automatic mutual TLS (mTLS) encryption
  • Service-to-service authentication
  • Fine-grained authorization policies
  • Certificate rotation and management

4. Resilience

  • Circuit breaking and outlier detection
  • Rate limiting and quota management
  • Fault injection for chaos testing
  • Connection pooling

Istio vs Linkerd Comparison

FeatureIstioLinkerd
ProxyEnvoy (C++)Linkerd2-proxy (Rust)
Resource UsageHigher (100-200MB/pod)Lower (10-20MB/pod)
FeaturesComprehensive (100+ CRDs)Focused (essential features)
ComplexityMore complex setupSimpler, faster setup
Multi-clusterFull supportBasic support
Traffic ManagementAdvanced (mirroring, A/B)Basic (canary, split)
ObservabilityFull stack (Kiali/Jaeger)Built-in (Linkerd Viz)
MaturityProduction-ready (CNCF)Production-ready (CNCF)

Choose Istio when:

  • Need advanced traffic management (mirroring, A/B testing)
  • Multi-cluster or multi-cloud deployments
  • Complex authorization requirements
  • Established operations team

Choose Linkerd when:

  • Resource efficiency is critical
  • Prefer simplicity over features
  • Kubernetes-only environment
  • Faster onboarding required

Essential Service Mesh Checklist

Pre-Installation:

 Kubernetes cluster v1.22+ running
 CNI plugin supporting network policies
 Prometheus and Grafana available
 Resource capacity planned (2-4 CPU, 4-8GB RAM)
 DNS resolution working correctly

Post-Installation:

 mTLS enabled cluster-wide (STRICT mode)
 Sidecar injection configured (automatic/manual)
 Monitoring dashboards accessible
 Health checks and readiness probes adjusted
 Certificate rotation configured
 Resource limits set on proxies

Traffic Management:

 VirtualService defines routing rules
 DestinationRule configures load balancing
 Circuit breaking thresholds set
 Retry policies configured
 Timeouts appropriate for services

Security:

 Authorization policies applied (default-deny)
 PeerAuthentication enforces mTLS
 Service accounts properly scoped
 Ingress/egress gateways secured
 Certificate validity monitored

Quick Start Commands

# Install Istio (demo profile)
istioctl install --set profile=demo -y

# Enable sidecar injection
kubectl label namespace default istio-injection=enabled

# Deploy sample app
kubectl apply -f samples/bookinfo/platform/kube/bookinfo.yaml

# Create ingress gateway
kubectl apply -f samples/bookinfo/networking/bookinfo-gateway.yaml

# Check mesh status
istioctl proxy-status

# View metrics
kubectl -n istio-system port-forward svc/kiali 20001:20001

Common Patterns

Canary Deployment:

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: reviews-canary
spec:
  hosts:
  - reviews
  http:
  - match:
    - headers:
        end-user:
          exact: jason
    route:
    - destination:
        host: reviews
        subset: v2
  - route:
    - destination:
        host: reviews
        subset: v1
      weight: 90
    - destination:
        host: reviews
        subset: v2
      weight: 10

Circuit Breaking:

apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: reviews-circuit-breaker
spec:
  host: reviews
  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 100
      http:
        http1MaxPendingRequests: 10
        maxRequestsPerConnection: 2
    outlierDetection:
      consecutiveErrors: 5
      interval: 30s
      baseEjectionTime: 30s
      maxEjectionPercent: 50

Zero Trust Security:

apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: deny-all
  namespace: default
spec:
  {} # Empty spec denies all traffic
---
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: allow-frontend-to-backend
spec:
  selector:
    matchLabels:
      app: backend
  action: ALLOW
  rules:
  - from:
    - source:
        principals: ["cluster.local/ns/default/sa/frontend"]
    to:
    - operation:
        methods: ["GET", "POST"]

Level 2:

📚 Full Examples: See REFERENCE.md for complete code samples, detailed configurations, and production-ready implementations.

Implementation Guide

Service Mesh Architecture

Control Plane (Istiod)

The control plane manages and configures the data plane proxies:

Core Functions:

  1. Service Discovery: Abstracts platform-specific discovery mechanisms
  2. Configuration Distribution: Pushes routing rules to proxies
  3. Certificate Authority: Issues and rotates mTLS certificates
  4. Policy Enforcement: Applies security and traffic policies

Istiod Components:

See REFERENCE.md for complete implementation.

Data Plane (Envoy Proxies)

Sidecar Pattern:

See REFERENCE.md for complete implementation.

Envoy Capabilities:

  • Layer 7 (HTTP/gRPC) and Layer 4 (TCP) proxying
  • Dynamic configuration via xDS APIs
  • Advanced load balancing algorithms
  • Health checking and circuit breaking
  • Rich metrics and distributed tracing

Traffic Flow

See REFERENCE.md for complete implementation.

Istio Installation and Configuration

Installation Options

1. IstioOperator (Recommended):

See REFERENCE.md for complete implementation.

2. Helm Installation:

See REFERENCE.md for complete implementation.

3. Production Profile:

See REFERENCE.md for complete implementation.

Sidecar Injection Strategies

Automatic Injection (Namespace-level):

# Label namespace for injection
kubectl label namespace production istio-injection=enabled

# Verify
kubectl get namespace -L istio-injection

Manual Injection (Pod-level):

# Inject sidecar into existing deployment
kubectl get deployment myapp -o yaml | \
  istioctl kube-inject -f - | \
  kubectl apply -f -

Selective Injection (Pod annotation):

apiVersion: v1
kind: Pod
metadata:
  name: myapp
  annotations:
    sidecar.istio.io/inject: "true"  # Explicit injection
spec:
  containers:
  - name: app
    image: myapp:v1

Configuration Validation

See REFERENCE.md for complete implementation.

Advanced Traffic Management

1. Canary Deployments

Gradual Rollout Strategy:

See REFERENCE.md for complete implementation.

Progressive Rollout Script:

See REFERENCE.md for complete implementation.

2. Blue-Green Deployments

Zero-Downtime Cutover:

See REFERENCE.md for complete implementation.

Cutover Process:

See REFERENCE.md for complete implementation.

3. A/B Testing

User Cohort Routing:

See REFERENCE.md for complete implementation.

4. Traffic Mirroring

Shadow Testing:

See REFERENCE.md for complete implementation.

Use Cases:

  • Test new version without impacting users
  • Load testing with real traffic patterns
  • Validate refactored services
  • Debug production issues safely

Observability Stack

Kiali (Service Mesh Dashboard)

Installation:

kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.20/samples/addons/kiali.yaml

# Access dashboard
istioctl dashboard kiali

Key Features:

  • Service topology graph
  • Traffic flow visualization
  • Configuration validation
  • Distributed tracing integration

Kiali Configuration:

See REFERENCE.md for complete implementation.

Jaeger (Distributed Tracing)

Installation:

kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.20/samples/addons/jaeger.yaml

# Access UI
istioctl dashboard jaeger

Enable Tracing in Mesh:

See REFERENCE.md for complete implementation.

Application Instrumentation:

See REFERENCE.md for complete implementation.

Prometheus & Grafana

Prometheus Installation:

kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.20/samples/addons/prometheus.yaml

Key Metrics:

See REFERENCE.md for complete implementation.

Grafana Dashboards:

kubectl apply -f https://raw.githubusercontent.com/istio/istio/release-1.20/samples/addons/grafana.yaml

istioctl dashboard grafana

Custom Dashboard (JSON):

See REFERENCE.md for complete implementation.

Security Configuration

Mutual TLS (mTLS)

Enable Strict mTLS (Cluster-wide):

See REFERENCE.md for complete implementation.

Per-Namespace mTLS:

See REFERENCE.md for complete implementation.

Per-Port mTLS:

See REFERENCE.md for complete implementation.

Authorization Policies

Default Deny:

See REFERENCE.md for complete implementation.

Allow Specific Services:

See REFERENCE.md for complete implementation.

JWT Authentication:

See REFERENCE.md for complete implementation.

RBAC with Custom Claims:

See REFERENCE.md for complete implementation.

Resilience Patterns

Circuit Breaking

Configuration:

See REFERENCE.md for complete implementation.

Testing Circuit Breaker:

See REFERENCE.md for complete implementation.

Retries and Timeouts

Retry Configuration:

See REFERENCE.md for complete implementation.

Timeout Configuration:

See REFERENCE.md for complete implementation.

Fault Injection (Chaos Testing):

See REFERENCE.md for complete implementation.

Rate Limiting

Local Rate Limiting (Envoy):

See REFERENCE.md for complete implementation.

Multi-Cluster Service Mesh

Cluster Federation

Primary-Remote Model:

See REFERENCE.md for complete implementation.

See REFERENCE.md for complete implementation.

Cross-Cluster Service Discovery:

See REFERENCE.md for complete implementation.

Performance Tuning

Resource Optimization

Sidecar Resource Limits:

See REFERENCE.md for complete implementation.

Sidecar Scoping:

See REFERENCE.md for complete implementation.

Telemetry Optimization

Reduce Metrics Cardinality:

See REFERENCE.md for complete implementation.

Sampling Configuration:

See REFERENCE.md for complete implementation.

Examples

Basic Usage

See REFERENCE.md for complete implementation.

Advanced Usage

// TODO: Add advanced example for service-mesh
// This example shows production-ready patterns

Integration Example

// TODO: Add integration example showing how service-mesh
// works with other systems and services

See examples/service-mesh/ for complete working examples.

Integration Points

This skill integrates with:

Upstream Dependencies

  • Tools: Common development tools and frameworks
  • Prerequisites: Basic understanding of general concepts

Downstream Consumers

  • Applications: Production systems requiring service-mesh functionality
  • CI/CD Pipelines: Automated testing and deployment workflows
  • Monitoring Systems: Observability and logging platforms

Related Skills

  • See other skills in this category

Common Integration Patterns

  1. Development Workflow: How this skill fits into daily development
  2. Production Deployment: Integration with production systems
  3. Monitoring & Alerting: Observability integration points

Common Pitfalls

Pitfall 1: Insufficient Testing

Problem: Not testing edge cases and error conditions leads to production bugs

Solution: Implement comprehensive test coverage including:

  • Happy path scenarios
  • Error handling and edge cases
  • Integration points with external systems

Prevention: Enforce minimum code coverage (80%+) in CI/CD pipeline

Pitfall 2: Hardcoded Configuration

Problem: Hardcoding values makes applications inflexible and environment-dependent

Solution: Use environment variables and configuration management:

  • Separate config from code
  • Use environment-specific configuration files
  • Never commit secrets to version control

Prevention: Use tools like dotenv, config validators, and secret scanners

Pitfall 3: Ignoring Security Best Practices

Problem: Security vulnerabilities from not following established security patterns

Solution: Follow security guidelines:

  • Input validation and sanitization
  • Proper authentication and authorization
  • Encrypted data transmission (TLS/SSL)
  • Regular security audits and updates

Prevention: Use security linters, SAST tools, and regular dependency updates

Best Practices:

  • Follow established patterns and conventions for service-mesh
  • Keep dependencies up to date and scan for vulnerabilities
  • Write comprehensive documentation and inline comments
  • Use linting and formatting tools consistently
  • Implement proper error handling and logging
  • Regular code reviews and pair programming
  • Monitor production metrics and set up alerts

Level 3: Deep Dive Resources

Official Documentation

Books

  • "Istio in Action" by Christian Posta and Rinor Maloku - Comprehensive guide
  • "Service Mesh Patterns" by Alex Soto Bueno - Design patterns
  • "Mastering Service Mesh" by Anjul Sahu - Advanced concepts

Online Courses

Hands-On Labs

Community Resources

Tools & Extensions

  • Kiali - Service mesh observability
  • Flagger - Progressive delivery operator
  • Meshery - Multi-mesh management
  • SMI (Service Mesh Interface) - Standard specification

Best Practices