Marketplace

ml-system-design

End-to-end ML system design for production. Use when designing ML pipelines, feature stores, model training infrastructure, or serving systems. Covers the complete lifecycle from data ingestion to model deployment and monitoring.

allowed_tools: Read, Glob, Grep

$ 安裝

git clone https://github.com/melodic-software/claude-code-plugins /tmp/claude-code-plugins && cp -r /tmp/claude-code-plugins/plugins/systems-design/skills/ml-system-design ~/.claude/skills/claude-code-plugins

// tip: Run this command in your terminal to install the skill


name: ml-system-design description: End-to-end ML system design for production. Use when designing ML pipelines, feature stores, model training infrastructure, or serving systems. Covers the complete lifecycle from data ingestion to model deployment and monitoring. allowed-tools: Read, Glob, Grep

ML System Design

This skill provides frameworks for designing production machine learning systems, from data pipelines to model serving.

When to Use This Skill

Keywords: ML pipeline, machine learning system, feature store, model training, model serving, ML infrastructure, MLOps, A/B testing ML, feature engineering, model deployment

Use this skill when:

  • Designing end-to-end ML systems for production
  • Planning feature store architecture
  • Designing model training pipelines
  • Planning model serving infrastructure
  • Preparing for ML system design interviews
  • Evaluating ML platform tools and frameworks

ML System Architecture Overview

The ML System Lifecycle

┌─────────────────────────────────────────────────────────────────────────┐
│                        ML SYSTEM LIFECYCLE                               │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  ┌──────────┐   ┌──────────┐   ┌──────────┐   ┌──────────┐   ┌────────┐ │
│  │  Data    │──▶│ Feature  │──▶│  Model   │──▶│  Model   │──▶│ Monitor│ │
│  │ Ingestion│   │ Pipeline │   │ Training │   │ Serving  │   │ & Eval │ │
│  └──────────┘   └──────────┘   └──────────┘   └──────────┘   └────────┘ │
│       │              │              │              │              │      │
│       ▼              ▼              ▼              ▼              ▼      │
│  ┌──────────┐   ┌──────────┐   ┌──────────┐   ┌──────────┐   ┌────────┐ │
│  │  Data    │   │ Feature  │   │  Model   │   │ Inference│   │ Metrics│ │
│  │  Lake    │   │  Store   │   │ Registry │   │  Cache   │   │  Store │ │
│  └──────────┘   └──────────┘   └──────────┘   └──────────┘   └────────┘ │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Key Components

ComponentPurposeExamples
Data IngestionCollect raw data from sourcesKafka, Kinesis, Pub/Sub
Feature PipelineTransform raw data to featuresSpark, Flink, dbt
Feature StoreStore and serve featuresFeast, Tecton, Vertex AI
Model TrainingTrain and validate modelsSageMaker, Vertex AI, Kubeflow
Model RegistryVersion and track modelsMLflow, Weights & Biases
Model ServingServe predictionsTensorFlow Serving, Triton, vLLM
MonitoringTrack model performanceEvidently, WhyLabs, Arize

Feature Store Architecture

Why Feature Stores?

Problems without a feature store:

  • Training-serving skew (features computed differently)
  • Duplicate feature computation across teams
  • No feature versioning or lineage
  • Slow feature experimentation

Feature Store Components

┌─────────────────────────────────────────────────────────────────┐
│                      FEATURE STORE                               │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌─────────────────────┐       ┌─────────────────────┐          │
│  │   OFFLINE STORE     │       │   ONLINE STORE      │          │
│  │                     │       │                     │          │
│  │  - Historical data  │       │  - Low-latency      │          │
│  │  - Training queries │ ────▶ │  - Point lookups    │          │
│  │  - Batch features   │ sync  │  - Real-time serving│          │
│  │                     │       │                     │          │
│  │  (Data Warehouse)   │       │  (Redis, DynamoDB)  │          │
│  └─────────────────────┘       └─────────────────────┘          │
│                                                                  │
│  ┌─────────────────────────────────────────────────────────────┐│
│  │                   FEATURE REGISTRY                          ││
│  │  - Feature definitions    - Version control                 ││
│  │  - Data lineage          - Access control                   ││
│  └─────────────────────────────────────────────────────────────┘│
└─────────────────────────────────────────────────────────────────┘

Feature Types

TypeComputationStorageExample
BatchScheduled (hourly/daily)Offline → OnlineUser purchase count (30 days)
StreamingReal-time event processingDirect to onlineItems in cart (current)
On-demandRequest-time computationNot storedDistance to nearest store

Training-Serving Consistency

TRAINING (Historical):
┌──────────────┐    ┌──────────────┐    ┌──────────────┐
│ Historical   │───▶│ Point-in-Time│───▶│  Training    │
│ Events       │    │ Join         │    │  Dataset     │
└──────────────┘    └──────────────┘    └──────────────┘
                          │
                    Uses feature
                    definitions
                          │
SERVING (Real-time):      ▼
┌──────────────┐    ┌──────────────┐    ┌──────────────┐
│ Online       │───▶│ Same Feature │───▶│  Prediction  │
│ Store        │    │ Definitions  │    │  Request     │
└──────────────┘    └──────────────┘    └──────────────┘

Model Training Infrastructure

Training Pipeline Components

┌───────────────────────────────────────────────────────────────────────┐
│                     TRAINING PIPELINE                                  │
├───────────────────────────────────────────────────────────────────────┤
│                                                                        │
│  ┌────────────┐   ┌────────────┐   ┌────────────┐   ┌────────────┐   │
│  │   Data     │──▶│   Feature  │──▶│   Model    │──▶│  Model     │   │
│  │   Loader   │   │   Transform│   │   Train    │   │  Validate  │   │
│  └────────────┘   └────────────┘   └────────────┘   └────────────┘   │
│        │               │                │                 │           │
│        ▼               ▼                ▼                 ▼           │
│  ┌────────────┐   ┌────────────┐   ┌────────────┐   ┌────────────┐   │
│  │ Experiment │   │ Hyperparameter│ │  Checkpoint │  │   Model    │   │
│  │  Tracking  │   │    Tuning     │ │   Storage  │   │  Registry  │   │
│  └────────────┘   └────────────┘   └────────────┘   └────────────┘   │
│                                                                        │
└───────────────────────────────────────────────────────────────────────┘

Training Infrastructure Patterns

PatternUse CaseTools
Single-nodeSmall datasets, quick experimentsJupyter, local GPU
Distributed data-parallelLarge datasets, same modelHorovod, PyTorch DDP
Model-parallelLarge models that don't fit in memoryDeepSpeed, FSDP, Megatron
Hyperparameter tuningAutomated model optimizationOptuna, Ray Tune

Experiment Tracking

Track for reproducibility:

What to TrackWhy
HyperparametersReproduce training runs
MetricsCompare model performance
ArtifactsModel files, datasets
Code versionGit commit hash
EnvironmentDocker image, dependencies
Data versionDataset hash or snapshot

Model Serving Architecture

Serving Patterns

PatternLatencyThroughputUse Case
Online (REST/gRPC)Low (<100ms)MediumReal-time predictions
BatchHigh (hours)Very highBulk scoring
StreamingMediumHighEvent-driven predictions
EmbeddedVery lowVariesEdge/mobile inference

Online Serving Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                     MODEL SERVING SYSTEM                             │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│   ┌──────────────┐                                                  │
│   │   Clients    │                                                  │
│   └──────┬───────┘                                                  │
│          │                                                          │
│          ▼                                                          │
│   ┌──────────────┐                                                  │
│   │ Load Balancer│                                                  │
│   └──────┬───────┘                                                  │
│          │                                                          │
│          ▼                                                          │
│   ┌──────────────────────────────────────────────────────────────┐  │
│   │                    API Gateway                                │  │
│   │  - Authentication   - Rate limiting   - Request validation   │  │
│   └──────────────────────────────┬───────────────────────────────┘  │
│                                  │                                  │
│          ┌───────────────────────┼───────────────────────┐         │
│          ▼                       ▼                       ▼         │
│   ┌────────────┐          ┌────────────┐          ┌────────────┐  │
│   │  Model A   │          │  Model B   │          │  Model C   │  │
│   │  (v1.2)    │          │  (v2.0)    │          │  (v1.0)    │  │
│   └────────────┘          └────────────┘          └────────────┘  │
│          │                       │                       │         │
│          └───────────────────────┼───────────────────────┘         │
│                                  ▼                                  │
│                         ┌────────────────┐                         │
│                         │ Feature Store  │                         │
│                         │ (Online)       │                         │
│                         └────────────────┘                         │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

Latency Optimization

TechniqueLatency ImpactTrade-off
BatchingReduces per-requestIncreases latency for first request
Caching10-100x fasterMay serve stale predictions
Quantization2-4x fasterSlight accuracy loss
DistillationVariableTraining overhead
GPU inference10-100x fasterCost increase

A/B Testing ML Models

Experiment Design

┌─────────────────────────────────────────────────────────────────────┐
│                      A/B TESTING ARCHITECTURE                        │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│   ┌──────────────┐                                                  │
│   │   Traffic    │                                                  │
│   └──────┬───────┘                                                  │
│          │                                                          │
│          ▼                                                          │
│   ┌──────────────────────┐                                          │
│   │ Experiment Assignment │ ◀─────── Experiment Config              │
│   │ - User bucketing      │          - Allocation %                 │
│   │ - Feature flags       │          - Target segments              │
│   └──────────┬───────────┘          - Guardrails                   │
│              │                                                       │
│     ┌────────┴────────┐                                             │
│     ▼                 ▼                                             │
│ ┌────────┐       ┌────────┐                                         │
│ │Control │       │Treatment│                                        │
│ │Model A │       │Model B  │                                        │
│ └────┬───┘       └────┬───┘                                         │
│      │                │                                              │
│      └────────┬───────┘                                             │
│               ▼                                                      │
│      ┌────────────────┐                                             │
│      │ Metrics Logger │                                             │
│      └────────┬───────┘                                             │
│               ▼                                                      │
│      ┌────────────────┐                                             │
│      │ Statistical    │ ─────▶ Decision: Ship / Iterate / Kill     │
│      │ Analysis       │                                             │
│      └────────────────┘                                             │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

Metrics to Track

Metric TypeExamplesPurpose
Model metricsAUC, RMSE, precision/recallModel quality
Business metricsCTR, conversion, revenueBusiness impact
Guardrail metricsLatency, error rate, engagementPrevent regressions
Segment metricsMetrics by user segmentDetect heterogeneous effects

Statistical Considerations

  • Sample size: Calculate power before experiment
  • Duration: Account for novelty effects and time patterns
  • Multiple testing: Adjust for multiple metrics (Bonferroni, FDR)
  • Early stopping: Use sequential testing methods

Model Monitoring

What to Monitor

CategoryMetricsAlert Threshold
Data qualityMissing values, schema drift>1% change
Feature driftDistribution shift (PSI, KL)PSI >0.2
Prediction driftOutput distribution shiftDepends on use case
Model performanceAccuracy, AUC (when labels available)>5% degradation
OperationalLatency, throughput, errorsSLO violations

Drift Detection

┌─────────────────────────────────────────────────────────────────────┐
│                      DRIFT DETECTION PIPELINE                        │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  Training Data                 Production Data                       │
│  ┌──────────────┐              ┌──────────────┐                     │
│  │ Reference    │              │   Current    │                     │
│  │ Distribution │              │ Distribution │                     │
│  └──────┬───────┘              └──────┬───────┘                     │
│         │                             │                              │
│         └──────────────┬──────────────┘                              │
│                        ▼                                             │
│              ┌──────────────────┐                                   │
│              │ Statistical Test │                                   │
│              │ - PSI (Population Stability Index)                   │
│              │ - KS Test                                            │
│              │ - Chi-squared                                        │
│              └────────┬─────────┘                                   │
│                       ▼                                              │
│              ┌──────────────────┐                                   │
│              │  Drift Score     │                                   │
│              └────────┬─────────┘                                   │
│                       │                                              │
│           ┌───────────┼───────────┐                                 │
│           ▼           ▼           ▼                                 │
│      No Drift    Warning     Critical                               │
│      (< 0.1)    (0.1-0.2)    (> 0.2)                               │
│         │           │           │                                   │
│         ▼           ▼           ▼                                   │
│      Continue    Investigate   Retrain                              │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

Common ML System Design Patterns

Pattern 1: Recommendation System

Components needed:
- Candidate Generation (retrieve 100s-1000s)
- Ranking Model (score and sort)
- Feature Store (user features, item features)
- Real-time personalization (recent behavior)
- A/B testing infrastructure

Pattern 2: Fraud Detection

Components needed:
- Real-time feature computation
- Low-latency model serving (<50ms)
- High recall focus (can't miss fraud)
- Explainability for compliance
- Human-in-the-loop review
- Feedback loop for labels

Pattern 3: Search Ranking

Components needed:
- Two-stage ranking (retrieval + ranking)
- Feature store for query/document features
- Low latency (<200ms end-to-end)
- Learning to rank models
- Click-through rate prediction
- A/B testing with interleaving

Estimation for ML Systems

Training Infrastructure

Training time estimation:
- Dataset size: 100M examples
- Model: Transformer (100M params)
- GPU: A100 (80GB, 312 TFLOPS)
- Batch size: 32
- Training steps: Dataset / batch = 3.1M steps
- Time per step: ~100ms
- Total time: ~86 hours single GPU
- With 8 GPUs (data parallel): ~11 hours

Serving Infrastructure

Inference estimation:
- QPS: 10,000
- Model latency: 20ms
- Batch size: 1 (real-time)
- GPU utilization: 50% (latency constraint)
- Requests per GPU/sec: 25
- GPUs needed: 10,000 / 25 = 400 GPUs
- With batching (batch 8): 100 GPUs (4x reduction)

Related Skills

  • llm-serving-patterns - LLM-specific serving and optimization
  • rag-architecture - Retrieval-Augmented Generation patterns
  • vector-databases - Vector search and embeddings
  • ml-inference-optimization - Latency and cost optimization
  • estimation-techniques - Back-of-envelope calculations
  • quality-attributes-taxonomy - NFR definitions

Related Commands

  • /sd:ml-pipeline <problem> - Design ML system interactively
  • /sd:estimate <scenario> - Capacity calculations

Related Agents

  • ml-systems-designer - Design ML architectures
  • ml-interviewer - Mock ML system design interviews

Version History

  • v1.0.0 (2025-12-26): Initial release

Last Updated

Date: 2025-12-26 Model: claude-opus-4-5-20251101

Repository

melodic-software
melodic-software
Author
melodic-software/claude-code-plugins/plugins/systems-design/skills/ml-system-design
3
Stars
0
Forks
Updated2d ago
Added6d ago