Marketplace

stream-processing

Use when designing real-time data processing systems, choosing stream processing frameworks, or implementing event-driven architectures. Covers Kafka, Flink, and streaming patterns.

allowed_tools: Read, Glob, Grep

$ Instalar

git clone https://github.com/melodic-software/claude-code-plugins /tmp/claude-code-plugins && cp -r /tmp/claude-code-plugins/plugins/systems-design/skills/stream-processing ~/.claude/skills/claude-code-plugins

// tip: Run this command in your terminal to install the skill


name: stream-processing description: Use when designing real-time data processing systems, choosing stream processing frameworks, or implementing event-driven architectures. Covers Kafka, Flink, and streaming patterns. allowed-tools: Read, Glob, Grep

Stream Processing

Patterns and technologies for real-time data processing, event streaming, and stream analytics.

When to Use This Skill

  • Designing real-time data pipelines
  • Choosing stream processing frameworks
  • Implementing event-driven architectures
  • Building real-time analytics
  • Understanding streaming vs batch trade-offs

Batch vs Streaming

Comparison

AspectBatchStreaming
LatencyMinutes to hoursMilliseconds to seconds
DataBounded (finite)Unbounded (infinite)
ProcessingProcess all at onceProcess as it arrives
StateRecompute each runMaintain continuously
ComplexityLowerHigher
CostOften lowerOften higher

When to Use Streaming

Use streaming when:
- Real-time responses required (<1 minute)
- Events need immediate action (fraud, alerts)
- Data arrives continuously
- Users expect live updates
- Time-sensitive business decisions

Use batch when:
- Daily/hourly reports sufficient
- Complex transformations needed
- Cost optimization priority
- Historical analysis
- One-time processing

Stream Processing Concepts

Event Time vs Processing Time

Event Time: When event actually occurred
Processing Time: When event is processed

Example:
┌─────────────────────────────────────────────────────────┐
│ Event: Purchase at 10:00:00 (event time)                │
│ Network delay: 5 seconds                                │
│ Processing: 10:00:05 (processing time)                  │
└─────────────────────────────────────────────────────────┘

Why it matters:
- Late events need handling
- Ordering not guaranteed
- Watermarks track progress

Watermarks

Watermark = "All events before this time have arrived"

Event stream:
──[10:01]──[10:02]──[10:00]──[10:03]──[Watermark: 10:00]──

Allows system to:
- Know when window is complete
- Handle late events
- Balance latency vs completeness

Windows

Tumbling Window (fixed, non-overlapping):
|─────|─────|─────|
0     5    10    15 (seconds)

Sliding Window (fixed, overlapping):
|─────|
  |─────|
    |─────|
Size: 5s, Slide: 2s

Session Window (activity-based):
|──────|     |───────────|    |───|
User activity with gaps defines windows

Count Window:
Process every N events

State Management

Stateful operations require maintained state:
- Aggregations (sum, count, avg)
- Joins between streams
- Pattern detection
- Deduplication

State backends:
- In-memory (fast, limited)
- RocksDB (larger, persistent)
- External (Redis, database)

Stream Processing Frameworks

Apache Kafka Streams

Characteristics:
- Library (not a cluster)
- Exactly-once semantics
- Kafka-native
- Java/Scala

Best for:
- Kafka-centric architectures
- Simpler transformations
- Microservices

Example topology:
source → filter → map → aggregate → sink

Apache Flink

Characteristics:
- Distributed cluster
- True streaming (not micro-batch)
- Advanced state management
- SQL support

Best for:
- Complex event processing
- Large-scale streaming
- Low-latency requirements

Example:
DataStream<Event> events = env.addSource(kafkaSource);
events
    .keyBy(e -> e.getUserId())
    .window(TumblingEventTimeWindows.of(Time.minutes(5)))
    .aggregate(new CountAggregator())
    .addSink(sink);

Apache Spark Streaming

Characteristics:
- Micro-batch processing
- Unified batch + streaming API
- Wide ecosystem
- Python, Scala, Java, R

Best for:
- Teams with Spark experience
- Batch + streaming unified
- Machine learning integration

Latency: Seconds (micro-batch)

Kafka Streams vs Flink vs Spark

FactorKafka StreamsFlinkSpark Streaming
DeploymentLibraryClusterCluster
LatencyLowLowestMedium
StateGoodExcellentGood
Exactly-onceYesYesYes
ComplexityLowHighMedium
ScalingWith KafkaIndependentIndependent
SQLLimitedYesYes
ML integrationLimitedLimitedExcellent

Stream Processing Patterns

Filtering

Input: All events
Output: Events matching criteria

Example: Only process events where amount > 1000

Mapping/Transformation

Input: Event type A
Output: Event type B

Example: Enrich order events with customer data

Aggregation

Input: Multiple events
Output: Single aggregated result

Examples:
- Count events per window
- Sum amounts per user
- Average latency per endpoint

Join Patterns

Stream-Stream Join:
┌─────────────┐     ┌─────────────┐
│   Orders    │ ──► │    Join     │
└─────────────┘     │ (by order_id│
┌─────────────┐     │  in window) │
│  Shipments  │ ──► │             │
└─────────────┘     └─────────────┘

Stream-Table Join (Enrichment):
┌─────────────┐     ┌─────────────┐
│   Events    │ ──► │    Join     │
└─────────────┘     │ (lookup by  │
┌─────────────┐     │  customer)  │
│  Customer   │ ──► │             │
│   Table     │     └─────────────┘
└─────────────┘

Deduplication

Problem: Duplicate events from at-least-once delivery

Solution:
1. Track seen IDs in state (with TTL)
2. If seen, drop
3. If new, process and store ID

State: {event_id: timestamp}
TTL: Based on expected duplicate window

Event Delivery Guarantees

At-Most-Once

May lose events, never duplicates
Process → Commit → (if fail, event lost)

Use when: Loss acceptable, simplicity preferred

At-Least-Once

Never loses, may have duplicates
Commit → Process → (if fail, reprocess)

Use when: No loss acceptable, handle duplicates downstream

Exactly-Once

Never loses, never duplicates
Requires:
- Idempotent operations, OR
- Transactional processing

How it works:
1. Read from source transactionally
2. Process and update state
3. Write output and commit together

Flink: Checkpointing + two-phase commit
Kafka Streams: Transactional producer + EOS

Late Event Handling

Strategies

1. Drop late events
   Simple, may lose data

2. Allow late events (allowed lateness)
   Process if within lateness threshold

3. Side output late events
   Main stream processes on-time
   Side stream handles late separately

4. Reprocess historical
   Batch job fixes late data impact

Watermark Strategies

Bounded Out-of-Orderness:
watermark = max_event_time - max_lateness

Example:
max_event_time = 10:00:00
max_lateness = 5 seconds
watermark = 09:59:55

Events before 09:59:55 considered complete

Scalability Patterns

Partitioning

Partition by key for parallel processing:

┌─────────────────────────────────────────────────────┐
│ Kafka Topic (3 partitions)                          │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────┐│
│ │ Partition 0 │ │ Partition 1 │ │ Partition 2     ││
│ │ user_a, b   │ │ user_c, d   │ │ user_e, f       ││
│ └─────────────┘ └─────────────┘ └─────────────────┘│
└─────────────────────────────────────────────────────┘
         │               │               │
         ▼               ▼               ▼
    ┌─────────┐    ┌─────────┐    ┌─────────┐
    │Worker 0 │    │Worker 1 │    │Worker 2 │
    └─────────┘    └─────────┘    └─────────┘

Backpressure

When downstream can't keep up:

1. Buffer (risk: OOM)
2. Drop (risk: data loss)
3. Backpressure (slow down source)

Flink: Backpressure propagates automatically
Kafka: Consumer lag indicates backpressure

Monitoring Streaming Applications

Key Metrics

Throughput:
- Events per second
- Bytes per second

Latency:
- Processing latency
- End-to-end latency

Health:
- Consumer lag
- Checkpoint duration
- Backpressure rate
- Error rate

Consumer Lag

Lag = Latest offset - Consumer offset

High lag indicates:
- Processing too slow
- Need more parallelism
- Downstream bottleneck

Monitor: Set alerting thresholds

Best Practices

1. Design for exactly-once when needed
2. Handle late events explicitly
3. Use event time, not processing time
4. Monitor consumer lag closely
5. Plan for state recovery
6. Test with realistic data volumes
7. Implement backpressure handling
8. Keep processing idempotent when possible

Related Skills

  • message-queues - Messaging patterns
  • data-architecture - Data platform design
  • etl-elt-patterns - Data pipeline patterns

Repository

melodic-software
melodic-software
Author
melodic-software/claude-code-plugins/plugins/systems-design/skills/stream-processing
3
Stars
0
Forks
Updated2d ago
Added6d ago