data-systems-architecture

Use when designing databases for data-heavy applications, making schema decisions for performance, choosing between normalization and denormalization, selecting storage/indexing strategies, planning for scale, or evaluating OLTP vs OLAP trade-offs. Also use when encountering N+1 queries, ORM issues, or concurrency problems.

$ インストール

git clone https://github.com/ratacat/claude-skills /tmp/claude-skills && cp -r /tmp/claude-skills/skills/data-systems-architecture ~/.claude/skills/claude-skills

// tip: Run this command in your terminal to install the skill


name: data-systems-architecture description: Use when designing databases for data-heavy applications, making schema decisions for performance, choosing between normalization and denormalization, selecting storage/indexing strategies, planning for scale, or evaluating OLTP vs OLAP trade-offs. Also use when encountering N+1 queries, ORM issues, or concurrency problems.

Data Systems Architecture

Overview

Core principle: Good data system architecture balances reliability (correct operation under faults), scalability (handling growth gracefully), and maintainability (enabling productive change over time). Every architectural decision involves trade-offs between these concerns.

This skill synthesizes knowledge from three foundational texts:

  • Designing Data-Intensive Applications (Kleppmann) - distributed systems, storage engines, scaling
  • The Art of PostgreSQL (Fontaine) - PostgreSQL-specific patterns, SQL as programming
  • PostgreSQL Query Optimization (Dombrovskaya et al.) - execution plans, performance tuning

When to Use

SymptomStart With
Designing a new database/schema01-foundational-principles.md
Normalization vs denormalization decisions02-data-modeling.md
Need to understand OLTP vs OLAP03-storage-engines.md
Slow queries, index selection04-indexing.md
Planning for growth, read replicas05-scaling-patterns.md
Race conditions, deadlocks, isolation issues06-transactions-concurrency.md
N+1 queries, ORM problems, application integration07-application-integration.md

Navigation

Reference Files (Load as needed)

01-foundational-principles.md    - Reliability/Scalability/Maintainability, load parameters
02-data-modeling.md              - Normalization, denormalization, schema design patterns
03-storage-engines.md            - B-trees, LSM-trees, OLTP vs OLAP, PostgreSQL internals
04-indexing.md                   - Index types, compound indexes, covering indexes, maintenance
05-scaling-patterns.md           - Replication, partitioning, sharding strategies
06-transactions-concurrency.md   - ACID, isolation levels, MVCC, locking patterns
07-application-integration.md    - ORM pitfalls, N+1, business logic placement, batch processing

Quick Decision Framework

New system design?
├─ Yes → Read 01, then 02 for data model
└─ No → What's the problem?
         ├─ "Queries are slow" → Read 04 (indexing) + 03 (storage patterns)
         ├─ "Data is inconsistent" → Read 02 (modeling) + 06 (transactions)
         ├─ "Can't handle the load" → Read 05 (scaling) + 03 (OLTP vs OLAP)
         ├─ "App makes too many queries" → Read 07 (N+1, ORM patterns)
         └─ "Race conditions/deadlocks" → Read 06 (concurrency)

Core Concepts (Quick Reference)

The Three Pillars

ConcernDefinitionKey Question
ReliabilitySystem works correctly under faultsWhat happens when things fail?
ScalabilityHandles growth gracefullyWhat's 10x load look like?
MaintainabilityEasy to operate and evolveCan new engineers understand this?

Data Model Selection

ModelBest ForAvoid When
RelationalMany-to-many relationships, joins, consistencyHighly hierarchical data, constant schema changes
DocumentSelf-contained docs, tree structuresNeed for joins, many-to-many
GraphHighly connected data, recursive queriesSimple CRUD, no relationship traversal

OLTP vs OLAP

AspectOLTPOLAP
Query patternPoint lookups, few rowsAggregates, many rows
OptimizationIndex everything used in WHEREFewer indexes, full scans OK
StorageRow-orientedConsider column-oriented

Index Type Quick Reference

TypeUse CasePostgreSQL
B-treeEquality, range, sortingDefault, most queries
HashEquality onlyFaster for exact match
GINArrays, JSONB, full-text@>, @@ operators
GiSTGeometric, range typesPostGIS, nearest-neighbor
BRINLarge, naturally ordered tablesTime-series data

Isolation Levels

LevelPreventsPostgreSQL Default?
Read CommittedDirty readsYes
Repeatable Read+ Non-repeatable readsNo
SerializableAll anomaliesNo (uses SSI)

Design Checklist

Before finalizing a data architecture:

  • Identified load parameters (read/write ratio, data volume, latency requirements)
  • Chose appropriate data model (relational/document/graph hybrid?)
  • Normalized to 3NF first, denormalized only with measured justification
  • Designed indexes for actual query patterns (not hypothetical)
  • Considered 10x growth scenario
  • Established isolation level requirements
  • Defined where business logic lives (app vs DB vs both)
  • Planned for operations (backups, monitoring, migrations)

References

  • Kleppmann, M. Designing Data-Intensive Applications (O'Reilly, 2017)
  • Fontaine, D. The Art of PostgreSQL (2nd ed., 2020)
  • Dombrovskaya, H., Novikov, B., Bailliekova, A. PostgreSQL Query Optimization (Apress, 2021)