name: schema-design description: Design database-specific schemas for relational, document, graph, and time-series databases. allowed-tools: Read, Write, Glob, Grep, Task

Schema Design

When to Use This Skill

Use this skill when:

Schema Design tasks - Working on design database-specific schemas for relational, document, graph, and time-series databases
Planning or design - Need guidance on Schema Design approaches
Best practices - Want to follow established patterns and standards

Overview

Schema design translates logical data models into physical database structures optimized for specific platforms. Different database types require different design approaches.

Database Type Selection

Selection Matrix

Requirement	Relational	Document	Graph	Time-Series	Wide-Column
ACID transactions	✓✓✓	✓	✓	✓	✓
Complex queries	✓✓✓	✓	✓✓	✓	✓
Schema flexibility	✓	✓✓✓	✓✓	✓	✓✓
Relationship traversal	✓	✓	✓✓✓	-	-
Time-based analytics	✓	✓	-	✓✓✓	✓✓
Horizontal scaling	✓	✓✓✓	✓✓	✓✓	✓✓✓

Relational Database Design (SQL Server/PostgreSQL)

Table Design Template

-- SQL Server table with best practices (PascalCase naming - Microsoft convention)
CREATE TABLE dbo.Orders (
    -- Primary Key (clustered) - GUID V7 for sortable IDs
    OrderId UNIQUEIDENTIFIER NOT NULL DEFAULT NEWSEQUENTIALID(),

    -- Foreign Keys
    CustomerId UNIQUEIDENTIFIER NOT NULL,

    -- Business Columns
    OrderNumber VARCHAR(20) NOT NULL,
    OrderDate DATE NOT NULL DEFAULT CAST(GETUTCDATE() AS DATE),
    Status VARCHAR(20) NOT NULL DEFAULT 'Pending',
    Subtotal DECIMAL(18,2) NOT NULL,
    TaxAmount DECIMAL(18,2) NOT NULL DEFAULT 0,
    TotalAmount AS (Subtotal + TaxAmount) PERSISTED,

    -- Audit Columns
    CreatedAt DATETIME2(3) NOT NULL DEFAULT SYSUTCDATETIME(),
    CreatedBy VARCHAR(100) NOT NULL,
    UpdatedAt DATETIME2(3),
    UpdatedBy VARCHAR(100),
    RowVersion ROWVERSION NOT NULL,

    -- Constraints
    CONSTRAINT PK_Orders PRIMARY KEY CLUSTERED (OrderId),
    CONSTRAINT FK_Orders_Customer
        FOREIGN KEY (CustomerId)
        REFERENCES dbo.Customers(CustomerId),
    CONSTRAINT UQ_Orders_OrderNumber UNIQUE (OrderNumber),
    CONSTRAINT CK_Orders_Status
        CHECK (Status IN ('Pending', 'Confirmed', 'Shipped', 'Delivered', 'Cancelled'))
);

-- Non-clustered indexes
CREATE NONCLUSTERED INDEX IX_Orders_CustomerDate
ON dbo.Orders (CustomerId, OrderDate DESC)
INCLUDE (Status, TotalAmount);

CREATE NONCLUSTERED INDEX IX_Orders_Status
ON dbo.Orders (Status)
WHERE Status NOT IN ('Delivered', 'Cancelled');  -- Filtered index

EF Core Configuration

// EF Core with SQL Server uses PascalCase by default - no column mapping needed!
public class OrderConfiguration : IEntityTypeConfiguration<Order>
{
    public void Configure(EntityTypeBuilder<Order> builder)
    {
        builder.ToTable("Orders", "dbo");

        // Primary Key - EF Core maps Id → OrderId automatically with conventions
        builder.HasKey(o => o.Id);
        builder.Property(o => o.Id)
            .HasColumnName("OrderId")  // Explicit for clarity
            .HasDefaultValueSql("NEWSEQUENTIALID()");

        // Properties - PascalCase C# → PascalCase SQL Server (natural mapping)
        builder.Property(o => o.OrderNumber)
            .HasMaxLength(20)
            .IsRequired();

        builder.Property(o => o.Subtotal)
            .HasPrecision(18, 2);

        builder.Property(o => o.TotalAmount)
            .HasComputedColumnSql("[Subtotal] + [TaxAmount]", stored: true);

        // Concurrency
        builder.Property(o => o.RowVersion)
            .IsRowVersion();

        // Relationships
        builder.HasOne(o => o.Customer)
            .WithMany(c => c.Orders)
            .HasForeignKey(o => o.CustomerId)
            .OnDelete(DeleteBehavior.Restrict);

        builder.HasMany(o => o.LineItems)
            .WithOne()
            .HasForeignKey(li => li.OrderId)
            .OnDelete(DeleteBehavior.Cascade);

        // Indexes
        builder.HasIndex(o => o.OrderNumber)
            .IsUnique();

        builder.HasIndex(o => new { o.CustomerId, o.OrderDate })
            .HasDatabaseName("IX_Orders_CustomerDate");
    }
}

Document Database Design (MongoDB/Cosmos DB)

Document Model Strategies

Pattern	Description	Use When
Embedded	Nest related data	1:few, always accessed together
Referenced	Store IDs, join at app level	1:many, independent access
Hybrid	Denormalize frequently used	Read optimization
Bucket	Group by time/category	Time-series, IoT

MongoDB Schema Design

// Customer document with embedded addresses
{
  "_id": ObjectId("..."),
  "customer_id": "CUST-12345",
  "profile": {
    "name": "John Smith",
    "email": "john@example.com",
    "phone": "+1-555-0123"
  },
  "addresses": [
    {
      "type": "billing",
      "street": "123 Main St",
      "city": "Seattle",
      "state": "WA",
      "postal_code": "98101",
      "country": "US",
      "is_default": true
    },
    {
      "type": "shipping",
      "street": "456 Oak Ave",
      "city": "Seattle",
      "state": "WA",
      "postal_code": "98102",
      "country": "US",
      "is_default": false
    }
  ],
  "preferences": {
    "marketing_opt_in": true,
    "preferred_channel": "email"
  },
  "metadata": {
    "created_at": ISODate("2024-01-15T10:30:00Z"),
    "updated_at": ISODate("2024-12-15T14:20:00Z"),
    "version": 3
  }
}

// Order document with referenced customer
{
  "_id": ObjectId("..."),
  "order_number": "ORD-2024-00123",
  "customer_id": "CUST-12345",  // Reference
  "customer_snapshot": {  // Denormalized for read
    "name": "John Smith",
    "email": "john@example.com"
  },
  "status": "confirmed",
  "line_items": [  // Embedded
    {
      "product_id": "PROD-001",
      "sku": "SKU-123",
      "name": "Widget Pro",
      "quantity": 2,
      "unit_price": 29.99,
      "extended_price": 59.98
    }
  ],
  "totals": {
    "subtotal": 59.98,
    "tax": 5.40,
    "shipping": 10.00,
    "total": 75.38
  },
  "dates": {
    "ordered_at": ISODate("2024-12-20T09:00:00Z"),
    "shipped_at": null,
    "delivered_at": null
  }
}

Cosmos DB Partition Strategy

// Partition key selection for multi-tenant orders
public class OrderDocument
{
    [JsonProperty("id")]
    public string Id { get; set; } = Guid.NewGuid().ToString();

    [JsonProperty("pk")]  // Partition key
    public string PartitionKey => $"{TenantId}|{CustomerId}";

    [JsonProperty("tenantId")]
    public string TenantId { get; set; }

    [JsonProperty("customerId")]
    public string CustomerId { get; set; }

    [JsonProperty("orderNumber")]
    public string OrderNumber { get; set; }

    [JsonProperty("type")]  // For heterogeneous containers
    public string DocumentType => "Order";

    [JsonProperty("lineItems")]
    public List<LineItemDocument> LineItems { get; set; } = [];

    [JsonProperty("_ts")]  // Auto-populated
    public long Timestamp { get; set; }
}

Graph Database Design (Neo4j)

Node and Relationship Design

// Node types
(:Customer {
  customerId: 'CUST-12345',
  name: 'John Smith',
  email: 'john@example.com',
  createdAt: datetime()
})

(:Product {
  productId: 'PROD-001',
  name: 'Widget Pro',
  category: 'Electronics',
  price: 29.99
})

(:Order {
  orderId: 'ORD-2024-00123',
  orderDate: date(),
  status: 'confirmed',
  totalAmount: 75.38
})

// Relationships
(:Customer)-[:PLACED {orderedAt: datetime()}]->(:Order)
(:Order)-[:CONTAINS {quantity: 2, unitPrice: 29.99}]->(:Product)
(:Customer)-[:VIEWED {viewedAt: datetime()}]->(:Product)
(:Product)-[:SIMILAR_TO {score: 0.85}]->(:Product)

Graph Query Patterns

// Recommendation query: Customers who bought X also bought
MATCH (c:Customer)-[:PLACED]->(:Order)-[:CONTAINS]->(p:Product {productId: 'PROD-001'})
MATCH (c)-[:PLACED]->(:Order)-[:CONTAINS]->(other:Product)
WHERE other.productId <> 'PROD-001'
RETURN other.name, COUNT(*) as frequency
ORDER BY frequency DESC
LIMIT 5;

// Path finding: Supply chain
MATCH path = (supplier:Supplier)-[:SUPPLIES*1..5]->(product:Product)
WHERE product.productId = 'PROD-001'
RETURN path;

Time-Series Database Design (InfluxDB/TimescaleDB)

TimescaleDB Schema

-- Create hypertable for time-series data
CREATE TABLE sensor_readings (
    time TIMESTAMPTZ NOT NULL,
    device_id VARCHAR(50) NOT NULL,
    location VARCHAR(100),
    temperature DOUBLE PRECISION,
    humidity DOUBLE PRECISION,
    pressure DOUBLE PRECISION
);

-- Convert to hypertable (partitioned by time)
SELECT create_hypertable('sensor_readings', 'time');

-- Add compression policy
ALTER TABLE sensor_readings
SET (
    timescaledb.compress,
    timescaledb.compress_segmentby = 'device_id'
);

SELECT add_compression_policy('sensor_readings', INTERVAL '7 days');

-- Create continuous aggregate for hourly stats
CREATE MATERIALIZED VIEW sensor_hourly
WITH (timescaledb.continuous) AS
SELECT
    time_bucket('1 hour', time) AS bucket,
    device_id,
    AVG(temperature) AS avg_temp,
    MAX(temperature) AS max_temp,
    MIN(temperature) AS min_temp,
    AVG(humidity) AS avg_humidity
FROM sensor_readings
GROUP BY time_bucket('1 hour', time), device_id;

Indexing Strategies

Index Types by Database

Database	Index Types
SQL Server	Clustered, Non-clustered, Columnstore, Filtered, Covering
PostgreSQL	B-tree, Hash, GiST, GIN, BRIN
MongoDB	Single field, Compound, Multi-key, Text, Geospatial
Cosmos DB	Range, Spatial, Composite (included by default)

Index Selection Guide

# Index Planning Checklist

## Query Analysis
- [ ] Identify most frequent queries
- [ ] Analyze query patterns (point lookup, range, full scan)
- [ ] Review execution plans for table scans
- [ ] Identify JOIN columns

## Index Candidates
| Column(s) | Query Pattern | Index Type | Include Columns |
|-----------|---------------|------------|-----------------|
| CustomerId | Equality lookup | Non-clustered | Status, CreatedAt |
| OrderDate | Range scan | Non-clustered | TotalAmount |
| Status | Equality (active only) | Filtered | - |

## Index Maintenance
- [ ] Define rebuild/reorganize schedule
- [ ] Monitor fragmentation
- [ ] Track index usage statistics
- [ ] Remove unused indexes

Data Type Selection

Type Mapping

Logical Type	SQL Server	PostgreSQL	MongoDB	Cosmos DB
Identifier	UNIQUEIDENTIFIER	UUID	ObjectId/String	String
Money	DECIMAL(18,2)	NUMERIC(18,2)	Decimal128	Number
Date only	DATE	DATE	Date (midnight)	String (ISO)
Timestamp	DATETIME2(3)	TIMESTAMPTZ	Date	String (ISO)
Boolean	BIT	BOOLEAN	Boolean	Boolean
JSON	NVARCHAR(MAX)	JSONB	Object (native)	Object (native)

Performance Patterns

Denormalization Decisions

Scenario	Normalize	Denormalize
High write frequency	✓
High read frequency		✓
Data consistency critical	✓
Query latency critical		✓
Storage constraints	✓
Join complexity high		✓

Validation Checklist

Database type selected based on requirements
Primary keys defined for all tables
Foreign keys with appropriate delete behavior
Indexes planned for query patterns
Data types optimized for storage/performance
Partitioning strategy defined (if needed)
Audit columns included (CreatedAt, UpdatedAt for SQL Server; created_at, updated_at for PostgreSQL)
Concurrency control mechanism (RowVersion for SQL Server, _etag for Cosmos DB)
Naming conventions consistent (PascalCase for SQL Server, snake_case for PostgreSQL)

Integration Points

Inputs from:

er-modeling skill → Logical model
dimensional-modeling skill → Star/snowflake schema
data-vault-modeling skill → Hub/link/satellite

Outputs to:

migration-planning skill → DDL scripts
EF Core → Entity configurations
Database deployment → Schema scripts

schema-design

$ Instalar