name: database-design type: knowledge description: Database schema design, migrations, query optimization, and ORM patterns. Use when designing database schemas, writing migrations, optimizing queries, or working with ORMs like SQLAlchemy or Django ORM. keywords: database, schema, migration, query, sql, orm, sqlalchemy, django orm, postgres, mysql, index, transaction auto_activate: true

Database Design Skill

Database schema design, migration strategies, query optimization, and ORM best practices.

When This Skill Activates

Designing database schemas
Writing database migrations
Optimizing slow queries
Working with ORMs (SQLAlchemy, Django ORM)
Setting up database indexes
Handling transactions
Keywords: "database", "schema", "migration", "query", "sql", "orm"

Schema Design Best Practices

Normalization vs Denormalization

Normalization (Eliminate redundancy):

✅ Use for: Transactional systems (OLTP)
✅ Benefits: Data integrity, no update anomalies
❌ Drawback: More JOINs, slower reads

Denormalization (Add redundancy):

✅ Use for: Analytical systems (OLAP), read-heavy apps
✅ Benefits: Faster reads, fewer JOINs
❌ Drawback: Harder to maintain consistency

Normal Forms Quick Reference:

Normal Form	Rule	Example
1NF	Atomic values, no repeating groups	No CSV in columns
2NF	1NF + no partial dependencies	All non-key columns depend on full primary key
3NF	2NF + no transitive dependencies	No non-key depends on another non-key

Practical Approach:

-- ✅ GOOD: 3NF design (normalized)
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    email VARCHAR(255) UNIQUE NOT NULL,
    created_at TIMESTAMP DEFAULT NOW()
);

CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    user_id INTEGER REFERENCES users(id),
    total_amount DECIMAL(10, 2),
    created_at TIMESTAMP DEFAULT NOW()
);

-- ❌ BAD: Denormalized without reason
CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    user_id INTEGER,
    user_email VARCHAR(255),  -- Redundant! Violates 3NF
    total_amount DECIMAL(10, 2)
);

When to Denormalize:

-- ✅ GOOD: Denormalize for performance (read-heavy)
CREATE TABLE order_summary (
    id SERIAL PRIMARY KEY,
    user_id INTEGER REFERENCES users(id),
    user_email VARCHAR(255),      -- Denormalized for fast reads
    total_amount DECIMAL(10, 2),
    order_count INTEGER,          -- Denormalized aggregate
    last_order_date TIMESTAMP
);
-- Use for dashboards, reporting, analytics

Data Types

Choose Appropriate Types:

-- ✅ GOOD: Specific types
email VARCHAR(255)           -- Fixed max length
price DECIMAL(10, 2)         -- Exact precision for money
created_at TIMESTAMP         -- Date + time
is_active BOOLEAN            -- True/false
metadata JSONB               -- Structured data (PostgreSQL)

-- ❌ BAD: Vague types
email TEXT                   -- Unbounded
price FLOAT                  -- Precision errors with money!
created_at VARCHAR(50)       -- String instead of timestamp
is_active VARCHAR(5)         -- "true" vs true

Money Handling:

-- ✅ CORRECT: DECIMAL for money
price DECIMAL(10, 2)  -- Up to 99,999,999.99

-- ❌ WRONG: FLOAT for money
price FLOAT           -- Precision errors: 0.1 + 0.2 ≠ 0.3

Primary Keys

Auto-Incrementing Integer (Most Common):

CREATE TABLE users (
    id SERIAL PRIMARY KEY,  -- PostgreSQL
    -- id INT AUTO_INCREMENT PRIMARY KEY  -- MySQL
    email VARCHAR(255)
);

UUID (Distributed Systems):

CREATE TABLE users (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    email VARCHAR(255)
);

Comparison:

Approach	Pros	Cons	Use When
SERIAL/INT	Simple, small, ordered	Not globally unique	Single database
UUID	Globally unique, no conflicts	Larger, unordered	Distributed systems, merging DBs

Foreign Keys & Relationships

One-to-Many:

CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    email VARCHAR(255)
);

CREATE TABLE posts (
    id SERIAL PRIMARY KEY,
    user_id INTEGER NOT NULL REFERENCES users(id) ON DELETE CASCADE,
    title VARCHAR(255),
    content TEXT
);

Many-to-Many (Junction Table):

CREATE TABLE students (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255)
);

CREATE TABLE courses (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255)
);

CREATE TABLE enrollments (
    student_id INTEGER REFERENCES students(id) ON DELETE CASCADE,
    course_id INTEGER REFERENCES courses(id) ON DELETE CASCADE,
    enrolled_at TIMESTAMP DEFAULT NOW(),
    PRIMARY KEY (student_id, course_id)
);

ON DELETE Options:

CASCADE - Delete related records (use carefully!)
SET NULL - Set foreign key to NULL
RESTRICT - Prevent deletion (default)

Best Practice:

-- ✅ GOOD: Explicit CASCADE when you want it
user_id INTEGER REFERENCES users(id) ON DELETE CASCADE

-- ✅ GOOD: RESTRICT when you want protection
user_id INTEGER REFERENCES users(id) ON DELETE RESTRICT

-- ❌ BAD: No constraint (orphaned records)
user_id INTEGER  -- No REFERENCES!

Indexing

When to Add Indexes

Always Index:

Primary keys (automatic)
Foreign keys (manual in most DBs)
Frequently queried columns (WHERE, ORDER BY, GROUP BY)
Unique constraints (UNIQUE)

Example:

CREATE TABLE posts (
    id SERIAL PRIMARY KEY,                    -- Indexed automatically
    user_id INTEGER REFERENCES users(id),     -- Should index!
    title VARCHAR(255),
    created_at TIMESTAMP,
    status VARCHAR(20)
);

-- Add indexes for common queries
CREATE INDEX idx_posts_user_id ON posts(user_id);
CREATE INDEX idx_posts_status ON posts(status);
CREATE INDEX idx_posts_created_at ON posts(created_at);

Index Types

B-Tree (Default):

-- For: =, <, >, <=, >=, BETWEEN, ORDER BY
CREATE INDEX idx_created_at ON posts(created_at);

Hash (PostgreSQL):

-- For: = only (faster than B-tree for equality)
CREATE INDEX idx_email ON users USING HASH (email);

GIN (Generalized Inverted Index - PostgreSQL):

-- For: Full-text search, JSONB, arrays
CREATE INDEX idx_tags ON posts USING GIN (tags);  -- Array column
CREATE INDEX idx_metadata ON posts USING GIN (metadata);  -- JSONB

Partial Index:

-- Index only active users (saves space)
CREATE INDEX idx_active_users ON users(email) WHERE is_active = true;

Composite Index:

-- For queries filtering by BOTH columns
CREATE INDEX idx_user_status ON posts(user_id, status);

-- ✅ Uses index
SELECT * FROM posts WHERE user_id = 1 AND status = 'published';

-- ✅ Uses index (leftmost prefix)
SELECT * FROM posts WHERE user_id = 1;

-- ❌ Does NOT use index (missing leftmost column)
SELECT * FROM posts WHERE status = 'published';

Index Tradeoffs

Pros:

⚡ Faster reads (queries)
⚡ Faster JOINs

Cons:

🐢 Slower writes (INSERT, UPDATE, DELETE)
💾 More disk space
🔧 More maintenance

Rule of Thumb: Index columns in WHERE, JOIN, ORDER BY - but don't over-index.

Query Optimization

EXPLAIN ANALYZE

Always profile slow queries:

EXPLAIN ANALYZE
SELECT posts.*, users.email
FROM posts
JOIN users ON posts.user_id = users.id
WHERE posts.status = 'published'
ORDER BY posts.created_at DESC
LIMIT 10;

What to Look For:

❌ Seq Scan (table scan) - Add index!
✅ Index Scan - Using index
❌ High cost - Optimize query
❌ Nested Loop on large tables - Check JOIN

N+1 Query Problem

❌ BAD: N+1 queries (slow):

# SQLAlchemy
users = session.query(User).all()
for user in users:
    print(user.posts)  # Triggers N queries!
# Total: 1 (users) + N (posts per user) queries

✅ GOOD: Eager loading (fast):

# SQLAlchemy
users = session.query(User).options(joinedload(User.posts)).all()
for user in users:
    print(user.posts)  # No extra query!
# Total: 1 query with JOIN

Django ORM:

# ❌ BAD: N+1
users = User.objects.all()
for user in users:
    print(user.posts.all())  # N queries

# ✅ GOOD: prefetch_related
users = User.objects.prefetch_related('posts').all()
for user in users:
    print(user.posts.all())  # 2 queries total

Common Optimizations

Use LIMIT:

-- ❌ BAD: Load everything
SELECT * FROM posts ORDER BY created_at DESC;

-- ✅ GOOD: Load only what you need
SELECT * FROM posts ORDER BY created_at DESC LIMIT 10;

Avoid SELECT *:

-- ❌ BAD: Load all columns
SELECT * FROM users WHERE id = 1;

-- ✅ GOOD: Load only needed columns
SELECT id, email FROM users WHERE id = 1;

Use EXISTS instead of COUNT:

-- ❌ SLOW: Counts all matching rows
SELECT COUNT(*) FROM posts WHERE user_id = 1;
IF count > 0 THEN ...

-- ✅ FAST: Stops at first match
SELECT EXISTS(SELECT 1 FROM posts WHERE user_id = 1 LIMIT 1);

Migrations

Migration Best Practices

1. Make Migrations Reversible:

# ✅ GOOD: Can rollback
def upgrade():
    op.add_column('users', sa.Column('phone', sa.String(20)))

def downgrade():
    op.drop_column('users', 'phone')

2. Avoid Locking (Large Tables):

-- ❌ BAD: Locks table (blocks reads/writes)
ALTER TABLE users ADD COLUMN phone VARCHAR(20) NOT NULL DEFAULT '';

-- ✅ GOOD: Add nullable first, backfill, then add constraint
ALTER TABLE users ADD COLUMN phone VARCHAR(20);  -- Step 1: No lock
-- Step 2: Backfill in batches (application code)
UPDATE users SET phone = '' WHERE phone IS NULL;
-- Step 3: Add constraint
ALTER TABLE users ALTER COLUMN phone SET NOT NULL;

3. Test Migrations:

# Apply migration
alembic upgrade head

# Rollback
alembic downgrade -1

# Re-apply
alembic upgrade head

4. Never Edit Merged Migrations:

Always create a new migration to fix issues
Old migrations are historical record

Migration Tools

Python:

Alembic (SQLAlchemy)
Django Migrations (Django ORM)

Node.js:

Knex.js
TypeORM

Ruby:

ActiveRecord Migrations (Rails)

ORM Patterns

SQLAlchemy (Python)

Define Models:

from sqlalchemy import Column, Integer, String, ForeignKey, DECIMAL
from sqlalchemy.orm import relationship
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()

class User(Base):
    __tablename__ = 'users'

    id = Column(Integer, primary_key=True)
    email = Column(String(255), unique=True, nullable=False)
    posts = relationship('Post', back_populates='user')

class Post(Base):
    __tablename__ = 'posts'

    id = Column(Integer, primary_key=True)
    user_id = Column(Integer, ForeignKey('users.id'), nullable=False)
    title = Column(String(255))
    user = relationship('User', back_populates='posts')

Query:

# Create
user = User(email="test@example.com")
session.add(user)
session.commit()

# Read
user = session.query(User).filter_by(email="test@example.com").first()
users = session.query(User).filter(User.email.like('%@example.com')).all()

# Update
user.email = "new@example.com"
session.commit()

# Delete
session.delete(user)
session.commit()

Eager Loading (Avoid N+1):

from sqlalchemy.orm import joinedload

# ✅ GOOD: Load posts in one query
users = session.query(User).options(joinedload(User.posts)).all()

Django ORM

Define Models:

from django.db import models

class User(models.Model):
    email = models.EmailField(unique=True)
    created_at = models.DateTimeField(auto_now_add=True)

class Post(models.Model):
    user = models.ForeignKey(User, on_delete=models.CASCADE, related_name='posts')
    title = models.CharField(max_length=255)
    content = models.TextField()

Query:

# Create
user = User.objects.create(email="test@example.com")

# Read
user = User.objects.get(email="test@example.com")
users = User.objects.filter(email__endswith="@example.com")

# Update
user.email = "new@example.com"
user.save()

# Delete
user.delete()

Eager Loading:

# ✅ GOOD: Avoid N+1
users = User.objects.prefetch_related('posts').all()

Transactions

ACID Properties

Atomicity: All or nothing
Consistency: Valid state always
Isolation: Concurrent transactions don't interfere
Durability: Committed data persists

Using Transactions

SQLAlchemy:

from sqlalchemy.orm import Session

with Session(engine) as session:
    try:
        user = User(email="test@example.com")
        session.add(user)

        post = Post(user_id=user.id, title="Test")
        session.add(post)

        session.commit()  # ✅ Both saved
    except Exception as e:
        session.rollback()  # ❌ Neither saved
        raise

Django:

from django.db import transaction

with transaction.atomic():
    user = User.objects.create(email="test@example.com")
    Post.objects.create(user=user, title="Test")
    # ✅ Both saved or ❌ neither saved

Raw SQL:

BEGIN;
    INSERT INTO users (email) VALUES ('test@example.com');
    INSERT INTO posts (user_id, title) VALUES (1, 'Test');
COMMIT;
-- Or ROLLBACK; to cancel

Connection Pooling

Why Pool?

Creating connections is expensive
Reuse existing connections
Limit concurrent connections

SQLAlchemy Pooling

from sqlalchemy import create_engine

engine = create_engine(
    'postgresql://user:pass@localhost/db',
    pool_size=10,          # Max 10 connections
    max_overflow=20,       # Allow 20 temporary connections
    pool_timeout=30,       # Wait 30s for connection
    pool_recycle=3600      # Recycle after 1 hour
)

Django Pooling

# settings.py
DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.postgresql',
        'NAME': 'mydb',
        'CONN_MAX_AGE': 600,  # Persist connections for 10 min
    }
}

Common Patterns

Soft Deletes

Instead of deleting, mark as deleted:

CREATE TABLE posts (
    id SERIAL PRIMARY KEY,
    title VARCHAR(255),
    deleted_at TIMESTAMP NULL  -- NULL = not deleted
);

-- ❌ Hard delete
DELETE FROM posts WHERE id = 1;

-- ✅ Soft delete
UPDATE posts SET deleted_at = NOW() WHERE id = 1;

-- Query non-deleted
SELECT * FROM posts WHERE deleted_at IS NULL;

ORM (SQLAlchemy):

class Post(Base):
    __tablename__ = 'posts'

    id = Column(Integer, primary_key=True)
    deleted_at = Column(DateTime, nullable=True)

# Soft delete
post.deleted_at = datetime.now()
session.commit()

# Query non-deleted
active_posts = session.query(Post).filter(Post.deleted_at == None).all()

Timestamps (created_at, updated_at)

Always track when records are created/modified:

CREATE TABLE posts (
    id SERIAL PRIMARY KEY,
    title VARCHAR(255),
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW()
);

-- Trigger to auto-update updated_at (PostgreSQL)
CREATE OR REPLACE FUNCTION update_updated_at_column()
RETURNS TRIGGER AS $$
BEGIN
    NEW.updated_at = NOW();
    RETURN NEW;
END;
$$ language 'plpgsql';

CREATE TRIGGER update_posts_updated_at BEFORE UPDATE ON posts
FOR EACH ROW EXECUTE FUNCTION update_updated_at_column();