name: sql-optimization description: SQL query optimization patterns including EXPLAIN plan analysis, index strategies, query rewriting, and N+1 query prevention. Use when optimizing slow database queries, analyzing query performance, designing indexes, or debugging database bottlenecks. Works with PostgreSQL, MySQL, SQLite, and other SQL databases. Typical improvements: 10x-1000x query speedup.

SQL Query Optimization Patterns

Comprehensive guide to optimizing SQL queries for performance, including EXPLAIN plan analysis, index design strategies, query rewriting patterns, and N+1 query detection. Works across PostgreSQL, MySQL, and SQLite.

Quick Reference

When to use this skill:

Slow database queries (>100ms for simple queries, >1s for complex)
High database CPU usage
Analyzing EXPLAIN plans
Designing database indexes
Debugging N+1 query problems
Optimizing JOIN operations
Reducing table scans

Common triggers:

"This query is too slow"
"How do I optimize this SQL"
"What indexes should I add"
"Explain this EXPLAIN plan"
"Fix N+1 queries"
"Database CPU at 100%"

Typical improvements:

3 seconds → 50ms (60x faster)
Full table scan → Index scan
1000 queries → 2 queries (N+1 elimination)

Part 1: Understanding EXPLAIN Plans

Reading PostgreSQL EXPLAIN

EXPLAIN ANALYZE
SELECT u.name, p.title
FROM users u
JOIN posts p ON u.id = p.user_id
WHERE u.created_at > '2024-01-01'
ORDER BY p.created_at DESC
LIMIT 10;

Key Metrics to Watch:

Seq Scan (bad): Full table scan, reads every row
Index Scan (good): Uses index, reads only needed rows
Cost: Estimated computational cost (lower is better)
Actual time: Real execution time in milliseconds
Rows: Number of rows processed at each step

Bad EXPLAIN Example

Seq Scan on users u  (cost=0.00..1234.00 rows=1000 width=50)
                     (actual time=0.123..45.678 rows=950 loops=1)
  Filter: (created_at > '2024-01-01'::date)
  Rows Removed by Filter: 50000
Planning Time: 0.234 ms
Execution Time: 3456.789 ms

Problems:

Seq Scan: Reading entire table (50,950 rows)
Rows Removed by Filter: Filtering after reading (wasteful)
Execution Time: 3.5 seconds (way too slow)

Good EXPLAIN Example (After Index)

Index Scan using users_created_at_idx on users u
  (cost=0.29..123.45 rows=950 width=50)
  (actual time=0.012..3.456 rows=950 loops=1)
  Index Cond: (created_at > '2024-01-01'::date)
Planning Time: 0.123 ms
Execution Time: 4.567 ms

Improvements:

Index Scan: Using index (only reads needed rows)
Index Cond: Filtering during index scan (efficient)
Execution Time: 4.5ms (750x faster!)

Part 2: Index Design Strategies

When to Add an Index

✅ Add index when:

Column frequently in WHERE clauses
Column frequently in JOIN conditions
Column frequently in ORDER BY
Query does Seq Scan on large table (>10K rows)
Query execution time >100ms

❌ Don't add index when:

Table has <1000 rows (Seq Scan is fast enough)
Column rarely queried
Column has very low cardinality (e.g., boolean with 50/50 distribution)
Table has heavy writes (indexes slow down INSERTs/UPDATEs)

Single-Column Indexes

-- Create index on frequently queried column
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_posts_user_id ON posts(user_id);
CREATE INDEX idx_orders_created_at ON orders(created_at);

Query that benefits:

-- Before index: Seq Scan (3000ms)
-- After index: Index Scan (5ms)
SELECT * FROM users WHERE email = 'user@example.com';

Composite Indexes (Multi-Column)

Order matters! Put most selective column first.

-- Good: Specific to general
CREATE INDEX idx_orders_user_status ON orders(user_id, status, created_at);

-- Query that uses all columns
SELECT * FROM orders
WHERE user_id = 123
  AND status = 'pending'
ORDER BY created_at DESC;

Index usage rules:

-- ✅ Uses index (matches from left)
WHERE user_id = 123
WHERE user_id = 123 AND status = 'pending'
WHERE user_id = 123 AND status = 'pending' AND created_at > '2024-01-01'

-- ❌ Doesn't use index (skips first column)
WHERE status = 'pending'
WHERE created_at > '2024-01-01'

Partial Indexes (Filtered)

Index only subset of rows (smaller, faster):

-- Only index active users
CREATE INDEX idx_users_active_email
ON users(email)
WHERE status = 'active';

-- Query that uses partial index
SELECT * FROM users
WHERE email = 'user@example.com'
  AND status = 'active';  -- Must include filter condition!

Benefits:

Smaller index (faster scans, less storage)
Only indexes rows you actually query
Great for status fields with skewed distribution

Covering Indexes (INCLUDE)

Include columns in index to avoid table access:

-- Postgres: INCLUDE clause
CREATE INDEX idx_users_email_covering
ON users(email)
INCLUDE (name, created_at);

-- Query doesn't need to access table (all data in index)
SELECT name, created_at FROM users WHERE email = 'user@example.com';

Benefit: Index-Only Scan (even faster than Index Scan)

Part 3: Query Rewriting Patterns

Problem 1: SELECT * (Fetching Unnecessary Data)

❌ Bad (fetches all columns):

SELECT * FROM users WHERE id = 123;

✅ Good (fetches only needed columns):

SELECT id, name, email FROM users WHERE id = 123;

Impact:

Smaller result set
Less memory usage
Faster network transfer
Can use covering indexes

Problem 2: N+1 Queries

❌ Bad (1 query for posts + N queries for users):

# 1 query
posts = db.execute("SELECT * FROM posts LIMIT 10")

# N queries (10 separate queries!)
for post in posts:
    user = db.execute("SELECT * FROM users WHERE id = ?", post.user_id)
    print(f"{user.name}: {post.title}")

✅ Good (2 queries total):

# 1 query for posts
posts = db.execute("SELECT * FROM posts LIMIT 10")

# 1 query for all users
user_ids = [p.user_id for p in posts]
users = db.execute("SELECT * FROM users WHERE id IN (?)", user_ids)
users_by_id = {u.id: u for u in users}

# No additional queries
for post in posts:
    user = users_by_id[post.user_id]
    print(f"{user.name}: {post.title}")

Even Better (1 query with JOIN):

SELECT u.name, p.title
FROM posts p
JOIN users u ON p.user_id = u.id
LIMIT 10;

Problem 3: Implicit Type Conversion

❌ Bad (index not used due to implicit conversion):

-- user_id is INTEGER, but '123' is string
SELECT * FROM posts WHERE user_id = '123';  -- Seq Scan!

✅ Good (explicit type, index used):

SELECT * FROM posts WHERE user_id = 123;  -- Index Scan

Problem 4: Function Calls in WHERE Clause

❌ Bad (can't use index):

-- Function prevents index usage
SELECT * FROM users WHERE LOWER(email) = 'user@example.com';

✅ Good (use functional index):

-- Create index on function result
CREATE INDEX idx_users_lower_email ON users(LOWER(email));

-- Now query can use index
SELECT * FROM users WHERE LOWER(email) = 'user@example.com';

✅ Even Better (store lowercase, regular index):

-- Store email in lowercase
CREATE INDEX idx_users_email ON users(email);

-- Query with lowercase value
SELECT * FROM users WHERE email = LOWER('user@example.com');

Problem 5: OR Conditions

❌ Bad (often does Seq Scan):

SELECT * FROM users
WHERE email = 'user@example.com'
   OR username = 'johndoe';

✅ Good (use UNION, allows index usage):

SELECT * FROM users WHERE email = 'user@example.com'
UNION
SELECT * FROM users WHERE username = 'johndoe';

Problem 6: NOT IN with Subquery

❌ Bad (very slow, especially with large subquery):

SELECT * FROM users
WHERE id NOT IN (SELECT user_id FROM blocked_users);

✅ Good (use LEFT JOIN with NULL check):

SELECT u.*
FROM users u
LEFT JOIN blocked_users b ON u.id = b.user_id
WHERE b.user_id IS NULL;

Part 4: JOIN Optimization

JOIN Order Matters

Database optimizes join order, but you can help:

Small table first, large table second:

-- Good: Join small table (categories: 10 rows) to large table (products: 10M rows)
SELECT p.*
FROM categories c
JOIN products p ON c.id = p.category_id
WHERE c.name = 'Electronics';

Avoid Cartesian Products

❌ Bad (produces rows1 × rows2 × rows3):

SELECT *
FROM users, posts, comments;  -- 100 × 10000 × 50000 = 50 billion rows!

✅ Good (proper JOIN conditions):

SELECT *
FROM users u
JOIN posts p ON u.id = p.user_id
JOIN comments c ON p.id = c.post_id;

EXISTS vs IN for Subqueries

Scenario: Find users who have at least one post

❌ Slower (IN loads all post IDs):

SELECT * FROM users
WHERE id IN (SELECT user_id FROM posts);

✅ Faster (EXISTS stops at first match):

SELECT * FROM users u
WHERE EXISTS (SELECT 1 FROM posts p WHERE p.user_id = u.id);

JOIN vs Subquery

Use JOIN when you need data from both tables:

SELECT u.name, p.title
FROM users u
JOIN posts p ON u.id = p.user_id;

Use Subquery when you only need filtering:

SELECT * FROM users
WHERE id IN (SELECT DISTINCT user_id FROM posts);

Part 5: Aggregation Optimization

COUNT(*) Optimization

Problem: COUNT(*) on large tables is slow

❌ Slow (counts all rows):

SELECT COUNT(*) FROM orders;  -- Full table scan on 10M rows

✅ Faster (approximate count from statistics):

-- PostgreSQL: Use statistics (fast but approximate)
SELECT reltuples::bigint AS estimate
FROM pg_class
WHERE relname = 'orders';

✅ Also Good (exact count with index):

-- If you have an index, COUNT can use index-only scan
CREATE INDEX idx_orders_id ON orders(id);
SELECT COUNT(*) FROM orders;  -- Index-only scan

GROUP BY Optimization

❌ Bad (groups after fetching all data):

SELECT user_id, COUNT(*)
FROM posts
GROUP BY user_id;

✅ Good (index helps grouping):

-- Create index on grouped column
CREATE INDEX idx_posts_user_id ON posts(user_id);

SELECT user_id, COUNT(*)
FROM posts
GROUP BY user_id;  -- Can use index for grouping

HAVING vs WHERE

❌ Bad (filters after grouping):

SELECT user_id, COUNT(*) as post_count
FROM posts
GROUP BY user_id
HAVING user_id > 1000;  -- Filters after grouping all users

✅ Good (filters before grouping):

SELECT user_id, COUNT(*) as post_count
FROM posts
WHERE user_id > 1000  -- Filters before grouping (can use index)
GROUP BY user_id;

Part 6: Pagination Optimization

OFFSET is Slow for Large Offsets

❌ Bad (OFFSET scans all skipped rows):

-- Page 1000 (skips 50,000 rows, then returns 50)
SELECT * FROM posts
ORDER BY created_at DESC
LIMIT 50 OFFSET 50000;  -- Scans 50,050 rows!

✅ Good (keyset pagination with WHERE):

-- Save last seen ID from previous page
SELECT * FROM posts
WHERE created_at < '2024-01-15 10:30:00'  -- Last seen timestamp
ORDER BY created_at DESC
LIMIT 50;  -- Only fetches 50 rows

Implementation:

# Page 1
results = db.execute("""
    SELECT id, created_at, title
    FROM posts
    ORDER BY created_at DESC, id DESC
    LIMIT 50
""")
last_created_at = results[-1].created_at
last_id = results[-1].id

# Page 2 (using last seen values)
results = db.execute("""
    SELECT id, created_at, title
    FROM posts
    WHERE (created_at, id) < (%s, %s)
    ORDER BY created_at DESC, id DESC
    LIMIT 50
""", last_created_at, last_id)

Benefits:

Consistent performance for any page number
No OFFSET overhead
Works with millions of rows

Part 7: Database-Specific Patterns

PostgreSQL-Specific

1. EXPLAIN ANALYZE (shows actual execution):

EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM users WHERE email = 'user@example.com';

2. pg_stat_statements (find slow queries):

-- Enable extension
CREATE EXTENSION pg_stat_statements;

-- Find slowest queries
SELECT
    calls,
    mean_exec_time,
    total_exec_time,
    query
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;

3. VACUUM and ANALYZE:

-- Update statistics after bulk changes
VACUUM ANALYZE table_name;

MySQL-Specific

1. EXPLAIN FORMAT=JSON (detailed output):

EXPLAIN FORMAT=JSON
SELECT * FROM users WHERE email = 'user@example.com';

2. SHOW PROFILE (detailed timing):

SET profiling = 1;
SELECT * FROM users WHERE email = 'user@example.com';
SHOW PROFILE FOR QUERY 1;

3. Query Cache (deprecated in MySQL 8.0):

-- Old MySQL versions only
SHOW VARIABLES LIKE 'query_cache%';

SQLite-Specific

1. EXPLAIN QUERY PLAN:

EXPLAIN QUERY PLAN
SELECT * FROM users WHERE email = 'user@example.com';

2. ANALYZE (update statistics):

ANALYZE table_name;

3. Indexes are critical (SQLite has no query optimizer):

-- SQLite relies heavily on indexes
CREATE INDEX idx_users_email ON users(email);

Part 8: Monitoring and Diagnosis

Find Missing Indexes (PostgreSQL)

-- Queries doing sequential scans on large tables
SELECT
    schemaname,
    tablename,
    seq_scan,
    seq_tup_read,
    idx_scan,
    seq_tup_read / seq_scan AS avg_seq_read
FROM pg_stat_user_tables
WHERE seq_scan > 0
ORDER BY seq_tup_read DESC
LIMIT 25;

Find Unused Indexes (PostgreSQL)

-- Indexes that are never used
SELECT
    schemaname,
    tablename,
    indexname,
    idx_scan,
    pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
  AND indexrelname NOT LIKE '%_pkey'  -- Exclude primary keys
ORDER BY pg_relation_size(indexrelid) DESC;

Check Index Bloat

-- Postgres: Check for bloated indexes
SELECT
    schemaname,
    tablename,
    indexname,
    pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC;

-- Rebuild bloated indexes
REINDEX INDEX index_name;

Part 9: Common Patterns by Use Case

Pattern 1: Recent Records with Pagination

-- Create composite index
CREATE INDEX idx_posts_created_user ON posts(created_at DESC, user_id);

-- Query uses index efficiently
SELECT * FROM posts
WHERE user_id = 123
ORDER BY created_at DESC
LIMIT 20;

Pattern 2: Search Across Multiple Columns

-- Create GIN index for full-text search (Postgres)
CREATE INDEX idx_posts_search ON posts
USING gin(to_tsvector('english', title || ' ' || content));

-- Fast full-text search
SELECT * FROM posts
WHERE to_tsvector('english', title || ' ' || content) @@
      to_tsquery('english', 'database & optimization');

Pattern 3: Range Queries

-- Create index on range column
CREATE INDEX idx_orders_created_at ON orders(created_at);

-- Query uses index
SELECT * FROM orders
WHERE created_at BETWEEN '2024-01-01' AND '2024-01-31'
ORDER BY created_at;

Pattern 4: Distinct Values with Filter

-- Create composite index
CREATE INDEX idx_posts_category_user ON posts(category_id, user_id);

-- Query uses index
SELECT DISTINCT category_id
FROM posts
WHERE user_id = 123;

Part 10: Quick Optimization Checklist

Diagnosis:

Run EXPLAIN ANALYZE on slow queries
Check for Seq Scan on tables >10K rows
Look for missing indexes
Identify N+1 query patterns
Check query execution time (aim for <100ms)

Index Strategy:

Index columns in WHERE clauses
Index columns in JOIN conditions
Index columns in ORDER BY
Use composite indexes for multi-column queries
Consider partial indexes for filtered queries
Remove unused indexes

Query Rewriting:

Select only needed columns (not SELECT *)
Use JOIN instead of N+1 queries
Use proper types (avoid implicit conversion)
Avoid functions in WHERE (use functional indexes)
Use UNION instead of OR for multiple conditions
Use EXISTS instead of IN for large subqueries

Pagination:

Use keyset pagination (not OFFSET) for large datasets
Include stable sort column (id) in pagination
Index pagination columns

Monitoring:

Enable query logging for slow queries
Monitor index usage
Check for index bloat
Regularly VACUUM ANALYZE (Postgres)

Resources

PostgreSQL:

EXPLAIN docs: https://www.postgresql.org/docs/current/sql-explain.html
Index types: https://www.postgresql.org/docs/current/indexes-types.html
Performance tips: https://wiki.postgresql.org/wiki/Performance_Optimization

MySQL:

EXPLAIN output: https://dev.mysql.com/doc/refman/8.0/en/explain-output.html
Optimization: https://dev.mysql.com/doc/refman/8.0/en/optimization.html

Tools:

pgAdmin (Postgres GUI)
MySQL Workbench
DBeaver (multi-database)
explain.depesz.com (Postgres EXPLAIN visualizer)

Unnamed Skill

$ Instalar