Unnamed Skill

SQL query optimization patterns including EXPLAIN plan analysis, index strategies, query rewriting, and N+1 query prevention. Use when optimizing slow database queries, analyzing query performance, designing indexes, or debugging database bottlenecks. Works with PostgreSQL, MySQL, SQLite, and other SQL databases. Typical improvements: 10x-1000x query speedup.

$ Installieren

git clone https://github.com/applied-artificial-intelligence/claude-code-toolkit /tmp/claude-code-toolkit && cp -r /tmp/claude-code-toolkit/skills/general-dev/sql-optimization ~/.claude/skills/claude-code-toolkit

// tip: Run this command in your terminal to install the skill


name: sql-optimization description: SQL query optimization patterns including EXPLAIN plan analysis, index strategies, query rewriting, and N+1 query prevention. Use when optimizing slow database queries, analyzing query performance, designing indexes, or debugging database bottlenecks. Works with PostgreSQL, MySQL, SQLite, and other SQL databases. Typical improvements: 10x-1000x query speedup.

SQL Query Optimization Patterns

Comprehensive guide to optimizing SQL queries for performance, including EXPLAIN plan analysis, index design strategies, query rewriting patterns, and N+1 query detection. Works across PostgreSQL, MySQL, and SQLite.


Quick Reference

When to use this skill:

  • Slow database queries (>100ms for simple queries, >1s for complex)
  • High database CPU usage
  • Analyzing EXPLAIN plans
  • Designing database indexes
  • Debugging N+1 query problems
  • Optimizing JOIN operations
  • Reducing table scans

Common triggers:

  • "This query is too slow"
  • "How do I optimize this SQL"
  • "What indexes should I add"
  • "Explain this EXPLAIN plan"
  • "Fix N+1 queries"
  • "Database CPU at 100%"

Typical improvements:

  • 3 seconds → 50ms (60x faster)
  • Full table scan → Index scan
  • 1000 queries → 2 queries (N+1 elimination)

Part 1: Understanding EXPLAIN Plans

Reading PostgreSQL EXPLAIN

EXPLAIN ANALYZE
SELECT u.name, p.title
FROM users u
JOIN posts p ON u.id = p.user_id
WHERE u.created_at > '2024-01-01'
ORDER BY p.created_at DESC
LIMIT 10;

Key Metrics to Watch:

  • Seq Scan (bad): Full table scan, reads every row
  • Index Scan (good): Uses index, reads only needed rows
  • Cost: Estimated computational cost (lower is better)
  • Actual time: Real execution time in milliseconds
  • Rows: Number of rows processed at each step

Bad EXPLAIN Example

Seq Scan on users u  (cost=0.00..1234.00 rows=1000 width=50)
                     (actual time=0.123..45.678 rows=950 loops=1)
  Filter: (created_at > '2024-01-01'::date)
  Rows Removed by Filter: 50000
Planning Time: 0.234 ms
Execution Time: 3456.789 ms

Problems:

  • Seq Scan: Reading entire table (50,950 rows)
  • Rows Removed by Filter: Filtering after reading (wasteful)
  • Execution Time: 3.5 seconds (way too slow)

Good EXPLAIN Example (After Index)

Index Scan using users_created_at_idx on users u
  (cost=0.29..123.45 rows=950 width=50)
  (actual time=0.012..3.456 rows=950 loops=1)
  Index Cond: (created_at > '2024-01-01'::date)
Planning Time: 0.123 ms
Execution Time: 4.567 ms

Improvements:

  • Index Scan: Using index (only reads needed rows)
  • Index Cond: Filtering during index scan (efficient)
  • Execution Time: 4.5ms (750x faster!)

Part 2: Index Design Strategies

When to Add an Index

✅ Add index when:

  • Column frequently in WHERE clauses
  • Column frequently in JOIN conditions
  • Column frequently in ORDER BY
  • Query does Seq Scan on large table (>10K rows)
  • Query execution time >100ms

❌ Don't add index when:

  • Table has <1000 rows (Seq Scan is fast enough)
  • Column rarely queried
  • Column has very low cardinality (e.g., boolean with 50/50 distribution)
  • Table has heavy writes (indexes slow down INSERTs/UPDATEs)

Single-Column Indexes

-- Create index on frequently queried column
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_posts_user_id ON posts(user_id);
CREATE INDEX idx_orders_created_at ON orders(created_at);

Query that benefits:

-- Before index: Seq Scan (3000ms)
-- After index: Index Scan (5ms)
SELECT * FROM users WHERE email = 'user@example.com';

Composite Indexes (Multi-Column)

Order matters! Put most selective column first.

-- Good: Specific to general
CREATE INDEX idx_orders_user_status ON orders(user_id, status, created_at);

-- Query that uses all columns
SELECT * FROM orders
WHERE user_id = 123
  AND status = 'pending'
ORDER BY created_at DESC;

Index usage rules:

-- ✅ Uses index (matches from left)
WHERE user_id = 123
WHERE user_id = 123 AND status = 'pending'
WHERE user_id = 123 AND status = 'pending' AND created_at > '2024-01-01'

-- ❌ Doesn't use index (skips first column)
WHERE status = 'pending'
WHERE created_at > '2024-01-01'

Partial Indexes (Filtered)

Index only subset of rows (smaller, faster):

-- Only index active users
CREATE INDEX idx_users_active_email
ON users(email)
WHERE status = 'active';

-- Query that uses partial index
SELECT * FROM users
WHERE email = 'user@example.com'
  AND status = 'active';  -- Must include filter condition!

Benefits:

  • Smaller index (faster scans, less storage)
  • Only indexes rows you actually query
  • Great for status fields with skewed distribution

Covering Indexes (INCLUDE)

Include columns in index to avoid table access:

-- Postgres: INCLUDE clause
CREATE INDEX idx_users_email_covering
ON users(email)
INCLUDE (name, created_at);

-- Query doesn't need to access table (all data in index)
SELECT name, created_at FROM users WHERE email = 'user@example.com';

Benefit: Index-Only Scan (even faster than Index Scan)


Part 3: Query Rewriting Patterns

Problem 1: SELECT * (Fetching Unnecessary Data)

❌ Bad (fetches all columns):

SELECT * FROM users WHERE id = 123;

✅ Good (fetches only needed columns):

SELECT id, name, email FROM users WHERE id = 123;

Impact:

  • Smaller result set
  • Less memory usage
  • Faster network transfer
  • Can use covering indexes

Problem 2: N+1 Queries

❌ Bad (1 query for posts + N queries for users):

# 1 query
posts = db.execute("SELECT * FROM posts LIMIT 10")

# N queries (10 separate queries!)
for post in posts:
    user = db.execute("SELECT * FROM users WHERE id = ?", post.user_id)
    print(f"{user.name}: {post.title}")

✅ Good (2 queries total):

# 1 query for posts
posts = db.execute("SELECT * FROM posts LIMIT 10")

# 1 query for all users
user_ids = [p.user_id for p in posts]
users = db.execute("SELECT * FROM users WHERE id IN (?)", user_ids)
users_by_id = {u.id: u for u in users}

# No additional queries
for post in posts:
    user = users_by_id[post.user_id]
    print(f"{user.name}: {post.title}")

Even Better (1 query with JOIN):

SELECT u.name, p.title
FROM posts p
JOIN users u ON p.user_id = u.id
LIMIT 10;

Problem 3: Implicit Type Conversion

❌ Bad (index not used due to implicit conversion):

-- user_id is INTEGER, but '123' is string
SELECT * FROM posts WHERE user_id = '123';  -- Seq Scan!

✅ Good (explicit type, index used):

SELECT * FROM posts WHERE user_id = 123;  -- Index Scan

Problem 4: Function Calls in WHERE Clause

❌ Bad (can't use index):

-- Function prevents index usage
SELECT * FROM users WHERE LOWER(email) = 'user@example.com';

✅ Good (use functional index):

-- Create index on function result
CREATE INDEX idx_users_lower_email ON users(LOWER(email));

-- Now query can use index
SELECT * FROM users WHERE LOWER(email) = 'user@example.com';

✅ Even Better (store lowercase, regular index):

-- Store email in lowercase
CREATE INDEX idx_users_email ON users(email);

-- Query with lowercase value
SELECT * FROM users WHERE email = LOWER('user@example.com');

Problem 5: OR Conditions

❌ Bad (often does Seq Scan):

SELECT * FROM users
WHERE email = 'user@example.com'
   OR username = 'johndoe';

✅ Good (use UNION, allows index usage):

SELECT * FROM users WHERE email = 'user@example.com'
UNION
SELECT * FROM users WHERE username = 'johndoe';

Problem 6: NOT IN with Subquery

❌ Bad (very slow, especially with large subquery):

SELECT * FROM users
WHERE id NOT IN (SELECT user_id FROM blocked_users);

✅ Good (use LEFT JOIN with NULL check):

SELECT u.*
FROM users u
LEFT JOIN blocked_users b ON u.id = b.user_id
WHERE b.user_id IS NULL;

Part 4: JOIN Optimization

JOIN Order Matters

Database optimizes join order, but you can help:

Small table first, large table second:

-- Good: Join small table (categories: 10 rows) to large table (products: 10M rows)
SELECT p.*
FROM categories c
JOIN products p ON c.id = p.category_id
WHERE c.name = 'Electronics';

Avoid Cartesian Products

❌ Bad (produces rows1 × rows2 × rows3):

SELECT *
FROM users, posts, comments;  -- 100 × 10000 × 50000 = 50 billion rows!

✅ Good (proper JOIN conditions):

SELECT *
FROM users u
JOIN posts p ON u.id = p.user_id
JOIN comments c ON p.id = c.post_id;

EXISTS vs IN for Subqueries

Scenario: Find users who have at least one post

❌ Slower (IN loads all post IDs):

SELECT * FROM users
WHERE id IN (SELECT user_id FROM posts);

✅ Faster (EXISTS stops at first match):

SELECT * FROM users u
WHERE EXISTS (SELECT 1 FROM posts p WHERE p.user_id = u.id);

JOIN vs Subquery

Use JOIN when you need data from both tables:

SELECT u.name, p.title
FROM users u
JOIN posts p ON u.id = p.user_id;

Use Subquery when you only need filtering:

SELECT * FROM users
WHERE id IN (SELECT DISTINCT user_id FROM posts);

Part 5: Aggregation Optimization

COUNT(*) Optimization

Problem: COUNT(*) on large tables is slow

❌ Slow (counts all rows):

SELECT COUNT(*) FROM orders;  -- Full table scan on 10M rows

✅ Faster (approximate count from statistics):

-- PostgreSQL: Use statistics (fast but approximate)
SELECT reltuples::bigint AS estimate
FROM pg_class
WHERE relname = 'orders';

✅ Also Good (exact count with index):

-- If you have an index, COUNT can use index-only scan
CREATE INDEX idx_orders_id ON orders(id);
SELECT COUNT(*) FROM orders;  -- Index-only scan

GROUP BY Optimization

❌ Bad (groups after fetching all data):

SELECT user_id, COUNT(*)
FROM posts
GROUP BY user_id;

✅ Good (index helps grouping):

-- Create index on grouped column
CREATE INDEX idx_posts_user_id ON posts(user_id);

SELECT user_id, COUNT(*)
FROM posts
GROUP BY user_id;  -- Can use index for grouping

HAVING vs WHERE

❌ Bad (filters after grouping):

SELECT user_id, COUNT(*) as post_count
FROM posts
GROUP BY user_id
HAVING user_id > 1000;  -- Filters after grouping all users

✅ Good (filters before grouping):

SELECT user_id, COUNT(*) as post_count
FROM posts
WHERE user_id > 1000  -- Filters before grouping (can use index)
GROUP BY user_id;

Part 6: Pagination Optimization

OFFSET is Slow for Large Offsets

❌ Bad (OFFSET scans all skipped rows):

-- Page 1000 (skips 50,000 rows, then returns 50)
SELECT * FROM posts
ORDER BY created_at DESC
LIMIT 50 OFFSET 50000;  -- Scans 50,050 rows!

✅ Good (keyset pagination with WHERE):

-- Save last seen ID from previous page
SELECT * FROM posts
WHERE created_at < '2024-01-15 10:30:00'  -- Last seen timestamp
ORDER BY created_at DESC
LIMIT 50;  -- Only fetches 50 rows

Implementation:

# Page 1
results = db.execute("""
    SELECT id, created_at, title
    FROM posts
    ORDER BY created_at DESC, id DESC
    LIMIT 50
""")
last_created_at = results[-1].created_at
last_id = results[-1].id

# Page 2 (using last seen values)
results = db.execute("""
    SELECT id, created_at, title
    FROM posts
    WHERE (created_at, id) < (%s, %s)
    ORDER BY created_at DESC, id DESC
    LIMIT 50
""", last_created_at, last_id)

Benefits:

  • Consistent performance for any page number
  • No OFFSET overhead
  • Works with millions of rows

Part 7: Database-Specific Patterns

PostgreSQL-Specific

1. EXPLAIN ANALYZE (shows actual execution):

EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM users WHERE email = 'user@example.com';

2. pg_stat_statements (find slow queries):

-- Enable extension
CREATE EXTENSION pg_stat_statements;

-- Find slowest queries
SELECT
    calls,
    mean_exec_time,
    total_exec_time,
    query
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;

3. VACUUM and ANALYZE:

-- Update statistics after bulk changes
VACUUM ANALYZE table_name;

MySQL-Specific

1. EXPLAIN FORMAT=JSON (detailed output):

EXPLAIN FORMAT=JSON
SELECT * FROM users WHERE email = 'user@example.com';

2. SHOW PROFILE (detailed timing):

SET profiling = 1;
SELECT * FROM users WHERE email = 'user@example.com';
SHOW PROFILE FOR QUERY 1;

3. Query Cache (deprecated in MySQL 8.0):

-- Old MySQL versions only
SHOW VARIABLES LIKE 'query_cache%';

SQLite-Specific

1. EXPLAIN QUERY PLAN:

EXPLAIN QUERY PLAN
SELECT * FROM users WHERE email = 'user@example.com';

2. ANALYZE (update statistics):

ANALYZE table_name;

3. Indexes are critical (SQLite has no query optimizer):

-- SQLite relies heavily on indexes
CREATE INDEX idx_users_email ON users(email);

Part 8: Monitoring and Diagnosis

Find Missing Indexes (PostgreSQL)

-- Queries doing sequential scans on large tables
SELECT
    schemaname,
    tablename,
    seq_scan,
    seq_tup_read,
    idx_scan,
    seq_tup_read / seq_scan AS avg_seq_read
FROM pg_stat_user_tables
WHERE seq_scan > 0
ORDER BY seq_tup_read DESC
LIMIT 25;

Find Unused Indexes (PostgreSQL)

-- Indexes that are never used
SELECT
    schemaname,
    tablename,
    indexname,
    idx_scan,
    pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
  AND indexrelname NOT LIKE '%_pkey'  -- Exclude primary keys
ORDER BY pg_relation_size(indexrelid) DESC;

Check Index Bloat

-- Postgres: Check for bloated indexes
SELECT
    schemaname,
    tablename,
    indexname,
    pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC;

-- Rebuild bloated indexes
REINDEX INDEX index_name;

Part 9: Common Patterns by Use Case

Pattern 1: Recent Records with Pagination

-- Create composite index
CREATE INDEX idx_posts_created_user ON posts(created_at DESC, user_id);

-- Query uses index efficiently
SELECT * FROM posts
WHERE user_id = 123
ORDER BY created_at DESC
LIMIT 20;

Pattern 2: Search Across Multiple Columns

-- Create GIN index for full-text search (Postgres)
CREATE INDEX idx_posts_search ON posts
USING gin(to_tsvector('english', title || ' ' || content));

-- Fast full-text search
SELECT * FROM posts
WHERE to_tsvector('english', title || ' ' || content) @@
      to_tsquery('english', 'database & optimization');

Pattern 3: Range Queries

-- Create index on range column
CREATE INDEX idx_orders_created_at ON orders(created_at);

-- Query uses index
SELECT * FROM orders
WHERE created_at BETWEEN '2024-01-01' AND '2024-01-31'
ORDER BY created_at;

Pattern 4: Distinct Values with Filter

-- Create composite index
CREATE INDEX idx_posts_category_user ON posts(category_id, user_id);

-- Query uses index
SELECT DISTINCT category_id
FROM posts
WHERE user_id = 123;

Part 10: Quick Optimization Checklist

Diagnosis:

  • Run EXPLAIN ANALYZE on slow queries
  • Check for Seq Scan on tables >10K rows
  • Look for missing indexes
  • Identify N+1 query patterns
  • Check query execution time (aim for <100ms)

Index Strategy:

  • Index columns in WHERE clauses
  • Index columns in JOIN conditions
  • Index columns in ORDER BY
  • Use composite indexes for multi-column queries
  • Consider partial indexes for filtered queries
  • Remove unused indexes

Query Rewriting:

  • Select only needed columns (not SELECT *)
  • Use JOIN instead of N+1 queries
  • Use proper types (avoid implicit conversion)
  • Avoid functions in WHERE (use functional indexes)
  • Use UNION instead of OR for multiple conditions
  • Use EXISTS instead of IN for large subqueries

Pagination:

  • Use keyset pagination (not OFFSET) for large datasets
  • Include stable sort column (id) in pagination
  • Index pagination columns

Monitoring:

  • Enable query logging for slow queries
  • Monitor index usage
  • Check for index bloat
  • Regularly VACUUM ANALYZE (Postgres)

Resources

PostgreSQL:

MySQL:

Tools:

  • pgAdmin (Postgres GUI)
  • MySQL Workbench
  • DBeaver (multi-database)
  • explain.depesz.com (Postgres EXPLAIN visualizer)