python-performance-optimization
Python performance optimization patterns using profiling, algorithmic improvements, and acceleration techniques. Use when optimizing slow Python code, reducing memory usage, or improving application throughput and latency.
$ 安裝
git clone https://github.com/NickCrew/claude-cortex /tmp/claude-cortex && cp -r /tmp/claude-cortex/skills/python-performance-optimization ~/.claude/skills/claude-cortex// tip: Run this command in your terminal to install the skill
name: python-performance-optimization description: Python performance optimization patterns using profiling, algorithmic improvements, and acceleration techniques. Use when optimizing slow Python code, reducing memory usage, or improving application throughput and latency.
Python Performance Optimization
Expert guidance for profiling, optimizing, and accelerating Python applications through systematic analysis, algorithmic improvements, efficient data structures, and acceleration techniques.
When to Use This Skill
- Code runs too slowly for production requirements
- High CPU usage or memory consumption issues
- Need to reduce API response times or batch processing duration
- Application fails to scale under load
- Optimizing data processing pipelines or scientific computing
- Reducing cloud infrastructure costs through efficiency gains
- Profile-guided optimization after measuring performance bottlenecks
Core Concepts
The Golden Rule: Never optimize without profiling first. 80% of execution time is spent in 20% of code.
Optimization Hierarchy (in priority order):
- Algorithm complexity - O(n²) → O(n log n) provides exponential gains
- Data structure choice - List → Set for lookups (10,000x faster)
- Language features - Comprehensions, built-ins, generators
- Caching - Memoization for repeated calculations
- Compiled extensions - NumPy, Numba, Cython for hot paths
- Parallelism - Multiprocessing for CPU-bound work
Key Principle: Algorithmic improvements beat micro-optimizations every time.
Quick Reference
Load detailed guides for specific optimization areas:
| Task | Load reference |
|---|---|
| Profile code and find bottlenecks | skills/python-performance-optimization/references/profiling.md |
| Algorithm and data structure optimization | skills/python-performance-optimization/references/algorithms.md |
| Memory optimization and generators | skills/python-performance-optimization/references/memory.md |
| String concatenation and file I/O | skills/python-performance-optimization/references/string-io.md |
| NumPy, Numba, Cython, multiprocessing | skills/python-performance-optimization/references/acceleration.md |
Optimization Workflow
Phase 1: Measure
- Profile with cProfile - Identify slow functions
- Line profile hot paths - Find exact slow lines
- Memory profile - Check for memory bottlenecks
- Benchmark baseline - Record current performance
Phase 2: Analyze
- Check algorithm complexity - Is it O(n²) or worse?
- Evaluate data structures - Are you using lists for lookups?
- Identify repeated work - Can results be cached?
- Find I/O bottlenecks - Database queries, file operations
Phase 3: Optimize
- Improve algorithms first - Biggest impact
- Use appropriate data structures - Set/dict for O(1) lookups
- Apply caching -
@lru_cachefor expensive functions - Use generators - For large datasets
- Leverage NumPy/Numba - For numerical code
- Parallelize - Multiprocessing for CPU-bound tasks
Phase 4: Validate
- Re-profile - Verify improvements
- Benchmark - Measure speedup quantitatively
- Test correctness - Ensure optimizations didn't break functionality
- Document - Explain why optimization was needed
Common Optimization Patterns
Pattern 1: Replace List with Set for Lookups
# Slow: O(n) lookup
if item in large_list: # Bad
# Fast: O(1) lookup
if item in large_set: # Good
Pattern 2: Use Comprehensions
# Slower
result = []
for i in range(n):
result.append(i * 2)
# Faster (35% speedup)
result = [i * 2 for i in range(n)]
Pattern 3: Cache Expensive Calculations
from functools import lru_cache
@lru_cache(maxsize=None)
def expensive_function(n):
# Result cached automatically
return complex_calculation(n)
Pattern 4: Use Generators for Large Data
# Memory inefficient
def read_file(path):
return [line for line in open(path)] # Loads entire file
# Memory efficient
def read_file(path):
for line in open(path): # Streams line by line
yield line.strip()
Pattern 5: Vectorize with NumPy
# Pure Python: ~500ms
result = sum(i**2 for i in range(1000000))
# NumPy: ~5ms (100x faster)
import numpy as np
result = np.sum(np.arange(1000000)**2)
Common Mistakes to Avoid
- Optimizing before profiling - You'll optimize the wrong code
- Using lists for membership tests - Use sets/dicts instead
- String concatenation in loops - Use
"".join()orStringIO - Loading entire files into memory - Use generators
- N+1 database queries - Use JOINs or batch queries
- Ignoring built-in functions - They're C-optimized and fast
- Premature optimization - Focus on algorithmic improvements first
- Not benchmarking - Always measure improvements quantitatively
Decision Tree
Start here: Profile with cProfile to find bottlenecks
Hot path is algorithm?
- Yes → Check complexity, improve algorithm, use better data structures
- No → Continue
Hot path is computation?
- Numerical loops → NumPy or Numba
- CPU-bound → Multiprocessing
- Already fast enough → Done
Hot path is memory?
- Large data → Generators, streaming
- Many objects →
__slots__, object pooling - Caching needed →
@lru_cacheor custom cache
Hot path is I/O?
- Database → Batch queries, indexes, connection pooling
- Files → Buffering, streaming
- Network → Async I/O, request batching
Best Practices
- Profile before optimizing - Measure to find real bottlenecks
- Optimize algorithms first - O(n²) → O(n) beats micro-optimizations
- Use appropriate data structures - Set/dict for lookups, not lists
- Leverage built-ins - C-implemented built-ins are faster than pure Python
- Avoid premature optimization - Optimize hot paths identified by profiling
- Use generators for large data - Reduce memory usage with lazy evaluation
- Batch operations - Minimize overhead from syscalls and network requests
- Cache expensive computations - Use
@lru_cacheor custom caching - Consider NumPy/Numba - Vectorization and JIT for numerical code
- Parallelize CPU-bound work - Use multiprocessing to utilize all cores
Resources
- Python Performance: https://wiki.python.org/moin/PythonSpeed
- cProfile: https://docs.python.org/3/library/profile.html
- NumPy: https://numpy.org/doc/stable/user/absolute_beginners.html
- Numba: https://numba.pydata.org/
- Cython: https://cython.readthedocs.io/
- High Performance Python (Book by Gorelick & Ozsvald)
Repository
