Marketplace
vector-databases
Vector database selection, embedding storage, approximate nearest neighbor (ANN) algorithms, and vector search optimization. Use when choosing vector stores, designing semantic search, or optimizing similarity search performance.
allowed_tools: Read, Glob, Grep
$ Installieren
git clone https://github.com/melodic-software/claude-code-plugins /tmp/claude-code-plugins && cp -r /tmp/claude-code-plugins/plugins/systems-design/skills/vector-databases ~/.claude/skills/claude-code-plugins// tip: Run this command in your terminal to install the skill
SKILL.md
name: vector-databases description: Vector database selection, embedding storage, approximate nearest neighbor (ANN) algorithms, and vector search optimization. Use when choosing vector stores, designing semantic search, or optimizing similarity search performance. allowed-tools: Read, Glob, Grep
Vector Databases
When to Use This Skill
Use this skill when:
- Choosing between vector database options
- Designing semantic/similarity search systems
- Optimizing vector search performance
- Understanding ANN algorithm trade-offs
- Scaling vector search infrastructure
- Implementing hybrid search (vectors + filters)
Keywords: vector database, embeddings, vector search, similarity search, ANN, approximate nearest neighbor, HNSW, IVF, FAISS, Pinecone, Weaviate, Milvus, Qdrant, Chroma, pgvector, cosine similarity, semantic search
Vector Database Comparison
Managed Services
| Database | Strengths | Limitations | Best For |
|---|---|---|---|
| Pinecone | Fully managed, easy scaling, enterprise | Vendor lock-in, cost at scale | Enterprise production |
| Weaviate Cloud | GraphQL, hybrid search, modules | Complexity | Knowledge graphs |
| Zilliz Cloud | Milvus-based, high performance | Learning curve | High-scale production |
| MongoDB Atlas Vector | Existing MongoDB users | Newer feature | MongoDB shops |
| Elastic Vector | Existing Elastic stack | Resource heavy | Search platforms |
Self-Hosted Options
| Database | Strengths | Limitations | Best For |
|---|---|---|---|
| Milvus | Feature-rich, scalable, GPU support | Operational complexity | Large-scale production |
| Qdrant | Rust performance, filtering, easy | Smaller ecosystem | Performance-focused |
| Weaviate | Modules, semantic, hybrid | Memory usage | Knowledge applications |
| Chroma | Simple, Python-native | Limited scale | Development, prototyping |
| pgvector | PostgreSQL extension | Performance limits | Postgres shops |
| FAISS | Library, not DB, fastest | No persistence, no filtering | Research, embedded |
Selection Decision Tree
Need managed, don't want operations?
โโโ Yes โ Pinecone (simplest) or Weaviate Cloud
โโโ No (self-hosted)
โโโ Already using PostgreSQL?
โโโ Yes, <1M vectors โ pgvector
โโโ No
โโโ Need maximum performance at scale?
โโโ Yes โ Milvus or Qdrant
โโโ No
โโโ Prototyping/development?
โโโ Yes โ Chroma
โโโ No โ Qdrant (balanced choice)
ANN Algorithms
Algorithm Overview
Exact KNN:
โข Search ALL vectors
โข O(n) time complexity
โข Perfect accuracy
โข Impractical at scale
Approximate NN (ANN):
โข Search SUBSET of vectors
โข O(log n) to O(1) complexity
โข Near-perfect accuracy
โข Practical at any scale
HNSW (Hierarchical Navigable Small World)
Layer 3: โโโโโโโโโโโโโโโโโโโโโโโโโ (sparse, long connections)
โ โ
Layer 2: โโโโโโโโโโโโโโโโโโโโโโโโโ (medium density)
โ โ โ โ โ
Layer 1: โโโโโโโโโโโโโโโโโโโโโโโโโ (denser)
โโโโโโโโโโโโโโโโโโโโโโโ
Layer 0: โโโโโโโโโโโโโโโโโโโโโโโโโ (all vectors)
Search: Start at top layer, greedily descend
โข Fast: O(log n) search time
โข High recall: >95% typically
โข Memory: Extra graph storage
HNSW Parameters:
| Parameter | Description | Trade-off |
|---|---|---|
M | Connections per node | Memory vs. recall |
ef_construction | Build-time search width | Build time vs. recall |
ef_search | Query-time search width | Latency vs. recall |
IVF (Inverted File Index)
Clustering Phase:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Cluster vectors into K centroids โ
โ โ
โ โ โ โ โ โ
โ /โ\ /โ\ /โ\ /โ\ โ
โ โโโโโ โโโโโ โโโโโ โโโโโ โ
โ Cluster 1 Cluster 2 Cluster 3 Cluster 4โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Search Phase:
1. Find nprobe nearest centroids
2. Search only those clusters
3. Much faster than exhaustive
IVF Parameters:
| Parameter | Description | Trade-off |
|---|---|---|
nlist | Number of clusters | Build time vs. search quality |
nprobe | Clusters to search | Latency vs. recall |
IVF-PQ (Product Quantization)
Original Vector (128 dim):
[0.1, 0.2, ..., 0.9] (128 ร 4 bytes = 512 bytes)
PQ Compressed (8 subvectors, 8-bit codes):
[23, 45, 12, 89, 56, 34, 78, 90] (8 bytes)
Memory reduction: 64x
Accuracy trade-off: ~5% recall drop
Algorithm Comparison
| Algorithm | Search Speed | Memory | Build Time | Recall |
|---|---|---|---|---|
| Flat/Brute | Slow (O(n)) | Low | None | 100% |
| IVF | Fast | Low | Medium | 90-95% |
| IVF-PQ | Very fast | Very low | Medium | 85-92% |
| HNSW | Very fast | High | Slow | 95-99% |
| HNSW+PQ | Very fast | Medium | Slow | 90-95% |
When to Use Which
< 100K vectors:
โโโ Flat index (exact search is fast enough)
100K - 1M vectors:
โโโ HNSW (best recall/speed trade-off)
1M - 100M vectors:
โโโ Memory available โ HNSW
โโโ Memory constrained โ IVF-PQ or HNSW+PQ
> 100M vectors:
โโโ Sharded IVF-PQ or distributed HNSW
Distance Metrics
Common Metrics
| Metric | Formula | Range | Best For |
|---|---|---|---|
| Cosine Similarity | AยทB / (||A|| ||B||) | [-1, 1] | Normalized embeddings |
| Dot Product | AยทB | (-โ, โ) | When magnitude matters |
| Euclidean (L2) | โฮฃ(A-B)ยฒ | [0, โ) | Absolute distances |
| Manhattan (L1) | ฮฃ|A-B| | [0, โ) | High-dimensional sparse |
Metric Selection
Embeddings pre-normalized (unit vectors)?
โโโ Yes โ Cosine = Dot Product (use Dot, faster)
โโโ No
โโโ Magnitude meaningful?
โโโ Yes โ Dot Product
โโโ No โ Cosine Similarity
Note: Most embedding models output normalized vectors
โ Dot product is usually the best choice
Filtering and Hybrid Search
Pre-filtering vs Post-filtering
Pre-filtering (Filter โ Search):
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ 1. Apply metadata filter โ
โ (category = "electronics") โ
โ Result: 10K of 1M vectors โ
โ โ
โ 2. Vector search on 10K vectors โ
โ Much faster, guaranteed filter match โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Post-filtering (Search โ Filter):
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ 1. Vector search on 1M vectors โ
โ Return top-1000 โ
โ โ
โ 2. Apply metadata filter โ
โ May return < K results! โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Hybrid Search Architecture
Query: "wireless headphones under $100"
โ
โโโโโโโดโโโโโโ
โผ โผ
โโโโโโโโโ โโโโโโโโโ
โVector โ โFilter โ
โSearch โ โ Build โ
โ"wire- โ โprice โ
โless โ โ< 100 โ
โhead- โ โ โ
โphones"โ โ โ
โโโโโโโโโ โโโโโโโโโ
โ โ
โโโโโโโฌโโโโโโ
โผ
โโโโโโโโโโโโโ
โ Combine โ
โ Results โ
โโโโโโโโโโโโโ
Metadata Index Design
| Metadata Type | Index Strategy | Query Example |
|---|---|---|
| Categorical | Bitmap/hash index | category = "books" |
| Numeric range | B-tree | price BETWEEN 10 AND 50 |
| Keyword search | Inverted index | tags CONTAINS "sale" |
| Geospatial | R-tree/geohash | location NEAR (lat, lng) |
Scaling Strategies
Sharding Approaches
Naive Sharding (by ID):
โโโโโโโโโโโ โโโโโโโโโโโ โโโโโโโโโโโ
โ Shard 1 โ โ Shard 2 โ โ Shard 3 โ
โ IDs 0-N โ โIDs N-2N โ โIDs 2N-3Nโ
โโโโโโโโโโโ โโโโโโโโโโโ โโโโโโโโโโโ
Query โ Search ALL shards โ Merge results
Semantic Sharding (by cluster):
โโโโโโโโโโโ โโโโโโโโโโโ โโโโโโโโโโโ
โ Shard 1 โ โ Shard 2 โ โ Shard 3 โ
โ Tech โ โ Health โ โ Finance โ
โ docs โ โ docs โ โ docs โ
โโโโโโโโโโโ โโโโโโโโโโโ โโโโโโโโโโโ
Query โ Route to relevant shard(s) โ Faster!
Replication
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Load Balancer โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ โ
โผ โผ โผ
โโโโโโโโโโโ โโโโโโโโโโโ โโโโโโโโโโโ
โReplica 1โ โReplica 2โ โReplica 3โ
โ (Read) โ โ (Read) โ โ (Read) โ
โโโโโโโโโโโ โโโโโโโโโโโ โโโโโโโโโโโ
โ โ โ
โโโโโโโโโโโโโผโโโโโโโโโโโโ
โ
โโโโโโโโโโโ
โ Primary โ
โ (Write) โ
โโโโโโโโโโโ
Scaling Decision Matrix
| Scale (vectors) | Architecture | Replication |
|---|---|---|
| < 1M | Single node | Optional |
| 1-10M | Single node, more RAM | For HA |
| 10-100M | Sharded, few nodes | Required |
| 100M-1B | Sharded, many nodes | Required |
| > 1B | Sharded + tiered | Required |
Performance Optimization
Index Build Optimization
| Optimization | Description | Impact |
|---|---|---|
| Batch insertion | Insert in batches of 1K-10K | 10x faster |
| Parallel build | Multi-threaded index construction | 2-4x faster |
| Incremental index | Add to existing index | Avoids rebuild |
| GPU acceleration | Use GPU for training (IVF) | 10-100x faster |
Query Optimization
| Optimization | Description | Impact |
|---|---|---|
| Warm cache | Keep index in memory | 10x latency reduction |
| Query batching | Batch similar queries | Higher throughput |
| Reduce dimensions | PCA, random projection | 2-4x faster |
| Early termination | Stop when "good enough" | Lower latency |
Memory Optimization
Memory per vector:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ 1536 dims ร 4 bytes = 6KB per vector โ
โ โ
โ 1M vectors: โ
โ Raw: 6GB โ
โ + HNSW graph: +2-4GB (M-dependent) โ
โ = 8-10GB total โ
โ โ
โ With PQ (64 subquantizers): โ
โ 1M vectors: ~64MB โ
โ = 100x reduction โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Operational Considerations
Backup and Recovery
| Strategy | Description | RPO/RTO |
|---|---|---|
| Snapshots | Periodic full backup | Hours |
| WAL replication | Write-ahead log streaming | Minutes |
| Real-time sync | Synchronous replication | Seconds |
Monitoring Metrics
| Metric | Description | Alert Threshold |
|---|---|---|
| Query latency p99 | 99th percentile latency | > 100ms |
| Recall | Search accuracy | < 90% |
| QPS | Queries per second | Capacity dependent |
| Memory usage | Index memory | > 80% |
| Index freshness | Time since last update | Domain dependent |
Index Maintenance
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Index Maintenance Tasks โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โข Compaction: Merge small segments โ
โ โข Reindex: Rebuild degraded index โ
โ โข Vacuum: Remove deleted vectors โ
โ โข Optimize: Tune parameters โ
โ โ
โ Schedule during low-traffic periods โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Common Patterns
Multi-Tenant Vector Search
Option 1: Namespace/Collection per tenant
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ tenant_1_collection โ
โ tenant_2_collection โ
โ tenant_3_collection โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Pro: Complete isolation
Con: Many indexes, operational overhead
Option 2: Single collection + tenant filter
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ shared_collection โ
โ metadata: { tenant_id: "..." } โ
โ Pre-filter by tenant_id โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Pro: Simpler operations
Con: Requires efficient filtering
Real-Time Updates
Write Path:
โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ
โ Write โ โ Buffer โ โ Merge โ
โ Request โโโโโถโ (Memory) โโโโโถโ to Index โ
โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ
Strategy:
1. Buffer writes in memory
2. Periodically merge to main index
3. Search: main index + buffer
4. Compact periodically
Embedding Versioning
Version 1 embeddings โโโ
โ
Version 2 embeddings โโโผโโโถ Parallel indexes during migration
โ
โ โโโโโโโโโโโโโโโโโโโโโโโ
โโโโโถโ Gradual reindexing โ
โ Blue-green switch โ
โโโโโโโโโโโโโโโโโโโโโโโ
Cost Estimation
Storage Costs
Cost = (vectors ร dimensions ร bytes ร replication) / GB ร $/GB/month
Example:
10M vectors ร 1536 dims ร 4 bytes ร 3 replicas = 184 GB
At $0.10/GB/month = $18.40/month storage
Note: Memory (for serving) costs more than storage
Compute Costs
Factors:
โข QPS (queries per second)
โข Latency requirements
โข Index type (HNSW needs more RAM)
โข Filtering complexity
Rule of thumb:
โข 1M vectors, HNSW, <50ms latency: 16GB RAM
โข 10M vectors, HNSW, <50ms latency: 64-128GB RAM
โข 100M vectors: Distributed system required
Related Skills
rag-architecture- Using vector databases in RAG systemsllm-serving-patterns- LLM inference with vector retrievalml-system-design- End-to-end ML pipeline designestimation-techniques- Capacity planning for vector systems
Version History
- v1.0.0 (2025-12-26): Initial release - Vector database patterns for systems design
Last Updated
Date: 2025-12-26
Repository

melodic-software
Author
melodic-software/claude-code-plugins/plugins/systems-design/skills/vector-databases
3
Stars
0
Forks
Updated4d ago
Added1w ago