Marketplace

vector-databases

Vector database selection, embedding storage, approximate nearest neighbor (ANN) algorithms, and vector search optimization. Use when choosing vector stores, designing semantic search, or optimizing similarity search performance.

allowed_tools: Read, Glob, Grep

$ Installieren

git clone https://github.com/melodic-software/claude-code-plugins /tmp/claude-code-plugins && cp -r /tmp/claude-code-plugins/plugins/systems-design/skills/vector-databases ~/.claude/skills/claude-code-plugins

// tip: Run this command in your terminal to install the skill


name: vector-databases description: Vector database selection, embedding storage, approximate nearest neighbor (ANN) algorithms, and vector search optimization. Use when choosing vector stores, designing semantic search, or optimizing similarity search performance. allowed-tools: Read, Glob, Grep

Vector Databases

When to Use This Skill

Use this skill when:

  • Choosing between vector database options
  • Designing semantic/similarity search systems
  • Optimizing vector search performance
  • Understanding ANN algorithm trade-offs
  • Scaling vector search infrastructure
  • Implementing hybrid search (vectors + filters)

Keywords: vector database, embeddings, vector search, similarity search, ANN, approximate nearest neighbor, HNSW, IVF, FAISS, Pinecone, Weaviate, Milvus, Qdrant, Chroma, pgvector, cosine similarity, semantic search

Vector Database Comparison

Managed Services

DatabaseStrengthsLimitationsBest For
PineconeFully managed, easy scaling, enterpriseVendor lock-in, cost at scaleEnterprise production
Weaviate CloudGraphQL, hybrid search, modulesComplexityKnowledge graphs
Zilliz CloudMilvus-based, high performanceLearning curveHigh-scale production
MongoDB Atlas VectorExisting MongoDB usersNewer featureMongoDB shops
Elastic VectorExisting Elastic stackResource heavySearch platforms

Self-Hosted Options

DatabaseStrengthsLimitationsBest For
MilvusFeature-rich, scalable, GPU supportOperational complexityLarge-scale production
QdrantRust performance, filtering, easySmaller ecosystemPerformance-focused
WeaviateModules, semantic, hybridMemory usageKnowledge applications
ChromaSimple, Python-nativeLimited scaleDevelopment, prototyping
pgvectorPostgreSQL extensionPerformance limitsPostgres shops
FAISSLibrary, not DB, fastestNo persistence, no filteringResearch, embedded

Selection Decision Tree

Need managed, don't want operations?
โ”œโ”€โ”€ Yes โ†’ Pinecone (simplest) or Weaviate Cloud
โ””โ”€โ”€ No (self-hosted)
    โ””โ”€โ”€ Already using PostgreSQL?
        โ”œโ”€โ”€ Yes, <1M vectors โ†’ pgvector
        โ””โ”€โ”€ No
            โ””โ”€โ”€ Need maximum performance at scale?
                โ”œโ”€โ”€ Yes โ†’ Milvus or Qdrant
                โ””โ”€โ”€ No
                    โ””โ”€โ”€ Prototyping/development?
                        โ”œโ”€โ”€ Yes โ†’ Chroma
                        โ””โ”€โ”€ No โ†’ Qdrant (balanced choice)

ANN Algorithms

Algorithm Overview

Exact KNN:
โ€ข Search ALL vectors
โ€ข O(n) time complexity
โ€ข Perfect accuracy
โ€ข Impractical at scale

Approximate NN (ANN):
โ€ข Search SUBSET of vectors
โ€ข O(log n) to O(1) complexity
โ€ข Near-perfect accuracy
โ€ข Practical at any scale

HNSW (Hierarchical Navigable Small World)

Layer 3: โ—‹โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ—‹  (sparse, long connections)
          โ”‚                       โ”‚
Layer 2: โ—‹โ”€โ”€โ”€โ—‹โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ—‹โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ—‹โ”€โ”€โ”€โ—‹  (medium density)
          โ”‚   โ”‚       โ”‚       โ”‚   โ”‚
Layer 1: โ—‹โ”€โ—‹โ”€โ—‹โ”€โ—‹โ”€โ—‹โ”€โ—‹โ”€โ—‹โ”€โ—‹โ”€โ—‹โ”€โ—‹โ”€โ—‹โ”€โ—‹โ”€โ—‹  (denser)
          โ”‚โ”‚โ”‚โ”‚โ”‚โ”‚โ”‚โ”‚โ”‚โ”‚โ”‚โ”‚โ”‚โ”‚โ”‚โ”‚โ”‚โ”‚โ”‚โ”‚โ”‚โ”‚โ”‚
Layer 0: โ—‹โ—‹โ—‹โ—‹โ—‹โ—‹โ—‹โ—‹โ—‹โ—‹โ—‹โ—‹โ—‹โ—‹โ—‹โ—‹โ—‹โ—‹โ—‹โ—‹โ—‹โ—‹โ—‹โ—‹โ—‹  (all vectors)

Search: Start at top layer, greedily descend
โ€ข Fast: O(log n) search time
โ€ข High recall: >95% typically
โ€ข Memory: Extra graph storage

HNSW Parameters:

ParameterDescriptionTrade-off
MConnections per nodeMemory vs. recall
ef_constructionBuild-time search widthBuild time vs. recall
ef_searchQuery-time search widthLatency vs. recall

IVF (Inverted File Index)

Clustering Phase:
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚     Cluster vectors into K centroids    โ”‚
โ”‚                                         โ”‚
โ”‚    โ—         โ—         โ—         โ—     โ”‚
โ”‚   /โ”‚\       /โ”‚\       /โ”‚\       /โ”‚\    โ”‚
โ”‚  โ—‹โ—‹โ—‹โ—‹โ—‹     โ—‹โ—‹โ—‹โ—‹โ—‹     โ—‹โ—‹โ—‹โ—‹โ—‹     โ—‹โ—‹โ—‹โ—‹โ—‹   โ”‚
โ”‚ Cluster 1  Cluster 2 Cluster 3 Cluster 4โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Search Phase:
1. Find nprobe nearest centroids
2. Search only those clusters
3. Much faster than exhaustive

IVF Parameters:

ParameterDescriptionTrade-off
nlistNumber of clustersBuild time vs. search quality
nprobeClusters to searchLatency vs. recall

IVF-PQ (Product Quantization)

Original Vector (128 dim):
[0.1, 0.2, ..., 0.9]  (128 ร— 4 bytes = 512 bytes)

PQ Compressed (8 subvectors, 8-bit codes):
[23, 45, 12, 89, 56, 34, 78, 90]  (8 bytes)

Memory reduction: 64x
Accuracy trade-off: ~5% recall drop

Algorithm Comparison

AlgorithmSearch SpeedMemoryBuild TimeRecall
Flat/BruteSlow (O(n))LowNone100%
IVFFastLowMedium90-95%
IVF-PQVery fastVery lowMedium85-92%
HNSWVery fastHighSlow95-99%
HNSW+PQVery fastMediumSlow90-95%

When to Use Which

< 100K vectors:
โ””โ”€โ”€ Flat index (exact search is fast enough)

100K - 1M vectors:
โ””โ”€โ”€ HNSW (best recall/speed trade-off)

1M - 100M vectors:
โ”œโ”€โ”€ Memory available โ†’ HNSW
โ””โ”€โ”€ Memory constrained โ†’ IVF-PQ or HNSW+PQ

> 100M vectors:
โ””โ”€โ”€ Sharded IVF-PQ or distributed HNSW

Distance Metrics

Common Metrics

MetricFormulaRangeBest For
Cosine SimilarityAยทB / (||A|| ||B||)[-1, 1]Normalized embeddings
Dot ProductAยทB(-โˆž, โˆž)When magnitude matters
Euclidean (L2)โˆšฮฃ(A-B)ยฒ[0, โˆž)Absolute distances
Manhattan (L1)ฮฃ|A-B|[0, โˆž)High-dimensional sparse

Metric Selection

Embeddings pre-normalized (unit vectors)?
โ”œโ”€โ”€ Yes โ†’ Cosine = Dot Product (use Dot, faster)
โ””โ”€โ”€ No
    โ””โ”€โ”€ Magnitude meaningful?
        โ”œโ”€โ”€ Yes โ†’ Dot Product
        โ””โ”€โ”€ No โ†’ Cosine Similarity

Note: Most embedding models output normalized vectors
      โ†’ Dot product is usually the best choice

Filtering and Hybrid Search

Pre-filtering vs Post-filtering

Pre-filtering (Filter โ†’ Search):
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ 1. Apply metadata filter               โ”‚
โ”‚    (category = "electronics")           โ”‚
โ”‚    Result: 10K of 1M vectors           โ”‚
โ”‚                                         โ”‚
โ”‚ 2. Vector search on 10K vectors        โ”‚
โ”‚    Much faster, guaranteed filter match โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Post-filtering (Search โ†’ Filter):
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ 1. Vector search on 1M vectors         โ”‚
โ”‚    Return top-1000                      โ”‚
โ”‚                                         โ”‚
โ”‚ 2. Apply metadata filter               โ”‚
โ”‚    May return < K results!             โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Hybrid Search Architecture

Query: "wireless headphones under $100"
           โ”‚
     โ”Œโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”
     โ–ผ           โ–ผ
 โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
 โ”‚Vector โ”‚  โ”‚Filter โ”‚
 โ”‚Search โ”‚  โ”‚ Build โ”‚
 โ”‚"wire- โ”‚  โ”‚price  โ”‚
 โ”‚less   โ”‚  โ”‚< 100  โ”‚
 โ”‚head-  โ”‚  โ”‚       โ”‚
 โ”‚phones"โ”‚  โ”‚       โ”‚
 โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
     โ”‚           โ”‚
     โ””โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”˜
           โ–ผ
    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
    โ”‚  Combine  โ”‚
    โ”‚  Results  โ”‚
    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Metadata Index Design

Metadata TypeIndex StrategyQuery Example
CategoricalBitmap/hash indexcategory = "books"
Numeric rangeB-treeprice BETWEEN 10 AND 50
Keyword searchInverted indextags CONTAINS "sale"
GeospatialR-tree/geohashlocation NEAR (lat, lng)

Scaling Strategies

Sharding Approaches

Naive Sharding (by ID):
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Shard 1 โ”‚ โ”‚ Shard 2 โ”‚ โ”‚ Shard 3 โ”‚
โ”‚ IDs 0-N โ”‚ โ”‚IDs N-2N โ”‚ โ”‚IDs 2N-3Nโ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
Query โ†’ Search ALL shards โ†’ Merge results

Semantic Sharding (by cluster):
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Shard 1 โ”‚ โ”‚ Shard 2 โ”‚ โ”‚ Shard 3 โ”‚
โ”‚ Tech    โ”‚ โ”‚ Health  โ”‚ โ”‚ Finance โ”‚
โ”‚ docs    โ”‚ โ”‚ docs    โ”‚ โ”‚ docs    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
Query โ†’ Route to relevant shard(s) โ†’ Faster!

Replication

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚              Load Balancer              โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚           โ”‚           โ”‚
         โ–ผ           โ–ผ           โ–ผ
    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
    โ”‚Replica 1โ”‚ โ”‚Replica 2โ”‚ โ”‚Replica 3โ”‚
    โ”‚  (Read) โ”‚ โ”‚  (Read) โ”‚ โ”‚  (Read) โ”‚
    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚           โ”‚           โ”‚
         โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                     โ”‚
                โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                โ”‚ Primary โ”‚
                โ”‚ (Write) โ”‚
                โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Scaling Decision Matrix

Scale (vectors)ArchitectureReplication
< 1MSingle nodeOptional
1-10MSingle node, more RAMFor HA
10-100MSharded, few nodesRequired
100M-1BSharded, many nodesRequired
> 1BSharded + tieredRequired

Performance Optimization

Index Build Optimization

OptimizationDescriptionImpact
Batch insertionInsert in batches of 1K-10K10x faster
Parallel buildMulti-threaded index construction2-4x faster
Incremental indexAdd to existing indexAvoids rebuild
GPU accelerationUse GPU for training (IVF)10-100x faster

Query Optimization

OptimizationDescriptionImpact
Warm cacheKeep index in memory10x latency reduction
Query batchingBatch similar queriesHigher throughput
Reduce dimensionsPCA, random projection2-4x faster
Early terminationStop when "good enough"Lower latency

Memory Optimization

Memory per vector:
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ 1536 dims ร— 4 bytes = 6KB per vector   โ”‚
โ”‚                                        โ”‚
โ”‚ 1M vectors:                            โ”‚
โ”‚   Raw: 6GB                             โ”‚
โ”‚   + HNSW graph: +2-4GB (M-dependent)   โ”‚
โ”‚   = 8-10GB total                       โ”‚
โ”‚                                        โ”‚
โ”‚ With PQ (64 subquantizers):            โ”‚
โ”‚   1M vectors: ~64MB                    โ”‚
โ”‚   = 100x reduction                     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Operational Considerations

Backup and Recovery

StrategyDescriptionRPO/RTO
SnapshotsPeriodic full backupHours
WAL replicationWrite-ahead log streamingMinutes
Real-time syncSynchronous replicationSeconds

Monitoring Metrics

MetricDescriptionAlert Threshold
Query latency p9999th percentile latency> 100ms
RecallSearch accuracy< 90%
QPSQueries per secondCapacity dependent
Memory usageIndex memory> 80%
Index freshnessTime since last updateDomain dependent

Index Maintenance

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚        Index Maintenance Tasks          โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ โ€ข Compaction: Merge small segments      โ”‚
โ”‚ โ€ข Reindex: Rebuild degraded index       โ”‚
โ”‚ โ€ข Vacuum: Remove deleted vectors        โ”‚
โ”‚ โ€ข Optimize: Tune parameters             โ”‚
โ”‚                                         โ”‚
โ”‚ Schedule during low-traffic periods     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Common Patterns

Multi-Tenant Vector Search

Option 1: Namespace/Collection per tenant
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ tenant_1_collection                     โ”‚
โ”‚ tenant_2_collection                     โ”‚
โ”‚ tenant_3_collection                     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
Pro: Complete isolation
Con: Many indexes, operational overhead

Option 2: Single collection + tenant filter
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ shared_collection                       โ”‚
โ”‚   metadata: { tenant_id: "..." }        โ”‚
โ”‚   Pre-filter by tenant_id               โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
Pro: Simpler operations
Con: Requires efficient filtering

Real-Time Updates

Write Path:
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   Write     โ”‚    โ”‚   Buffer    โ”‚    โ”‚   Merge     โ”‚
โ”‚   Request   โ”‚โ”€โ”€โ”€โ–ถโ”‚   (Memory)  โ”‚โ”€โ”€โ”€โ–ถโ”‚   to Index  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Strategy:
1. Buffer writes in memory
2. Periodically merge to main index
3. Search: main index + buffer
4. Compact periodically

Embedding Versioning

Version 1 embeddings โ”€โ”€โ”
                       โ”‚
Version 2 embeddings โ”€โ”€โ”ผโ”€โ”€โ–ถ Parallel indexes during migration
                       โ”‚
                       โ”‚    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                       โ””โ”€โ”€โ”€โ–ถโ”‚ Gradual reindexing  โ”‚
                            โ”‚ Blue-green switch   โ”‚
                            โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Cost Estimation

Storage Costs

Cost = (vectors ร— dimensions ร— bytes ร— replication) / GB ร— $/GB/month

Example:
10M vectors ร— 1536 dims ร— 4 bytes ร— 3 replicas = 184 GB
At $0.10/GB/month = $18.40/month storage

Note: Memory (for serving) costs more than storage

Compute Costs

Factors:
โ€ข QPS (queries per second)
โ€ข Latency requirements
โ€ข Index type (HNSW needs more RAM)
โ€ข Filtering complexity

Rule of thumb:
โ€ข 1M vectors, HNSW, <50ms latency: 16GB RAM
โ€ข 10M vectors, HNSW, <50ms latency: 64-128GB RAM
โ€ข 100M vectors: Distributed system required

Related Skills

  • rag-architecture - Using vector databases in RAG systems
  • llm-serving-patterns - LLM inference with vector retrieval
  • ml-system-design - End-to-end ML pipeline design
  • estimation-techniques - Capacity planning for vector systems

Version History

  • v1.0.0 (2025-12-26): Initial release - Vector database patterns for systems design

Last Updated

Date: 2025-12-26

Repository

melodic-software
melodic-software
Author
melodic-software/claude-code-plugins/plugins/systems-design/skills/vector-databases
3
Stars
0
Forks
Updated4d ago
Added1w ago