embeddings

Text embeddings for semantic search and similarity. Use when converting text to vectors, choosing embedding models, implementing chunking strategies, or building document similarity features.

$ インストール

git clone https://github.com/yonatangross/skillforge-claude-plugin /tmp/skillforge-claude-plugin && cp -r /tmp/skillforge-claude-plugin/.claude/skills/embeddings ~/.claude/skills/skillforge-claude-plugin

// tip: Run this command in your terminal to install the skill


name: embeddings description: Text embeddings for semantic search and similarity. Use when converting text to vectors, choosing embedding models, implementing chunking strategies, or building document similarity features. context: fork agent: data-pipeline-engineer

Embeddings

Convert text to dense vector representations for semantic search and similarity.

When to Use

  • Building semantic search systems
  • Document similarity comparison
  • RAG retrieval (see: rag-retrieval skill)
  • Clustering related content
  • Duplicate detection

Quick Reference

from openai import OpenAI

client = OpenAI()

# Single text embedding
response = client.embeddings.create(
    model="text-embedding-3-small",
    input="Your text here"
)
vector = response.data[0].embedding  # 1536 dimensions
# Batch embedding (efficient)
texts = ["text1", "text2", "text3"]
response = client.embeddings.create(
    model="text-embedding-3-small",
    input=texts
)
vectors = [item.embedding for item in response.data]

Model Selection

ModelDimsCostUse Case
text-embedding-3-small1536$0.02/1MGeneral purpose
text-embedding-3-large3072$0.13/1MHigh accuracy
nomic-embed-text (Ollama)768FreeLocal/CI

Chunking Strategy

def chunk_text(text: str, chunk_size: int = 512, overlap: int = 50) -> list[str]:
    """Split text into overlapping chunks for embedding."""
    words = text.split()
    chunks = []

    for i in range(0, len(words), chunk_size - overlap):
        chunk = " ".join(words[i:i + chunk_size])
        if chunk:
            chunks.append(chunk)

    return chunks

Guidelines:

  • Chunk size: 256-1024 tokens (512 typical)
  • Overlap: 10-20% for context continuity
  • Include metadata (title, source) with chunks

Similarity Calculation

import numpy as np

def cosine_similarity(a: list[float], b: list[float]) -> float:
    """Calculate cosine similarity between two vectors."""
    a, b = np.array(a), np.array(b)
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

# Usage
similarity = cosine_similarity(vector1, vector2)
# 1.0 = identical, 0.0 = orthogonal, -1.0 = opposite

Key Decisions

  • Dimension reduction: Can truncate text-embedding-3-large to 1536 dims
  • Normalization: Most models return normalized vectors
  • Batch size: 100-500 texts per API call for efficiency

Common Mistakes

  • Embedding queries differently than documents
  • Not chunking long documents (context gets lost)
  • Using wrong similarity metric (cosine vs euclidean)
  • Re-embedding unchanged content (cache embeddings)

Advanced Patterns

See references/advanced-patterns.md for:

  • Late Chunking: Embed full document, extract chunk vectors from contextualized tokens
  • Batch API: Production batching with rate limiting and retry
  • Embedding Cache: Redis-based caching to avoid re-embedding
  • Matryoshka Embeddings: Dimension reduction with text-embedding-3

Related Skills

  • rag-retrieval - Using embeddings for RAG pipelines
  • hyde-retrieval - Hypothetical document embeddings for vocabulary mismatch
  • contextual-retrieval - Anthropic's context-prepending technique
  • reranking-patterns - Cross-encoder reranking for precision
  • ollama-local - Local embeddings with nomic-embed-text

Capability Details

text-to-vector

Keywords: embedding, text to vector, vectorize, embed text Solves:

  • Convert text to vector embeddings
  • Choose appropriate embedding models
  • Handle embedding API integration

semantic-search

Keywords: semantic search, vector search, similarity search, find similar Solves:

  • Implement semantic search over documents
  • Configure similarity thresholds
  • Rank results by relevance

chunking-strategies

Keywords: chunk, chunking, split, text splitting, overlap Solves:

  • Split documents into optimal chunks
  • Configure chunk size and overlap
  • Preserve semantic boundaries

batch-embedding

Keywords: batch, bulk embed, parallel embedding, batch processing Solves:

  • Embed large document collections efficiently
  • Handle rate limits and retries
  • Optimize embedding costs

local-embeddings

Keywords: local, ollama, self-hosted, on-premise, offline Solves:

  • Run embeddings locally with Ollama
  • Deploy self-hosted embedding models
  • Reduce API costs with local models