ai-rag
Complete RAG and search engineering skill. Covers chunking strategies, hybrid retrieval (BM25 + vector), cross-encoder reranking, query rewriting, ranking pipelines, nDCG/MRR evaluation, and production search systems. Modern patterns for retrieval-augmented generation and semantic search.
$ Installieren
git clone https://github.com/vasilyu1983/AI-Agents-public /tmp/AI-Agents-public && cp -r /tmp/AI-Agents-public/frameworks/claude-code-kit/framework/skills/ai-rag ~/.claude/skills/AI-Agents-public// tip: Run this command in your terminal to install the skill
name: ai-rag description: Complete RAG and search engineering skill. Covers chunking strategies, hybrid retrieval (BM25 + vector), cross-encoder reranking, query rewriting, ranking pipelines, nDCG/MRR evaluation, and production search systems. Modern patterns for retrieval-augmented generation and semantic search.
RAG & Search Engineering â Complete Reference
Build production-grade retrieval systems with modern hybrid search patterns.
This skill covers:
-
RAG: Chunking, contextual retrieval, grounding, adaptive/self-correcting systems
-
Search: BM25, vector search, hybrid fusion, ranking pipelines
-
Evaluation: recall@k, nDCG, MRR, groundedness metrics
-
Chunking strategies: Page-level chunking (0.648 accuracy, highest in NVIDIA benchmarks)
-
Contextual Retrieval: Anthropic's 2024 technique (67% accuracy improvement with prompt caching)
-
Hybrid retrieval: Lexical (BM25) + vector + cross-encoder reranking
-
Reranking: Cross-encoder (ms-marco-TinyBERT-L-2-v2, 4.3M params, outperforms larger models)
-
RAG evaluation: Recall@K, Precision@K, nDCG, groundedness, verbosity, instruction following
-
Modern paradigm shift: Adaptive, multimodal, self-correcting systems (static RAG is over)
Key Insights:
- Page-level chunking achieved highest accuracy (0.648) with lowest variance
- Contextual Retrieval reduces retrieval failures by 67% when combined with reranking
- Semantic chunking improves recall by up to 9% over simpler methods
- Hybrid retrieval + reranking drastically improves accuracy
- Era of static RAG is over - adaptive, wise retrieval is mainstream
It focuses on doing, not explaining theory.
Scope note: Retrieval algorithm tuning (BM25/HNSW/hybrid, query rewriting) lives in ai-rag; this skill covers RAG-specific packaging, context injection, and grounded generation.
Quick Reference
| Task | Tool/Framework | Command/Pattern | When to Use |
|---|---|---|---|
| Chunking | Page-level, Semantic | RecursiveCharacterTextSplitter (400-512) | 0.648 accuracy, 85-90% recall |
| Contextual Retrieval | Anthropic Claude | Generate chunk context + prompt caching | 67% failure reduction, $1.02/M tokens |
| Hybrid Retrieval | BM25 + Vector | LlamaIndex, LangChain, Haystack | Significant relevance benefits (modern standard) |
| Reranking | Cross-encoder | ms-marco-TinyBERT-L-2-v2 (4.3M params) | Drastically improves accuracy, <100ms |
| Vector Index | HNSW, IVF | FAISS, Pinecone, Qdrant, Weaviate | <10M: HNSW, >10M: IVF/ScaNN |
| Evaluation | RAGAS, TruLens | Recall@K, nDCG, groundedness metrics | Quality validation, A/B testing |
Decision Tree: RAG Architecture Selection
Building RAG system: [Architecture Path]
ââ Document type?
â ââ Page-structured? â Page-level chunking (0.648 accuracy, lowest variance)
â ââ Technical docs? â Semantic chunking (9% recall improvement)
â ââ Simple content? â RecursiveCharacterTextSplitter (400-512, 85-90% recall)
â
ââ Retrieval accuracy low?
â ââ Multi-entity docs? â Contextual Retrieval (67% failure reduction)
â ââ Noisy results? â Cross-encoder reranking (TinyBERT, <100ms)
â ââ Mixed queries? â Hybrid retrieval (BM25 + vector + reranking)
â
ââ Dataset size?
â ââ <100k chunks? â Flat index (exact search)
â ââ 100k-10M? â HNSW (low latency)
â ââ >10M? â IVF/ScaNN/DiskANN (scalable)
â
ââ Production quality?
ââ Full pipeline: Page-level + Contextual + Hybrid + Reranking â Optimal accuracy
When to Use This Skill
Claude should invoke this skill when the user asks:
- "Help me design a RAG pipeline."
- "How should I chunk this document?"
- "Optimize retrieval for my use case."
- "My RAG system is hallucinating â fix it."
- "Choose the right vector database / index type."
- "Create a RAG evaluation framework."
- "Debug why retrieval gives irrelevant results."
Related Skills
For adjacent topics, reference these skills:
- ai-llm - Prompting, fine-tuning, instruction datasets
- ai-llm - Agentic workflows, multi-agent systems, LLM orchestration
- ai-rag - BM25, hybrid search, ranking pipelines (complements RAG retrieval)
- ai-llm-inference - Serving performance, quantization, batching
- ai-mlops - Security, privacy, PII handling
- ai-mlops - Deployment, monitoring, data pipelines
- ai-prompt-engineering - Prompt patterns for RAG generation phase
Detailed Guides
Core RAG Architecture
- Pipeline Architecture - End-to-end RAG pipeline structure, ingestion, freshness, index hygiene, embedding selection
- Chunking Strategies - Modern benchmarks (page-level 0.648 accuracy, semantic, RecursiveCharacterTextSplitter 400-512)
- Index Selection Guide - Vector database configuration, HNSW/IVF/Flat selection, parameter tuning
Advanced Retrieval Techniques
- Retrieval Patterns - Dense retrieval, hybrid search, query preprocessing, reranking workflow, metadata filtering
- Contextual Retrieval Guide - Anthropic's 2024 technique (67% failure reduction), prompt caching, implementation
- Grounding Checklists - Context compression, hallucination control, citation patterns, answerability validation
Production & Evaluation
- RAG Evaluation Guide - Recall@K, nDCG, groundedness, RAGAS/TruLens, A/B testing, sliced evaluation
- Advanced RAG Patterns - Graph/multimodal RAG, online evaluation, telemetry, shadow/canary testing, adaptive retrieval
- RAG Troubleshooting - Failure mode triage, debugging irrelevant results, hallucination fixes
Existing Detailed Patterns
- Chunking Patterns - Technical implementation details for all chunking approaches
- Retrieval Patterns - Low-level retrieval implementation patterns
Templates
Chunking & Ingestion
Embedding & Indexing
Retrieval & Reranking
Context Packaging & Grounding
Evaluation
Navigation
Resources
- resources/rag-evaluation-guide.md
- resources/rag-troubleshooting.md
- resources/contextual-retrieval-guide.md
- resources/pipeline-architecture.md
- resources/advanced-rag-patterns.md
- resources/chunking-strategies.md
- resources/grounding-checklists.md
- resources/index-selection-guide.md
- resources/retrieval-patterns.md
- resources/chunking-patterns.md
Templates
- templates/context/template-context-packing.md
- templates/context/template-grounding.md
- templates/chunking/template-basic-chunking.md
- templates/chunking/template-code-chunking.md
- templates/chunking/template-long-doc-chunking.md
- templates/retrieval/template-retrieval-pipeline.md
- templates/retrieval/template-hybrid-search.md
- templates/retrieval/template-reranking.md
- templates/eval/template-rag-eval.md
- templates/eval/template-rag-testset.jsonl
- templates/indexing/template-index-config.md
- templates/indexing/template-metadata-schema.md
Data
- data/sources.json â Curated external references
External Resources
See data/sources.json for:
- Embedding models (OpenAI, Cohere, Sentence Transformers, Voyage AI, Jina)
- Vector DBs (FAISS, Pinecone, Qdrant, Weaviate, Milvus, Chroma, pgvector, LanceDB)
- Hybrid search libraries (Elasticsearch, OpenSearch, Typesense, Meilisearch)
- Reranking models (Cohere Rerank, Jina Reranker, RankGPT, Flashrank)
- Evaluation frameworks (RAGAS, TruLens, DeepEval, BEIR)
- RAG frameworks (LlamaIndex, LangChain, Haystack, txtai)
- Advanced techniques (RAG Fusion, CRAG, Self-RAG, Contextual Retrieval)
- Production platforms (Vectara, AWS Kendra)
Use this skill whenever the user needs retrieval-augmented system design or debugging, not prompt work or deployment.
Repository
