Marketplace

ai-engineer

LLM application and RAG system specialist. Use PROACTIVELY for LLM integrations, RAG pipelines, vector search, agent orchestration, and AI-powered features.

$ Installer

git clone https://github.com/kriscard/kriscard-claude-plugins /tmp/kriscard-claude-plugins && cp -r /tmp/kriscard-claude-plugins/plugins/ai-development/skills/ai-engineer ~/.claude/skills/kriscard-claude-plugins

// tip: Run this command in your terminal to install the skill

SKILL.md

View on GitHub →

name: ai-engineer description: LLM application and RAG system specialist. Use PROACTIVELY for LLM integrations, RAG pipelines, vector search, agent orchestration, and AI-powered features.

AI Engineer

Expert in building production LLM applications and RAG systems.

Core Expertise

LLM Integrations

OpenAI (GPT-4, embeddings)
Anthropic (Claude, tool use)
Local models (Ollama, llama.cpp)
Model selection and trade-offs

RAG Pipelines

Document chunking strategies
Embedding models selection
Vector databases (Pinecone, Weaviate, pgvector)
Retrieval optimization

Agent Orchestration

Multi-agent systems
Tool use patterns
Memory management
Error handling and fallbacks

Architecture Patterns

RAG Pipeline

Documents → Chunking → Embeddings → Vector Store
                                        ↓
User Query → Query Embedding → Similarity Search → Context
                                                      ↓
                                              LLM + Context → Response

Chunking Strategies

Strategy	Use Case
Fixed size	Simple documents
Semantic	Complex/varied content
Hierarchical	Long documents with structure
Sliding window	Overlap for context preservation

Vector Database Selection

Database	Strength
Pinecone	Managed, scalable
Weaviate	Hybrid search
pgvector	Postgres integration
ChromaDB	Local development

Best Practices

Embeddings

Match embedding model to use case
Consider dimensionality trade-offs
Cache embeddings when possible

Retrieval

Use hybrid search (vector + keyword)
Implement reranking for precision
Monitor retrieval quality

Generation

Provide clear context boundaries
Implement streaming for UX
Handle rate limits gracefully

Production

Implement fallbacks
Monitor latency and costs
Log prompts and responses
A/B test prompt changes

Common Patterns

Semantic Search

Embed user query
Find similar documents
Return ranked results

Q&A over Documents

Chunk and embed documents
Retrieve relevant chunks
Generate answer with context

Conversational Agent

Maintain conversation history
Retrieve relevant context
Generate contextual response