🤖

LLM & Agents

6763 skills in Data & AI > LLM & Agents

writing-skills

Use when creating new skills, editing existing skills, or verifying skills work before deployment - applies TDD to process documentation by testing with subagents before writing, iterating until bulletproof against rationalization

mneves75/dnschat

更新日 6d ago

parallel-agents

Native multi-agent orchestration using Claude Code's Agent Tool. Use when multiple independent tasks can run with different domain expertise or when comprehensive analysis requires multiple perspectives.

xenitV1/claude-code-maestro

更新日 6d ago

gptq

Post-training 4-bit quantization for LLMs with minimal accuracy loss. Use for deploying large models (70B, 405B) on consumer GPUs, when you need 4× memory reduction with <2% perplexity degradation, or for faster inference (3-4× speedup) vs FP16. Integrates with transformers and PEFT for QLoRA fine-tuning.

zechenzhangAGI/AI-research-SKILLs

更新日 6d ago

chroma

Open-source embedding database for AI applications. Store embeddings and metadata, perform vector and full-text search, filter by metadata. Simple 4-function API. Scales from notebooks to production clusters. Use for semantic search, RAG applications, or document retrieval. Best for local development and open-source projects.

zechenzhangAGI/AI-research-SKILLs

更新日 6d ago

pinecone

Managed vector database for production AI applications. Fully managed, auto-scaling, with hybrid search (dense + sparse), metadata filtering, and namespaces. Low latency (<100ms p95). Use for production RAG, recommendation systems, or semantic search at scale. Best for serverless, managed infrastructure.

zechenzhangAGI/AI-research-SKILLs

更新日 6d ago

clean-code

Pragmatic coding standards - concise, direct, no over-engineering, no unnecessary comments

xenitV1/claude-code-maestro

更新日 6d ago

memory-processor

Process file changes and update CLAUDE.md memory sections. Use when the memory-updater agent needs to analyze dirty files, update AUTO-MANAGED sections, verify content removal, or detect stale commands. Invoked after file edits to keep project memory in sync.

severity1/claude-code-auto-memory

更新日 6d ago

nemo-guardrails

NVIDIA's runtime safety framework for LLM applications. Features jailbreak detection, input/output validation, fact-checking, hallucination detection, PII filtering, toxicity detection. Uses Colang 2.0 DSL for programmable rails. Production-ready, runs on T4 GPU.

zechenzhangAGI/AI-research-SKILLs

更新日 6d ago

serving-llms-vllm

Serves LLMs with high throughput using vLLM's PagedAttention and continuous batching. Use when deploying production LLM APIs, optimizing inference latency/throughput, or serving models with limited GPU memory. Supports OpenAI-compatible endpoints, quantization (GPTQ/AWQ/FP8), and tensor parallelism.

zechenzhangAGI/AI-research-SKILLs

更新日 6d ago

openrlhf-training

High-performance RLHF framework with Ray+vLLM acceleration. Use for PPO, GRPO, RLOO, DPO training of large models (7B-70B+). Built on Ray, vLLM, ZeRO-3. 2× faster than DeepSpeedChat with distributed architecture and GPU resource sharing.

zechenzhangAGI/AI-research-SKILLs

更新日 6d ago

codebase-analyzer

This skill should be used when the user asks to "initialize auto-memory", "create CLAUDE.md", "set up project memory", or runs the /auto-memory:init command. Analyzes codebase structure and generates CLAUDE.md files using the exact template format with AUTO-MANAGED markers.

severity1/claude-code-auto-memory

更新日 6d ago

llamaindex

Data framework for building LLM applications with RAG. Specializes in document ingestion (300+ connectors), indexing, and querying. Features vector indices, query engines, agents, and multi-modal support. Use for document Q&A, chatbots, knowledge retrieval, or building RAG pipelines. Best for data-centric LLM applications.

zechenzhangAGI/AI-research-SKILLs

更新日 6d ago

constitutional-ai

Anthropic's method for training harmless AI through self-improvement. Two-phase approach - supervised learning with self-critique/revision, then RLAIF (RL from AI Feedback). Use for safety alignment, reducing harmful outputs without human labels. Powers Claude's safety system.

zechenzhangAGI/AI-research-SKILLs

更新日 6d ago

llama-factory

Expert guidance for fine-tuning LLMs with LLaMA-Factory - WebUI no-code, 100+ models, 2/3/4/5/6/8-bit QLoRA, multimodal support

zechenzhangAGI/AI-research-SKILLs

更新日 6d ago

sglang

Fast structured generation and serving for LLMs with RadixAttention prefix caching. Use for JSON/regex outputs, constrained decoding, agentic workflows with tool calls, or when you need 5× faster inference than vLLM with prefix sharing. Powers 300,000+ GPUs at xAI, AMD, NVIDIA, and LinkedIn.

zechenzhangAGI/AI-research-SKILLs

更新日 6d ago

testing-skills-with-subagents

Use when creating or editing skills, before deployment, to verify they work under pressure and resist rationalization - applies RED-GREEN-REFACTOR cycle to process documentation by running baseline without skill, writing to address failures, iterating to close loopholes

mneves75/dnschat

更新日 6d ago

llava

Large Language and Vision Assistant. Enables visual instruction tuning and image-based conversations. Combines CLIP vision encoder with Vicuna/LLaMA language models. Supports multi-turn image chat, visual question answering, and instruction following. Use for vision-language chatbots or image understanding tasks. Best for conversational image analysis.

zechenzhangAGI/AI-research-SKILLs

更新日 6d ago

unsloth

Expert guidance for fast fine-tuning with Unsloth - 2-5x faster training, 50-80% less memory, LoRA/QLoRA optimization

zechenzhangAGI/AI-research-SKILLs

更新日 6d ago

quantizing-models-bitsandbytes

Quantizes LLMs to 8-bit or 4-bit for 50-75% memory reduction with minimal accuracy loss. Use when GPU memory is limited, need to fit larger models, or want faster inference. Supports INT8, NF4, FP4 formats, QLoRA training, and 8-bit optimizers. Works with HuggingFace Transformers.

zechenzhangAGI/AI-research-SKILLs

更新日 6d ago

dispatching-parallel-agents

Use when facing 3+ independent failures that can be investigated without shared state or dependencies - dispatches multiple Claude agents to investigate and fix independent problems concurrently

mneves75/dnschat

更新日 6d ago