Soumettre un Skill

🤖

LLM & Agents

6763 skills in Data & AI > LLM & Agents

gptq

Post-training 4-bit quantization for LLMs with minimal accuracy loss. Use for deploying large models (70B, 405B) on consumer GPUs, when you need 4× memory reduction with <2% perplexity degradation, or for faster inference (3-4× speedup) vs FP16. Integrates with transformers and PEFT for QLoRA fine-tuning.

zechenzhangAGI/AI-research-SKILLs

Mis à jour 6d ago

chroma

Open-source embedding database for AI applications. Store embeddings and metadata, perform vector and full-text search, filter by metadata. Simple 4-function API. Scales from notebooks to production clusters. Use for semantic search, RAG applications, or document retrieval. Best for local development and open-source projects.

zechenzhangAGI/AI-research-SKILLs

Mis à jour 6d ago

pinecone

Managed vector database for production AI applications. Fully managed, auto-scaling, with hybrid search (dense + sparse), metadata filtering, and namespaces. Low latency (<100ms p95). Use for production RAG, recommendation systems, or semantic search at scale. Best for serverless, managed infrastructure.

zechenzhangAGI/AI-research-SKILLs

Mis à jour 6d ago

clean-code

Pragmatic coding standards - concise, direct, no over-engineering, no unnecessary comments

xenitV1/claude-code-maestro

Mis à jour 6d ago

memory-processor

Process file changes and update CLAUDE.md memory sections. Use when the memory-updater agent needs to analyze dirty files, update AUTO-MANAGED sections, verify content removal, or detect stale commands. Invoked after file edits to keep project memory in sync.

severity1/claude-code-auto-memory

Mis à jour 6d ago

nemo-guardrails

NVIDIA's runtime safety framework for LLM applications. Features jailbreak detection, input/output validation, fact-checking, hallucination detection, PII filtering, toxicity detection. Uses Colang 2.0 DSL for programmable rails. Production-ready, runs on T4 GPU.

zechenzhangAGI/AI-research-SKILLs

Mis à jour 6d ago

serving-llms-vllm

Serves LLMs with high throughput using vLLM's PagedAttention and continuous batching. Use when deploying production LLM APIs, optimizing inference latency/throughput, or serving models with limited GPU memory. Supports OpenAI-compatible endpoints, quantization (GPTQ/AWQ/FP8), and tensor parallelism.

zechenzhangAGI/AI-research-SKILLs

Mis à jour 6d ago

constitutional-ai

Anthropic's method for training harmless AI through self-improvement. Two-phase approach - supervised learning with self-critique/revision, then RLAIF (RL from AI Feedback). Use for safety alignment, reducing harmful outputs without human labels. Powers Claude's safety system.

zechenzhangAGI/AI-research-SKILLs

Mis à jour 6d ago

llama-factory

Expert guidance for fine-tuning LLMs with LLaMA-Factory - WebUI no-code, 100+ models, 2/3/4/5/6/8-bit QLoRA, multimodal support

zechenzhangAGI/AI-research-SKILLs

Mis à jour 6d ago

sglang

Fast structured generation and serving for LLMs with RadixAttention prefix caching. Use for JSON/regex outputs, constrained decoding, agentic workflows with tool calls, or when you need 5× faster inference than vLLM with prefix sharing. Powers 300,000+ GPUs at xAI, AMD, NVIDIA, and LinkedIn.

zechenzhangAGI/AI-research-SKILLs

Mis à jour 6d ago

testing-skills-with-subagents

Use when creating or editing skills, before deployment, to verify they work under pressure and resist rationalization - applies RED-GREEN-REFACTOR cycle to process documentation by running baseline without skill, writing to address failures, iterating to close loopholes

mneves75/dnschat

Mis à jour 6d ago

llava

Large Language and Vision Assistant. Enables visual instruction tuning and image-based conversations. Combines CLIP vision encoder with Vicuna/LLaMA language models. Supports multi-turn image chat, visual question answering, and instruction following. Use for vision-language chatbots or image understanding tasks. Best for conversational image analysis.

zechenzhangAGI/AI-research-SKILLs

Mis à jour 6d ago

quantizing-models-bitsandbytes

Quantizes LLMs to 8-bit or 4-bit for 50-75% memory reduction with minimal accuracy loss. Use when GPU memory is limited, need to fit larger models, or want faster inference. Supports INT8, NF4, FP4 formats, QLoRA training, and 8-bit optimizers. Works with HuggingFace Transformers.

zechenzhangAGI/AI-research-SKILLs

Mis à jour 6d ago

dispatching-parallel-agents

Use when facing 3+ independent failures that can be investigated without shared state or dependencies - dispatches multiple Claude agents to investigate and fix independent problems concurrently

mneves75/dnschat

Mis à jour 6d ago

app-builder

Main application building orchestrator. Creates full-stack applications from natural language requests. Determines project type, selects tech stack, coordinates agents. Use for creating new applications, scaffolding projects, or building features from scratch.

xenitV1/claude-code-maestro

Mis à jour 6d ago

unsloth

Expert guidance for fast fine-tuning with Unsloth - 2-5x faster training, 50-80% less memory, LoRA/QLoRA optimization

zechenzhangAGI/AI-research-SKILLs

Mis à jour 6d ago

geo-fundamentals

Generative Engine Optimization for AI search engines (ChatGPT, Claude, Perplexity).

xenitV1/claude-code-maestro

Mis à jour 6d ago

tdd-workflow

Test-Driven Development workflow principles. RED-GREEN-REFACTOR cycle.

xenitV1/claude-code-maestro

Mis à jour 6d ago

subagent-driven-development

Use when executing implementation plans with independent tasks in the current session - dispatches fresh subagent for each task with code review between tasks, enabling fast iteration with quality gates

mneves75/dnschat

Mis à jour 6d ago

clip

OpenAI's model connecting vision and language. Enables zero-shot image classification, image-text matching, and cross-modal retrieval. Trained on 400M image-text pairs. Use for image search, content moderation, or vision-language tasks without fine-tuning. Best for general-purpose image understanding.

zechenzhangAGI/AI-research-SKILLs

Mis à jour 6d ago