LLM & Agents
6763 skills in Data & AI > LLM & Agents
llm-evaluation
Implement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking. Use when testing LLM performance, measuring AI application quality, or establishing evaluation frameworks.
langchain-patterns
LangChain implementation patterns with templates, scripts, and examples for RAG pipelines
skill-manager
Manage your installed Claude Code skills - install, update, rename, uninstall, and list skills from GitHub URLs. Use when the user wants to install a skill, update a skill, list installed skills, rename a skill, remove/delete/uninstall a skill, or provides a GitHub URL to a skills directory.
rl-environments
Gym/gymnasium API - custom environments, spaces, wrappers, vectorization, debugging
ui-design-agent
Creates UI/UX design patterns, components, and user experience guidelines
ds-star
Multi-agent data science framework using DS-STAR (Data Science - Structured Thought and Action) architecture. Automates data analysis through collaborative AI agents with multi-model support (Haiku, Sonnet, Opus). Use for exploratory data analysis, automated insights, and iterative data science workflows.
tauri-ipc-developer
Specialized agent for implementing type-safe IPC communication between React frontend and Rust backend in Tauri v2 applications. Use when adding new Tauri commands, implementing bidirectional events, debugging IPC serialization issues, or optimizing command performance.
goap-agent
Invoke for complex multi-step tasks requiring intelligent planning and multi-agent coordination. Use when tasks need decomposition, dependency mapping, parallel/sequential/swarm/iterative execution strategies, or coordination of multiple specialized agents with quality gates and dynamic optimization.
nexus-prompt-engineer
4-D prompt engineering assistant that transforms vague requirements into high-precision prompts through guided interaction. Trigger when users need to: (1) craft high-quality system prompts, (2) optimize existing prompts, (3) use '/fast' for quick generation or '/audit' for prompt review. Applicable to any scenario requiring carefully designed prompts.
create-new-skills
Creates new Agent Skills for Claude Code following best practices and documentation. Use when the user wants to create a new skill, extend Claude's capabilities, or package domain expertise into a reusable skill.
reviewing-with-claude
現在のClaudeセッション内でクイックレビューを実施します。コンテキストを保持したまま即座にコード品質、セキュリティ、パフォーマンスを評価します。
context-store
Context Store - Document management system for storing, querying, and retrieving documents across Claude Code sessions. Use this to maintain knowledge bases, share documents between agents. Whenever you encounter a <document id=*> in a session, use this skill to retrieve its content.
langgraph
Expert guidance for building stateful, multi-actor AI agents with LangGraph - graphs, nodes, edges, state management, and agent architectures.
pont-de-londres
Pattern d'intégration pour relier un graphe de domaine (structuré, issu d'un CSV) à un graphe lexical (extrait automatiquement de documents non-structurés via LLM). Utiliser cette skill lorsque Claude doit construire un Knowledge Graph hybride combinant données structurées et extraction automatique, notamment avec neo4j-graphrag et SimpleKGPipeline. Cas d'usage: GraphRAG, ingestion de PDFs avec métadonnées, construction de Knowledge Graphs à partir de sources hétérogènes.
observability
Real-time monitoring dashboard for PAI multi-agent activity. USE WHEN user says 'start observability', 'stop dashboard', 'restart observability', 'monitor agents', 'show agent activity', or needs to debug multi-agent workflows.
revision-agent
Specialized agent for systematic prose revision using 3-column method and house-rulebook enforcement. Reviews structure, style, and mechanics top-down. Use when user asks to "revise", "edit", "improve prose", or explicitly invokes revision agent.
ai-evaluation-suite
Comprehensive AI/LLM evaluation toolkit for production AI systems. Covers LLM output quality, prompt engineering, RAG evaluation, agent performance, hallucination detection, bias assessment, cost/token optimization, latency metrics, model comparison, and fine-tuning evaluation. Includes BLEU/ROUGE metrics, perplexity, F1 scores, LLM-as-judge patterns, and benchmarks like MMLU and HumanEval.
claude-skill-bash
Apply comprehensive bash scripting standards including main function pattern, usage documentation, argument parsing, dependency checking, and error handling. Triggers when creating/editing .sh files, bash scripts, or discussing shell scripting, deployment scripts, automation tasks, or bash conventions.
claude-mcp-expert
Expert on Model Context Protocol (MCP) integration, MCP servers, installation, configuration, and authentication. Triggers when user mentions MCP, MCP servers, installing MCP, connecting tools, MCP resources, MCP prompts, or remote/local MCP servers.
unify
Validate spec-implementation-test alignment and convergence. Checks spec completeness, implementation conformance, test coverage, and contract consistency. Use after implementation and tests are complete.