自然語言處理
1693 skills in 數據與 AI > 自然語言處理
writing-plans
Use when design is complete and you need detailed implementation tasks for engineers with zero codebase context - creates comprehensive implementation plans with exact file paths, complete code examples, and verification steps assuming engineer has minimal domain knowledge
nemo-curator
GPU-accelerated data curation for LLM training. Supports text/image/video/audio. Features fuzzy deduplication (16× faster), quality filtering (30+ heuristics), semantic deduplication, PII redaction, NSFW detection. Scales across GPUs with RAPIDS. Use for preparing high-quality training datasets, cleaning web data, or deduplicating large corpora.
llamaguard
Meta's 7-8B specialized moderation model for LLM input/output filtering. 6 safety categories - violence/hate, sexual content, weapons, substances, self-harm, criminal planning. 94-95% accuracy. Deploy with vLLM, HuggingFace, Sagemaker. Integrates with NeMo Guardrails.
sentencepiece
Language-independent tokenizer treating text as raw Unicode. Supports BPE and Unigram algorithms. Fast (50k sentences/sec), lightweight (6MB memory), deterministic vocabulary. Used by T5, ALBERT, XLNet, mBART. Train on raw text without pre-tokenization. Use when you need multilingual support, CJK languages, or reproducible tokenization.
mcp-builder
MCP (Model Context Protocol) server building principles. Tool design, resource patterns, best practices.
fine-tuning-with-trl
Fine-tune LLMs using reinforcement learning with TRL - SFT for instruction tuning, DPO for preference alignment, PPO/GRPO for reward optimization, and reward model training. Use when need RLHF, align model with preferences, or train from human feedback. Works with HuggingFace Transformers.
sentence-transformers
Framework for state-of-the-art sentence, text, and image embeddings. Provides 5000+ pre-trained models for semantic similarity, clustering, and retrieval. Supports multilingual, domain-specific, and multimodal models. Use for generating embeddings for RAG, semantic search, or similarity tasks. Best for production embedding generation.
whisper
OpenAI's general-purpose speech recognition model. Supports 99 languages, transcription, translation to English, and language identification. Six model sizes from tiny (39M params) to large (1550M params). Use for speech-to-text, podcast transcription, or multilingual audio processing. Best for robust, multilingual ASR.
ppage
Captures session learnings, decisions, and context to a markdown file for future agent ramp-up. Use when user says "ppage", "page context", "save context", "capture learnings", or before ending a substantial work session.
oracle-codex
This skill should be used when the user asks to "use Codex", "ask Codex", "consult Codex", "Codex review", "use GPT for planning", "ask GPT to review", "get GPT's opinion", "what does GPT think", "second opinion on code", "consult the oracle", "ask the oracle", or mentions using an AI oracle for planning or code review. NOT for implementation tasks.
mcpc
Use mcpc CLI to interact with MCP servers - call tools, read resources, get prompts. Use this when working with Model Context Protocol servers, calling MCP tools, or accessing MCP resources programmatically.
add-route-context
为Flutter页面添加路由上下文记录功能,支持日期等参数的AI上下文识别。当需要让AI助手通过"询问当前上下文"功能获取页面状态(如日期、ID等参数)时使用。适用场景:(1) 日期驱动的页面(日记、活动、日历等),(2) ID驱动的页面(用户详情、订单详情等),(3) 任何需要AI理解当前页面参数的场景
ecto-thinking
This skill should be used when the user asks to "add a database table", "create a new context", "query the database", "add a field to a schema", "validate form input", "fix N+1 queries", "preload this association", "separate these concerns", or mentions Repo, changesets, migrations, Ecto.Multi, has_many, belongs_to, transactions, query composition, or how contexts should talk to each other.
Fact Check
This skill should be used when the user asks to "verify claims", "fact check", "validate documentation", "check sources", or needs verification of external source references. Provides patterns for systematic fact verification using Context7 and WebSearch.
ai-pattern-detection
Detects AI-generated writing patterns and suggests authentic alternatives. Auto-applies when reviewing content, editing documents, generating text, or when user mentions writing quality, AI detection, authenticity, or natural voice. Use when relevant to the task.
ai-pattern-detection
Detects AI-generated writing patterns and suggests authentic alternatives. Auto-applies when reviewing content, editing documents, generating text, or when user mentions writing quality, AI detection, authenticity, or natural voice.
mcp-builder
Build Model Context Protocol (MCP) servers and tools that extend Claude's capabilities with custom functions, data sources, and integrations. Use when creating custom MCP servers, implementing tools for Claude, building integrations with external services, creating data source connectors, implementing custom functions, or extending Claude's capabilities with domain-specific tools.
ui-first-builder
Creates production-ready UI immediately from any description. Generates complete pages, components, and realistic mock data in FIRST response. Uses Next.js 14 + Tailwind + shadcn/ui. Never asks questions - infers everything from context. Triggers: UI creation, page building, component generation, build interface, screen design, layout requests.
smart-routing
Intelligent request routing for /toh command. Analyzes user intent, assesses confidence, detects IDE environment, and routes to the appropriate agent(s). Memory-first approach ensures context awareness. Triggers: /toh command, natural language requests, ambiguous inputs.
design-mastery
World-class design system with extensible business type registry. Automatically selects appropriate design patterns based on business context. Anti-AI detection, trend-aware, production-ready design decisions. CRITICAL: Must be read before any UI creation task.