Deep Learning
671 skills in Data & AI > Deep Learning
performing-systematic-debugging-for-stubborn-problems
Applies a modified Fagan Inspection methodology to systematically resolve persistent bugs and complex issues. Use when multiple previous fix attempts have failed repeatedly, when dealing with intricate system interactions, or when a methodical root cause analysis is needed. Do not use for simple troubleshooting. Triggers after multiple failed debugging attempts on the same complex issue.
optimizing-attention-flash
Optimizes transformer attention with Flash Attention for 2-4x speedup and 10-20x memory reduction. Use when training/running transformers with long sequences (>512 tokens), encountering GPU memory issues with attention, or need faster inference. Supports PyTorch native SDPA, flash-attn library, H100 FP8, and sliding window attention.
deepspeed
Expert guidance for distributed training with DeepSpeed - ZeRO optimization stages, pipeline parallelism, FP16/BF16/FP8, 1-bit Adam, sparse attention
pytorch-fsdp
Expert guidance for Fully Sharded Data Parallel training with PyTorch FSDP - parameter sharding, mixed precision, CPU offloading, FSDP2
rwkv-architecture
RNN+Transformer hybrid with O(n) inference. Linear time, infinite context, no KV cache. Train like GPT (parallel), infer like RNN (sequential). Linux Foundation AI project. Production at Windows, Office, NeMo. RWKV-7 (March 2025). Models up to 14B parameters.
huggingface-accelerate
Simplest distributed training API. 4 lines to add distributed support to any PyTorch script. Unified API for DeepSpeed/FSDP/Megatron/DDP. Automatic device placement, mixed precision (FP16/BF16/FP8). Interactive config, single launch command. HuggingFace ecosystem standard.
Unnamed Skill
Distributed training orchestration across clusters. Scales PyTorch/TensorFlow/HuggingFace from laptop to 1000s of nodes. Built-in hyperparameter tuning with Ray Tune, fault tolerance, elastic scaling. Use when training massive models across multiple machines or running distributed hyperparameter sweeps.
pytorch-lightning
High-level PyTorch framework with Trainer class, automatic distributed training (DDP/FSDP/DeepSpeed), callbacks system, and minimal boilerplate. Scales from laptop to supercomputer with same code. Use when you want clean training loops with built-in best practices.
Unnamed Skill
Scalable data processing for ML workloads. Streaming execution across CPU/GPU, supports Parquet/CSV/JSON/images. Integrates with Ray Train, PyTorch, TensorFlow. Scales from single machine to 100s of nodes. Use for batch inference, data preprocessing, multi-modal data loading, or distributed ETL pipelines.
llama-cpp
Runs LLM inference on CPU, Apple Silicon, and consumer GPUs without NVIDIA hardware. Use for edge deployment, M1/M2/M3 Macs, AMD/Intel GPUs, or when CUDA is unavailable. Supports GGUF quantization (1.5-8 bit) for reduced memory and 4-10Ă— speedup vs PyTorch on CPU.
tensorrt-llm
Optimizes LLM inference with NVIDIA TensorRT for maximum throughput and lowest latency. Use for production deployment on NVIDIA GPUs (A100/H100), when you need 10-100x faster inference than PyTorch, or for serving models with quantization (FP8/INT4), in-flight batching, and multi-GPU scaling.
doc-scraper
Scrape documentation websites into organized reference files. Use when converting docs sites to searchable references or building Claude skills.
doc-scraper
Scrape documentation websites into organized reference files. Use when converting docs sites to searchable references or building Claude skills.
invoice-organizer
Organize, categorize, track, and manage invoices systematically with automated extraction of invoice data, payment tracking, and financial organization. Use when processing invoice uploads, extracting invoice details (date, amount, vendor), categorizing expenses, tracking payment status, organizing receipts, generating financial reports, or building accounting and bookkeeping systems.
refactoring
Improve code structure, readability, and maintainability without changing external behavior through systematic refactoring techniques like extracting functions, removing duplication, simplifying conditionals, and applying design patterns. Use when reducing technical debt, extracting functions or classes, removing code duplication, simplifying complex conditionals, renaming for clarity, applying design patterns, improving code organization, reducing coupling, increasing cohesion, or maintaining test coverage during structural improvements.
file-organizer
Organize, categorize, rename, and manage files systematically using automated rules, naming conventions, and folder structures for efficient file management. Use when organizing uploaded files, implementing file naming conventions, categorizing files by type or metadata, creating folder structures, cleaning up messy directories, automating file movements, implementing media libraries, or building file management systems.
content-evaluation-framework
This skill should be used when evaluating the quality of book chapters, lessons, or educational content. It provides a systematic 6-category rubric with weighted scoring (Technical Accuracy 30%, Pedagogical Effectiveness 25%, Writing Quality 20%, Structure & Organization 15%, AI-First Teaching 10%, Constitution Compliance Pass/Fail) and multi-tier assessment (Excellent/Good/Needs Work/Insufficient). Use this during iterative drafting, after content completion, on-demand review requests, or before validation phases.
gitlab-ci-best-practices
Use when optimizing GitLab CI/CD pipelines for performance, reliability, or maintainability. Covers pipeline optimization and organizational patterns.
tensorflow-neural-networks
Build and train neural networks with TensorFlow
PHP Composer and Autoloading
Use when composer package management and PSR-4 autoloading including dependency management, autoload strategies, package creation, version constraints, and patterns for modern PHP project organization and distribution.