Image Processing
912 skills in Content & Media > Image Processing
nemo-curator
GPU-accelerated data curation for LLM training. Supports text/image/video/audio. Features fuzzy deduplication (16Ă— faster), quality filtering (30+ heuristics), semantic deduplication, PII redaction, NSFW detection. Scales across GPUs with RAPIDS. Use for preparing high-quality training datasets, cleaning web data, or deduplicating large corpora.
Unnamed Skill
Scalable data processing for ML workloads. Streaming execution across CPU/GPU, supports Parquet/CSV/JSON/images. Integrates with Ray Train, PyTorch, TensorFlow. Scales from single machine to 100s of nodes. Use for batch inference, data preprocessing, multi-modal data loading, or distributed ETL pipelines.
clip
OpenAI's model connecting vision and language. Enables zero-shot image classification, image-text matching, and cross-modal retrieval. Trained on 400M image-text pairs. Use for image search, content moderation, or vision-language tasks without fine-tuning. Best for general-purpose image understanding.
llava
Large Language and Vision Assistant. Enables visual instruction tuning and image-based conversations. Combines CLIP vision encoder with Vicuna/LLaMA language models. Supports multi-turn image chat, visual question answering, and instruction following. Use for vision-language chatbots or image understanding tasks. Best for conversational image analysis.
sentence-transformers
Framework for state-of-the-art sentence, text, and image embeddings. Provides 5000+ pre-trained models for semantic similarity, clustering, and retrieval. Supports multilingual, domain-specific, and multimodal models. Use for generating embeddings for RAG, semantic search, or similarity tasks. Best for production embedding generation.
web-asset-generator
Generate web assets including favicons, app icons (PWA), and social media meta images (Open Graph) for Facebook, Twitter, WhatsApp, and LinkedIn. Use when users need icons, favicons, social sharing images, or Open Graph images from logos or text slogans. Handles image resizing, text-to-image generation, and provides proper HTML meta tags.
zai-cli
Z.AI CLI providing: - Vision: image/video analysis, OCR, UI-to-code, error diagnosis (GLM-4.6V) - Search: real-time web search with domain/recency filtering - Reader: web page to markdown extraction - Repo: GitHub code search and reading via ZRead - Tools: MCP tool discovery and raw calls - Code: TypeScript tool chaining Use for visual content analysis, web search, page reading, or GitHub exploration. Requires Z_AI_API_KEY.
pdf-extractor
Extract text, tables, and images from PDF files. Use when converting PDF documentation, manuals, or reports to searchable text.
pdf-extractor
Extract text, tables, and images from PDF files. Use when converting PDF documentation, manuals, or reports to searchable text.
performance-optimization
Optimize application performance through code splitting, lazy loading, caching strategies, bundle size reduction, render optimization, and profiling. Use when improving page load times, reducing bundle sizes, optimizing React rendering, implementing code splitting, configuring caching strategies, lazy loading components and routes, optimizing images and assets, profiling performance bottlenecks, implementing virtual scrolling for large lists, or improving Core Web Vitals and Lighthouse scores.
nano-banana-prompting
This skill should be used when crafting prompts for Nano Banana Pro (Gemini image generation). Use when users want help writing image generation prompts, need guidance on prompt structure, or want to optimize their prompts for better results.
csharp-async-patterns
Use when C# async/await patterns including Task, ValueTask, async streams, and cancellation. Use when writing asynchronous C# code.
csharp-async-patterns
Use when C# asynchronous programming with async/await, Task, ValueTask, ConfigureAwait, and async streams for responsive applications.
nano-banana
This skill should be used for Python scripting and Gemini image generation. Use when users ask to generate images, create AI art, edit images with AI, or run Python scripts with uv. Trigger phrases include "generate an image", "create a picture", "draw", "make an image of", "nano banana", or any image generation request.
csharp-nullable-types
Use when C# nullable reference types, null safety patterns, and migration strategies. Use when ensuring null safety in C# code.
dotnet-run-file
Run script-like CSharp programs using dotnet run file.cs. Use this skill when users want to execute CSharp code directly, write one-liner scripts via stdin, or learn about run file directives.
csharp-linq
Use when lINQ (Language Integrated Query) with query and method syntax, deferred execution, expression trees, and performance optimization.
act-docker-setup
Use when configuring Docker environments for act, selecting runner images, managing container resources, or troubleshooting Docker-related issues with local GitHub Actions testing.
csharp-linq
Use when lINQ query and method syntax, deferred execution, and performance optimization. Use when querying collections in C#.
markdown-syntax-fundamentals
Use when writing or editing markdown files. Covers headings, text formatting, lists, links, images, code blocks, and blockquotes.