Image Processing
912 skills in Content & Media > Image Processing
reachy-mini
Complete SDK for controlling Reachy Mini robot - head movement, antennas, camera, audio, motion recording/playback.Covers architecture (daemon/client), deployment modes (USB, wireless, simulation, on-Pi), and app distribution.Also includes advanced application patterns: MovementManager, layered motion, audio-reactive movement, face tracking,LLM tool systems, and OpenAI realtime integration.Use when: (1) Writing code to control Reachy Mini, (2) Moving the robot head or antennas, (3) Accessing camera/video,(4) Playing/recording audio, (5) Recording or playing back motions, (6) Looking at points in image or world space,(7) Understanding robot capabilities, (8) Connecting to real or simulated robot, (9) Building conversational AI apps,(10) Integrating with LLMs/OpenAI, (11) Deploying apps to robot, (12) Any robotics task with Reachy Mini.
summary
Create leader-ready status updates (one-pagers) with clear highlights, risks, next steps, and optional images.
uploading-to-imgur
Upload images to Imgur via API and get shareable links. Supports anonymous upload (Client ID) and authenticated upload (Access Token). Returns detailed JSON with image URLs, delete links, dimensions, and metadata. Use when the user needs to upload images to Imgur, share images publicly, or get image hosting URLs.
image-processing
Implement image processing for PhotoVault using Sharp and streaming patterns. Use when working with photo uploads, thumbnail generation, EXIF handling, ZIP extraction, or optimizing images for web. Includes memory management for serverless and PhotoVault storage structure.
supabase-storage
Expert guide for Supabase Storage including bucket management, file operations, URL generation, and RLS policies. Use when working with file uploads/downloads, creating public or private buckets, generating signed URLs, implementing storage RLS policies, handling resumable uploads, image transformations, or any Supabase Storage-related tasks.
sharp
Processes images with Sharp, the high-performance Node.js library for resizing, converting, and optimizing images. Use when building image pipelines, generating thumbnails, or optimizing uploads server-side.
ai-sdk-v5
Build AI-native applications with AI SDK v5. Use when integrating LLMs (OpenAI, Anthropic, Gemini) for text generation, streaming, structured outputs, embeddings, image generation, or audio processing. Covers setup, core workflows, provider selection, and advanced features like reasoning, caching, and tool usage.
nano-banana-pro
Generate and edit images using Google's Nano Banana Pro (Gemini 3 Pro Image) API. Use when the user asks to generate, create, draw, design, make, edit, modify, change, alter, or update images. Also use for "make me an image", "create a picture", or "draw me a...". Use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports both text-to-image generation and image-to-image editing with configurable resolution (1K default, 2K, or 4K for high resolution). DO NOT read the image file first - use this skill directly with the --input-image parameter.
implement-text-response-question
Create D3 questions focused on written explanations and reflections of static content (text, images, diagrams). For video-based questions, use implement-video-question instead.
confluence
Manage Confluence documentation with downloads, uploads, conversions, and diagrams. Use when asked to "download Confluence pages", "upload to Confluence", "convert Wiki Markup", "sync markdown to Confluence", "create Confluence page", or "handle Confluence images".
family-tree-researcher
Research genealogical connections to Aboriginal peoples in specified regions using web search, historical documents, and database analysis. Process images from folders, discover relationship paths, and track research findings.
gemini-vision
Guide for implementing Google Gemini API image understanding - analyze images with captioning, classification, visual QA, object detection, segmentation, and multi-image comparison. Use when analyzing images, answering visual questions, detecting objects, or processing documents with vision.
canva-resize-for-all-social-media
Resize a Canva design into multiple social media formats (Facebook post, Facebook story, Instagram post, Instagram story, LinkedIn post) and export all versions as PNGs. Use this skill when users want to resize Canva designs specifically for multiple social media platforms in one operation, rather than resizing to a single format manually.
ci-images
Work with this repo’s GitHub Actions CI and GHCR Docker image publishing workflow. Use when changing generation checks, tests, formatting, or when preparing a release and validating image tags.
generating-images
Generate images using AI models via OpenRouter API. Supports text-to-image and image-based generation with customizable aspect ratios. Use when the user asks to generate, create, or synthesize images based on text descriptions or reference images.
epub-creator
Create production-quality EPUB 3 ebooks from markdown and images with automated QA, formatting fixes, and validation. Use when creating ebooks, converting markdown to EPUB, or compiling chapters into a publishable book. Handles markdown quirks, generates TOC, adds covers, and validates output automatically.
creating-profile-images
Generates Nano Banana Pro prompts for profile icons and SNS images. Use when user mentions "プロフィール画像", "アイコン作成", "SNS用画像", or "ヘッダー画像".
sleeptrack-ios
This skill helps iOS developers integrate the Asleep SDK for sleep tracking functionality. Use this skill when building native iOS apps with Swift/SwiftUI that need sleep tracking capabilities, implementing delegate patterns, configuring iOS permissions (microphone, notifications, background modes), managing tracking lifecycle, integrating Siri Shortcuts, or working with Combine framework for reactive state management.
invoice-processor
Automatically process invoices (发票) from PDFs/images to Excel spreadsheets using AI vision recognition. Use this skill when users mention "发票", "invoice", "处理发票", "识别发票", "提取发票", or need to convert invoice files to Excel format.
web-to-markdown
Batch-process web pages via headless Playwright browser, extract HTML, convert to markdown using Turndown, and save to timestamped scratchpad file. Use when user asks to "capture these pages as markdown", "save web content", "fetch and convert webpages", or needs clean markdown from HTML. All URLs from one prompt → single file at docs/web-captures/<timestamp>.md.