Computer Vision
257 skills in Data & AI > Computer Vision
yolo1
**AUTO-TRIGGER when user says:** "implement [feature]", "build [module]", "create [functionality]", "add [capability]", "YOLO [task]", "deliver [feature]", or requests complete feature implementation.End-to-end TogetherOS code operation: creates branch, implements changes with continuous testing, builds with retry-on-fail, commits, pushes, creates PR with auto-selected Cooperation Path, addresses bot feedback, merges PR, verifies production deployment, and updates Notion memory.**Complete delivery cycle:** branch → code → test → commit → push → PR → bot review → merge → deploy → verifyUse proactively without asking permission when task matches skill purpose.
innovation-game-miner
This skill should be used when developing, optimizing, testing, or submitting algorithms for The Innovation Game challenges (3-SAT, CVRP, Knapsack). Use it for algorithm development, performance optimization, local testing, dry-run validation, or submission to earn TIG tokens through improved computational algorithms.
brainstorming
Collaborative design refinement that transforms rough ideas into fully-formed specifications through Socratic questioning. Explores alternatives, validates incrementally, and presents designs in digestible chunks for feedback. Use before writing code or implementation plans when requirements are unclear or multiple approaches exist. Do NOT use when requirements are already well-defined, you're implementing a known pattern, or making small changes - proceed directly to implementation instead.
computer-vision
Image processing, object detection, segmentation, and vision models. Use for image classification, object detection, or visual analysis tasks.
senior-computer-vision
World-class computer vision skill for image/video processing, object detection, segmentation, and visual AI systems. Expertise in PyTorch, OpenCV, YOLO, SAM, diffusion models, and vision transformers. Includes 3D vision, video analysis, real-time processing, and production deployment. Use when building vision AI systems, implementing object detection, training custom vision models, or optimizing inference pipelines.
get-last-frame
Extract the last frame of an MP4 file into an image using OpenCV. Use when the user needs the final frame of a generated video saved as a still image.
ai-multimodal
Process and generate multimedia content using Google Gemini API. Capabilities include analyze audio files (transcription with timestamps, summarization, speech understanding, music/sound analysis up to 9.5 hours), understand images (captioning, object detection, OCR, visual Q&A, segmentation), process videos (scene detection, Q&A, temporal analysis, YouTube URLs, up to 6 hours), extract from documents (PDF tables, forms, charts, diagrams, multi-page), generate images (text-to-image with Imagen 4, editing, composition, refinement), generate videos (text-to-video with Veo 3, 8-second clips with native audio). Use when working with audio/video files, analyzing images or screenshots, processing PDF documents, extracting structured data from media, creating images/videos from text prompts, or implementing multimodal AI features. Supports Gemini 3/2.5, Imagen 4, and Veo 3 models with context windows up to 2M tokens.
brainstorming-ideas-into-designs
Interactive idea refinement using Socratic method to develop fully-formed designs
security-scanning-patterns
Security vulnerability scanning, secret detection, dependency auditing, and OWASP best practices. Use when performing security audits, scanning for vulnerabilities, detecting exposed secrets, checking dependencies, validating security headers, implementing OWASP patterns, or when user mentions security, vulnerabilities, secrets, CVE, OWASP, npm audit, security headers, or penetration testing.
brainstorming
Use when creating or developing anything, before writing code or implementation plans - refines rough ideas into fully-formed designs through structured Socratic questioning, alternative exploration, and incremental validation
vulnerability-triage-prioritization
Assess vulnerability severity using CVSS scoring, classify vulnerability types (CVE vs compliance), detect false positives, and prioritize remediation workflows. Use when analyzing vulnerability data, calculating risk scores, or determining remediation priority.
container-security
Comprehensive container security guidance including vulnerability scanning with Trivy, image hardening, secrets management, and CIS benchmark compliance. Activates when working with "container security", "image scanning", "CVE", "vulnerability", "docker security", "hardening", or "CIS benchmark".
pc-app-security-analyzer
PC application security analyzer for Windows/macOS/Linux executables. Static analysis (SAST) for desktop applications including PE/ELF/Mach-O analysis, dependency scanning, SBOM generation. Integrates with cve-checker and cra-code-reviewer for CRA compliance. Triggers on: PC app analysis, Windows exe, macOS app, Linux binary, PE analysis, ELF analysis, desktop security, SBOM generation, dependency scan.
live-coding-interviewer
Run a senior-level Java/Spring backend live-coding interview practice, guiding the user with Socratic prompts, strict code review (readability, edge cases, data structures), and staged hints without giving direct answers.
scvelo-complete
scVelo RNA速度分析工具包 - 100%覆盖文档(78个文件:完整API+教程+动态建模+可视化)
android-firmware-analyzer
Android端末ファームウェアのセキュリティ分析。Security Patch Level (SPL) チェック、Android/Qualcomm/Samsung Security Bulletin照合、カーネルCVE確認、ファームウェアイメージ解析。Triggers on: Android firmware analysis, SPL check, security patch level, Android security bulletin, Qualcomm bulletin, Samsung security, kernel CVE, firmware security, device security assessment.
component-layer
This skill should be used when the user asks to 'create a component', 'add a button', 'build a card', 'add UI element', or 'create a feature component'. Provides guidance for React components using shadcn/ui, Radix primitives, and CVA patterns in components/**/*.tsx.
containerizing-applications
Containerizes applications with Docker, docker-compose, and Helm charts.Use when creating Dockerfiles, docker-compose configurations, or Helm charts for Kubernetes.Includes Docker Hardened Images (95% fewer CVEs), multi-stage builds, and 15+ battle-tested gotchas.
security-scanning-suite
Comprehensive security analysis including SAST, DAST, dependency scanning, secret detection, and vulnerability assessment. Use for security audits, CVE tracking, compliance checks, and preventing vulnerabilities from reaching production. Supports multiple languages and frameworks.
ai-multimodal
Process and generate multimedia content using Google Gemini API for better vision capabilities. Capabilities include analyze audio files (transcription with timestamps, summarization, speech understanding, music/sound analysis up to 9.5 hours), understand images (better image analysis than Claude models, captioning, reasoning, object detection, design extraction, OCR, visual Q&A, segmentation, handle multiple images), process videos (scene detection, Q&A, temporal analysis, YouTube URLs, up to 6 hours), extract from documents (PDF tables, forms, charts, diagrams, multi-page), generate images (text-to-image with Imagen 4, editing, composition, refinement), generate videos (text-to-video with Veo 3, 8-second clips with native audio). Use when working with audio/video files, analyzing images or screenshots (instead of default vision capabilities of Claude, only fallback to Claude's vision capabilities if needed), processing PDF documents, extracting structured data from media, creating images/videos from text pr