Computer Vision
257 skills in Data & AI > Computer Vision
security-audit
Agent Skill: Security audit patterns for PHP/OWASP. Use when conducting security assessments, identifying vulnerabilities (XXE, SQL injection, XSS), or CVSS scoring. By Netresearch.
json-validation
Centralized JSON validation for AGENT_SUCCESS_CRITERIA with defensive parsing and injection attack prevention (CVSS 8.2)
vulnerability-scanning
Automated security scanning for dependencies, code, containers with Trivy, Snyk, npm audit. Use for CI/CD security gates, pre-deployment audits, compliance requirements, or encountering CVE detection, outdated packages, license compliance, SBOM generation errors.
mentor
Guide through problems with questions, not answers using Socratic teaching style. Use when asked to teach, explain concepts through discovery, help learn, or guide understanding without giving direct solutions. Triggers on: "use mentor mode", "teach me", "help me understand", "guide me", "mentor", "I want to learn", "explain by asking", "Socratic", "don't give me the answer". Read-only mode - explores and guides but doesn't write code.
security-dependency-scanning
Guide for conducting comprehensive web dependency security scans to identify outdated libraries, CVEs, and security misconfigurations. Use when analyzing deployed websites for dependency vulnerabilities.
sora-python-sdk
Python SDK for WebRTC SFU Sora - real-time video/audio streaming and communication. Use when working with Sora connections, WebRTC streams, video/audio sources/sinks, hardware encoding/decoding (Intel VPL, NVIDIA, Apple VideoToolbox, AMD AMF, Raspberry Pi), data channels, simulcast, encoded transforms, VAD, or building real-time communication applications. Supports sendonly, recvonly, and sendrecv roles.
zai-cli
Z.AI CLI providing: - Vision: image/video analysis, OCR, UI-to-code, error diagnosis (GLM-4.6V) - Search: real-time web search with domain/recency filtering - Reader: web page to markdown extraction - Repo: GitHub code search and reading via ZRead - Tools: MCP tool discovery and raw calls - Code: TypeScript tool chaining Use for visual content analysis, web search, page reading, or GitHub exploration. Requires Z_AI_API_KEY.
career-document-architect
Use when writing or reviewing career documents including research statements, teaching statements, diversity statements, CVs, or biosketches. Invoke when user mentions research statement, teaching philosophy, diversity statement, biosketch, academic CV, faculty application, or needs help with career narrative, positioning, or professional documents for academic advancement.
socratic-teaching-scaffolds
Use when teaching complex concepts (technical, scientific, philosophical), helping learners discover insights through guided questioning rather than direct explanation, correcting misconceptions by revealing contradictions, onboarding new team members through scaffolded learning, mentoring through problem-solving question frameworks, designing self-paced learning materials, or when user mentions "teach me", "help me understand", "explain like I'm", "learning path", "guided discovery", or "Socratic method".
remote-run-ssh
Run CVlization examples on the `ssh l1` GPU host by copying only the needed example directory plus the shared `cvlization/` package into `/tmp`, then launching the example’s Docker scripts.
ai-multimodal
Multimodal AI processing via Google Gemini API (2M tokens context). Capabilities: audio (transcription, 9.5hr max, summarization, music analysis), images (captioning, OCR, object detection, segmentation, visual Q&A), video (scene detection, 6hr max, YouTube URLs, temporal analysis), documents (PDF extraction, tables, forms, charts), image generation (text-to-image, editing). Actions: transcribe, analyze, extract, caption, detect, segment, generate from media. Keywords: Gemini API, audio transcription, image captioning, OCR, object detection, video analysis, PDF extraction, text-to-image, multimodal, speech recognition, visual Q&A, scene detection, YouTube transcription, table extraction, form processing, image generation, Imagen. Use when: transcribing audio/video, analyzing images/screenshots, extracting data from PDFs, processing YouTube videos, generating images from text, implementing multimodal AI features.
styling-with-tailwind
Creates UIs using Tailwind CSS utility classes and shadcn/ui patterns. Covers CSS variables with OKLCH colors, component variants with CVA, responsive design, dark mode, and Tailwind v4 features. Use when building interfaces with Tailwind, styling shadcn/ui components, implementing themes, or working with utility-first CSS.
brainstorming
Use when creating or developing anything, before writing code or implementation plans - refines rough ideas into fully-formed designs through structured Socratic questioning, alternative exploration, and incremental validation
verify-training-pipeline
Verify a CVlization training pipeline example is properly structured, can build, trains successfully, and logs appropriate metrics. Use when validating example implementations or debugging training issues.
verify-inference-example
Verify a CVlization inference example is properly structured, builds successfully, and runs inference correctly. Use when validating inference example implementations or debugging inference issues.
brainstorming
Use when creating or developing anything, before writing code or implementation plans - refines rough ideas into fully-formed designs through structured Socratic questioning, alternative exploration, and incremental validation
senior-computer-vision
World-class computer vision skill for image/video processing, object detection, segmentation, and visual AI systems. Expertise in PyTorch, OpenCV, YOLO, SAM, diffusion models, and vision transformers. Includes 3D vision, video analysis, real-time processing, and production deployment. Use when building vision AI systems, implementing object detection, training custom vision models, or optimizing inference pipelines.
brainstorming
Use when creating or developing anything, before writing code or implementation plans - refines rough ideas into fully-formed designs through structured Socratic questioning, alternative exploration, and incremental validation
shadcn-ui
shadcn/ui component patterns with Radix primitives and Tailwind styling. Use when building UI components, using CVA variants, implementing compound components, or styling with data-slot attributes. Triggers on shadcn, cva, cn(), data-slot, Radix, Button, Card, Dialog, VariantProps.
shadcn-code-review
Reviews shadcn/ui components for CVA patterns, composition with asChild, accessibility states, and data-slot usage. Use when reviewing React components using shadcn/ui, Radix primitives, or Tailwind styling.