Audio Processing
357 skills in Content & Media > Audio Processing
skill-elevenlabs-tts-tool
ElevenLabs text-to-speech CLI tool guide
podcast-production-guide
Эксперт podcast production. Используй для создания подкастов, audio editing и distribution.
audio-transcription-cleanup
Transform messy voice transcription text into well-formatted, human-readable documents while preserving original meaning
openai
OpenAI API via curl. Use this skill for GPT chat completions, DALL-E image generation, Whisper audio transcription, embeddings, and text-to-speech.
coles-invoice-processor
Processes Coles grocery invoices to extract structured data and predict future orders. Use when user uploads/pastes invoice content, asks to analyze grocery purchases, or wants shopping predictions.
court-record-transcriber
Development skill for CaseMark's Court Recording Transcriber - an AI-powered application for transcribing court recordings with speaker identification, synchronized playback, search, and legal document exports. Built with Next.js 16, PostgreSQL, Drizzle ORM, wavesurfer.js, and Case.dev APIs. Use this skill when: (1) Working on or extending the court-record-transcriber codebase, (2) Integrating with Case.dev transcription APIs, (3) Working with audio playback/waveforms, (4) Building transcript export features, or (5) Adding speaker identification logic.
macos-say
Use macOS text-to-speech via the `say` command for voice feedback, audio narration, and spoken output.
curriculum-review-pedagogy
Verify constructive alignment between objectives, activities, and assessments; validate instructional design quality and learning science principles. Use when reviewing curriculum quality, checking alignment, or validating pedagogical soundness. Activates on "review alignment", "check pedagogy", "validate curriculum", or "quality review".
rust-candle-core
Build native Rust ML models with Candle framework. Use when implementing vision transformers, LLMs, or audio models with GPU acceleration.
openai-api
OpenAI REST API integration guide. Use when: making direct HTTP calls to OpenAI API,understanding API structure without SDK, debugging API requests, learning request/responseformats, handling errors and rate limits. Covers: authentication, Chat Completions,Embeddings, Images (DALL-E), Audio (Whisper/TTS), error handling, streaming.
brand-voice-generator
Develop consistent brand voice and messaging guidelines for companies and personal brands. Creates tone, style, and communication frameworks that align with brand values and target audience preferences.
audio-recorder
Expert in managing audio recordings using sox. **Use this skill whenever the user mentions "record", "recording", "start recording", "stop recording", "list records", or asks to capture audio from meetings or conversations.**
notebooklm-lesson-planner
Create structured Google NotebookLM lesson plans with curated sources and tailored audio overview prompts. Use when the user wants to create a NotebookLM notebook, generate learning content with audio overviews, or build a curriculum with multiple focused segments. Handles both single lesson creation and batch processing of multiple topics.
ai-multimodal
Process and generate multimedia content using Google Gemini API. Capabilities include analyze audio files (transcription with timestamps, summarization, speech understanding, music/sound analysis up to 9.5 hours), understand images (captioning, object detection, OCR, visual Q&A, segmentation), process videos (scene detection, Q&A, temporal analysis, YouTube URLs, up to 6 hours), extract from documents (PDF tables, forms, charts, diagrams, multi-page), generate images (text-to-image, editing, composition, refinement). Use when working with audio/video files, analyzing images or screenshots, processing PDF documents, extracting structured data from media, creating images from text prompts, or implementing multimodal AI features. Supports multiple models (Gemini 2.5/2.0) with context windows up to 2M tokens. | Sử dụng khi: AI, LLM, vision, embedding, phân tích hình ảnh, Gemini API.
mydetailarea-testing
Comprehensive E2E testing suite for MyDetailArea dealership workflows. Implements Playwright test scenarios for critical user journeys including order creation, invoice generation, payment processing, VIN scanning, and team collaboration. Includes role-based testing, performance benchmarks, visual regression, and CI/CD integration. Use when implementing automated testing for dealership operations.
pod-design-review
Generate five brand-aligned design concepts for validated niches using Claude creativity constrained by deterministic structure and brand voice memories.
story-explanation
Create compelling story-format summaries using UltraThink to find the best narrative framing. Support multiple formats - 3-part narrative, n-length with inline links, abridged 5-line, or comprehensive via Foundry MCP. USE WHEN user says 'create story explanation', 'narrative summary', 'explain as a story', or wants content in Daniel's conversational first-person voice.
content-creator
Create SEO-optimized marketing content with consistent brand voice. Includes brand voice analyzer, SEO optimizer, content frameworks, and social media templates. Use when writing blog posts, creating social media content, analyzing brand voice, optimizing SEO, planning content calendars, or when user mentions content creation, brand voice, SEO optimization, social media marketing, or content strategy.
seo-content
Create high-quality, SEO-optimized content that ranks AND reads like a human wrote it. Use when turning keyword research into actual content pieces. Takes a target keyword/cluster and produces a complete article optimized for search while avoiding AI-sounding output. Triggers on: write SEO content for X, create article for keyword, write blog post about X, SEO article, content for keyword cluster. Outputs publication-ready content with proper structure, optimization, and human voice.
content-filter
Filter and classify AI research content for relevance. Use when processing raw content from Twitter, Substacks, blogs, or podcasts to determine if it's worth extracting claims from. Assigns relevance scores, topics, and author categories.