Audio Processing
357 skills in Content & Media > Audio Processing
swarm
Multi-perspective reasoning through Upanishadic Antahkarana voices. Use for complex problems requiring diverse viewpoints and synthesis.
marketing-writer
Write authentic, conversion-focused marketing content for product features and launches. Use when Maurice ships a feature, needs landing page copy, tweet threads, launch emails, or any marketing content. Automatically analyzes codebase to understand features and value props. Brand voice is casual, direct, no corporate buzzwords - focuses on real benefits in simple language.
brand-voice
Define or extract a consistent brand voice that other skills can use. Two modes - Extract (analyze existing content you're proud of) or Build (strategically construct a voice from scratch). Use when starting a project, when copy sounds generic, or when output needs to sound like a specific person/brand. Triggers on: what's my voice, analyze my brand, help me define my voice, make this sound like me, voice guide, brand personality. Outputs a voice profile that can be fed into direct-response-copy and other content skills.
typescript-taste
Apply rigorous TypeScript type design with strong inference, minimal constraints, and sound fallbacks.
audio-quality-checker
Analyze the WaveCap-SDR audio stream to assess tuning quality, detect silence, noise, proper audio, or distortion. Use when checking if SDR channels are properly configured or debugging audio issues.
cover-letter-voice
Develop authentic cover letter narrative using philosophy, patterns, and job's cultural requirements
ui-token-first
Enforce UI token usage for Espresso Engineered frontend work. Use when editing Svelte/SvelteKit UI, styling typography, voice lines, headers, cards, surfaces, or layout so styles come from frontend/src/lib/ui tokens instead of app.css or ad-hoc CSS.
dheplab-newsletter
DHEPLab Newsletter content pipeline for LinkedIn and Substack. Creates, optimizes,and manages thought leadership content establishing DHEPLab as the premier voicein digital health economics. Use for LinkedIn posts, Substack newsletters, contentcalendar management, and engagement tracking.
recipe-builder
Create and manage WaveCap-SDR recipe templates for common capture scenarios. Use when setting up new band plans, creating presets for trunking systems, or building reusable multi-channel configurations for marine/aviation/broadcast monitoring.
nnt-compiler
Work with the NNT (Nakul Notation Tool) compiler - parse music notation shorthand, query musical structures, and export to MusicXML, ABC, and other formats for PhD research and educational content
audio-transcribe
使用 Whisper 将音频/视频转换为文字,支持词级别时间戳。Use when user wants to 语音转文字, 音频转文字, 视频转文字, 字幕生成, transcribe audio, speech to text, generate subtitles, 识别语音.
ai-multimodal
Process and generate multimedia content using Google Gemini API. Capabilities include analyze audio files (transcription with timestamps, summarization, speech understanding, music/sound analysis up to 9.5 hours), understand images (captioning, object detection, OCR, visual Q&A, segmentation), process videos (scene detection, Q&A, temporal analysis, YouTube URLs, up to 6 hours), extract from documents (PDF tables, forms, charts, diagrams, multi-page), generate images (text-to-image, editing, composition, refinement). Use when working with audio/video files, analyzing images or screenshots, processing PDF documents, extracting structured data from media, creating images from text prompts, or implementing multimodal AI features. Supports multiple models (Gemini 2.5/2.0) with context windows up to 2M tokens.
gemini-audio
Guide for implementing Google Gemini API audio capabilities - analyze audio with transcription, summarization, and understanding (up to 9.5 hours), plus generate speech with controllable TTS. Use when processing audio files, creating transcripts, analyzing speech/music/sounds, or generating natural speech from text.
audio-effect
Create standard SuperCollider audio effects for Bice-Box (delays, reverbs, filters, distortions). Provides templates, ControlSpecs, common patterns, and MCP workflow for safely creating/updating effects.
widget-tester
Expert assistant for testing the embeddable Bible widget functionality in the KR92 Bible Voice project. Use when creating widget tests, validating embed API responses, testing reference formats, checking audio integration, or creating regression test cases.
ai-multimodal
Process and generate multimedia content using Google Gemini API. Capabilities include analyze audio files (transcription with timestamps, summarization, speech understanding, music/sound analysis up to 9.5 hours), understand images (captioning, object detection, OCR, visual Q&A, segmentation), process videos (scene detection, Q&A, temporal analysis, YouTube URLs, up to 6 hours), extract from documents (PDF tables, forms, charts, diagrams, multi-page), generate images (text-to-image, editing, composition, refinement). Use when working with audio/video files, analyzing images or screenshots, processing PDF documents, extracting structured data from media, creating images from text prompts, or implementing multimodal AI features. Supports multiple models (Gemini 2.5/2.0) with context windows up to 2M tokens.
content-research-writer
Creates high-quality content (blog posts, tweets, newsletters, documentation) that matches the user's writing style and voice. Performs web research to find citations and supporting evidence. Use when user requests blog posts, marketing content, newsletters, tweets, or any written content that should sound authentic and be well-researched.
wavecap-audio
Analyze recorded audio files from WaveCap. Use when the user wants to inspect audio recordings, check audio quality, list available recordings, or get audio file metadata.
livekit-stt-selfhosted
Build self-hosted speech-to-text APIs using Hugging Face models (Whisper, Wav2Vec2) and create LiveKit voice agent plugins. Use when building STT infrastructure, creating custom LiveKit plugins, deploying self-hosted transcription services, or integrating Whisper/HF models with LiveKit agents. Includes FastAPI server templates, LiveKit plugin implementation, model selection guides, and production deployment patterns.
elevenlabs-agents
Work with ElevenLabs Conversational AI agents - initiate calls, retrieve transcripts, manage phone numbers, and analyze agent conversations. Use when building or testing voice AI applications.