Audio Processing
357 skills in Content & Media > Audio Processing
critical-bug-detector
Automatically performs critical pre-release checks when building installers, creating releases, or packaging VoiceLite. Prevents missing files, version mismatches, and configuration errors that have caused release failures.
notebooklm
Guide for managing Google NotebookLM from the command line using nlm CLI. Use when the user wants to create notebooks, manage sources, generate audio overviews, or mentions NotebookLM, nlm, notebook management, or research organization.
voice-and-tone
Writing style guide for jonmagic / Jonathan Hoyt with authentic voice patterns and tone guidelines. Use when generating any prose content on jonmagic's behalf—blog posts, documentation, reflections, feedback, snippets, or any written communication. Ensures first-person narratives with introspective framing, concrete examples, and thoughtful principal-engineer perspective.
voice-transcription
Record and transcribe voice input when user wants to speak instead of type, describe complex issues verbally, provide audio input, or dictate text. Use this when user says "record my voice", "let me speak", "voice input", "transcribe audio", or when verbal description would be clearer than typing.
osay
AI-powered text-to-speech CLI tool. Use for pronunciation queries, reading text aloud, generating audio files, or language practice. Triggers on "how to pronounce", "say this", "read aloud", "TTS", "text to speech", "speak", or audio generation requests.
voice-interface-builder
Expert in building voice interfaces, speech recognition, and text-to-speech systems
audio-producer
Expert in web audio, audio processing, and interactive sound design
skyrim-audio
Handle Skyrim audio files including FUZ, XWM, and WAV formats. Use when the user wants to inspect audio, extract voice files, create FUZ files, or convert audio formats.
integration-test
Full SDK integration test that runs actual queries through the Claude SDK sandbox. Use after making changes to SDK client code, session management, skill loading, network proxy, voice/TTS, or image generation. Runs real prompts through the SDK to verify the complete path works.
podcast-downloader
Search and download podcast episodes from Apple Podcasts. Use when user wants to find podcasts, download podcast episodes, get podcast information, or mentions Apple Podcasts, iTunes, podcast search, or audio downloads.
text-to-speech
Converts text to speech audio using OpenAI TTS API. Use when users request audio versions of text or want responses read aloud.
project-setup-wizard
This skill should be used when analyzing an existing project to automatically generatepersonalized skills, agents, commands, and documentation based on detected patterns and needs.AUTO-ACTIVATES for: project setup, analyze project, setup claude code, personalize claude,auto-generate tools, detect project needs, bootstrap project.PROVIDES: Deep project analysis (code patterns, architecture, domain detection),automatic skill generation (personalized for THIS project's patterns),automatic agent generation (for recurring tasks), automatic command generation (for workflows),custom CLAUDE.md (with project-specific context and best practices).ANALYZES: Code patterns (repetitive endpoints, components, queries), project structure(architecture, layers, modules), dependencies (frameworks, libraries), recurring tasks(what developers do repeatedly), domain detection (invoicing, e-commerce, analytics, etc.).GENERATES: Personalized skills (e.g., "invoice-endpoint-builder" for invoicing project),task-specific
elevenlabs-agents
Build conversational AI voice agents with ElevenLabs Platform using React, JavaScript, React Native, or Swift SDKs. Configure agents, tools (client/server/MCP), RAG knowledge bases, multi-voice, and Scribe real-time STT. Use when: building voice chat interfaces, implementing AI phone agents with Twilio, configuring agent workflows or tools, adding RAG knowledge bases, testing with CLI "agents as code", or troubleshooting deprecated @11labs packages, Android audio cutoff, CSP violations, dynamic variables, or WebRTC config. Keywords: ElevenLabs Agents, ElevenLabs voice agents, AI voice agents, conversational AI, @elevenlabs/react, @elevenlabs/client, @elevenlabs/react-native, @elevenlabs/elevenlabs-js, @elevenlabs/agents-cli, elevenlabs SDK, voice AI, TTS, text-to-speech, ASR, speech recognition, turn-taking model, WebRTC voice, WebSocket voice, ElevenLabs conversation, agent system prompt, agent tools, agent knowledge base, RAG voice agents, multi-voice agents, pronunciation dictionary, voice speed control, elevenlabs scribe, @11labs deprecated, Android audio cutoff, CSP violation elevenlabs, dynamic variables elevenlabs, case-sensitive tool names, webhook authentication
audio-normalizer
Use when asked to normalize audio volume, match loudness, or apply peak/RMS normalization to audio files.
mediatts-canary-runner
Run TTS canary tests, measure audio quality/latency, and rollback on threshold breaches. Use before rolling out new voices or pipelines.
mediamulti-speaker-orchestrator
Orchestrate multi-voice TTS production: assign speakers, chunk dialogue, dispatch to voices, sync timing, and mix into a final track. Use after dialogue-dramatizer produces script turns.
customer-acquisition-focus
Use when working on Total Audio Platform (Audio Intel) during customer acquisition phase - enforces focus on customer-facing features over technical perfection, validates work contributes to ÂŁ500/month goal, prevents perfectionism and over-optimization that delays customer acquisition
brand-building
Brand strategy, identity, positioning, and voice development. Use when developing brand guidelines, creating positioning statements, defining brand voice, or building brand architecture.
music-downloader
This skill should be used when users need to download audio or music from online platforms like YouTube, SoundCloud, Spotify, or other streaming services. It provides yt-dlp and spotdl command templates for high-quality audio extraction, playlist downloads, metadata embedding, and multi-platform support.
multi-modal
Multi-modal prompting with vision, audio, and document understanding