Audio Processing
357 skills in Content & Media > Audio Processing
stream-validator
Validate WebSocket and HTTP stream health for WaveCap-SDR channels. Use when debugging streaming issues, measuring latency or throughput, detecting packet loss, or verifying audio/spectrum delivery.
play-sound
Cross-platform audio feedback system for task completion and user prompts. Provides non-intrusive sound notifications to improve workflow awareness.
audio-transcribe
使用 Whisper 将音频/视频转换为文字,支持词级别时间戳。Use when user wants to 语音转文字, 音频转文字, 视频转文字, 字幕生成, transcribe audio, speech to text, generate subtitles, 识别语音.
finish-es-latam-audiobook
Финализация аудиокниги ru→es_latam. Проверяет что ВСЕ переводы готовы, генерирует TTS, комбинирует аудио, ищет фоновую картинку, создаёт видео. Использовать ТОЛЬКО после завершения claude-translation-pipeline-es_latam.
polish-transcripts
Polish raw interview transcripts into searchable, well-structured markdown with metadata, dynamic headers, and full fidelity to technical content. Designed for Ray Peat interview podcasts and similar shows where technical accuracy is paramount.
mindwork-transcribe
Transcribe therapy session recordings to formatted text. Converts audio to clean, speaker-labeled transcripts (Me/Therapist format) with grammar correction and English translation. Use when processing therapy recordings, session audio, or any two-person conversation recording.
capture-health-check
End-to-end health check for WaveCap-SDR captures. Use when captures are stuck in "starting" state, spectrum analyzer not updating, audio not playing, or to verify the system is working correctly.
gray-swan-mitm-general
Complete guide for Gray Swan MITM challenges across all waves. Includes AI agent profiling, defense bypass strategies, domain-specific handling, and platform troubleshooting. Use for ANY Gray Swan MITM challenge.
voice-processing
Voice cloning workflows, voice library management, audio format conversion, and voice settings. Use when cloning voices, managing voice libraries, processing audio for voice creation, configuring voice settings, or when user mentions voice cloning, instant cloning, professional cloning, voice library, audio processing, voice settings, or ElevenLabs voices.
unsloth-stt
Fine-tuning Speech-to-Text models like Whisper using Unsloth's optimized LoRA pipeline. Triggers: stt, whisper, transcription, audio fine-tuning, speech-to-text, audio normalization.
bjw-voice-modeling
Capture Bennett Waxse's writing voice for technical communication, LinkedIn posts, research discussions, and professional correspondence. Use when drafting content that should sound like Bennett, including: (1) Research paper discussions or LinkedIn posts about academic work, (2) Technical emails to collaborators, (3) Professional communications that need his specific tone and style, (4) Any writing where maintaining his authentic voice matters.
marketing-writer
Writes marketing content for product features by reading the codebase to understand the app's purpose, features, and value proposition. Uses casual, direct brand voice without buzzwords. Use when shipping features, creating landing page copy, writing launch emails, creating tweet threads, or any marketing content needs. Triggers include "write marketing copy", "create landing page section", "write a tweet thread", "launch email for X feature", or requests for product descriptions.
blog-writing
Write blog posts in Matt Palmer's formal voice. Use for tech blogs, company posts, tutorials, and professional content about AI-assisted development.
elevenlabs-agents
Use this skill when building AI voice agents with the ElevenLabs Agents Platform. This skill covers the complete platform including agent configuration (system prompts, turn-taking, workflows), voice & language features (multi-voice, pronunciation, speed control), knowledge base (RAG), tools (client/server/MCP/system), SDKs (React, JavaScript, React Native, Swift, Widget), Scribe (real-time STT), WebRTC/WebSocket connections, testing & evaluation, analytics, privacy/compliance (GDPR/HIPAA/SOC 2), cost optimization, CLI workflows ("agents as code"), and DevOps integration. Prevents 17+ common errors including package deprecation, Android audio cutoff, CSP violations, missing dynamic variables, case-sensitive tool names, webhook authentication failures, and WebRTC configuration issues. Provides production-tested templates for React, Next.js, React Native, Swift, and Cloudflare Workers. Token savings: ~73% (22k → 6k tokens). Production tested.Keywords: ElevenLabs Agents, ElevenLabs voice agents, AI voice agent
audio-integration
Set up and manage ElevenLabs Text-to-Speech (TTS) audio for blog posts, projects, and pages. Use when enabling audio narration, generating voice samples, or configuring voice settings.
ai-multimodal
Process and generate multimedia content using Google Gemini API for better vision capabilities. Capabilities include analyze audio files (transcription with timestamps, summarization, speech understanding, music/sound analysis up to 9.5 hours), understand images (better image analysis than Claude models, captioning, reasoning, object detection, design extraction, OCR, visual Q&A, segmentation, handle multiple images), process videos (scene detection, Q&A, temporal analysis, YouTube URLs, up to 6 hours), extract from documents (PDF tables, forms, charts, diagrams, multi-page), generate images (text-to-image with Imagen 4, editing, composition, refinement), generate videos (text-to-video with Veo 3, 8-second clips with native audio). Use when working with audio/video files, analyzing images or screenshots (instead of default vision capabilities of Claude, only fallback to Claude's vision capabilities if needed), processing PDF documents, extracting structured data from media, creating images/videos from text pr
catskill-writer
Write Catskill Crew newsletter content in Michael's voice. Use when writing HAPPENINGS, REPORT, BULLETIN sections or assembling a complete newsletter edition.
blog-post-editor
Write new blog posts or edit existing ones to match the established writing voice and style guidelines.
audio-dsp-reviewer
Expert in digital signal processing for audio applications. Validates biquad filter implementations, frequency response calculations, and audio algorithms. Use when modifying audio-math.ts, implementing new filter types, or adding spectral analysis features.
ab-testing-statistician
Expert in statistical analysis for blind A/B and ABX audio testing. Validates randomization, calculates statistical significance, and ensures proper experimental design. Use when implementing A/B test features or analyzing test results.