Audio Processing

davila7/claude-code-templates

Create SEO-optimized marketing content with consistent brand voice. Includes brand voice analyzer, SEO optimizer, content frameworks, and social media templates. Use when writing blog posts, creating social media content, analyzing brand voice, optimizing SEO, planning content calendars, or when user mentions content creation, brand voice, SEO optimization, social media marketing, or content strategy.

14.5k

1.3k

Updated 4d ago

digital-brain

muratcankoylan/Agent-Skills-for-Context-Engineering

This skill should be used when the user asks to "write a post", "check my voice", "look up contact", "prepare for meeting", "weekly review", "track goals", or mentions personal brand, content creation, network management, or voice consistency.

5.4k

424

transformers

K-Dense-AI/claude-scientific-skills

3.0k

334

ai-multimodal

mrgoonie/claudekit-skills

Process and generate multimedia content using Google Gemini API. Capabilities include analyze audio files (transcription with timestamps, summarization, speech understanding, music/sound analysis up to 9.5 hours), understand images (captioning, object detection, OCR, visual Q&A, segmentation), process videos (scene detection, Q&A, temporal analysis, YouTube URLs, up to 6 hours), extract from documents (PDF tables, forms, charts, diagrams, multi-page), generate images (text-to-image, editing, composition, refinement). Use when working with audio/video files, analyzing images or screenshots, processing PDF documents, extracting structured data from media, creating images from text prompts, or implementing multimodal AI features. Supports multiple models (Gemini 2.5/2.0) with context windows up to 2M tokens.

1.1k

227

danielmiessler/Personal_AI_Infrastructure

story-explanation

Create compelling story-format summaries using UltraThink to find the best narrative framing. Support multiple formats - 3-part narrative, n-length with inline links, abridged 5-line, or comprehensive via Foundry MCP. USE WHEN user says 'create story explanation', 'narrative summary', 'explain as a story', or wants content in Daniel's conversational first-person voice.

1.1k

237

brand-voice-consistency

Ensure all communication matches brand voice and tone guidelines. Use when creating marketing copy, customer communications, public-facing content, or when users mention brand voice, tone, or writing style.

luongnv89/claude-howto

727

Writing Like User

Emulate the user's personal writing voice and style patterns. Use when the user asks to write content in their voice, draft documents, compose messages, or requests "write this like me" or "in my style."

CaptainCrouton89/.claude

490

audiocraft-audio-generation

zechenzhangAGI/AI-research-SKILLs

PyTorch library for audio generation including text-to-music (MusicGen) and text-to-sound (AudioGen). Use when you need to generate music from text descriptions, create sound effects, or perform melody-conditioned music generation.

481

openai-whisper-api

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

steipete/clawdis

430

sag

ElevenLabs text-to-speech with mac-style say UX.

steipete/clawdis

430

transcript-fixer

daymade/claude-code-skills

Corrects speech-to-text transcription errors in meeting notes, lectures, and interviews using dictionary rules and AI. Learns patterns to build personalized correction databases. Use when working with transcripts containing ASR/STT errors, homophones, or Chinese/English mixed content requiring cleanup.

148

speech

Use when implementing speech-to-text, live transcription, or audio transcription. Covers SpeechAnalyzer (iOS 26+), SpeechTranscriber, volatile/finalized results, AssetInventory model management, audio format handling.

CharlesWiltgen/Axiom

142

avfoundation-ref

Reference — AVFoundation audio APIs, AVAudioSession categories/modes, AVAudioEngine pipelines, bit-perfect DAC output, iOS 26+ spatial audio capture, ASAF/APAC, Audio Mix with Cinematic framework

CharlesWiltgen/Axiom

142

openai-api

Build with OpenAI's stateless APIs - Chat Completions (GPT-5.2, GPT-5.1, o3, o4-mini), Realtime API (voice), Batch API (50% savings), Embeddings, Images (DALL-E 3), Audio (Whisper + TTS), and Moderation. Node.js SDK and fetch for Cloudflare Workers. Use when: implementing chat with GPT-5.2/5.1/o3/o4-mini, Realtime voice API (WebSocket), Batch API for cost savings, xhigh reasoning (GPT-5.2), streaming responses, function calling/tools, structured outputs, embeddings for RAG, DALL-E 3 images, Whisper transcription, TTS (13 voices), or troubleshooting rate limits (429), API keys (401), streaming errors.

jezweb/claude-skills

120

elevenlabs-agents

Build conversational AI voice agents with ElevenLabs Platform using React, JavaScript, React Native, or Swift SDKs. Configure agents, tools (client/server/MCP), RAG knowledge bases, multi-voice, and Scribe real-time STT. Use when: building voice chat interfaces, implementing AI phone agents with Twilio, configuring agent workflows or tools, adding RAG knowledge bases, testing with CLI "agents as code", or troubleshooting deprecated @11labs packages, Android audio cutoff, CSP violations, dynamic variables, or WebRTC config. Keywords: ElevenLabs Agents, ElevenLabs voice agents, AI voice agents, conversational AI, @elevenlabs/react, @elevenlabs/client, @elevenlabs/react-native, @elevenlabs/elevenlabs-js, @elevenlabs/agents-cli, elevenlabs SDK, voice AI, TTS, text-to-speech, ASR, speech recognition, turn-taking model, WebRTC voice, WebSocket voice, ElevenLabs conversation, agent system prompt, agent tools, agent knowledge base, RAG voice agents, multi-voice agents, pronunciation dictionary, voice speed control, elevenlabs scribe, @11labs deprecated, Android audio cutoff, CSP violation elevenlabs, dynamic variables elevenlabs, case-sensitive tool names, webhook authentication, post-call webhook, webhook payload schema, ElevenLabs-Signature header, transcript null message, call_successful string, webhook cost credits USD, charging llm_price, user context extraction, llm_usage tokens, data_collection_results, evaluation_criteria_results, feedback thumb_rating, interrupted turn, source_medium, rag_retrieval_info, has_audio has_user_audio has_response_audio

jezweb/claude-skills

120

tabz-guide

Progressive disclosure guide to TabzChrome capabilities. This skill should be used when users ask about profiles, terminal management, browser automation, MCP tools, audio/TTS notifications, integration, debugging, API, or setup. Provides on-demand help organized by topic with references to detailed documentation.

GGPrompts/TabzChrome

115

gemini-audio

Guide for implementing Google Gemini API audio capabilities - analyze audio with transcription, summarization, and understanding (up to 9.5 hours), plus generate speech with controllable TTS. Use when processing audio files, creating transcripts, analyzing speech/music/sounds, or generating natural speech from text.

einverne/dotfiles

109

transcribe-audio

Transcribes video audio using WhisperX, preserving original timestamps. Creates JSON transcript with word-level timing. Use when you need to generate audio transcripts for videos.

barefootford/buttercut

human-writing

Write content that sounds natural, conversational, and authentically human - avoiding AI-generated patterns, corporate speak, and generic phrasing

pr-pm/prpm