🎨

音訊處理

357 skills in 內容與媒體 > 音訊處理

gencast

Marketplace

Auto-invoke gencast to generate podcasts from documents when user mentions podcast, audio, or dialogue generation

cadrianmae/claude-marketplace

更新於 6d ago

openai-whisper-api

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

majiayu000/claude-skill-registry

更新於 6d ago

unsloth-tts

Fine-tuning Text-to-Speech (TTS) models with Unsloth for voice cloning and synthetic speech (triggers: TTS, text-to-speech, voice cloning, Orpheus-TTS, audio fine-tuning, speech synthesis).

majiayu000/claude-skill-registry

更新於 6d ago

brand-guidelines

Create and maintain brand guidelines including visual identity, voice and tone, and usage rules. Use for establishing brand standards, style guides, and ensuring brand consistency across materials.

vamseeachanta/workspace-hub

更新於 6d ago

yarb-branding

Apply yarb brand voice and tone when writing UI copy, documentation, error messages,or any user-facing text. Use this skill when asked to "write copy", "update messaging","make it sound like yarb", "apply branding", or when creating new UI components.

AhamSammich/yarb

更新於 6d ago

whisper-lolo-audio-ingest

Build or modify the browser-side recording and upload pipeline for whisper-lolo. Use when implementing MediaRecorder + IndexedDB chunking, assembling audio blobs, or configuring Vercel Blob client uploads with progress and callbacks.

Lofp34/whisper-lolo

更新於 6d ago

replicate

Marketplace

Runs open-source ML models via Replicate API for image generation, LLMs, and audio. Use when calling Stable Diffusion, Llama, Whisper, or other models without infrastructure management.

mgd34msu/goodvibes-plugin

更新於 6d ago

glsl-shader

Create audio-reactive GLSL visualizers for Bice-Box. Provides templates, audio uniforms (iRMSOutput, iRMSInput, iAudioTexture), coordinate patterns, and common shader functions.

majiayu000/claude-skill-registry

更新於 6d ago

voiceover

Generates audio narration from a text file using Chatterbox TTS. Use when the user wants to generate voiceover/audio from ANY text file.

kedbin/chatterbox-skills

更新於 6d ago

parakeet

Convert audio files to text using parakeet-mlx, NVIDIA's Parakeet automatic speech recognition model optimized for Apple's MLX framework. Run via uvx for on-device speech-to-text processing with high-quality timestamped transcriptions. Ideal for podcasts, interviews, meetings, and other audio content. This skill is triggered when the user says things like "transcribe this audio", "convert audio to text", "transcribe this podcast", "get text from this recording", "speech to text", or "transcribe this wav/mp3/m4a file".

majiayu000/claude-skill-registry

更新於 6d ago

tts

Implement text-to-speech (TTS) capabilities using the z-ai-web-dev-sdk. Use this skill when the user needs to convert text into natural-sounding speech, create audio content, build voice-enabled applications, or generate spoken audio files. Supports multiple voices, adjustable speed, and various audio formats.

albertfast/radar_tinder

更新於 6d ago

audio-injection-testing

Test Bob The Skull with virtual audio injection instead of speaking. Use when testing wake word detection, STT accuracy, full conversation pipeline, or automated testing. Covers setup, configuration, injection methods, and troubleshooting.

majiayu000/claude-skill-registry

更新於 6d ago

wavecap-export

Export WaveCap transcriptions and pager data. Use when the user wants to export transcriptions as JSON, download reviewed transcriptions with audio, or export pager feed data.

majiayu000/claude-skill-registry

更新於 6d ago

gemini-audio

Guide for implementing Google Gemini API audio capabilities - analyze audio with transcription, summarization, and understanding (up to 9.5 hours), plus generate speech with controllable TTS. Use when processing audio files, creating transcripts, analyzing speech/music/sounds, or generating natural speech from text.

majiayu000/claude-skill-registry

更新於 6d ago

whisper-transcribe

Transcribes audio and video files to text using OpenAI's Whisper CLI with contextual grounding.Converts audio/video to text, transcribes recordings, and creates transcripts from media files.Use when asked to "whisper transcribe", "transcribe audio", "convert recording to text", or"speech to text". Uses markdown files in the same directory as context to improve transcriptionaccuracy for technical terms, proper nouns, and domain-specific vocabulary.

majiayu000/claude-skill-registry

更新於 6d ago

rtsafetyauditor

Analyze C++ code for real-time safety violations including heap allocations, locks, blocking calls, and unbounded operations in audio threads.

chrislyons/orpheus-sdk

更新於 6d ago

ai-multimodal

Process and generate multimedia content using Google Gemini API. Capabilities include analyze audio files (transcription with timestamps, summarization, speech understanding, music/sound analysis up to 9.5 hours), understand images (captioning, object detection, OCR, visual Q&A, segmentation), process videos (scene detection, Q&A, temporal analysis, YouTube URLs, up to 6 hours), extract from documents (PDF tables, forms, charts, diagrams, multi-page), generate images (text-to-image with Imagen 4, editing, composition, refinement), generate videos (text-to-video with Veo 3, 8-second clips with native audio). Use when working with audio/video files, analyzing images or screenshots, processing PDF documents, extracting structured data from media, creating images/videos from text prompts, or implementing multimodal AI features. Supports Gemini 3/2.5, Imagen 4, and Veo 3 models with context windows up to 2M tokens.

nb150301/voice-training-app

更新於 6d ago

piper

Convert text to speech using Piper, a fast, local, neural text-to-speech system with natural sounding voices. This skill is triggered when the user says things like "convert text to speech", "text to audio", "read this aloud", "create audio from text", "generate speech from text", "make an audio file from this text", or "use piper TTS".

majiayu000/claude-skill-registry

更新於 6d ago

realitykit-ar-companion

Comprehensive RealityKit skill optimized for building AR companion experiences on iOS and visionOS, with character animation, body/hand tracking, AI integration patterns, spatial audio, and entity lifecycle management

domocarroll/wayfinding-companion

更新於 6d ago

wavecap-evaluate

Evaluate WaveCap audio analysis and transcription accuracy. Use when the user wants to run regression tests, compare transcriptions against ground truth, calculate WER/CER metrics, or assess overall system quality.

TobiasWooldridge/WaveCap

更新於 6d ago