音訊處理
357 skills in 內容與媒體 > 音訊處理
writing-enhancer
Rephrase or completely rewrite content matching user's preferred tone, voice, and style.
applying-brand-guidelines
Apply brand voice, tone, and style guidelines to written content across platforms. Use this skill when writing or editing content that needs to reflect a specific brand identity, adapting content for LinkedIn, Substack, X (Twitter), or other platforms while maintaining brand consistency. Triggers include requests to write on-brand content, apply brand voice, adapt content for platforms, or review content for brand alignment.
openai
OpenAI API via curl. Use this skill for GPT chat completions, DALL-E image generation, Whisper audio transcription, embeddings, and text-to-speech.
register-twilio-test-audio
Use when adding new test audio files for Twilio voice calls, uploading audio to S3, or updating the twilio_place_call.py script with new audio options.
research-to-essay
Research-driven essay and post creation with thematic synthesis, citation management, and voice calibration. Use when creating Substack/LinkedIn posts, long-form essays synthesizing multiple sources, or publication-grade writing requiring web search, narrative arc, and proper attribution. Triggers include "research and write about [topic]" or "dig into this idea and write."
writing-linkedin-posts
Create engaging, authentic LinkedIn posts like a Top Voice. Use this skill when asked to write LinkedIn content, social media posts for LinkedIn, professional thought leadership content, or help with LinkedIn engagement strategy. Triggers include requests for LinkedIn posts, professional social content, thought leadership pieces, or viral/engaging LinkedIn content.
prose-polish
Evaluate and elevate writing effectiveness through multi-dimensional quality assessment. Analyzes craft, coherence, authority, purpose, and voice with genre-calibrated thresholds. Use for refining drafts, diagnosing quality issues, generating quality content, or teaching writing principles.
discovery-interviews-surveys
Use when validating product assumptions before building, discovering unmet user needs, understanding customer problems and workflows, testing concepts or positioning, researching target markets, identifying jobs-to-be-done and hiring triggers, uncovering pain points and workarounds, or when users mention user research, customer interviews, surveys, discovery interviews, validation studies, or voice of customer.
brand-guidelines
Create a BRAND_GUIDELINES.md that defines how to communicate with your customer. Requires CUSTOMER.md to exist first. Covers voice, tone, language rules, messaging framework, and copy patterns.
ai-multimodal
Multimodal AI processing via Google Gemini API (2M tokens context). Capabilities: audio (transcription, 9.5hr max, summarization, music analysis), images (captioning, OCR, object detection, segmentation, visual Q&A), video (scene detection, 6hr max, YouTube URLs, temporal analysis), documents (PDF extraction, tables, forms, charts), image generation (text-to-image, editing). Actions: transcribe, analyze, extract, caption, detect, segment, generate from media. Keywords: Gemini API, audio transcription, image captioning, OCR, object detection, video analysis, PDF extraction, text-to-image, multimodal, speech recognition, visual Q&A, scene detection, YouTube transcription, table extraction, form processing, image generation, Imagen. Use when: transcribing audio/video, analyzing images/screenshots, extracting data from PDFs, processing YouTube videos, generating images from text, implementing multimodal AI features.
payment-integration
Payment gateway integration. Providers: SePay (Vietnamese: VietQR, bank transfer, cards), Polar (global SaaS: subscriptions, usage-based billing). SDKs: Node.js, PHP, Python, Go, Laravel, Next.js. Capabilities: checkout flows, subscription management, webhooks, QR code generation, benefit automation, tax compliance. Actions: integrate, implement, configure, handle payments/subscriptions/webhooks. Keywords: payment gateway, SePay, Polar, VietQR, bank transfer, subscription, usage-based billing, checkout, webhook, QR code, API key, OAuth2, product management, customer portal, tax compliance, MoR, recurring payment, invoice. Use when: integrating payment processing, implementing checkout, managing subscriptions, handling payment webhooks, generating payment QR codes, building billing systems.
agent-orchestrator
Spawn, monitor, and manage Claude Code agents in parallel tmux sessions. Supports simple ad-hoc agents and complex DAG-based multi-agent orchestration with wave execution.
story-explanation
Create compelling story-format summaries using UltraThink to find the best narrative framing. Support multiple formats - 3-part narrative, n-length with inline links, abridged 5-line, or comprehensive via Foundry MCP. USE WHEN user says 'create story explanation', 'narrative summary', 'explain as a story', or wants content in Daniel's conversational first-person voice.
ui-audio-theme
Generate cohesive UI audio themes with subtle, minimal sound effects for applications. This skill should be used when users want to create a set of coordinated interface sounds for wallet apps, dashboards, or web applications - generating sounds mapped to UI interaction constants like button clicks, notifications, and navigation transitions using ElevenLabs API.
brand-voice
Defines and maintains consistent brand communication across all marketing materials. This skill should be used when creating new marketing content, auditing existing materials for voice consistency, onboarding team members to brand guidelines, or when content sounds generic or "off-brand."
content-creator
Create SEO-optimized marketing content with consistent brand voice. Includes brand voice analyzer, SEO optimizer, content frameworks, and social media templates. Use when writing blog posts, creating social media content, analyzing brand voice, optimizing SEO, planning content calendars, or when user mentions content creation, brand voice, SEO optimization, social media marketing, or content strategy.
Vram-GPU-OOM
GPU VRAM management patterns for sharing memory across services (Ollama, Whisper, ComfyUI). OOM retry logic, auto-unload on idle, and service signaling protocol.
ai-multimodal
Process and generate multimedia content using Google Gemini API for better vision capabilities. Capabilities include analyze audio files (transcription with timestamps, summarization, speech understanding, music/sound analysis up to 9.5 hours), understand images (better image analysis than Claude models, captioning, reasoning, object detection, design extraction, OCR, visual Q&A, segmentation, handle multiple images), process videos (scene detection, Q&A, temporal analysis, YouTube URLs, up to 6 hours), extract from documents (PDF tables, forms, charts, diagrams, multi-page), generate images (text-to-image with Imagen 4, editing, composition, refinement), generate videos (text-to-video with Veo 3, 8-second clips with native audio). Use when working with audio/video files, analyzing images or screenshots (instead of default vision capabilities of Claude, only fallback to Claude's vision capabilities if needed), processing PDF documents, extracting structured data from media, creating images/videos from text prompts, or implementing multimodal AI features. Supports Gemini 3/2.5, Imagen 4, and Veo 3 models with context windows up to 2M tokens.
Whisper-Transcription
Audio transcription using local whisper.cpp server with CUDA acceleration. HTTP API for speech-to-text conversion.
content-atomizer
Repurposes single content pieces into multiple formats for maximum distribution while maintaining brand voice. This skill should be used when maximizing ROI from pillar content, filling content calendars efficiently, reaching audiences across multiple platforms, or when creating original content for every channel feels unsustainable.