Unnamed Skill
Fine-tuning Text-to-Speech (TTS) models with Unsloth for voice cloning and synthetic speech (triggers: TTS, text-to-speech, voice cloning, Orpheus-TTS, audio fine-tuning, speech synthesis).
$ Installer
git clone https://github.com/majiayu000/claude-skill-registry /tmp/claude-skill-registry && cp -r /tmp/claude-skill-registry/skills/development/unsloth-tts ~/.claude/skills/claude-skill-registry// tip: Run this command in your terminal to install the skill
SKILL.md
name: unsloth-tts description: Fine-tuning Text-to-Speech (TTS) models with Unsloth for voice cloning and synthetic speech (triggers: TTS, text-to-speech, voice cloning, Orpheus-TTS, audio fine-tuning, speech synthesis).
Overview
Unsloth-tts brings the library's performance optimizations to speech synthesis models. Specifically optimized for the Llama-based Orpheus-TTS architecture, it enables high-quality voice cloning with 50% less memory than standard implementations.
When to Use
- When creating custom voice clones that require realistic phrasing and emotional quirks.
- When fine-tuning Orpheus-TTS for specialized synthesis (e.g., laughter, sighs).
- When deploying TTS models via llama.cpp or GGUF.
Decision Tree
- Is high audio quality the priority over VRAM?
- Yes: Load with
load_in_4bit = False.
- Yes: Load with
- Does the target voice have specific quirks (e.g., laughter)?
- Yes: Annotate transcripts with emotion tags like
<laughs>.
- Yes: Annotate transcripts with emotion tags like
- Is this a quick test or full replication?
- Quick test: Set
max_steps = 60. - Full replication: Train for 1 full epoch with 1-3 hours of data.
- Quick test: Set
Workflows
- Voice Cloning with LoRA: Load
orpheus-3b-0.1-pretrained, prepare 1-3 hours of annotated audio/text data, and apply LoRA adapters. - Audio Data Preprocessing: Cast audio columns to 24,000Hz (for Orpheus) and tokenize text while preserving special tags like
<sigh>. - Training and Evaluation: Train with low batch size (1) and high learning rate (2e-4), saving only the LoRA adapters for portability.
Non-Obvious Insights
- Zero-shot voice cloning often fails to capture subtle phrasing; LoRA fine-tuning is necessary for true personality replication.
- Orpheus-TTS is highly beginner-friendly because its Llama-based architecture is compatible with existing LLM tools like GGUF.
- Unlike some TTS systems, models like Orpheus can decode audio output tokens directly into waveforms without needing a separate vocoder.
Evidence
- "Unsloth supports any transformers compatible TTS model... 1.5x faster with 50% less memory than other implementations." Source
- "Orpheus supports tags like , , ... These tags are enclosed in angle brackets and will be treated as special tokens." Source
Scripts
scripts/unsloth-tts_tool.py: Audio preprocessing and Orpheus loading.scripts/unsloth-tts_tool.js: Utility for managing transcript tagging.
Dependencies
unslothlibrosa/soundfiledatasets
References
Repository

majiayu000
Author
majiayu000/claude-skill-registry/skills/development/unsloth-tts
0
Stars
0
Forks
Updated6h ago
Added1w ago