consolidate-transcripts
Consolidate transcripts from a channel into a single file, sorted by date (newest first), up to 800K tokens. Use when preparing transcripts for LLM context or bulk analysis.
$ 安裝
git clone https://github.com/dparedesi/YTScribe /tmp/YTScribe && cp -r /tmp/YTScribe/.agent/skills/consolidate-transcripts ~/.claude/skills/YTScribe// tip: Run this command in your terminal to install the skill
SKILL.md
name: consolidate-transcripts description: Consolidate transcripts from a channel into a single file, sorted by date (newest first), up to 800K tokens. Use when preparing transcripts for LLM context or bulk analysis.
Consolidate Transcripts
Why? LLMs have context limits. This skill merges multiple transcripts into a single file with accurate token counting, so you can feed an entire channel's content to Claude or GPT without exceeding limits.
Quick Start
python scripts/consolidate_transcripts.py <channel_name>
Output: data/<channel_name>/<channel_name>-consolidated.md
Workflow
1. Identify the Channel
List available channels:
ls data/
2. Choose Token Limit
| Use Case | Recommended Limit | Flag |
|---|---|---|
| Claude (200K context) | 150000 | --limit 150000 |
| GPT-4 Turbo (128K) | 100000 | --limit 100000 |
| Full archive (Claude Pro) | 800000 | (default) |
| Quick sample | 50000 | --limit 50000 |
[!TIP] The default 800K limit leaves ~200K tokens for prompts and responses when using Claude's 1M context.
3. Run Consolidation
python scripts/consolidate_transcripts.py <channel_name> [--limit TOKENS] [--verbose]
Examples:
# Default (800K tokens)
python scripts/consolidate_transcripts.py library-of-minds
# Custom limit for GPT-4
python scripts/consolidate_transcripts.py aws-reinvent-2025 --limit 100000
# Verbose output showing all included files
python scripts/consolidate_transcripts.py dwarkesh-patel --verbose
4. Verify Output
Check the consolidated file was created:
ls -la data/<channel_name>/*-consolidated.md
Parameters
| Option | Description | Default |
|---|---|---|
channel_name | Folder name in data/ | Required |
--limit, -l | Maximum tokens to include | 800000 |
--verbose, -v | Show detailed file list | False |
Output Format
The consolidated file includes:
- Header — Generation metadata, total transcripts, token/word counts
- Table of Contents — Dates, titles, tokens, words per transcript
- Transcripts — Full text with title, date, author, source URL
Troubleshooting
| Problem | Cause | Solution |
|---|---|---|
ModuleNotFoundError: tiktoken | tiktoken not installed | pip install tiktoken |
No transcripts found | Empty transcripts folder | Run transcript-download first |
FileNotFoundError | Channel doesn't exist | Check ls data/ for valid names |
| Output file is small | Few transcripts available | Use --verbose to see what was included |
| Token count seems wrong | Old tiktoken version | pip install --upgrade tiktoken |
Common Mistakes
- Wrong channel name — Use the folder name exactly as shown in
ls data/, not the YouTube channel name. - Forgetting to download transcripts first — Consolidation requires transcripts to exist. Run
/download-transcriptsfirst. - Using too high a limit — If you exceed your LLM's context, you'll get truncation errors. Use the limit guide above.
- Expecting real-time updates — Re-run consolidation after downloading new transcripts.
Reference
- Transcripts sorted newest first (descending by date)
- Files without dates in filename are placed last
- Token counting uses
cl100k_baseencoding (GPT-4/Claude compatible) - Consolidated files are gitignored (not committed)
- Re-running overwrites the previous consolidated file
Repository

dparedesi
Author
dparedesi/YTScribe/.agent/skills/consolidate-transcripts
6
Stars
0
Forks
Updated4d ago
Added1w ago