Marketplace

doc-splitter

Split large documentation (10K+ pages) into focused sub-skills with intelligent routing. Use for massive doc sites like Godot, AWS, or MSDN. Use when relevant to the task.

$ インストール

git clone https://github.com/jmagly/ai-writing-guide /tmp/ai-writing-guide && cp -r /tmp/ai-writing-guide/.factory/skills/doc-splitter ~/.claude/skills/ai-writing-guide

// tip: Run this command in your terminal to install the skill


name: doc-splitter description: Split large documentation (10K+ pages) into focused sub-skills with intelligent routing. Use for massive doc sites like Godot, AWS, or MSDN. Use when relevant to the task.

Documentation Splitter Skill

Purpose

Single responsibility: Split large documentation sites into multiple focused sub-skills with an optional router skill for intelligent navigation. (BP-4)

Grounding Checkpoint (Archetype 1 Mitigation)

Before executing, VERIFY:

  • Total page count is known (run estimation first)
  • Documentation categories are identifiable
  • Target skill size determined (default: 5,000 pages per skill)
  • Router strategy selected (category, size, or hybrid)

DO NOT split without understanding documentation structure.

Uncertainty Escalation (Archetype 2 Mitigation)

ASK USER instead of guessing when:

  • Category boundaries unclear
  • Optimal skill size uncertain for target use case
  • Cross-references between sections complicate splitting
  • Router vs flat structure decision needed

NEVER arbitrarily split - seek user guidance on boundaries.

Context Scope (Archetype 3 Mitigation)

Context TypeIncludedExcluded
RELEVANTDoc structure, categories, page countsActual page content
PERIPHERALSimilar large doc examplesOther documentation
DISTRACTORContent quality concernsIndividual page issues

Size Guidelines

Documentation SizeRecommendationStrategy
< 5,000 pagesOne skillNo splitting
5,000 - 10,000 pagesConsider splittingCategory-based
10,000 - 30,000 pagesRecommendedRouter + Categories
30,000+ pagesStrongly recommendedRouter + Categories

Workflow Steps

Step 1: Estimate Documentation Size (Grounding)

# Quick estimation with skill-seekers
skill-seekers estimate configs/large-docs.json

# Output:
# 📊 ESTIMATION RESULTS
# ✅ Pages Discovered: 28,450
# 📈 Estimated Total: 32,000
# ⏱️  Time Elapsed: 2.1 minutes
# 💡 Recommended: Split into 6-7 sub-skills

Step 2: Analyze Category Structure

# Identify natural category boundaries
skill-seekers analyze --config configs/large-docs.json --categories

# Output:
# Categories detected:
# - scripting: 8,200 pages
# - 2d: 5,400 pages
# - 3d: 9,100 pages
# - physics: 4,300 pages
# - networking: 2,800 pages
# - editor: 2,200 pages

Step 3: Choose Split Strategy

StrategyBest ForDescription
categoryClear topic divisionsSplit by documentation sections
sizeUniform distributionSplit every N pages
routerUser navigationHub skill + specialized sub-skills
hybridComplex docsCategories + size limits per category

Step 4: Execute Split

Option A: With skill-seekers

# Category-based split
skill-seekers split --config configs/godot.json --strategy category

# Router-based split (recommended for large docs)
skill-seekers split --config configs/godot.json --strategy router

# Size-based split
skill-seekers split --config configs/godot.json --strategy size --pages-per-skill 5000

Option B: Manual split configuration

{
  "name": "godot",
  "max_pages": 40000,
  "split_strategy": "router",
  "split_config": {
    "target_pages_per_skill": 5000,
    "create_router": true,
    "categories": {
      "scripting": {
        "patterns": ["/scripting/", "/gdscript/", "/c_sharp/"],
        "max_pages": 8000
      },
      "2d": {
        "patterns": ["/2d/", "/sprite/", "/tilemap/"],
        "max_pages": 6000
      },
      "3d": {
        "patterns": ["/3d/", "/mesh/", "/spatial/"],
        "max_pages": 10000
      },
      "physics": {
        "patterns": ["/physics/", "/collision/", "/rigidbody/"],
        "max_pages": 5000
      }
    }
  }
}

Step 5: Scrape Sub-Skills

# Scrape all sub-skills in parallel
for config in configs/godot-*.json; do
  skill-seekers scrape --config $config &
done
wait

# Or sequentially with progress
for config in configs/godot-*.json; do
  echo "Processing: $config"
  skill-seekers scrape --config $config
done

Step 6: Generate Router Skill

# Auto-generate router from sub-skills
skill-seekers generate-router configs/godot-*.json

# Creates godot-router skill that intelligently routes queries

Step 7: Validate Split Results

# Check sub-skill sizes
for dir in output/godot-*/; do
  echo "$dir: $(find $dir -name "*.md" | wc -l) files"
done

# Verify router coverage
cat output/godot-router/SKILL.md | grep -A 50 "## Sub-Skills"

Recovery Protocol (Archetype 4 Mitigation)

On error:

  1. PAUSE - Note which sub-skill failed
  2. DIAGNOSE - Check error type:
    • Category overlap → Refine URL patterns
    • Uneven split → Adjust page limits
    • Orphan pages → Add catch-all category
    • Router incomplete → Regenerate after all sub-skills done
  3. ADAPT - Modify split configuration
  4. RETRY - Re-split affected category (max 3 attempts)
  5. ESCALATE - Present split preview, ask user for boundary adjustments

Checkpoint Support

State saved to: .aiwg/working/checkpoints/doc-splitter/

checkpoints/doc-splitter/
├── estimation.json         # Page count results
├── category_analysis.json  # Category breakdown
├── split_plan.json         # Planned split configuration
├── progress/
│   ├── godot-scripting.json
│   ├── godot-2d.json
│   └── ...
└── router_draft.md         # Router skill draft

Output Structure

After splitting large documentation:

configs/
├── godot.json              # Original config
├── godot-scripting.json    # Generated sub-config
├── godot-2d.json
├── godot-3d.json
├── godot-physics.json
└── godot-router.json       # Router config

output/
├── godot-scripting/        # Sub-skill
│   ├── SKILL.md
│   └── references/
├── godot-2d/               # Sub-skill
├── godot-3d/               # Sub-skill
├── godot-physics/          # Sub-skill
└── godot-router/           # Router skill
    ├── SKILL.md            # Routing logic
    └── references/
        └── routing-table.md

Router Skill Structure

The generated router skill:

# Godot Documentation Router

## Purpose
Route queries to the appropriate specialized Godot sub-skill.

## Sub-Skills

| Topic | Skill | Coverage |
|-------|-------|----------|
| GDScript, C#, scripting patterns | godot-scripting | 8,200 pages |
| 2D graphics, sprites, tilemaps | godot-2d | 5,400 pages |
| 3D graphics, meshes, materials | godot-3d | 9,100 pages |
| Physics, collisions, rigid bodies | godot-physics | 4,300 pages |

## Routing Rules

1. **Scripting questions** → godot-scripting
   - Keywords: script, gdscript, c#, function, variable, class

2. **2D graphics questions** → godot-2d
   - Keywords: sprite, 2d, tilemap, animation2d, canvas

3. **3D graphics questions** → godot-3d
   - Keywords: mesh, 3d, spatial, material, shader, camera3d

4. **Physics questions** → godot-physics
   - Keywords: physics, collision, rigidbody, area, raycast

## Usage

Ask your question naturally. This router will direct you to the appropriate specialized skill.

Example:
- "How do I create a player movement script?" → godot-scripting
- "How do I set up tilemap collisions?" → godot-2d
- "How do I apply materials to a mesh?" → godot-3d

Troubleshooting

IssueDiagnosisSolution
Uneven splitsCategory size variesUse hybrid strategy with max_pages
Orphan pagesURL patterns incompleteAdd catch-all or refine patterns
Router confusionOverlapping keywordsMake routing rules more specific
Too many skillsOver-segmentedMerge related categories

References