doc-scraper
Generic web scraper for extracting and organizing Snowflake documentation with intelligent caching and configurable spider depth. Scrapes any section of docs.snowflake.com controlled by --base-path.
$ Installieren
git clone https://github.com/sfc-gh-dflippo/snowflake-dbt-demo /tmp/snowflake-dbt-demo && cp -r /tmp/snowflake-dbt-demo/.claude/skills/doc-scraper ~/.claude/skills/snowflake-dbt-demo// tip: Run this command in your terminal to install the skill
SKILL.md
name: doc-scraper description: Generic web scraper for extracting and organizing Snowflake documentation with intelligent caching and configurable spider depth. Scrapes any section of docs.snowflake.com controlled by --base-path.
Snowflake Documentation Scraper
Scrapes docs.snowflake.com sections to Markdown with SQLite caching (7-day expiration).
Usage
First time setup (auto-installs uv and doc-scraper):
python3 .claude/skills/doc-scraper/scripts/doc_scraper.py
Subsequent runs:
doc-scraper --output-dir=./snowflake-docs
doc-scraper --output-dir=./snowflake-docs --base-path="/en/sql-reference/"
doc-scraper --output-dir=./snowflake-docs --spider-depth=2
Command Options
| Option | Default | Description |
|---|---|---|
--output-dir | Required | Output directory for scraped docs |
--base-path | /en/migrations/ | URL section to scrape |
--spider-depth | 1 | Link depth: 0=seeds, 1=+links, 2=+2nd |
--limit | None | Cap URLs (for testing) |
--dry-run | - | Preview without writing |
Output
output-dir/
âââ SKILL.md # Auto-generated index
âââ scraper_config.yaml # Editable config (auto-created)
âââ .cache/ # SQLite cache (auto-managed)
âââ en/migrations/*.md # Scraped pages with frontmatter
Configuration
Auto-created at {output-dir}/scraper_config.yaml:
rate_limiting:
max_concurrent_threads: 4
spider:
max_pages: 1000
allowed_paths: ["/en/"]
scraped_pages:
expiration_days: 7
Troubleshooting
| Issue | Solution |
|---|---|
| Too many pages | Lower --spider-depth or edit config |
| Missing pages | Increase --spider-depth |
| Cache corruption | Delete {output-dir}/.cache/ (rare) |
Repository

sfc-gh-dflippo
Author
sfc-gh-dflippo/snowflake-dbt-demo/.claude/skills/doc-scraper
23
Stars
6
Forks
Updated6d ago
Added1w ago