bright-data
Bright Data Web Scraper API via curl. Use this skill for scraping social media (Twitter/X, Reddit, YouTube, Instagram, TikTok), account management, and usage monitoring.
$ Installieren
git clone https://github.com/vm0-ai/vm0-skills /tmp/vm0-skills && cp -r /tmp/vm0-skills/bright-data ~/.claude/skills/vm0-skills// tip: Run this command in your terminal to install the skill
name: bright-data description: Bright Data Web Scraper API via curl. Use this skill for scraping social media (Twitter/X, Reddit, YouTube, Instagram, TikTok), account management, and usage monitoring. vm0_secrets:
- BRIGHTDATA_API_KEY
Bright Data Web Scraper API
Use the Bright Data API via direct curl calls for social media scraping, web data extraction, and account management.
Official docs:
https://docs.brightdata.com/
When to Use
Use this skill when you need to:
- Scrape social media - Twitter/X, Reddit, YouTube, Instagram, TikTok, LinkedIn
- Extract web data - Posts, profiles, comments, engagement metrics
- Monitor usage - Track bandwidth and request usage
- Manage account - Check status and zones
Prerequisites
- Sign up at Bright Data
- Get your API key from Settings > Users
- Create a Web Scraper dataset in the Control Panel to get your
dataset_id
export BRIGHTDATA_API_KEY="your-api-key"
Base URL
https://api.brightdata.com
Important: When using
$VARin a command that pipes to another command, wrap the command containing$VARinbash -c '...'. Due to a Claude Code bug, environment variables are silently cleared when pipes are used directly.bash -c 'curl -s "https://api.example.com" -H "Authorization: Bearer $API_KEY"'
Social Media Scraping
Bright Data supports scraping these social media platforms:
| Platform | Profiles | Posts | Comments | Reels/Videos |
|---|---|---|---|---|
| Twitter/X | ✅ | ✅ | - | - |
| - | ✅ | ✅ | - | |
| YouTube | ✅ | ✅ | ✅ | - |
| ✅ | ✅ | ✅ | ✅ | |
| TikTok | ✅ | ✅ | ✅ | - |
| ✅ | ✅ | - | - |
How to Use
1. Trigger Scraping (Asynchronous)
Trigger a data collection job and get a snapshot_id for later retrieval.
Write to /tmp/brightdata_request.json:
[
{"url": "https://twitter.com/username"},
{"url": "https://twitter.com/username2"}
]
Then run (replace <dataset-id> with your actual dataset ID):
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/trigger?dataset_id=<dataset-id>" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json'
Response:
{
"snapshot_id": "s_m4x7enmven8djfqak"
}
2. Trigger Scraping (Synchronous)
Get results immediately in the response (for small requests).
Write to /tmp/brightdata_request.json:
[
{"url": "https://www.reddit.com/r/technology/comments/xxxxx"}
]
Then run (replace <dataset-id> with your actual dataset ID):
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json'
3. Monitor Progress
Check the status of a scraping job (replace <snapshot-id> with your actual snapshot ID):
bash -c 'curl -s "https://api.brightdata.com/datasets/v3/progress/<snapshot-id>" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"'
Response:
{
"snapshot_id": "s_m4x7enmven8djfqak",
"dataset_id": "gd_xxxxx",
"status": "running"
}
Status values: running, ready, failed
4. Download Results
Once status is ready, download the collected data (replace <snapshot-id> with your actual snapshot ID):
bash -c 'curl -s "https://api.brightdata.com/datasets/v3/snapshot/<snapshot-id>?format=json" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"'
5. List Snapshots
Get all your snapshots:
bash -c 'curl -s "https://api.brightdata.com/datasets/v3/snapshots" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"' | jq '.[] | {snapshot_id, dataset_id, status}'
6. Cancel Snapshot
Cancel a running job (replace <snapshot-id> with your actual snapshot ID):
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/cancel?snapshot_id=<snapshot-id>" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"'
Platform-Specific Examples
Twitter/X - Scrape Profile
Write to /tmp/brightdata_request.json:
[
{"url": "https://twitter.com/elonmusk"}
]
Then run (replace <dataset-id> with your actual dataset ID):
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json'
Returns: x_id, profile_name, biography, is_verified, followers, following, profile_image_link
Twitter/X - Scrape Posts
Write to /tmp/brightdata_request.json:
[
{"url": "https://twitter.com/username/status/123456789"}
]
Then run (replace <dataset-id> with your actual dataset ID):
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json'
Returns: post_id, text, replies, likes, retweets, views, hashtags, media
Reddit - Scrape Subreddit Posts
Write to /tmp/brightdata_request.json:
[
{"url": "https://www.reddit.com/r/technology", "sort_by": "hot"}
]
Then run (replace <dataset-id> with your actual dataset ID):
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/trigger?dataset_id=<dataset-id>" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json'
Parameters: url, sort_by (new/top/hot)
Returns: post_id, title, description, num_comments, upvotes, date_posted, community
Reddit - Scrape Comments
Write to /tmp/brightdata_request.json:
[
{"url": "https://www.reddit.com/r/technology/comments/xxxxx/post_title"}
]
Then run (replace <dataset-id> with your actual dataset ID):
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json'
Returns: comment_id, user_posted, comment_text, upvotes, replies
YouTube - Scrape Video Info
Write to /tmp/brightdata_request.json:
[
{"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"}
]
Then run (replace <dataset-id> with your actual dataset ID):
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json'
Returns: title, views, likes, num_comments, video_length, transcript, channel_name
YouTube - Search by Keyword
Write to /tmp/brightdata_request.json:
[
{"keyword": "artificial intelligence", "num_of_posts": 50}
]
Then run (replace <dataset-id> with your actual dataset ID):
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/trigger?dataset_id=<dataset-id>" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json'
YouTube - Scrape Comments
Write to /tmp/brightdata_request.json:
[
{"url": "https://www.youtube.com/watch?v=xxxxx", "load_replies": 3}
]
Then run (replace <dataset-id> with your actual dataset ID):
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json'
Returns: comment_text, likes, replies, username, date
Instagram - Scrape Profile
Write to /tmp/brightdata_request.json:
[
{"url": "https://www.instagram.com/username"}
]
Then run (replace <dataset-id> with your actual dataset ID):
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/scrape?dataset_id=<dataset-id>" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json'
Returns: followers, post_count, profile_name, is_verified, biography
Instagram - Scrape Posts
Write to /tmp/brightdata_request.json:
[
{
"url": "https://www.instagram.com/username",
"num_of_posts": 20,
"start_date": "01-01-2024",
"end_date": "12-31-2024"
}
]
Then run (replace <dataset-id> with your actual dataset ID):
bash -c 'curl -s -X POST "https://api.brightdata.com/datasets/v3/trigger?dataset_id=<dataset-id>" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}" \
-H "Content-Type: application/json" \
-d @/tmp/brightdata_request.json'
Account Management
Check Account Status
bash -c 'curl -s "https://api.brightdata.com/status" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"'
Response:
{
"status": "active",
"customer": "hl_xxxxxxxx",
"can_make_requests": true,
"ip": "x.x.x.x"
}
Get Active Zones
bash -c 'curl -s "https://api.brightdata.com/zone/get_active_zones" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"' | jq '.[] | {name, type}'
Get Bandwidth Usage
bash -c 'curl -s "https://api.brightdata.com/customer/bw" \
-H "Authorization: Bearer ${BRIGHTDATA_API_KEY}"'
Getting Dataset IDs
To use the scraping features, you need a dataset_id:
- Go to Bright Data Control Panel
- Create a new Web Scraper dataset or select an existing one
- Choose the platform (Twitter, Reddit, YouTube, etc.)
- Copy the
dataset_idfrom the dataset settings
Dataset IDs can also be found in the bandwidth usage API response under the data field keys (e.g., v__ds_api_gd_xxxxx where gd_xxxxx is your dataset ID).
Common Parameters
| Parameter | Description | Example |
|---|---|---|
url | Target URL to scrape | https://twitter.com/user |
keyword | Search keyword | "artificial intelligence" |
num_of_posts | Limit number of results | 50 |
start_date | Filter by date (MM-DD-YYYY) | "01-01-2024" |
end_date | Filter by date (MM-DD-YYYY) | "12-31-2024" |
sort_by | Sort order (Reddit) | new, top, hot |
format | Response format | json, csv |
Rate Limits
- Batch mode: up to 100 concurrent requests
- Maximum input size: 1GB per batch
- Exceeding limits returns
429error
Guidelines
- Create datasets first: Use the Control Panel to create scraper datasets
- Use async for large jobs: Use
/triggerfor discovery and batch operations - Use sync for small jobs: Use
/scrapefor single URL quick lookups - Check status before download: Poll
/progressuntil status isready - Respect rate limits: Don't exceed 100 concurrent requests
- Date format: Use MM-DD-YYYY for date parameters
