name: exa-get-contents description: Retrieve and extract content from URLs with AI-powered summarization and structured data extraction. Use for scraping web pages, extracting specific information, summarizing articles, or crawling websites with subpages.

Exa Get Contents

Token-efficient strategies for retrieving and extracting content from URLs using exa-ai.

Use --help to see available commands and verify usage before running:

exa-ai <command> --help

Critical Requirements

MUST follow these rules when using exa-ai get-contents:

Shared Requirements

This skill inherits requirements from Common Requirements:

Schema design patterns → All schema operations
Output format selection → All output operations

MUST Rules

Always use livecrawl: Include --livecrawl-timeout 10000 for fresh, up-to-date content instead of cached results

SHOULD Rules

Prefer --summary over --text: Use summaries with schemas for structured extraction instead of full text for better token efficiency

Cost Optimization

Pricing

Per piece of content: $0.001

Each URL counts as one piece of content. Multiple URLs increase cost linearly.

Cost strategy:

Only fetch URLs you need
Use --summary instead of --text to reduce processing (and token costs)
Combine with search results to target specific URLs rather than crawling broadly

Token Optimization

Apply these strategies:

Use toon format: --output-format toon for 40% fewer tokens than JSON (use when reading output directly)
Use JSON + jq: Extract only needed fields with jq (use when piping/processing output)
Use --summary: Get AI-generated summaries instead of full page text
Use schemas: Extract structured data with --summary-schema (always pipe to jq)
Limit extraction: Use --text-max-characters, --links, and --image-links to control output size

IMPORTANT: Choose one approach, don't mix them:

Approach 1: toon only - Compact YAML-like output for direct reading
Approach 2: JSON + jq - Extract specific fields programmatically
Approach 3: Schemas + jq - Get structured data, always use JSON output (default) and pipe to jq

Examples:

# ❌ High token usage - full text
exa-ai get-contents "https://example.com" --text --livecrawl-timeout 10000

# ✅ Approach 1: toon format with summary (70% reduction)
exa-ai get-contents "https://example.com" --summary --livecrawl-timeout 10000 --output-format toon

# ✅ Approach 2: JSON + jq for summary extraction (80% reduction)
exa-ai get-contents "https://example.com" --summary --livecrawl-timeout 10000 | jq '.results[].summary'

# ✅ Approach 3: Schema + jq for structured extraction (85% reduction)
exa-ai get-contents "https://example.com" \
  --summary \
  --livecrawl-timeout 10000 \
  --summary-schema '{"type":"object","properties":{"key_info":{"type":"string"}}}' | \
  jq -r '.results[].summary | fromjson | .key_info'

# ❌ Don't mix toon with jq (toon is YAML-like, not JSON)
exa-ai get-contents "https://example.com" --output-format toon | jq -r '.results'

Quick Start

Basic Content with Summary

exa-ai get-contents "https://anthropic.com" --summary --livecrawl-timeout 10000 --output-format toon

Custom Summary Query

exa-ai get-contents "https://techcrunch.com" \
  --summary \
  --livecrawl-timeout 10000 \
  --summary-query "What are the main tech news stories on this page?" | jq '.results[].summary'

Structured Data Extraction

exa-ai get-contents "https://www.stripe.com" \
  --summary \
  --livecrawl-timeout 10000 \
  --summary-schema '{"type":"object","properties":{"company_name":{"type":"string"},"main_product":{"type":"string"},"target_market":{"type":"string"}}}' | jq -r '.results[].summary | fromjson'

Multiple URLs

exa-ai get-contents "https://anthropic.com,https://openai.com,https://cohere.com" \
  --summary \
  --livecrawl-timeout 10000 \
  --output-format toon

Detailed Reference

For complete options, examples, and advanced usage, consult REFERENCE.md.

Shared Requirements

Schema Design

MUST: Use object wrapper for schemas

Applies to: answer, search, find-similar, get-contents

When using schema parameters (--output-schema or --summary-schema), always wrap properties in an object:

{"type":"object","properties":{"field_name":{"type":"string"}}}

DO NOT use bare properties without the object wrapper:

{"properties":{"field_name":{"type":"string"}}}  // ❌ Missing "type":"object"

Why: The Exa API requires a valid JSON Schema with an object type at the root level. Omitting this causes validation errors.

Examples:

# ✅ CORRECT - object wrapper included
exa-ai search "AI news" \
  --summary-schema '{"type":"object","properties":{"headline":{"type":"string"}}}'

# ❌ WRONG - missing object wrapper
exa-ai search "AI news" \
  --summary-schema '{"properties":{"headline":{"type":"string"}}}'

Output Format Selection

MUST NOT: Mix toon format with jq

Applies to: answer, context, search, find-similar, get-contents

toon format produces YAML-like output, not JSON. DO NOT pipe toon output to jq for parsing:

# ❌ WRONG - toon is not JSON
exa-ai search "query" --output-format toon | jq -r '.results'

# ✅ CORRECT - use JSON (default) with jq
exa-ai search "query" | jq -r '.results[].title'

# ✅ CORRECT - use toon for direct reading only
exa-ai search "query" --output-format toon

Why: jq expects valid JSON input. toon format is designed for human readability and produces YAML-like output that jq cannot parse.

SHOULD: Choose one output approach

Applies to: answer, context, search, find-similar, get-contents

Pick one strategy and stick with it throughout your workflow:

Approach 1: toon only - Compact YAML-like output for direct reading
- Use when: Reading output directly, no further processing needed
- Token savings: ~40% reduction vs JSON
- Example: exa-ai search "query" --output-format toon
Approach 2: JSON + jq - Extract specific fields programmatically
- Use when: Need to extract specific fields or pipe to other commands
- Token savings: ~80-90% reduction (extracts only needed fields)
- Example: exa-ai search "query" | jq -r '.results[].title'
Approach 3: Schemas + jq - Structured data extraction with validation
- Use when: Need consistent structured output across multiple queries
- Token savings: ~85% reduction + consistent schema
- Example: exa-ai search "query" --summary-schema '{...}' | jq -r '.results[].summary | fromjson'

Why: Mixing approaches increases complexity and token usage. Choosing one approach optimizes for your use case.

Shell Command Best Practices

MUST: Run commands directly, parse separately

Applies to: monitor, search (websets), research, and all skills using complex commands

When using the Bash tool with complex shell syntax, run commands directly and parse output in separate steps:

# ❌ WRONG - nested command substitution
webset_id=$(exa-ai webset-create --search '{"query":"..."}' | jq -r '.webset_id')

# ✅ CORRECT - run directly, then parse
exa-ai webset-create --search '{"query":"..."}'
# Then in a follow-up command:
webset_id=$(cat output.json | jq -r '.webset_id')

Why: Complex nested $(...) command substitutions can fail unpredictably in shell environments. Running commands directly and parsing separately improves reliability and makes debugging easier.

MUST NOT: Use nested command substitutions

Applies to: All skills when using complex multi-step operations

Avoid nesting multiple levels of command substitution:

# ❌ WRONG - deeply nested
result=$(exa-ai search "$(cat query.txt | tr '\n' ' ')" --num-results $(cat config.json | jq -r '.count'))

# ✅ CORRECT - sequential steps
query=$(cat query.txt | tr '\n' ' ')
count=$(cat config.json | jq -r '.count')
exa-ai search "$query" --num-results $count

Why: Nested command substitutions are fragile and hard to debug when they fail. Sequential steps make each operation explicit and easier to troubleshoot.

SHOULD: Break complex commands into sequential steps

Applies to: All skills when working with multi-step workflows

For readability and reliability, break complex operations into clear sequential steps:

# ❌ Less maintainable - everything in one line
exa-ai webset-create --search '{"query":"startups","count":1}' | jq -r '.webset_id' | xargs -I {} exa-ai webset-search-create {} --query "AI" --behavior override

# ✅ More maintainable - clear steps
exa-ai webset-create --search '{"query":"startups","count":1}'
webset_id=$(jq -r '.webset_id' < output.json)
exa-ai webset-search-create $webset_id --query "AI" --behavior override

Why: Sequential steps are easier to understand, debug, and modify. Each step can be verified independently.

exa-get-contents

$ Installieren

name: exa-get-contents description: Retrieve and extract content from URLs with AI-powered summarization and structured data extraction. Use for scraping web pages, extracting specific information, summarizing articles, or crawling websites with subpages.

Exa Get Contents

Critical Requirements

Shared Requirements

MUST Rules

SHOULD Rules

Cost Optimization

Pricing

Token Optimization

Quick Start

Basic Content with Summary

Custom Summary Query

Structured Data Extraction

Multiple URLs

Detailed Reference

Shared Requirements

Schema Design

MUST: Use object wrapper for schemas

Output Format Selection

MUST NOT: Mix toon format with jq

SHOULD: Choose one output approach

Shell Command Best Practices

MUST: Run commands directly, parse separately

MUST NOT: Use nested command substitutions

SHOULD: Break complex commands into sequential steps

Repository

Actions

Related Skills