name: enact/firecrawl version: 1.2.0 description: Scrape, crawl, search, and extract structured data from websites using Firecrawl API - converts web pages to LLM-ready markdown enact: "2.0"

from: python:3.12-slim

build:

pip install requests

env: FIRECRAWL_API_KEY: description: Your Firecrawl API key from firecrawl.dev secret: true

command: python /workspace/firecrawl.py ${action} ${url} ${formats} ${limit} ${only_main_content} ${prompt} ${schema}

timeout: 300s

license: MIT

tags:

web-scraping
crawling
markdown
llm
ai
data-extraction
search
structured-data

annotations: readOnlyHint: true openWorldHint: true

inputSchema: type: object properties: action: type: string description: | The action to perform: - scrape: Extract content from a single URL - crawl: Discover and scrape all subpages of a website - map: Get all URLs from a website (fast discovery) - search: Search the web and get scraped results - extract: Extract structured data using AI enum: - scrape - crawl - map - search - extract default: scrape url: type: string description: The URL to process (for scrape, crawl, map, extract) or search query (for search action) formats: type: string description: Comma-separated output formats (markdown, html, links, screenshot). Used by scrape and crawl actions. default: markdown limit: type: integer description: Maximum number of pages to crawl (crawl action) or search results to return (search action) default: 10 only_main_content: type: boolean description: Extract only the main content, excluding headers, navs, footers (scrape action) default: true prompt: type: string description: | Multi-purpose field: - For map: Search query to filter URLs - For extract: Natural language instruction for what to extract default: "" schema: type: string description: JSON schema string for structured extraction (extract action only). Define the shape of data you want to extract. default: "" required: - url

outputSchema: type: object properties: success: type: boolean description: Whether the operation succeeded action: type: string description: The action that was performed url: type: string description: The URL or query that was processed data: type: object description: The scraped/crawled/extracted data including markdown, metadata, and structured content error: type: string description: Error message if the operation failed

examples:

input: url: "https://example.com" action: "scrape" description: Scrape a single page and get markdown
input: url: "https://docs.example.com" action: "crawl" limit: 5 description: Crawl a documentation site (up to 5 pages)
input: url: "https://example.com" action: "map" description: Get all URLs from a website
input: url: "latest AI news" action: "search" limit: 5 description: Search the web and get scraped results
input: url: "https://news.ycombinator.com" action: "extract" prompt: "Extract the top 5 news headlines with their URLs and point counts" description: Extract structured data from a page using AI

Firecrawl Web Scraping Tool

A powerful web scraping tool that uses the Firecrawl API to convert websites into clean, LLM-ready markdown and extract structured data.

Features

Scrape: Extract content from a single URL as markdown, HTML, or with screenshots
Crawl: Automatically discover and scrape all accessible subpages of a website
Map: Get a list of all URLs from a website without scraping content (extremely fast)
Search: Search the web and get full scraped content from results
Extract: Use AI to extract structured data from pages with natural language prompts

Setup

Get an API key from firecrawl.dev

Set your API key as a secret:

enact env set FIRECRAWL_API_KEY <your-api-key> --secret --namespace enact

This stores your API key securely in your OS keyring (macOS Keychain, Windows Credential Manager, or Linux Secret Service).

Usage Examples

Scrape a single page

enact run enact/firecrawl --url "https://example.com" --action scrape

Crawl an entire documentation site

enact run enact/firecrawl --url "https://docs.example.com" --action crawl --limit 20

Map all URLs on a website

enact run enact/firecrawl --url "https://example.com" --action map

Search the web

enact run enact/firecrawl --url "latest AI developments 2024" --action search --limit 5

Extract structured data with AI

enact run enact/firecrawl --url "https://news.ycombinator.com" --action extract --prompt "Extract the top 10 news headlines with their URLs"

Extract with a JSON schema

enact run enact/firecrawl \
  --url "https://example.com/pricing" \
  --action extract \
  --prompt "Extract pricing information" \
  --schema '{"type":"object","properties":{"plans":{"type":"array","items":{"type":"object","properties":{"name":{"type":"string"},"price":{"type":"string"}}}}}}'

Output

The tool returns JSON with:

markdown: Clean, LLM-ready content
metadata: Title, description, language, source URL
extract: Structured data (for extract action)
links: Discovered URLs (for map action)

API Features

Firecrawl handles the hard parts of web scraping:

Anti-bot mechanisms
Dynamic JavaScript content
Proxies and rate limiting
PDF and document parsing
Screenshot capture

enact/firecrawl

$ 安裝