Fetch web pages as clean Markdown for AI-agent workflows.
中文 | English
- Markdown-first output pipeline -- readability extraction + HTML-to-Markdown conversion, so agents receive clean text instead of noisy HTML/JS/CSS
- Headless browser fallback -- renders JavaScript-heavy pages (SPAs, dynamic dashboards) when static extraction falls short
- Custom request headers -- send
Authorization,Cookie, or any header to access authenticated endpoints - Multi-URL batch fetching -- fetch multiple pages concurrently with structured, per-URL output
# Install
go install github.com/firede/agent-fetch/cmd/agent-fetch@latest
# Fetch a page
agent-fetch https://example.comOr download a prebuilt binary from Releases.
In the default auto mode, agent-fetch runs a three-stage fallback pipeline:
Request with Accept: text/markdown
|
v
Markdown response? --yes--> Return as-is
| no
v
Static HTML extraction
+ Markdown conversion --quality OK?--> Return
| no
v
Headless browser render
+ extraction + conversion --> Return
This means most pages are handled without a browser, keeping things fast, while JS-heavy pages still get rendered correctly.
| Mode | Behavior | Browser needed |
|---|---|---|
auto (default) |
Three-stage fallback: native Markdown -> static extraction -> browser render | Only when static quality is low |
static |
Static HTML extraction only, no browser | No |
browser |
Always use headless Chrome/Chromium | Yes |
raw |
Send Accept: text/markdown, return HTTP body verbatim |
No |
- Download the archive for your platform from GitHub Releases.
- Extract and make the binary executable:
chmod +x ./agent-fetch- Move the binary to a directory on your
PATH, or run it directly:
./agent-fetch https://example.comRelease binaries are not yet notarized by Apple, so Gatekeeper may block execution. Remove the quarantine attribute to proceed:
xattr -dr com.apple.quarantine ./agent-fetchgo install github.com/firede/agent-fetch/cmd/agent-fetch@latestInstall a specific version:
go install github.com/firede/agent-fetch/cmd/agent-fetch@v0.3.0Ensure $(go env GOPATH)/bin (usually ~/go/bin) is in your PATH.
agent-fetch [options] <url> [url ...]| Flag | Default | Description |
|---|---|---|
--mode |
auto |
Fetch mode: auto | static | browser | raw |
--meta |
true |
Prepend title/description front matter (use --meta=false to disable) |
--timeout |
20s |
HTTP request timeout (applies to static/auto modes) |
--browser-timeout |
30s |
Page-load timeout (applies to browser/auto modes) |
--network-idle |
1200ms |
Wait time after last network activity before capturing content |
--wait-selector |
CSS selector to wait for before capturing, e.g. article |
|
--header |
Custom request header, repeatable. e.g. --header 'Authorization: Bearer token' |
|
--user-agent |
agent-fetch/0.1 |
User-Agent header |
--max-body-bytes |
8388608 |
Max response bytes to read |
--concurrency |
4 |
Max concurrent fetches for multi-URL requests |
--doctor |
false |
Run environment checks (runtime + headless browser readiness) and print remediation guidance |
--browser-path |
Browser executable path/name override for browser and auto modes |
# Default auto mode
agent-fetch https://example.com
# Force browser rendering for a JS-heavy page
agent-fetch --mode browser --wait-selector 'article' https://example.com
# Force a specific browser binary (useful in containers/custom installs)
agent-fetch --mode browser --browser-path /usr/bin/chromium https://example.com
# Static extraction without front matter
agent-fetch --mode static --meta=false https://example.com
# Get raw HTTP response body
agent-fetch --mode raw https://example.com
# Authenticated request
agent-fetch --header "Authorization: Bearer $TOKEN" https://example.com
# Batch fetch with concurrency control
agent-fetch --concurrency 4 https://example.com https://example.org
# Check environment readiness
agent-fetch --doctor
# Check environment readiness with explicit browser path
agent-fetch --doctor --browser-path /usr/bin/chromiumWhen multiple URLs are provided, requests run concurrently (controlled by --concurrency) and output is emitted in input order using task markers:
<!-- count: 3, succeeded: 2, failed: 1 -->
<!-- task[1]: https://example.com/hello -->
...markdown...
<!-- /task[1] -->
<!-- task[2](failed): https://abc.com -->
<!-- error[2]: ... -->
Exit codes: 0 all succeeded, 1 any task failed, 2 argument/usage error.
This project ships a SKILL.md that can be used with coding agents that support skill files. Point your skill directory to skills/agent-fetch and the agent will be able to invoke agent-fetch when its built-in fetch capability is insufficient.
agent-fetch reads from the command line and writes Markdown to stdout, making it easy to integrate into any agent pipeline or shell-based tool call:
result=$(agent-fetch --mode static https://example.com)The table below compares agent-fetch with the built-in web-fetch capabilities found in some coding agents. Actual built-in capabilities vary by product and version.
| Scenario | Built-in web fetch | agent-fetch |
|---|---|---|
| Basic page fetch with HTML simplification | Yes | Yes |
| JavaScript-rendered pages (SPAs) | Varies | Yes (headless browser) |
| Custom headers (auth, cookies) | Varies | Yes (--header) |
| No AI summarization (outputs extracted body as-is) | Varies | Yes (subject to --max-body-bytes) |
| Batch fetch multiple URLs concurrently | Varies | Yes (--concurrency) |
| CSS selector-based wait/extraction | Varies | Yes (--wait-selector) |
| Works outside coding agents (CLI, CI/CD) | N/A | Yes (standalone CLI) |
How built-in web fetch typically works: Tools like Claude Code's WebFetch and Codex's built-in fetch retrieve a page over HTTP, convert the HTML to Markdown, and then pass the content through an AI model that may summarize or truncate it to fit the context window. This pipeline is fast and sufficient for most pages, but it usually does not execute JavaScript (so SPA or JS-rendered pages may return incomplete content), does not support custom request headers, and processes one URL at a time.
- No built-in web fetch available (other agent frameworks, CLI pipelines, CI/CD) -- use agent-fetch as your primary fetch tool.
- Built-in web fetch available -- use agent-fetch as a complement for JS-heavy pages, authenticated endpoints, batch fetching, or when you need the extracted content without summarization.
browser and auto modes may require Chrome or Chromium on the host.
Use --mode static or --mode raw to avoid the browser dependency entirely.
- Run
agent-fetch --doctorto validate runtime/browser readiness and get guided fixes. - Use
--browser-pathwhen the browser is installed in a non-default location (common in container images).
go build -o agent-fetch ./cmd/agent-fetchThis project is open-sourced under the MIT License.