Browser Automation
Browser Automation
Section titled “Browser Automation”Hermes Agent includes a full browser automation toolset with multiple backend options:
- Browserbase cloud mode via Browserbase for managed cloud browsers and anti-bot tooling
- Browser Use cloud mode via Browser Use as an alternative cloud browser provider
- Firecrawl cloud mode via Firecrawl for cloud browsers with built-in scraping
- Camofox local mode via Camofox for local anti-detection browsing (Firefox-based fingerprint spoofing)
- Local Chrome via CDP — connect browser tools to your own Chrome instance using
/browser connect - Local browser mode via the
agent-browserCLI and a local Chromium installation
In all modes, the agent can navigate websites, interact with page elements, fill forms, and extract information.
Overview
Section titled “Overview”Pages are represented as accessibility trees (text-based snapshots), making them ideal for LLM agents. Interactive elements get ref IDs (like @e1, @e2) that the agent uses for clicking and typing.
Key capabilities:
- Multi-provider cloud execution — Browserbase, Browser Use, or Firecrawl — no local browser needed
- Local Chrome integration — attach to your running Chrome via CDP for hands-on browsing
- Built-in stealth — random fingerprints, CAPTCHA solving, residential proxies (Browserbase)
- Session isolation — each task gets its own browser session
- Automatic cleanup — inactive sessions are closed after a timeout
- Vision analysis — screenshot + AI analysis for visual understanding
:::tip Nous Subscribers
If you have a paid Nous Portal subscription, you can use browser automation through the Tool Gateway without any separate API keys. Run hermes model or hermes tools to enable it.
:::
Browserbase cloud mode
Section titled “Browserbase cloud mode”To use Browserbase-managed cloud browsers, add:
# Add to ~/.hermes/.envBROWSERBASE_API_KEY=***BROWSERBASE_PROJECT_ID=your-project-id-hereGet your credentials at browserbase.com.
Browser Use cloud mode
Section titled “Browser Use cloud mode”To use Browser Use as your cloud browser provider, add:
# Add to ~/.hermes/.envBROWSER_USE_API_KEY=***Get your API key at browser-use.com. Browser Use provides a cloud browser via its REST API. If both Browserbase and Browser Use credentials are set, Browserbase takes priority.
Firecrawl cloud mode
Section titled “Firecrawl cloud mode”To use Firecrawl as your cloud browser provider, add:
# Add to ~/.hermes/.envFIRECRAWL_API_KEY=fc-***Get your API key at firecrawl.dev. Then select Firecrawl as your browser provider:
hermes setup tools# → Browser Automation → FirecrawlOptional settings:
# Self-hosted Firecrawl instance (default: https://api.firecrawl.dev)FIRECRAWL_API_URL=http://localhost:3002
# Session TTL in seconds (default: 300)FIRECRAWL_BROWSER_TTL=600Hybrid routing: cloud for public URLs, local for LAN/localhost
Section titled “Hybrid routing: cloud for public URLs, local for LAN/localhost”When a cloud provider is configured, Hermes auto-spawns a local Chromium sidecar
for URLs that resolve to a private/loopback/LAN address (localhost, 127.0.0.1,
192.168.x.x, 10.x.x.x, 172.16-31.x.x, *.local, *.lan, *.internal,
IPv6 loopback ::1, link-local 169.254.x.x). Public URLs continue to use the
cloud provider in the same conversation.
This solves the common “I’m developing locally but using Browserbase” workflow —
the agent can screenshot your dashboard at http://localhost:3000 AND scrape
https://github.com without you switching providers or disabling the SSRF guard.
The cloud provider never sees the private URL.
The feature is on by default. To disable it (all URLs go to the configured cloud provider, as before):
browser: cloud_provider: browserbase auto_local_for_private_urls: falseWith auto-routing disabled, private URLs are rejected with
"Blocked: URL targets a private or internal address" unless you also set
browser.allow_private_urls: true (which lets the cloud provider attempt them —
usually won’t work since Browserbase etc. can’t reach your LAN).
Requirements: the local sidecar uses the same agent-browser CLI as pure local
mode, so you need it installed (hermes setup tools → Browser Automation
auto-installs it). Post-navigation redirects from a public URL onto a private
address are still blocked (you can’t use a redirect-to-internal trick to reach
your LAN through the public path).
Camofox local mode
Section titled “Camofox local mode”Camofox is a self-hosted Node.js server wrapping Camoufox (a Firefox fork with C++ fingerprint spoofing). It provides local anti-detection browsing without cloud dependencies.
# Clone the Camofox browser server firstgit clone https://github.com/jo-inc/camofox-browsercd camofox-browser
# Build and start with Docker using the default container settings# (auto-detects arch: aarch64 on M1/M2, x86_64 on Intel)make up
# Stop and remove the default containermake down
# Force a clean rebuild (for example, after upgrading VERSION/RELEASE)make reset
# Just download binaries without buildingmake fetch
# Override arch or version explicitlymake up ARCH=x86_64make up VERSION=135.0.1 RELEASE=beta.24make up starts the default container immediately. If you want custom runtime settings such as a larger Node heap, VNC, or a persistent profile directory, build the image first and then run it yourself:
# Build the image without starting the default containermake build
# Start with persistence, VNC live view, and a larger Node heapmkdir -p ~/.camofox-dockerdocker run -d \ --name camofox-browser \ --restart unless-stopped \ -p 9377:9377 \ -p 6080:6080 \ -p 5901:5900 \ -e CAMOFOX_PORT=9377 \ -e ENABLE_VNC=1 \ -e VNC_BIND=0.0.0.0 \ -e VNC_RESOLUTION=1920x1080 \ -e MAX_OLD_SPACE_SIZE=2048 \ -v ~/.camofox-docker:/root/.camofox \ camofox-browser:135.0.1-aarch64With VNC enabled, the browser runs in headed mode and can be watched live in your browser at http://localhost:6080 (noVNC). You can also connect a native VNC client to localhost:5901.
If you already ran make up, stop and remove that default container before starting the custom one:
make down# then run the custom docker run command aboveThen set in ~/.hermes/.env:
CAMOFOX_URL=http://localhost:9377Or configure via hermes tools → Browser Automation → Camofox.
When CAMOFOX_URL is set, all browser tools automatically route through Camofox instead of Browserbase or agent-browser.
Persistent browser sessions
Section titled “Persistent browser sessions”By default, each Camofox session gets a random identity — cookies and logins don’t survive across agent restarts. To enable persistent browser sessions, add the following to ~/.hermes/config.yaml:
browser: camofox: managed_persistence: trueThen fully restart Hermes so the new config is picked up.
:::warning Nested path matters
Hermes reads browser.camofox.managed_persistence, not a top-level managed_persistence. A common mistake is writing:
# ❌ Wrong — Hermes ignores thismanaged_persistence: trueIf the flag is placed at the wrong path, Hermes silently falls back to a random ephemeral userId and your login state will be lost on every session.
:::
What Hermes does
Section titled “What Hermes does”- Sends a deterministic profile-scoped
userIdto Camofox so the server can reuse the same Firefox profile across sessions. - Skips server-side context destruction on cleanup, so cookies and logins survive between agent tasks.
- Scopes the
userIdto the active Hermes profile, so different Hermes profiles get different browser profiles (profile isolation).
What Hermes does not do
Section titled “What Hermes does not do”- It does not force persistence on the Camofox server. Hermes only sends a stable
userId; the server must honor it by mapping thatuserIdto a persistent Firefox profile directory. - If your Camofox server build treats every request as ephemeral (e.g. always calls
browser.newContext()without loading a stored profile), Hermes cannot make those sessions persist. Make sure you are running a Camofox build that implements userId-based profile persistence.
Verify it’s working
Section titled “Verify it’s working”- Start Hermes and your Camofox server.
- Open Google (or any login site) in a browser task and sign in manually.
- End the browser task normally.
- Start a new browser task.
- Open the same site again — you should still be signed in.
If step 5 logs you out, the Camofox server isn’t honoring the stable userId. Double-check your config path, confirm you fully restarted Hermes after editing config.yaml, and verify your Camofox server version supports persistent per-user profiles.
Where state lives
Section titled “Where state lives”Hermes derives the stable userId from the profile-scoped directory ~/.hermes/browser_auth/camofox/ (or the equivalent under $HERMES_HOME for non-default profiles). The actual browser profile data lives on the Camofox server side, keyed by that userId. To fully reset a persistent profile, clear it on the Camofox server and remove the corresponding Hermes profile’s state directory.
VNC live view
Section titled “VNC live view”When Camofox runs in headed mode (with a visible browser window), it exposes a VNC port in its health check response. Hermes automatically discovers this and includes the VNC URL in navigation responses, so the agent can share a link for you to watch the browser live.
Local Chrome via CDP (/browser connect)
Section titled “Local Chrome via CDP (/browser connect)”Instead of a cloud provider, you can attach Hermes browser tools to your own running Chrome instance via the Chrome DevTools Protocol (CDP). This is useful when you want to see what the agent is doing in real-time, interact with pages that require your own cookies/sessions, or avoid cloud browser costs.
In the CLI, use:
/browser connect # Connect to Chrome at ws://localhost:9222/browser connect ws://host:port # Connect to a specific CDP endpoint/browser status # Check current connection/browser disconnect # Detach and return to cloud/local modeIf Chrome isn’t already running with remote debugging, Hermes will attempt to auto-launch it with --remote-debugging-port=9222.
When connected via CDP, all browser tools (browser_navigate, browser_click, etc.) operate on your live Chrome instance instead of spinning up a cloud session.
WSL2 + Windows Chrome: prefer MCP over /browser connect
Section titled “WSL2 + Windows Chrome: prefer MCP over /browser connect”If Hermes runs inside WSL2 but the Chrome window you want to control runs on the Windows host, /browser connect is often not the best path.
Why:
/browser connectexpects Hermes itself to reach a usable CDP endpoint- modern Chrome live-debugging sessions often expose a host-local endpoint that is not directly reachable from WSL the same way a classic
9222port is - even when Windows Chrome is debuggable, the cleanest integration is often to let a Windows-side browser MCP server attach to Chrome and let Hermes talk to that MCP server
For that setup, prefer chrome-devtools-mcp through Hermes MCP support.
See the MCP guide for the practical setup:
Local browser mode
Section titled “Local browser mode”If you do not set any cloud credentials and don’t use /browser connect, Hermes can still use the browser tools through a local Chromium install driven by agent-browser.
Optional Environment Variables
Section titled “Optional Environment Variables”# Residential proxies for better CAPTCHA solving (default: "true")BROWSERBASE_PROXIES=true
# Advanced stealth with custom Chromium — requires Scale Plan (default: "false")BROWSERBASE_ADVANCED_STEALTH=false
# Session reconnection after disconnects — requires paid plan (default: "true")BROWSERBASE_KEEP_ALIVE=true
# Custom session timeout in milliseconds (default: project default)# Examples: 600000 (10min), 1800000 (30min)BROWSERBASE_SESSION_TIMEOUT=600000
# Inactivity timeout before auto-cleanup in seconds (default: 120)BROWSER_INACTIVITY_TIMEOUT=120Install agent-browser CLI
Section titled “Install agent-browser CLI”npm install -g agent-browser# Or install locally in the repo:npm installThe browser toolset must be included in your config’s toolsets list or enabled via hermes config set toolsets '["hermes-cli", "browser"]'.
Available Tools
Section titled “Available Tools”browser_navigate
Section titled “browser_navigate”Navigate to a URL. Must be called before any other browser tool. Initializes the Browserbase session.
Navigate to https://github.com/NousResearchbrowser_snapshot
Section titled “browser_snapshot”Get a text-based snapshot of the current page’s accessibility tree. Returns interactive elements with ref IDs like @e1, @e2 for use with browser_click and browser_type.
full=false(default): Compact view showing only interactive elementsfull=true: Complete page content
Snapshots over 8000 characters are automatically summarized by an LLM.
browser_click
Section titled “browser_click”Click an element identified by its ref ID from the snapshot.
Click @e5 to press the "Sign In" buttonbrowser_type
Section titled “browser_type”Type text into an input field. Clears the field first, then types the new text.
Type "hermes agent" into the search field @e3browser_scroll
Section titled “browser_scroll”Scroll the page up or down to reveal more content.
Scroll down to see more resultsbrowser_press
Section titled “browser_press”Press a keyboard key. Useful for submitting forms or navigation.
Press Enter to submit the formSupported keys: Enter, Tab, Escape, ArrowDown, ArrowUp, and more.
browser_back
Section titled “browser_back”Navigate back to the previous page in browser history.
browser_get_images
Section titled “browser_get_images”List all images on the current page with their URLs and alt text. Useful for finding images to analyze.
browser_vision
Section titled “browser_vision”Take a screenshot and analyze it with vision AI. Use this when text snapshots don’t capture important visual information — especially useful for CAPTCHAs, complex layouts, or visual verification challenges.
The screenshot is saved persistently and the file path is returned alongside the AI analysis. On messaging platforms (Telegram, Discord, Slack, WhatsApp), you can ask the agent to share the screenshot — it will be sent as a native photo attachment via the MEDIA: mechanism.
What does the chart on this page show?Screenshots are stored in ~/.hermes/cache/screenshots/ and automatically cleaned up after 24 hours.
browser_console
Section titled “browser_console”Get browser console output (log/warn/error messages) and uncaught JavaScript exceptions from the current page. Essential for detecting silent JS errors that don’t appear in the accessibility tree.
Check the browser console for any JavaScript errorsUse clear=True to clear the console after reading, so subsequent calls only show new messages.
browser_cdp
Section titled “browser_cdp”Raw Chrome DevTools Protocol passthrough — the escape hatch for browser operations not covered by the other tools. Use for native dialog handling, iframe-scoped evaluation, cookie/network control, or any CDP verb the agent needs.
Only available when a CDP endpoint is reachable at session start — meaning /browser connect has attached to a running Chrome, or browser.cdp_url is set in config.yaml. The default local agent-browser mode, Camofox, and cloud providers (Browserbase, Browser Use, Firecrawl) do not currently expose CDP to this tool — cloud providers have per-session CDP URLs but live-session routing is a follow-up.
CDP method reference: https://chromedevtools.github.io/devtools-protocol/ — the agent can web_extract a specific method’s page to look up parameters and return shape.
Common patterns:
# List tabs (browser-level, no target_id)browser_cdp(method="Target.getTargets")
# Handle a native JS dialog on a tabbrowser_cdp(method="Page.handleJavaScriptDialog", params={"accept": true, "promptText": ""}, target_id="<tabId>")
# Evaluate JS in a specific tabbrowser_cdp(method="Runtime.evaluate", params={"expression": "document.title", "returnByValue": true}, target_id="<tabId>")
# Get all cookiesbrowser_cdp(method="Network.getAllCookies")Browser-level methods (Target.*, Browser.*, Storage.*) omit target_id. Page-level methods (Page.*, Runtime.*, DOM.*, Emulation.*) require a target_id from Target.getTargets. Each stateless call is independent — sessions do not persist between calls.
Cross-origin iframes: pass frame_id (from browser_snapshot.frame_tree.children[] where is_oopif=true) to route the CDP call through the supervisor’s live session for that iframe. This is how Runtime.evaluate inside a cross-origin iframe works on Browserbase, where stateless CDP connections would hit signed-URL expiry. Example:
browser_cdp( method="Runtime.evaluate", params={"expression": "document.title", "returnByValue": True}, frame_id="<frame_id from browser_snapshot>",)Same-origin iframes don’t need frame_id — use document.querySelector('iframe').contentDocument from a top-level Runtime.evaluate instead.
browser_dialog
Section titled “browser_dialog”Responds to a native JS dialog (alert / confirm / prompt / beforeunload). Before this tool existed, dialogs would silently block the page’s JavaScript thread and subsequent browser_* calls would hang or throw; now the agent sees pending dialogs in browser_snapshot output and responds explicitly.
Workflow:
- Call
browser_snapshot. If a dialog is blocking the page, it shows up aspending_dialogs: [{"id": "d-1", "type": "alert", "message": "..."}]. - Call
browser_dialog(action="accept")orbrowser_dialog(action="dismiss"). Forprompt()dialogs, passprompt_text="..."to supply the response. - Re-snapshot —
pending_dialogsis empty; the page’s JS thread has resumed.
Detection happens automatically via a persistent CDP supervisor — one WebSocket per task that subscribes to Page/Runtime/Target events. The supervisor also populates a frame_tree field in the snapshot so the agent can see the iframe structure of the current page, including cross-origin (OOPIF) iframes.
Availability matrix:
| Backend | Detection via pending_dialogs | Response (browser_dialog tool) |
|---|---|---|
Local Chrome via /browser connect or browser.cdp_url | ✓ | ✓ full workflow |
| Browserbase | ✓ | ✓ full workflow (via injected XHR bridge) |
| Camofox / default local agent-browser | ✗ | ✗ (no CDP endpoint) |
How it works on Browserbase. Browserbase’s CDP proxy auto-dismisses real native dialogs server-side within ~10ms, so we can’t use Page.handleJavaScriptDialog. The supervisor injects a small script via Page.addScriptToEvaluateOnNewDocument that overrides window.alert/confirm/prompt with a synchronous XHR. We intercept those XHRs via Fetch.enable — the page’s JS thread stays blocked on the XHR until we call Fetch.fulfillRequest with the agent’s response. prompt() return values round-trip back into page JS unchanged.
Dialog policy is configured in config.yaml under browser.dialog_policy:
| Policy | Behavior |
|---|---|
must_respond (default) | Capture, surface in snapshot, wait for explicit browser_dialog() call. Safety auto-dismiss after browser.dialog_timeout_s (default 300s) so a buggy agent can’t stall forever. |
auto_dismiss | Capture, dismiss immediately. Agent still sees the dialog in browser_state history but doesn’t have to act. |
auto_accept | Capture, accept immediately. Useful when navigating pages with aggressive beforeunload prompts. |
Frame tree inside browser_snapshot.frame_tree is capped to 30 frames and OOPIF depth 2 to keep payloads bounded on ad-heavy pages. A truncated: true flag surfaces when limits were hit; agents needing the full tree can use browser_cdp with Page.getFrameTree.
Practical Examples
Section titled “Practical Examples”Filling Out a Web Form
Section titled “Filling Out a Web Form”User: Sign up for an account on example.com with my email john@example.com
Agent workflow:1. browser_navigate("https://example.com/signup")2. browser_snapshot() → sees form fields with refs3. browser_type(ref="@e3", text="john@example.com")4. browser_type(ref="@e5", text="SecurePass123")5. browser_click(ref="@e8") → clicks "Create Account"6. browser_snapshot() → confirms successResearching Dynamic Content
Section titled “Researching Dynamic Content”User: What are the top trending repos on GitHub right now?
Agent workflow:1. browser_navigate("https://github.com/trending")2. browser_snapshot(full=true) → reads trending repo list3. Returns formatted resultsSession Recording
Section titled “Session Recording”Automatically record browser sessions as WebM video files:
browser: record_sessions: true # default: falseWhen enabled, recording starts automatically on the first browser_navigate and saves to ~/.hermes/browser_recordings/ when the session closes. Works in both local and cloud (Browserbase) modes. Recordings older than 72 hours are automatically cleaned up.
Stealth Features
Section titled “Stealth Features”Browserbase provides automatic stealth capabilities:
| Feature | Default | Notes |
|---|---|---|
| Basic Stealth | Always on | Random fingerprints, viewport randomization, CAPTCHA solving |
| Residential Proxies | On | Routes through residential IPs for better access |
| Advanced Stealth | Off | Custom Chromium build, requires Scale Plan |
| Keep Alive | On | Session reconnection after network hiccups |
Session Management
Section titled “Session Management”- Each task gets an isolated browser session via Browserbase
- Sessions are automatically cleaned up after inactivity (default: 2 minutes)
- A background thread checks every 30 seconds for stale sessions
- Emergency cleanup runs on process exit to prevent orphaned sessions
- Sessions are released via the Browserbase API (
REQUEST_RELEASEstatus)
Limitations
Section titled “Limitations”- Text-based interaction — relies on accessibility tree, not pixel coordinates
- Snapshot size — large pages may be truncated or LLM-summarized at 8000 characters
- Session timeout — cloud sessions expire based on your provider’s plan settings
- Cost — cloud sessions consume provider credits; sessions are automatically cleaned up when the conversation ends or after inactivity. Use
/browser connectfor free local browsing. - No file downloads — cannot download files from the browser