Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions skills/autobrowse/.env.example
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
ANTHROPIC_API_KEY=sk-ant-...
BROWSERBASE_API_KEY=bb_live_...
BROWSERBASE_PROJECT_ID=your-project-id

# Throwaway inbox provisioning (only needed for signup/login/MFA tasks).
# Points scripts/inbox.mjs at the browse.sh inbox endpoint, which owns the
# AgentMail key — you never handle an AgentMail credential directly.
BROWSE_SH_URL=https://browse.sh
BROWSE_SH_WEBHOOK_SECRET=
1 change: 1 addition & 0 deletions skills/autobrowse/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,4 @@ tasks/
traces/
*.log
.DS_Store
.inbox.json
33 changes: 33 additions & 0 deletions skills/autobrowse/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,12 +72,31 @@ List available tasks:
ls ./autobrowse/tasks/
```

### Step 2.5 — (Only if the task needs email) Provision a throwaway inbox

If the workflow requires **registering an account, logging in, or email / MFA verification**, give the inner agent its own disposable inbox. You never touch AgentMail or supply a human email — `scripts/inbox.mjs` mints a Browserbase-owned throwaway inbox via browse.sh and the address is injected into the run.

Requires `BROWSE_SH_WEBHOOK_SECRET` in the environment (see `.env.example`). Then, once per task, before the loop:

```bash
node ${CLAUDE_SKILL_DIR}/scripts/inbox.mjs create --workspace ./autobrowse --task <task>
# prints the inbox address, e.g. ab-3f9k2@agentmail.to
```

Capture it and pass `--inbox-email` to **every** `evaluate.mjs` run for this task (see "Run the inner agent"). The address is also available to task.md authors as `{{inbox_email}}`.

The inbox is **loop-only** — it exists just so exploration can complete signup/MFA. Always release it when the loop ends (see "Clean up the inbox"). Graduated skills do not depend on it; end users supply their own email/credentials at run time.

> **Concurrency limit:** the Browserbase AgentMail org caps at 3 inboxes. Sequential loops self-heal (a stale inbox is swept on the next `create`), but **do not run more than 3 email-needing tasks in parallel** (`--all` / `--tasks`) — the 4th `create` will fail. If you hit this, run them in smaller batches.

### Step 3 — Multi-task: spawn parallel sub-agents

If running multiple tasks, use the Agent tool to spawn one sub-agent per task simultaneously. Each sub-agent receives a self-contained prompt to run the full autobrowse loop for its task:

> "You are running the autobrowse skill for task `<name>`. Workspace: `<absolute-path-to-workspace>` (e.g. `/path/to/project/autobrowse`). Run `<N>` iterations of: evaluate → read trace → improve strategy.md → repeat. Use `--env <env>`. Pass `--workspace <workspace>` to every evaluate.mjs invocation. Follow the autobrowse loop instructions exactly.
>
> If this task needs signup/login/MFA, run `inbox.mjs create` once before the loop, pass `--inbox-email <addr>` to every evaluate.mjs run, and run `inbox.mjs release` when the loop ends (even on failure).
>
> When graduating, install the skill to `~/.claude/skills/<task-name>/SKILL.md` with proper agentskills frontmatter (name + description). Do not just copy strategy.md — write a self-contained skill.
>
> At the end, output a structured summary with: task name, pass/fail on final run, total cumulative cost, iterations completed, per-iteration table (iter number, turns, cost, status, hypothesis tested), and 2-3 bullet key learnings."
Expand All @@ -104,6 +123,8 @@ Check that `./autobrowse/tasks/<task>/task.md` exists (scaffold it from the temp
node ${CLAUDE_SKILL_DIR}/scripts/evaluate.mjs --task <task-name> --workspace ./autobrowse
# or for bot-protected sites:
node ${CLAUDE_SKILL_DIR}/scripts/evaluate.mjs --task <task-name> --workspace ./autobrowse --env remote
# if you provisioned an inbox in Step 2.5, pass it on every run:
node ${CLAUDE_SKILL_DIR}/scripts/evaluate.mjs --task <task-name> --workspace ./autobrowse --inbox-email <addr>
```

This runs the browser session and writes a full trace to `./autobrowse/traces/<task>/latest/`.
Expand Down Expand Up @@ -221,6 +242,18 @@ ls ~/.claude/skills/<task-name>/SKILL.md

The skill is now available as `/<task-name>` in Claude Code.

> **Email/MFA tasks — graduation note:** the throwaway inbox is loop-only. The graduated SKILL.md must **not** reference `inbox.mjs` or the autobrowse inbox. Instead, document that the end user supplies their own email/credentials at run time (or reuses an authenticated session via `/cookie-sync`), and note in "Site-Specific Gotchas" that the flow requires email/MFA verification.

### Clean up the inbox

If you provisioned an inbox in Step 2.5, release it once the loop ends — **whether it graduated, failed, or hit max iterations**:

```bash
node ${CLAUDE_SKILL_DIR}/scripts/inbox.mjs release --workspace ./autobrowse --task <task>
```

This deletes the throwaway inbox and removes its local `.inbox.json`. It's best-effort and safe to run even if no inbox exists. (Abandoned inboxes are also swept automatically on the next `create`, but release promptly to stay under the 3-inbox cap.)

---

## Final report (multi-task mode)
Expand Down
6 changes: 6 additions & 0 deletions skills/autobrowse/references/example-task.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,12 @@ List the data the agent needs (credentials, form values, etc.):
- Field 1: value
- Field 2: value

If the task requires registering an account, logging in, or email/MFA
verification, provision a throwaway inbox before the loop (see SKILL.md) and the
agent receives `{{inbox_email}}` automatically — use it for any email field:

- Email: {{inbox_email}}

## Steps

1. Navigate to the URL
Expand Down
53 changes: 48 additions & 5 deletions skills/autobrowse/scripts/evaluate.mjs
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ const TOOLS = [
" browse get url/title/text — Get page info\n" +
" browse mouse drag <x1> <y1> <x2> <y2> — Drag (for sliders)\n" +
" browse back/reload/stop — Navigation/session control\n\n" +
"If a throwaway inbox was provisioned for this task (see the Agent Inbox section, when present), you may also run `node <path>/scripts/inbox.mjs wait-otp|latest ...` through this tool to read verification emails.\n\n" +
"Critical: Always `browse snapshot` after every action — refs invalidate on DOM changes.",
input_schema: {
type: "object",
Expand Down Expand Up @@ -81,6 +82,8 @@ Options:
--env local|remote Browser environment (default: local)
--model <model> Claude model for the inner agent (default: ${DEFAULT_MODEL})
--run-number N Force a specific run number (default: auto-increment)
--inbox-email <addr> Throwaway inbox address for signup/login/MFA tasks
(provision it first via scripts/inbox.mjs create)
--help Show this help message

Environment variables:
Expand Down Expand Up @@ -159,6 +162,17 @@ function getNextRunNumber(tracesDir) {
}

const ALLOWED_COMMAND = "browse";
// Absolute path to the throwaway-inbox helper. The agent may shell out to it
// (e.g. `node <abs>/scripts/inbox.mjs wait-otp ...`) when a task involves
// signup/login/MFA and an inbox was provisioned for the run.
const INBOX_SCRIPT = path.join(SKILL_DIR, "scripts", "inbox.mjs");

function isAllowedCommand(executable, args) {
if (executable === ALLOWED_COMMAND) return true;
// node <abs-path-to-inbox.mjs> ...
if (executable === "node" && args[0] && path.resolve(args[0]) === INBOX_SCRIPT) return true;
return false;
}

function parseCommand(command) {
const args = [];
Expand Down Expand Up @@ -250,8 +264,8 @@ function executeCommand(command) {
}

const [executable, ...args] = parsed.args;
if (executable !== ALLOWED_COMMAND) {
return { output: `BLOCKED: only browse commands are allowed. Got: ${command.slice(0, 50)}`, error: true, duration_ms: 0 };
if (!isAllowedCommand(executable, args)) {
return { output: `BLOCKED: only browse and inbox.mjs commands are allowed. Got: ${command.slice(0, 50)}`, error: true, duration_ms: 0 };
}

const start = Date.now();
Expand All @@ -271,7 +285,34 @@ function executeCommand(command) {
}
}

function buildSystemPrompt(strategy, traceDir, browseEnv) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inbox wait killed by timeout

High Severity

Long-running inbox.mjs wait-otp and wait-link calls run through executeCommand with a fixed 30s execFileSync timeout, while the system prompt tells the agent to use --within 60. Polling is cut off early, so verification mail that arrives after 30 seconds fails even though the inbox helper would still be waiting.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 2087aca. Configure here.

function buildInboxSection(inboxEmail, workspace, taskName) {
if (!inboxEmail) return "";
return `
# Agent Inbox

You have been provisioned a throwaway email inbox for this task:

${inboxEmail}

Use this address for any signup, login, or MFA / email-verification step — type it into email fields exactly as shown. To read mail that arrives (verification links, one-time codes), shell out via the execute tool:

- Wait for an OTP / verification code:
\`node ${INBOX_SCRIPT} wait-otp --workspace ${workspace} --task ${taskName} --from <sender-domain> --within 60\`
Prints just the extracted code on stdout (or fails after the timeout). Use the sending domain you expect, e.g. \`--from stripe.com\`. Default matches a 4–8 digit code; pass \`--regex "<pattern>"\` for alphanumeric codes.

- Wait for a verification / magic link, then open it:
\`node ${INBOX_SCRIPT} wait-link --workspace ${workspace} --task ${taskName} --from <sender-domain> --within 60\`
Prints just the first URL found (optionally filter with \`--match <substr>\`, e.g. \`--match verify\`). Then \`browse open <that-url>\` to complete verification.

- Read the most recent message raw (fallback if the helpers above miss):
\`node ${INBOX_SCRIPT} latest --workspace ${workspace} --task ${taskName}\`
Prints the newest message as JSON (from, subject, text, html).

Do not call AgentMail or any other email API directly — only the commands above.
`;
}

function buildSystemPrompt(strategy, traceDir, browseEnv, inboxSection) {
const openFlag = browseEnv === "remote" ? "--remote" : "--local";
const envDesc = browseEnv === "remote"
? `Use **remote mode** (Browserbase) — Browserbase Identity, Verified browsers, CAPTCHA solving, residential proxies:
Expand Down Expand Up @@ -352,7 +393,7 @@ ${envDesc}
- **Page seems empty**: Try \`browse wait timeout 1000\` then \`browse snapshot\`; if you know the target element, use \`browse wait selector "<selector>"\`
- **Dropdown didn't open**: Wait briefly, then snapshot to check
- **Slider won't move with click**: Use \`browse press ArrowRight\` / \`browse press ArrowLeft\` after clicking the slider thumb

${inboxSection}
# Current Navigation Strategy

The following strategy has been learned from previous iterations. Follow these guidelines:
Expand Down Expand Up @@ -401,7 +442,9 @@ async function main() {

const strategy = fs.readFileSync(strategyFile, "utf-8");
const task = fs.readFileSync(taskFile, "utf-8");
const systemPrompt = buildSystemPrompt(strategy, traceDir, browseEnv);
const inboxEmail = getArg("inbox-email");
const inboxSection = buildInboxSection(inboxEmail, workspace, taskName);
const systemPrompt = buildSystemPrompt(strategy, traceDir, browseEnv, inboxSection);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Task placeholder never substituted

Medium Severity

SKILL.md and example-task.md say task authors can use {{inbox_email}}, but evaluate.mjs sends task.md to the model unchanged when --inbox-email is set. The user message can still contain the literal placeholder instead of the provisioned address.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 2087aca. Configure here.


console.error(`\n${"=".repeat(60)}`);
console.error(` AUTOBROWSE — ${taskName} — Run ${runNumber}`);
Expand Down
Loading