browserbase · aq17 · May 14, 2026 · May 14, 2026
diff --git a/skills/autobrowse/SKILL.md b/skills/autobrowse/SKILL.md
@@ -272,6 +272,99 @@ Write the file `./autobrowse/reports/YYYY-MM-DD-HH-MM-<tasks>.md` with:
 
 ---
 
+## Export to deterministic Playwright
+
+Once a task has graduated, you can collapse the LLM-driven replay loop into a single deterministic TypeScript script via the `export` subcommand. The export mines the most recent passing run's `trace.json`, resolves session-scoped ARIA refs against the snapshots they came from, and emits a Playwright script that connects to a fresh Browserbase session (optionally bound to a persistent context).
+
+```bash
+# Default — generate and verify against the latest passing run
+node ${CLAUDE_SKILL_DIR}/scripts/export.mjs --task <task-name>
+
+# Custom workspace / specific run / skip verification
+node ${CLAUDE_SKILL_DIR}/scripts/export.mjs --task <task-name> --workspace ./autobrowse
+node ${CLAUDE_SKILL_DIR}/scripts/export.mjs --task <task-name> --run run-022
+node ${CLAUDE_SKILL_DIR}/scripts/export.mjs --task <task-name> --no-verify
+```
+
+The export writes to `<workspace>/tasks/<task>/playwright/`:
+
+- `<task>.ts` — runnable Playwright script. Connects to Browserbase via `chromium.connectOverCDP` when `BROWSERBASE_CONTEXT_ID` is set; falls back to local Chromium otherwise.
+- `selectors.cache.json` — resolved locators + ranked fallbacks per action. Used by future self-healing tooling.
+- `package.json`, `tsconfig.json` — minimal scaffold with `playwright`, `zod`, `tsx`, `dotenv`.
+
+How refs are resolved: every `[X-Y]` ref in the trace is looked up against the most recent prior `browse snapshot` containing it. The matched node's role + accessible name are turned into a ranked list of Playwright locator candidates — `getByRole({ name })` first, then `getByLabel` / `getByPlaceholder` for form inputs, then `getByText`, then bare `getByRole`. The best candidate is emitted inline; lower-ranked candidates are saved to `selectors.cache.json` for self-healing.
+
+The final extract step is generated with one Claude Haiku call at export time (requires `ANTHROPIC_API_KEY`). The LLM is given the final snapshot, the Zod schema parsed from `task.md`'s `## Output` block, and the agent's final reasoning. If the API key is missing the export still produces a script — the extract block is a TODO placeholder.
+
+For a Stagehand-targeted export (LLM-driven replay via `stagehand.act`/`observe`), use the standalone `/stagehand-export` skill.
+
+## Iterative Playwright loop (recommended for tasks that need a deterministic artifact)
+
+When the end goal is a runnable Playwright script (cron, Browserbase Functions, etc.), prefer `loop.mjs` over manually orchestrating evaluate + export. The loop converges on a workflow that **both** the LLM explorer **and** the deterministic Playwright replay can complete — which is a strictly stronger guarantee than "the LLM agent's trace ends with success: true."
+
+```bash
+node ${CLAUDE_SKILL_DIR}/scripts/loop.mjs --task <task-name> --env remote \
+  --max-iterations 8 --max-turns-per-iter 60
+```
+
+What it does per iteration:
+
+1. Runs `evaluate.mjs` (one LLM-driven exploration round).
+2. If the trace passed (`success: true` in the final JSON), runs `export.mjs --target playwright --no-verify` to emit a fresh script.
+3. Runs the emitted script (`npx tsx <task>.ts`) against a new BB session — the actual deterministic replay.
+4. If the Playwright replay passed → records a pass. If it failed → distills the failure (Claude Haiku, ~$0.01) into a new entry under `strategy.md`'s "Recent Playwright Failures" section.
+5. Next iteration's evaluate reads the updated strategy.md and adapts.
+
+**Convergence**: graduates when the emitted script passes in 2 of the last 3 iterations.
+
+### Strategy.md sections
+
+The loop expects (and the distiller maintains) this structure:
+
+```markdown
+# <task> Navigation Strategy
+
+## Navigation Heuristics
+(prose for the LLM explorer — fast-path URLs, timing notes, step sequences)
+
+## Codegen Hints
+(per-task overrides for the Playwright emitter — e.g., "use force:true for all radios on this site")
+
+## Recent Playwright Failures
+### Iteration 3 — <one-line>
+- **What failed**: ...
+- **Likely cause**: ...
+- **Fix to try next iteration**: ...
+```
+
+The emitter (`codegen-playwright.mjs`) bakes in baseline defaults for the most common state-portal patterns: `forceCheck` for checkbox `fill_sel` ops, `forceClickRadio` for radio click ops, `selectWithFallback` (JS-enable + native setter) for every `select_dropdown`, and a `reactFill` helper for inputs that need to bypass keystroke-by-keystroke event handling.
+
+### When to use `loop.mjs` vs `evaluate.mjs` directly
+
+- **Use `loop.mjs`** when you want a Playwright script as the deliverable. Costs more per iteration (each adds a script export + replay) but converges on something that actually replays in prod.
+- **Use `evaluate.mjs`** when you want a `/<task>` skill that future Claude sessions invoke (the original autobrowse flow). Cheaper, doesn't generate a Playwright script.
+
+---
+
+### Pre-authed sessions via persistent context
+
+For tasks that need authentication, create a Browserbase context once, log in interactively, and point autobrowse at it via the env var:
+
+```bash
+# One-time: create a context, log into the target site via live-view
+bb contexts create --project-id $BROWSERBASE_PROJECT_ID --json
+
+# Then, every autobrowse run for this task reuses the cached cookies/storage
+export BROWSERBASE_CONTEXT_ID=<id-from-above>
+node ${CLAUDE_SKILL_DIR}/scripts/evaluate.mjs --task <name> --env remote
+```
+
+When `BROWSERBASE_CONTEXT_ID` is set with `--env remote`, evaluate.mjs creates one BB session bound to that context before the agent loop, transparently injects `--connect <session-id>` into every browse command the agent issues, and releases the session at exit. The agent's `browse env` / `browse stop` / `browse status` calls become no-ops in this mode. Iterations skip the per-run login dance.
+
+The same env var, when set at runtime for the exported script, makes it attach to the same persisted context.
+
+---
+
 ## Rules
 
 - **Only edit `strategy.md`** — never touch `task.md` (unless creating it from the template) or `evaluate.mjs`

diff --git a/skills/autobrowse/scripts/evaluate.mjs b/skills/autobrowse/scripts/evaluate.mjs
@@ -23,7 +23,7 @@ const SKILL_DIR = path.resolve(__dirname, "..");
 // ── Config ─────────────────────────────────────────────────────────
 
 const DEFAULT_MODEL = "claude-sonnet-4-6";
-const MAX_TURNS = 30;
+const DEFAULT_MAX_TURNS = 30;
 const MAX_TOKENS = 4096;
 const EXEC_TIMEOUT_MS = 30_000;
 
@@ -161,6 +161,50 @@ function getNextRunNumber(tracesDir) {
 
 const ALLOWED_COMMAND = "browse";
 
+// ── Managed Browserbase session (BROWSERBASE_CONTEXT_ID passthrough) ──
+//
+// When BROWSERBASE_CONTEXT_ID is set and --env remote, autobrowse creates
+// one persistent BB session before the agent loop and rewrites every
+// browse command from the agent to attach to it. This lets the agent
+// inherit a pre-authed state (cookies/storage) without having to log in
+// during every training iteration. Set via env var (not a CLI flag) so
+// callers can run autobrowse for the same task with or without the
+// context attached.
+let MANAGED_SESSION_ID = null;
+
+function preCreateBrowserbaseSession(ctxId) {
+  console.error(`[autobrowse] BROWSERBASE_CONTEXT_ID detected — creating BB session bound to context ${ctxId.slice(0, 8)}…`);
+  try {
+    const stdout = execFileSync(
+      "bb",
+      ["sessions", "create", "--context-id", ctxId, "--persist", "--advanced-stealth", "--solve-captchas"],
+      { encoding: "utf-8" },
+    );
+    const session = JSON.parse(stdout);
+    if (!session.id) throw new Error(`bb output missing id: ${stdout.slice(0, 200)}`);
+    console.error(`[autobrowse] Created managed session ${session.id} (context ${ctxId.slice(0, 8)}…)`);
+    return session.id;
+  } catch (err) {
+    console.error(`[autobrowse] FATAL: could not create BB session — ${err.message || err}`);
+    process.exit(1);
+  }
+}
+
+function releaseManagedSession() {
+  if (!MANAGED_SESSION_ID) return;
+  const sid = MANAGED_SESSION_ID;
+  MANAGED_SESSION_ID = null; // guard against re-entry from signal handlers
+  try {
+    execFileSync("bb", ["sessions", "update", sid, "--status", "REQUEST_RELEASE"], { stdio: "ignore" });
+    console.error(`[autobrowse] Released managed session ${sid}`);
+  } catch (err) {
+    console.error(`[autobrowse] Warning: failed to release session ${sid}: ${err.message || err}`);
+  }
+}
+process.on("exit", releaseManagedSession);
+process.on("SIGINT", () => { releaseManagedSession(); process.exit(130); });
+process.on("SIGTERM", () => { releaseManagedSession(); process.exit(143); });
+
 function parseCommand(command) {
   const args = [];
   let current = "";
@@ -243,6 +287,23 @@ function parseCommand(command) {
 }
 
 function executeCommand(command) {
+  // If a managed BB session is active, rewrite browse commands to attach to
+  // it. Session-lifecycle commands become no-ops so the agent's prompt-baked
+  // `browse env remote` / `browse stop` muscle memory doesn't fight us.
+  if (MANAGED_SESSION_ID) {
+    const trimmed = command.trim();
+    if (/^browse\s+(env|stop|status|start)(\s|$)/.test(trimmed)) {
+      return {
+        output: `[managed] no-op — session ${MANAGED_SESSION_ID.slice(0, 8)}… is pre-attached`,
+        error: false,
+        duration_ms: 0,
+      };
+    }
+    if (/^browse\s+/.test(trimmed) && !trimmed.includes("--connect")) {
+      command = trimmed.replace(/^browse\s+/, `browse --connect ${MANAGED_SESSION_ID} `);
+    }
+  }
+
   // Security: only allow the browse CLI and execute it without a shell so
   // metacharacters are treated as literal arguments instead of extra commands.
   const parsed = parseCommand(command);
@@ -272,15 +333,17 @@ function executeCommand(command) {
   }
 }
 
-function buildSystemPrompt(strategy, traceDir, browseEnv) {
-  const envDesc = browseEnv === "remote"
-    ? `Use **remote mode** (Browserbase) — anti-bot stealth, CAPTCHA solving, residential proxies:
+function buildSystemPrompt(strategy, traceDir, browseEnv, managedSessionId) {
+  const envDesc = managedSessionId
+    ? `The browser is **pre-attached** to a managed Browserbase session (id starting ${managedSessionId.slice(0, 8)}…) with a persistent context — cookies/storage from prior sessions are loaded. **Skip session lifecycle commands** (\`browse env\`, \`browse stop\`, \`browse status\`) — they are silently no-ops. Begin with \`browse open <url>\`.`
+    : browseEnv === "remote"
+      ? `Use **remote mode** (Browserbase) — anti-bot stealth, CAPTCHA solving, residential proxies:
 \`\`\`
 browse stop
 browse env remote
 \`\`\`
 Always run \`browse stop\` first to kill any existing local session before switching to remote.`
-    : `Use **local mode** — runs on local Chrome:
+      : `Use **local mode** — runs on local Chrome:
 \`\`\`
 browse env local
 \`\`\``;
@@ -391,6 +454,19 @@ async function main() {
   }
 
   const browseEnv = getArg("env", "local");
+  const maxTurnsArg = getArg("max-turns");
+  const MAX_TURNS = maxTurnsArg ? parseInt(maxTurnsArg, 10) : DEFAULT_MAX_TURNS;
+  if (!Number.isFinite(MAX_TURNS) || MAX_TURNS < 1) {
+    console.error(`ERROR: --max-turns must be a positive integer; got "${maxTurnsArg}".`);
+    process.exit(1);
+  }
+
+  // Pre-create a managed Browserbase session if BROWSERBASE_CONTEXT_ID is set.
+  // Falls back to the agent driving session setup itself when unset.
+  if (process.env.BROWSERBASE_CONTEXT_ID && browseEnv === "remote") {
+    MANAGED_SESSION_ID = preCreateBrowserbaseSession(process.env.BROWSERBASE_CONTEXT_ID);
+  }
+
   const client = new Anthropic();
   const runNumber = getNextRunNumber(tracesDir);
   const runId = `run-${String(runNumber).padStart(3, "0")}`;
@@ -400,7 +476,7 @@ async function main() {
 
   const strategy = fs.readFileSync(strategyFile, "utf-8");
   const task = fs.readFileSync(taskFile, "utf-8");
-  const systemPrompt = buildSystemPrompt(strategy, traceDir, browseEnv);
+  const systemPrompt = buildSystemPrompt(strategy, traceDir, browseEnv, MANAGED_SESSION_ID);
 
   console.error(`\n${"=".repeat(60)}`);
   console.error(`  AUTOBROWSE — ${taskName} — Run ${runNumber}`);