browserbase · aq17 · May 14, 2026 · May 14, 2026 · Jun 3, 2026 · Jun 3, 2026
diff --git a/skills/autobrowse/SKILL.md b/skills/autobrowse/SKILL.md
@@ -349,6 +349,99 @@ Write the file `./autobrowse/reports/YYYY-MM-DD-HH-MM-<tasks>.md` with:
 
 ---
 
+## Export to deterministic Playwright
+
+Once a task has graduated, you can collapse the LLM-driven replay loop into a single deterministic TypeScript script via the `export` subcommand. The export mines the most recent passing run's `trace.json`, resolves session-scoped ARIA refs against the snapshots they came from, and emits a Playwright script that connects to a fresh Browserbase session (optionally bound to a persistent context).
+
+```bash
+# Default — generate and verify against the latest passing run
+node ${CLAUDE_SKILL_DIR}/scripts/export.mjs --task <task-name>
+
+# Custom workspace / specific run / skip verification
+node ${CLAUDE_SKILL_DIR}/scripts/export.mjs --task <task-name> --workspace ./autobrowse
+node ${CLAUDE_SKILL_DIR}/scripts/export.mjs --task <task-name> --run run-022
+node ${CLAUDE_SKILL_DIR}/scripts/export.mjs --task <task-name> --no-verify
+```
+
+The export writes to `<workspace>/tasks/<task>/playwright/`:
+
+- `<task>.ts` — runnable Playwright script. Connects to Browserbase via `chromium.connectOverCDP` when `BROWSERBASE_CONTEXT_ID` is set; falls back to local Chromium otherwise.
+- `selectors.cache.json` — resolved locators + ranked fallbacks per action. Used by future self-healing tooling.
+- `package.json`, `tsconfig.json` — minimal scaffold with `playwright`, `zod`, `tsx`, `dotenv`.
+
+How refs are resolved: every `[X-Y]` ref in the trace is looked up against the most recent prior `browse snapshot` containing it. The matched node's role + accessible name are turned into a ranked list of Playwright locator candidates — `getByRole({ name })` first, then `getByLabel` / `getByPlaceholder` for form inputs, then `getByText`, then bare `getByRole`. The best candidate is emitted inline; lower-ranked candidates are saved to `selectors.cache.json` for self-healing.
+
+The final extract step is generated with one Claude Haiku call at export time (requires `ANTHROPIC_API_KEY`). The LLM is given the final snapshot, the Zod schema parsed from `task.md`'s `## Output` block, and the agent's final reasoning. If the API key is missing the export still produces a script — the extract block is a TODO placeholder.
+
+For a Stagehand-targeted export (LLM-driven replay via `stagehand.act`/`observe`), use the standalone `/stagehand-export` skill.
+
+## Iterative Playwright loop (recommended for tasks that need a deterministic artifact)
+
+When the end goal is a runnable Playwright script (cron, Browserbase Functions, etc.), prefer `loop.mjs` over manually orchestrating evaluate + export. The loop converges on a workflow that **both** the LLM explorer **and** the deterministic Playwright replay can complete — which is a strictly stronger guarantee than "the LLM agent's trace ends with success: true."
+
+```bash
+node ${CLAUDE_SKILL_DIR}/scripts/loop.mjs --task <task-name> --env remote \
+  --max-iterations 8 --max-turns-per-iter 60
+```
+
+What it does per iteration:
+
+1. Runs `evaluate.mjs` (one LLM-driven exploration round).
+2. If the trace passed (`success: true` in the final JSON), runs `export.mjs --target playwright --no-verify` to emit a fresh script.
+3. Runs the emitted script (`npx tsx <task>.ts`) against a new BB session — the actual deterministic replay.
+4. If the Playwright replay passed → records a pass. If it failed → distills the failure (Claude Haiku, ~$0.01) into a new entry under `strategy.md`'s "Recent Playwright Failures" section.
+5. Next iteration's evaluate reads the updated strategy.md and adapts.
+
+**Convergence**: graduates when the emitted script passes in 2 of the last 3 iterations.
+
+### Strategy.md sections
+
+The loop expects (and the distiller maintains) this structure:
+
+```markdown
+# <task> Navigation Strategy
+
+## Navigation Heuristics
+(prose for the LLM explorer — fast-path URLs, timing notes, step sequences)
+
+## Codegen Hints
+(per-task overrides for the Playwright emitter — e.g., "use force:true for all radios on this site")
+
+## Recent Playwright Failures
+### Iteration 3 — <one-line>
+- **What failed**: ...
+- **Likely cause**: ...
+- **Fix to try next iteration**: ...
+```
+
+The emitter (`codegen-playwright.mjs`) bakes in baseline defaults for the most common state-portal patterns: `forceCheck` for checkbox `fill_sel` ops, `forceClickRadio` for radio click ops, `selectWithFallback` (JS-enable + native setter) for every `select_dropdown`, and a `reactFill` helper for inputs that need to bypass keystroke-by-keystroke event handling.
+
+### When to use `loop.mjs` vs `evaluate.mjs` directly
+
+- **Use `loop.mjs`** when you want a Playwright script as the deliverable. Costs more per iteration (each adds a script export + replay) but converges on something that actually replays in prod.
+- **Use `evaluate.mjs`** when you want a `/<task>` skill that future Claude sessions invoke (the original autobrowse flow). Cheaper, doesn't generate a Playwright script.
+
+---
+
+### Pre-authed sessions via persistent context
+
+For tasks that need authentication, create a Browserbase context once, log in interactively, and point autobrowse at it via the env var:
+
+```bash
+# One-time: create a context, log into the target site via live-view
+bb contexts create --project-id $BROWSERBASE_PROJECT_ID --json
+
+# Then, every autobrowse run for this task reuses the cached cookies/storage
+export BROWSERBASE_CONTEXT_ID=<id-from-above>
+node ${CLAUDE_SKILL_DIR}/scripts/evaluate.mjs --task <name> --env remote
+```
+
+When `BROWSERBASE_CONTEXT_ID` is set with `--env remote`, evaluate.mjs creates one BB session bound to that context before the agent loop, transparently injects `--connect <session-id>` into every browse command the agent issues, and releases the session at exit. The agent's `browse env` / `browse stop` / `browse status` calls become no-ops in this mode. Iterations skip the per-run login dance.
+
+The same env var, when set at runtime for the exported script, makes it attach to the same persisted context.
+
+---
+
 ## Rules
 
 - **Only edit `strategy.md`** — never touch `task.md` (unless creating it from the template) or `evaluate.mjs`

diff --git a/skills/autobrowse/scripts/export.mjs b/skills/autobrowse/scripts/export.mjs
@@ -0,0 +1,203 @@
+#!/usr/bin/env node
+
+/**
+ * export.mjs — Translate a graduated autobrowse task into a deterministic
+ * runnable script.
+ *
+ * Currently supports --target playwright. The Stagehand variant lives in
+ * the standalone /stagehand-export skill; once Playwright is shipped and
+ * proven we can fold both targets behind this CLI.
+ *
+ * Usage:
+ *   node scripts/export.mjs --task <name> --target playwright \\
+ *        [--workspace ./autobrowse] [--run run-NNN] \\
+ *        [--output <dir>] [--no-verify]
+ */
+
+import "dotenv/config";
+import * as fs from "node:fs";
+import * as path from "node:path";
+
+import { pickRun, listRuns } from "./lib/pick-run.mjs";
+import { taskToSchema, parseStrategySections } from "./lib/parse-task.mjs";
+import { walkTrace } from "./lib/command-mapping.mjs";
+import {
+  generatePlaywrightScript,
+  playwrightPackageJson,
+  playwrightTsconfig,
+} from "./lib/codegen-playwright.mjs";
+import { verifyGenerated } from "./lib/verify.mjs";
+
+// ── CLI args ───────────────────────────────────────────────────────
+
+function getArg(name, fallback) {
+  const i = process.argv.indexOf(`--${name}`);
+  return i !== -1 && process.argv[i + 1] ? process.argv[i + 1] : fallback;
+}
+const hasFlag = (n) => process.argv.includes(`--${n}`);
+
+if (hasFlag("help") || hasFlag("h")) {
+  console.log(`autobrowse export — generate deterministic replay scripts from autobrowse traces
+
+Usage: node scripts/export.mjs --task <name> [options]
+
+Options:
+  --task <name>          Task name — matches tasks/<name>/ (required)
+  --target <kind>        playwright (default; stagehand lives in /stagehand-export)
+  --workspace <dir>      Workspace root holding tasks/ and traces/ (default: ./autobrowse)
+  --run <id>             Force a specific run (default: newest passing)
+  --output <dir>         Output directory for generated files (default: <workspace>/tasks/<name>/<target>)
+  --no-verify            Skip the npm install + tsx run verification step
+
+Env:
+  ANTHROPIC_API_KEY      Used for LLM-generated extract block. If unset, a TODO placeholder is emitted.
+  BROWSERBASE_*          Pass through to the generated script at runtime.
+
+Exit codes: 0 generated+verified, 2 generated but verify failed (or --no-verify), 1 generator error.`);
+  process.exit(0);
+}
+
+const TASK = getArg("task");
+const TARGET = getArg("target", "playwright");
+const WORKSPACE = path.resolve(getArg("workspace", "autobrowse"));
+const FORCED_RUN = getArg("run");
+const VERIFY = !hasFlag("no-verify");
+const OUTPUT = getArg("output");
+
+if (!TASK) {
+  console.error("ERROR: --task <name> is required");
+  console.error("Run with --help for usage.");
+  process.exit(1);
+}
+if (TARGET !== "playwright") {
+  console.error(`ERROR: --target=${TARGET} not yet supported here. Use the /stagehand-export skill for Stagehand output.`);
+  process.exit(1);
+}
+
+// ── Locate sources ────────────────────────────────────────────────
+
+const taskDir = path.join(WORKSPACE, "tasks", TASK);
+const tracesDir = path.join(WORKSPACE, "traces", TASK);
+const outDir = OUTPUT ? path.resolve(OUTPUT) : path.join(taskDir, TARGET);
+
+const taskFile = path.join(taskDir, "task.md");
+const strategyFile = path.join(taskDir, "strategy.md");
+
+for (const [label, file] of [["task.md", taskFile], ["strategy.md", strategyFile]]) {
+  if (!fs.existsSync(file)) {
+    console.error(`ERROR: ${label} not found at ${file} — run autobrowse first.`);
+    process.exit(1);
+  }
+}
+if (!fs.existsSync(tracesDir)) {
+  console.error(`ERROR: no traces at ${tracesDir} — run autobrowse first.`);
+  process.exit(1);
+}
+
+const runId = pickRun(tracesDir, FORCED_RUN);
+if (!runId) {
+  console.error(`ERROR: no passing runs found in ${tracesDir}.`);
+  console.error("Graduate the task with autobrowse first, or pass --run <id> to force.");
+  console.error("Available runs:", listRuns(tracesDir).join(", ") || "(none)");
+  process.exit(1);
+}
+
+const runDir = path.join(tracesDir, runId);
+const tracePath = path.join(runDir, "trace.json");
+if (!fs.existsSync(tracePath)) {
+  console.error(`ERROR: trace.json missing at ${tracePath}`);
+  process.exit(1);
+}
+
+console.error(`[export] task=${TASK} target=${TARGET} run=${runId} workspace=${WORKSPACE}`);
+
+const trace = JSON.parse(fs.readFileSync(tracePath, "utf-8"));
+const taskMd = fs.readFileSync(taskFile, "utf-8");
+const strategyMd = fs.readFileSync(strategyFile, "utf-8");
+
+// ── Schema + sections ──────────────────────────────────────────────
+
+const { outputShape, zodSchema, schemaFieldCount } = taskToSchema(taskMd);
+const sections = parseStrategySections(strategyMd);
+const ops = walkTrace(trace, sections);
+
+// Find the agent's final natural-language summary (for LLM extract grounding).
+let finalReasoning = "";
+for (let i = trace.length - 1; i >= 0; i--) {
+  if (trace[i].role === "assistant" && trace[i].reasoning) {
+    finalReasoning = trace[i].reasoning;
+    break;
+  }
+}
+
+// ── Generate Playwright script ─────────────────────────────────────
+
+const { scriptCode, cachedActions, stats, extract } = await generatePlaywrightScript({
+  task: TASK,
+  runId,
+  workspace: WORKSPACE,
+  trace,
+  ops,
+  zodSchema,
+  outputShape,
+  taskMd,
+  finalReasoning,
+});
+
+// ── Write outputs ──────────────────────────────────────────────────
+
+fs.mkdirSync(outDir, { recursive: true });
+const scriptPath = path.join(outDir, `${TASK}.ts`);
+const cachePath = path.join(outDir, "selectors.cache.json");
+const pkgPath = path.join(outDir, "package.json");
+const tsconfigPath = path.join(outDir, "tsconfig.json");
+
+fs.writeFileSync(scriptPath, scriptCode);
+fs.writeFileSync(
+  cachePath,
+  JSON.stringify(
+    {
+      task: TASK,
+      target: TARGET,
+      generated_from: { workspace: WORKSPACE, run: runId },
+      stats,
+      extract,
+      actions: cachedActions,
+    },
+    null,
+    2,
+  ),
+);
+if (!fs.existsSync(pkgPath)) {
+  fs.writeFileSync(pkgPath, JSON.stringify(playwrightPackageJson(TASK), null, 2));
+}
+if (!fs.existsSync(tsconfigPath)) {
+  fs.writeFileSync(tsconfigPath, JSON.stringify(playwrightTsconfig(), null, 2));
+}
+
+console.error(`[export] wrote ${path.relative(process.cwd(), scriptPath)}`);
+console.error(`[export] ops: ${ops.length} | cached: ${stats.cached} | ref_resolved: ${stats.ref_resolved} | ref_failed: ${stats.ref_failed} | dropped: ${stats.dropped}`);
+console.error(`[export] schema fields: ${schemaFieldCount} | extract: ${extract.generated ? "LLM-generated" : `fallback (${extract.reason})`}`);
+
+// ── Verify ─────────────────────────────────────────────────────────
+
+const baseReport = {
+  task: TASK,
+  target: TARGET,
+  run: runId,
+  script: scriptPath,
+  cache: cachePath,
+  stats,
+  schema_fields: schemaFieldCount,
+  extract: { generated: extract.generated, reason: extract.reason },
+};
+
+if (!VERIFY) {
+  console.log(JSON.stringify({ ...baseReport, verified: false }, null, 2));
+  process.exit(0);
+}
+
+const v = verifyGenerated(outDir, `${TASK}.ts`);
+const report = { ...baseReport, verified: true, passed: v.passed, exit_code: v.exit_code, run_log: v.run_log, output: v.output };
+console.log(JSON.stringify(report, null, 2));
+process.exit(v.passed ? 0 : 2);