diff --git a/.claude-plugin/marketplace.json b/.claude-plugin/marketplace.json index bba1a258..e778d0fc 100644 --- a/.claude-plugin/marketplace.json +++ b/.claude-plugin/marketplace.json @@ -24,6 +24,21 @@ "./skills/browser" ] }, + { + "name": "browser-swarm", + "source": "./", + "description": "Coordinate multiple browser agents in one real Chrome profile through a Chrome extension bridge, colored tab group, and target-bound browse CLI endpoints.", + "version": "0.0.1", + "author": { + "name": "Browserbase" + }, + "category": "automation", + "keywords": ["browser", "swarm", "chrome-extension", "tab-groups", "stagehand", "understudy", "browse-cli"], + "strict": false, + "skills": [ + "./skills/browser-swarm" + ] + }, { "name": "functions", "source": "./", diff --git a/README.md b/README.md index 723da7ca..7bddebd9 100644 --- a/README.md +++ b/README.md @@ -9,6 +9,7 @@ This plugin includes the following skills (see `skills/` for details): | Skill | Description | |-------|-------------| | [browser](skills/browser/SKILL.md) | Automate web browser interactions via CLI commands — supports remote Browserbase sessions with anti-bot stealth, CAPTCHA solving, and residential proxies | +| [browser-swarm](skills/browser-swarm/SKILL.md) | Coordinate multiple browser agents in one real Chrome profile through a Chrome extension bridge, colored tab group, and target-bound browse CLI endpoints | | [browserbase-cli](skills/browserbase-cli/SKILL.md) | Use the official `bb` CLI for Browserbase Functions and platform API workflows including sessions, projects, contexts, extensions, fetch, and dashboard | | [functions](skills/functions/SKILL.md) | Deploy serverless browser automation to Browserbase cloud using the `bb` CLI | | [site-debugger](skills/site-debugger/SKILL.md) | Diagnose and fix failing browser automations — analyzes bot detection, selectors, timing, auth, and captchas, then generates a tested site playbook | diff --git a/skills/browser-swarm/RUNNING_TEST_NOTES.md b/skills/browser-swarm/RUNNING_TEST_NOTES.md new file mode 100644 index 00000000..fb34a898 --- /dev/null +++ b/skills/browser-swarm/RUNNING_TEST_NOTES.md @@ -0,0 +1,179 @@ +# Browser Swarm Running Test Notes + +This file tracks issues found while stress-testing browser-swarm and the evidence used to close them. + +## Fixed Issues + +### Arc tab grouping crashes Arc + +- Repro: Arc 1.146.0 crashed after the grouped path called `chrome.tabGroups.query` / `chrome.tabs.group`. +- Fix: Added `--no-group` mode and documented Arc as no-group only. +- Evidence: Arc no-group smoke and multi-tab real-world research both passed without extension warnings or an Arc crash. + +### Disposable Chrome relay port conflicts with installed real-browser extension + +- Repro: Arc's installed browser-swarm extension can connect to the default relay port while disposable Chrome tests are running. +- Fix: Added `--relay-port` / `BROWSER_SWARM_PORT` support and temporary extension patching in `launch-chrome.mjs`. +- Evidence: `BROWSER_SWARM_PORT=19990 BROWSER_SWARM_BROWSE_BIN=/Users/shrey/Developer/cli/bin/run.js npm run e2e` launched disposable Chrome with `/tmp/browser-swarm-extension-19990` and passed. + +### E2E write assertions treated `browse get` results as strings + +- Repro: `browse get title/text/value` returns JSON objects like `{ "title": "..." }`, `{ "text": "..." }`, and `{ "value": "..." }`; the new write test initially compared the whole object. +- Fix: Added `parseGetField()` to assert the specific returned field. +- Evidence: Parser checks passed and the e2e progressed to real browser behavior assertions. + +### Parallel same-page input submission was flaky + +- Repro: Three target-bound worker sessions on the same URL/title filled distinct values in parallel, but parallel submit via `fill --press-enter`, explicit `press Enter`, or `click #submit` left the third tab unsubmitted before the transport fix. +- Root cause: Chrome's visible input surface needs tab activation/serialization for `Input.*` CDP commands when multiple background tabs receive input concurrently through the extension debugger transport. +- Fix: The extension now activates the owning tab and serializes forwarded `Input.*` commands before sending them through `chrome.debugger.sendCommand`. +- Evidence: The Chrome e2e now passes with three same-page worker tabs, parallel `fill`, parallel `click #submit`, distinct title/text/value results, and one visible target per worker. + +### Worker prompts allowed invalid browse command probes + +- Repro: In one live Codex worker stress, beta/gamma successfully mutated their assigned tabs but tried invalid commands while collecting evidence (`screenshot ` and `pages`) before retrying with valid commands. +- Fix: Tightened the worker contract with explicit command-shape guidance: do not probe commands during the run, use `tab list`, and use `screenshot --path `. +- Evidence: Strict rerun on Chrome relay port `19993` spawned three real Codex workers in parallel. All returned `status: success`, used one target each, reported no errors, and the main harness verified distinct DOM state for all three same-URL tabs. + +### Real-browser setup accepted stale extension workers + +- Repro: Arc connected to the relay and `/health` returned `extensionConnected: true`, but the active service worker reported version `0.1.0` while the unpacked manifest was `0.1.1`. +- Fix: `setup-real-browser.mjs` now reads the unpacked manifest, prints the expected extension version/worker, and exits `3` with `versionMatches: false` when the connected worker version is stale. +- Evidence: `node scripts/setup-real-browser.mjs --browser arc --no-open --no-start-relay --timeout 2 --json` against the stale Arc worker exited `3` and reported `versionMatches: false`; the same helper against disposable Chrome on relay port `19995` exited `0` with `versionMatches: true`. + +### Arc MV3 service worker script URL stayed cache-stale + +- Repro: Arc continued to reconnect with active worker version `0.1.0` even after the unpacked manifest and installed skill reported `0.1.1`. +- Fix: Versioned the background service-worker filename to `service-worker-v0-1-1.js` in `manifest.json` while leaving `service-worker.js` as a tiny compatibility wrapper. +- Evidence: `BROWSER_SWARM_PORT=20013 BROWSER_SWARM_BROWSE_BIN= npm run e2e` passed in disposable Chrome with extension version `0.1.1`, proving the launcher now patches the manifest-declared worker file for non-default relay ports. Opening a temporary extension page that called `chrome.runtime.reload()` still made Arc reconnect with active worker `0.1.0`, so Arc still requires a browser-level service-worker refresh or restart. + +### Review-reported relay/setup hardening issues + +- Repro: targeted review found several real edge cases: worker endpoints could forward `Target.createTarget` / `Target.closeTarget` if a `sessionId` was present, unknown `sessionId` messages fell back to the first visible worker target, extension reconnects could leave stale relay targets, the relay `screenshot --path` CLI ignored `--path`, synthetic event failures could send an error after a successful response, `setup-real-browser` leaked the parent log file descriptor and accepted unknown browsers after spreading `undefined`, and `launch-chrome` accepted missing bare executable names before reaching the "Chrome not found" error. +- Fix: blocked worker lifecycle commands before the session forwarding path, made unknown sessions fail closed, clear targets on extension `hello`, write relay CLI screenshots to `--path`, warn instead of failing on synthetic event emission, close the parent relay log fd after spawn, validate browser names before spreading config, and only accept bare Chrome candidates when they exist on `PATH`. +- Evidence: `BROWSER_SWARM_PORT=19997 BROWSER_SWARM_BROWSE_BIN= npm run e2e` passed with extra probes for `Target.createTarget` / `Target.closeTarget` with an attached session, unknown session fallback, relay CLI screenshot writing (`1509844` bytes), one-target visibility, public read tasks, and three same-page parallel `fill` + `click #submit` write tasks. `node scripts/setup-real-browser.mjs --browser not-a-browser --no-open --no-start-relay --no-wait` now exits `1` with the supported browser list. + +### Session-scoped Target command and root closeTarget regressions + +- Repro: follow-up review found that a worker could send `Target.getTargets`, `Target.getTargetInfo`, or `Target.attachToTarget` with its own synthetic `sessionId` and skip the no-session isolation switch, forwarding the command to Chrome raw. It also found that root `Target.closeTarget` relied on an extension detach event that could be skipped after `chrome.tabs.remove`, leaving stale relay targets. +- Fix: relay now scopes `Target.getTargets`, `Target.getTargetInfo`, and `Target.attachToTarget` before the session forwarding path and validates any provided `sessionId`. Root `Target.closeTarget` removes the closed target from the relay map and broadcasts a detach after a successful close response. +- Evidence: `BROWSER_SWARM_PORT=20000 BROWSER_SWARM_BROWSE_BIN= npm run e2e` passed. The run verified session-scoped `Target.getTargets` saw exactly one worker target, session-scoped sibling attach/info errored, root create increased target count `3 -> 4`, root close reduced it `4 -> 3`, relay screenshot wrote `1510778` bytes, and the three same-page parallel write tasks still passed. + +### Worker shell selector quoting + +- Repro: during a latest-head mixed Codex + Claude Code Chrome smoke, the Codex worker's first `fill #box ...` shell invocation treated `#box` as a shell comment and failed before retrying with a quoted selector. +- Fix: tightened the worker contract to explicitly quote CSS selectors that contain shell-special characters, such as `"#box"` and `"#submit"`, whenever invoking `browse` through a shell. +- Evidence: the same worker retried with a quoted selector and completed successfully. The main harness verified the Codex worker tab title/text/value as `latest-head-codex-worker`, the Claude Code worker tab title/text/value as `latest-head-claude-worker`, and exactly one visible tab per target-bound endpoint. + +### Claude worker reported an unscoped tab count + +- Repro: in the current-head mixed Codex + Claude Code Chrome smoke on relay port `20006`, the Claude worker successfully filled and submitted its target-bound tab but reported `tabCount: 2` after using non-contract raw/relay-level inspection instead of `browse tab list`. +- Fix: tightened the worker contract to forbid raw WebSocket scripts, `curl` relay endpoints, and `/swarm/*` admin endpoint probes from worker agents. Worker `tabCount` must come from `browse tab list` on the assigned target-bound endpoint and should be `1`. +- Evidence: the top-level harness independently connected to both target-bound endpoints and verified each returned exactly one visible target. It also verified distinct DOM state for Codex (`current-head-codex-worker-87267b6`) and Claude Code (`current-head-claude-worker-87267b6`) on identical same-page tabs. + +### Long browse session names can block worker startup + +- Repro: after tightening the Claude Code worker prompt to use only browse CLI commands, a first-command `browse fill` with session names like `browser-swarm-current-claude-strict-90398373` hung until the CLI returned `Driver daemon socket was not ready after 30000ms`. +- Fix: shortened the recommended session naming pattern to `bs-