browserbase · shubh24 · May 30, 2026 · Jun 1, 2026 · Jun 1, 2026 · Jun 1, 2026
diff --git a/README.md b/README.md
@@ -19,6 +19,7 @@ This plugin includes the following skills (see `skills/` for details):
 | [fetch](skills/fetch/SKILL.md) | Fetch HTML or JSON from static pages without a browser session — inspect status codes, headers, follow redirects |
 | [search](skills/search/SKILL.md) | Search the web and return structured results (titles, URLs, metadata) without a browser session |
 | [ui-test](skills/ui-test/SKILL.md) | AI-powered adversarial UI testing — analyzes git diffs to test changes, or explores the full app to find bugs |
+| [browsability](skills/browsability/SKILL.md) | Score how usable a website is by an AI browser agent — Access Resistance (how much stealth/proxy/captcha help is needed), Drivability (do controls survive the accessibility-tree prune, iframe/shadow-DOM traps), and Agent tax (steps over the human baseline); emits a graded report with concrete fixes |
 
 ## Installation
 

diff --git a/skills/browsability/.gitignore b/skills/browsability/.gitignore
@@ -0,0 +1 @@
+browsability-out/
diff --git a/skills/browsability/LICENSE.txt b/skills/browsability/LICENSE.txt
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2026 Browserbase, Inc.
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/skills/browsability/SKILL.md b/skills/browsability/SKILL.md
@@ -0,0 +1,103 @@
+---
+name: browsability
+description: "Score how usable a website is BY AN AI BROWSER AGENT — its Browsability Index. Measures how little infrastructure assistance an agent needs to operate the site (Access Resistance), whether the agent can perceive and drive the live DOM (Drivability — does each control survive the accessibility-tree prune, are there iframe/shadow-DOM/deep-DOM traps), and how many more steps the agent needs than a human (Agent tax). Grounded in what the open-source Stagehand framework treats as hard. Use when the user asks how browsable / agent-friendly / agent-ready a website or a specific web flow (signup, checkout, search) is for a BROWSER agent, to compare sites on browser-agent usability, or to produce a browsability report card with concrete fixes. Triggers: 'how browsable is <site>', 'is this site agent-friendly for a browser agent', 'grade this checkout/signup flow for agents', 'browser-agent friendliness', 'DOM friction', 'browsability of <url>'. NOT for SEO/AEO or content discoverability (a different layer), and NOT for docs/SDK onboarding DX (use the agent-experience skill for that)."
+license: MIT
+metadata:
+  author: browserbase
+  version: "0.1.0"
+allowed-tools: Bash Read Write Edit Glob Grep Agent
+compatibility: "Requires `bun` and the browse CLI (`npm install -g @browserbasehq/browse-cli`). Remote mode needs BROWSERBASE_API_KEY. The full agent-ladder pass additionally needs a model-driven reference agent (use the `browser` skill as the driver)."
+---
+
+# Browsability — how usable is a site for a browser agent?
+
+Score how well an AI **browser** agent can *operate* a website. The opinion: *browsability is how
+little help an agent needs to succeed, and how much harder the site is for an agent than for a human.*
+This is the operability layer — not discoverability, so ignore `llms.txt`, sitemaps, SEO/AEO.
+
+**Before scoring, read `references/rubric.md`** — the full code-grounded rubric (axes, signals, the
+assistance ladder, the agent-vs-human delta, and remediation knowledge). The summary below is only the
+operating procedure.
+
+## The score (0–100)
+
+| Axis | Pts | Source |
+|---|---|---|
+| **A · Access Resistance** | 30 | lowest assistance rung that completes the task (agent ladder) |
+| **B1 · Reachability** | 25 | % of controls that survive the accessibility-tree prune (deterministic probe) |
+| **B3 · Structural traps** | 15 | cross-origin iframes, shadow DOM, DOM depth/size (deterministic probe) |
+| **C · Agent tax** | 20 | agent steps OVER the human baseline (the delta — not absolute click count) |
+| **D · Recoverability** | 10 | self-heal / site errors / blocking overlays / step ceiling (agent run) |
+
+Score only counts for tasks a verifier confirms actually completed. **Agent-native affordance** (an
+API / deep-link / structured action path) is a *ceiling badge*, not a scored component — flag it, do
+not add it to the number; this rubric measures operability of the UI.
+
+## Workflow
+
+### Step 1 — Drivability probe (always; deterministic, no model)
+
+Run the probe on the target URL (a page, or the entry point of a flow):
+
+```bash
+cd skills/browsability
+bun scripts/friction.ts <url> --out browsability-out
+```
+
+This loads the page through the browse CLI and reports **B1 reachability** + **B3 structural traps**
+straight from the live DOM (40 of 100 points). It needs no model and finishes in seconds. Use remote
+mode (`browse env remote`, needs `BROWSERBASE_API_KEY`) for bot-protected sites; local is fine
+otherwise. This alone is a useful friction profile and is the right answer for a quick assessment.
+
+### Step 2 — Agent ladder + tasks (for the full score)
+
+Derive a small set of **canonical tasks** for the site (informational / navigational / transactional —
+e.g. "find the price of the paid plan", "create an account", "submit the contact form"). For each
+task, run a reference browser agent across the **Access Resistance ladder** and record results:
+
+- **rung 0** vanilla headless — captcha-solving **off** (`solveCaptchas:false`), no proxy, no fingerprint
+- **rung 1** default assist — captcha-solving on
+- **rung 2** proxy + realistic fingerprint
+- **rung 3** advanced stealth + persisted context
+- **rung 4** maximum assistance
+
+Stop climbing once a task succeeds; the lowest passing rung is its Access Resistance. Drive the agent
+with the `browser` skill (the browse CLI) or Stagehand, and judge each run's `success` with a verifier
+— do not trust the agent's self-report. Capture **real step counts** and a **`humanBaselineSteps`**
+estimate per task so Agent tax is computed as the delta. Record into `tasks.json`:
+
+```json
+{ "url": "https://example.com",
+  "tasks": [
+    { "name": "Create an account", "type": "transactional", "humanBaselineSteps": 4,
+      "runs": [ {"rung":0,"success":false,"steps":10,"model":"<model>","note":"signup CTA unlabeled"},
+                {"rung":2,"success":true,"steps":7,"model":"<model>","note":""} ] } ] }
+```
+
+If no model-driven agent is available, act as the reference agent using the `browser` skill: execute
+each task's browse steps, count the steps, and write the runs into `tasks.json` honestly (mark
+single-model). This produces a real, if single-model, result.
+
+### Step 3 — Composite score + report
+
+```bash
+bun scripts/score.ts --friction browsability-out/friction.json --tasks tasks.json --out browsability-out
+```
+
+Writes `browsability-out/browsability.json` with the 0–100 score, grade, and per-axis breakdown. When
+`tasks.json` is absent it reports a **Drivability-only** score (B1 + B3, 40 max) and marks A/C/D
+pending — still honest, just partial.
+
+### Step 4 — Report to the user
+
+Present a **profile, not just a number**: the grade, the per-axis breakdown, the lowest passing rung,
+and — most usefully — a **ranked remediation list** drawn from the rubric's remediation table (e.g.
+"signup CTA has no accessible name → add `aria-label`; estimated lift +X"). Cite the concrete signal
+each finding came from.
+
+## Notes & gotchas
+
+- `solveCaptchas` defaults to **on** in Browserbase — an honest rung-0 must explicitly disable it, or rungs 0 and 1 collapse and captcha-walled sites get over-credited.
+- The deterministic probe approximates "closed shadow DOM" via custom-element count with zero open shadow hosts; treat it as a hint and confirm during the agent run.
+- Keep the human baseline honest — Agent tax is the *delta*, so a genuinely long workflow (10 steps for humans too) must not be penalized as un-browsable.
+- The scripts call `browse stop` on exit; if a daemon hangs, `pkill -f "browse.*daemon"`.
diff --git a/skills/browsability/references/rubric.md b/skills/browsability/references/rubric.md
@@ -0,0 +1,153 @@
+# The Browsability Rubric
+
+A code-grounded, operational definition of how usable a website is **by an AI browser agent** —
+and how to score it. Grounded in what the open-source [Stagehand](https://github.com/browserbase/stagehand)
+browser-automation framework actually treats as hard, plus the public Browserbase session settings.
+
+## The opinion, in one line
+
+**Browsability is how little help an agent needs to succeed** — and, more precisely, **how much
+harder the site is for an agent than for a motivated human.**
+
+It is *not* discoverability. Forget `llms.txt`, sitemaps, token efficiency, and SEO/AEO — those
+measure whether content can be *found and cited*. Browsability measures whether an agent can
+*operate* the live site: perceive the controls, drive the DOM, and complete a real task.
+
+It is measured **operationally** — by running real agent tasks and reading harness + session
+telemetry (which controls survived the accessibility tree, how many steps a flow took, which errors
+fired, how much stealth/proxy assistance was needed) — not by linting static HTML.
+
+> **Scope note:** this rubric covers *UI operability* — driving a website in a browser. It is the
+> sibling of, not a substitute for, auditing docs/SDK onboarding experience.
+
+## The key reframe: score the agent-vs-human delta, not absolute effort
+
+A 10-click checkout that also takes a human 10 clicks is *perfectly browsable* — that's just the
+workflow. A 3-click task that takes the agent 10 because controls are unlabeled is *not browsable* —
+those extra 7 clicks are the **agent tax**.
+
+Scoring the **delta over the human baseline** mathematically subtracts out UX/design length (which
+costs humans and agents equally) and isolates exactly the agent-specific penalty. This resolves the
+"is click-count a UX problem or a browsability problem?" question: only the *excess* over the human
+path counts.
+
+Stagehand surfaces a piece of this directly — a native `<select>` is a **one-step** action; a custom
+dropdown must be clicked open, re-snapshotted, then selected — a **two-step** action. That second
+step *is* agent tax: incidental inflation, not essential workflow.
+
+## The scored axes (+ one ceiling badge)
+
+| Axis | What it measures | Weight | In score? |
+|---|---|---|---|
+| **A · Access Resistance** | How much infrastructure assistance the agent needs to operate at all (the ladder) | 30 | ✅ |
+| **B1 · Reachability** | Can the agent perceive the controls (survive the accessibility-tree prune) | 25 | ✅ |
+| **B3 · Structural traps** | iframes, shadow DOM, DOM depth/size | 15 | ✅ |
+| **C · Agent tax** | Steps *above the human baseline* (incidental inflation only) | 20 | ✅ |
+| **D · Recoverability** | What happens when something breaks (self-heal, site errors, blocking overlays, step ceiling) | 10 | ✅ |
+| — Essential path length | Inherent workflow steps (humans pay too) | — | ❌ separate "Agent UX" lens |
+| — Agent-native affordance | An API / deep-link / structured action path exists | — | ⭐ ceiling badge, not scored |
+
+Agent-native affordance (offering a non-UI path so an agent need not drive the browser at all) is
+noted as the *ceiling*, not a scored component — this rubric deliberately measures **operability of
+the UI**, the realistic last mile for the large share of the web that is UI-only.
+
+Gate everything on a **success verdict** per task (a verifier, not the agent's self-report): friction
+and tax scores only count for tasks confirmed to have actually completed.
+
+---
+
+## Axis A — Access Resistance (the assistance ladder)
+
+Browserbase exposes public session settings, each mitigating a specific site-side obstacle. Re-run
+the *same task* climbing the ladder; the **lowest rung at which it succeeds** is the site's Access
+Resistance. Lower = more browsable.
+
+| Public setting | Mitigates |
+|---|---|
+| `solveCaptchas` | CAPTCHA challenges |
+| `proxies` | IP blocks, rate limits, geo-gating (residential / geo-targeted) |
+| `fingerprint` | headless-browser fingerprint detection |
+| `advancedStealth` | advanced anti-bot detection |
+| `context` (persist) | re-auth / re-consent walls; session continuity |
+
+The ladder to re-run a task across:
+
+- **L0 Vanilla headless** — captcha-solving **off**, no proxy, no fingerprint, fresh context. The agent looks like raw headless Chrome. *Passing here = maximally browsable.*
+- **L1 Default assist** — captcha-solving on, still no proxy/fingerprint.
+- **L2 Proxied + realistic fingerprint** — geo proxy + a realistic desktop fingerprint.
+- **L3 Advanced stealth + persisted context** — advanced anti-bot mitigation on; cookies persisted.
+- **L4 Maximum assistance** — top-tier anti-bot mitigation. *Needing this rung = barely browsable.*
+
+> **Gotcha:** `solveCaptchas` defaults to **on** in Browserbase, so an honest rung-0 baseline must
+> explicitly turn it off — otherwise L0 and L1 collapse and captcha-walled sites get over-credited.
+
+**Score:** `A = 30 * (1 - minPassingRung / 4)`.
+
+---
+
+## Axis B — Drivability (per-step technical difficulty)
+
+### B1 · Element reachability — can the agent even *see* the control?
+
+Stagehand builds an accessibility tree and **prunes any node that lacks all three of**: an accessible
+name, named children, or a non-structural role. An unlabeled `<div role="generic">` button is removed
+*before the model ever sees it.* The survival rule, from the open-source accessibility snapshot:
+
+```js
+// keep a node iff:
+const keep = !!(name && name.trim())        // it has an accessible name, OR
+          || !!(childIds && childIds.length) // it has named children, OR
+          || !isStructural(role);            // it has a real role (not generic/none/inlinetextbox)
+```
+
+- **Signal:** reachable-control ratio = interactive controls that survive the prune ÷ all interactive controls.
+- **Penalize:** icon-only buttons with no `aria-label`; `<div onclick>` controls; inputs with no associated `<label>`; closed-shadow custom components.
+- **Reward:** native semantic elements (`button`, `a[href]`, `input`, `select`) with text/labels — they always survive.
+
+### B3 · Structural traps — the hard walls
+
+| Trap | Why it hurts an agent |
+|---|---|
+| Closed shadow DOM | roots closed before instrumentation are effectively invisible |
+| Cross-origin iframes | short-lived, separately-managed frames that can drop out mid-operation |
+| Deep DOM (>256 levels) | serialization stack limits force shallower, slower retries |
+| Never-settling network | streaming / sub-second polling never reaches "network idle" → timeout every step |
+| Virtualized lists | no automatic "scroll until found"; an observe→scroll→observe loop is required |
+| Very large DOM | the serialized tree is truncated; elements past the cap become invisible |
+
+---
+
+## Axis C — Agent tax (steps over the human baseline)
+
+For each verifier-confirmed task: `agentTax = agentSteps - humanBaselineSteps`. Where a human baseline
+is unavailable, approximate the incidental inflation from the **two-step ratio** (custom controls the
+framework must expand-then-act on) plus needless modal steps. Only the *excess* counts; essential
+workflow length is reported separately as "Agent UX," not scored as browsability.
+
+---
+
+## Axis D — Recoverability — what happens when something breaks
+
+Stagehand's error taxonomy cleanly separates *site-caused* friction from agent-caused, and its
+self-heal path is the tell: on a stale selector (the DOM mutated under the agent) it re-snapshots and
+re-asks the model once. Frequent self-heal = an unstable, hostile DOM.
+
+- **Site-caused errors (penalize):** element-not-visible, selector-resolution failures, element-not-found, captcha timeouts, navigation timeouts.
+- **Blocking overlays (penalize):** cookie/consent walls, login walls, paywalls — not auto-dismissed; they eat steps or wall the flow entirely.
+- **Max-steps blowout:** agent loops have a default step budget; tasks that exhaust it score as failures.
+- **Signal:** self-heal count, site-caused-error count, overlay-encountered flag, whether the run hit the step ceiling.
+
+---
+
+## Remediation knowledge (turn findings into fixes)
+
+| Finding | Fix |
+|---|---|
+| Low reachable-ratio | add `aria-label` to icon-only controls; use semantic `<button>` / `<a>` |
+| Many custom dropdowns | use native `<select>` where possible |
+| Cross-origin iframes in the flow | same-origin embed, or a direct route |
+| Closed shadow DOM | open shadow roots, or expose semantic fallbacks |
+| Deep / very large DOM | flatten nesting, paginate, reduce node count |
+| High Access Resistance | reduce hostile bot-walls on agent-relevant flows |
+| High agent tax | collapse the funnel; remove needless modal steps |
+| (ceiling) UI-only | offer an API / deep-link / structured action path for agents |