From 9c30e8053a91fcc61244aee681736b2d36cf75f8 Mon Sep 17 00:00:00 2001 From: shubh24 Date: Fri, 29 May 2026 18:54:43 -0700 Subject: [PATCH 1/5] feat(browsability): score how usable a site is for a browser agent MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adds the `browsability` skill — an operational rubric for how well an AI *browser* agent can drive a website's UI (the sibling of agent-experience, which covers docs/SDK onboarding DX). Scores 0–100 across: A Access Resistance — lowest assistance rung (stealth/proxy/captcha ladder) a task needs to complete B1 Reachability — % of controls that survive the accessibility-tree prune B3 Structural traps — cross-origin iframes, shadow DOM, DOM depth/size C Agent tax — agent steps OVER the human baseline (delta, not absolute) D Recoverability — self-heal / site errors / blocking overlays / step ceiling The Drivability slice (B1+B3) runs deterministically from one page load via scripts/friction.ts (no model). The full score adds an agent run across the assistance ladder via scripts/score.ts. Rubric grounded in what the open-source Stagehand framework treats as hard; uses only public Browserbase session settings. Co-Authored-By: Claude Opus 4.8 (1M context) --- README.md | 1 + skills/browsability/.gitignore | 1 + skills/browsability/LICENSE.txt | 21 ++++ skills/browsability/SKILL.md | 103 +++++++++++++++ skills/browsability/references/rubric.md | 153 +++++++++++++++++++++++ skills/browsability/scripts/friction.ts | 93 ++++++++++++++ skills/browsability/scripts/score.ts | 97 ++++++++++++++ 7 files changed, 469 insertions(+) create mode 100644 skills/browsability/.gitignore create mode 100644 skills/browsability/LICENSE.txt create mode 100644 skills/browsability/SKILL.md create mode 100644 skills/browsability/references/rubric.md create mode 100644 skills/browsability/scripts/friction.ts create mode 100644 skills/browsability/scripts/score.ts diff --git a/README.md b/README.md index 03d44f26..c9631b33 100644 --- a/README.md +++ b/README.md @@ -19,6 +19,7 @@ This plugin includes the following skills (see `skills/` for details): | [fetch](skills/fetch/SKILL.md) | Fetch HTML or JSON from static pages without a browser session — inspect status codes, headers, follow redirects | | [search](skills/search/SKILL.md) | Search the web and return structured results (titles, URLs, metadata) without a browser session | | [ui-test](skills/ui-test/SKILL.md) | AI-powered adversarial UI testing — analyzes git diffs to test changes, or explores the full app to find bugs | +| [browsability](skills/browsability/SKILL.md) | Score how usable a website is by an AI browser agent — Access Resistance (how much stealth/proxy/captcha help is needed), Drivability (do controls survive the accessibility-tree prune, iframe/shadow-DOM traps), and Agent tax (steps over the human baseline); emits a graded report with concrete fixes | ## Installation diff --git a/skills/browsability/.gitignore b/skills/browsability/.gitignore new file mode 100644 index 00000000..4d5a7f03 --- /dev/null +++ b/skills/browsability/.gitignore @@ -0,0 +1 @@ +browsability-out/ diff --git a/skills/browsability/LICENSE.txt b/skills/browsability/LICENSE.txt new file mode 100644 index 00000000..f2f43974 --- /dev/null +++ b/skills/browsability/LICENSE.txt @@ -0,0 +1,21 @@ +MIT License + +Copyright (c) 2026 Browserbase, Inc. + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. diff --git a/skills/browsability/SKILL.md b/skills/browsability/SKILL.md new file mode 100644 index 00000000..ece2a2a0 --- /dev/null +++ b/skills/browsability/SKILL.md @@ -0,0 +1,103 @@ +--- +name: browsability +description: "Score how usable a website is BY AN AI BROWSER AGENT — its Browsability Index. Measures how little infrastructure assistance an agent needs to operate the site (Access Resistance), whether the agent can perceive and drive the live DOM (Drivability — does each control survive the accessibility-tree prune, are there iframe/shadow-DOM/deep-DOM traps), and how many more steps the agent needs than a human (Agent tax). Grounded in what the open-source Stagehand framework treats as hard. Use when the user asks how browsable / agent-friendly / agent-ready a website or a specific web flow (signup, checkout, search) is for a BROWSER agent, to compare sites on browser-agent usability, or to produce a browsability report card with concrete fixes. Triggers: 'how browsable is ', 'is this site agent-friendly for a browser agent', 'grade this checkout/signup flow for agents', 'browser-agent friendliness', 'DOM friction', 'browsability of '. NOT for SEO/AEO or content discoverability (a different layer), and NOT for docs/SDK onboarding DX (use the agent-experience skill for that)." +license: MIT +metadata: + author: browserbase + version: "0.1.0" +allowed-tools: Bash Read Write Edit Glob Grep Agent +compatibility: "Requires `bun` and the browse CLI (`npm install -g @browserbasehq/browse-cli`). Remote mode needs BROWSERBASE_API_KEY. The full agent-ladder pass additionally needs a model-driven reference agent (use the `browser` skill as the driver)." +--- + +# Browsability — how usable is a site for a browser agent? + +Score how well an AI **browser** agent can *operate* a website. The opinion: *browsability is how +little help an agent needs to succeed, and how much harder the site is for an agent than for a human.* +This is the operability layer — not discoverability, so ignore `llms.txt`, sitemaps, SEO/AEO. + +**Before scoring, read `references/rubric.md`** — the full code-grounded rubric (axes, signals, the +assistance ladder, the agent-vs-human delta, and remediation knowledge). The summary below is only the +operating procedure. + +## The score (0–100) + +| Axis | Pts | Source | +|---|---|---| +| **A · Access Resistance** | 30 | lowest assistance rung that completes the task (agent ladder) | +| **B1 · Reachability** | 25 | % of controls that survive the accessibility-tree prune (deterministic probe) | +| **B3 · Structural traps** | 15 | cross-origin iframes, shadow DOM, DOM depth/size (deterministic probe) | +| **C · Agent tax** | 20 | agent steps OVER the human baseline (the delta — not absolute click count) | +| **D · Recoverability** | 10 | self-heal / site errors / blocking overlays / step ceiling (agent run) | + +Score only counts for tasks a verifier confirms actually completed. **Agent-native affordance** (an +API / deep-link / structured action path) is a *ceiling badge*, not a scored component — flag it, do +not add it to the number; this rubric measures operability of the UI. + +## Workflow + +### Step 1 — Drivability probe (always; deterministic, no model) + +Run the probe on the target URL (a page, or the entry point of a flow): + +```bash +cd skills/browsability +bun scripts/friction.ts --out browsability-out +``` + +This loads the page through the browse CLI and reports **B1 reachability** + **B3 structural traps** +straight from the live DOM (40 of 100 points). It needs no model and finishes in seconds. Use remote +mode (`browse env remote`, needs `BROWSERBASE_API_KEY`) for bot-protected sites; local is fine +otherwise. This alone is a useful friction profile and is the right answer for a quick assessment. + +### Step 2 — Agent ladder + tasks (for the full score) + +Derive a small set of **canonical tasks** for the site (informational / navigational / transactional — +e.g. "find the price of the paid plan", "create an account", "submit the contact form"). For each +task, run a reference browser agent across the **Access Resistance ladder** and record results: + +- **rung 0** vanilla headless — captcha-solving **off** (`solveCaptchas:false`), no proxy, no fingerprint +- **rung 1** default assist — captcha-solving on +- **rung 2** proxy + realistic fingerprint +- **rung 3** advanced stealth + persisted context +- **rung 4** maximum assistance + +Stop climbing once a task succeeds; the lowest passing rung is its Access Resistance. Drive the agent +with the `browser` skill (the browse CLI) or Stagehand, and judge each run's `success` with a verifier +— do not trust the agent's self-report. Capture **real step counts** and a **`humanBaselineSteps`** +estimate per task so Agent tax is computed as the delta. Record into `tasks.json`: + +```json +{ "url": "https://example.com", + "tasks": [ + { "name": "Create an account", "type": "transactional", "humanBaselineSteps": 4, + "runs": [ {"rung":0,"success":false,"steps":10,"model":"","note":"signup CTA unlabeled"}, + {"rung":2,"success":true,"steps":7,"model":"","note":""} ] } ] } +``` + +If no model-driven agent is available, act as the reference agent using the `browser` skill: execute +each task's browse steps, count the steps, and write the runs into `tasks.json` honestly (mark +single-model). This produces a real, if single-model, result. + +### Step 3 — Composite score + report + +```bash +bun scripts/score.ts --friction browsability-out/friction.json --tasks tasks.json --out browsability-out +``` + +Writes `browsability-out/browsability.json` with the 0–100 score, grade, and per-axis breakdown. When +`tasks.json` is absent it reports a **Drivability-only** score (B1 + B3, 40 max) and marks A/C/D +pending — still honest, just partial. + +### Step 4 — Report to the user + +Present a **profile, not just a number**: the grade, the per-axis breakdown, the lowest passing rung, +and — most usefully — a **ranked remediation list** drawn from the rubric's remediation table (e.g. +"signup CTA has no accessible name → add `aria-label`; estimated lift +X"). Cite the concrete signal +each finding came from. + +## Notes & gotchas + +- `solveCaptchas` defaults to **on** in Browserbase — an honest rung-0 must explicitly disable it, or rungs 0 and 1 collapse and captcha-walled sites get over-credited. +- The deterministic probe approximates "closed shadow DOM" via custom-element count with zero open shadow hosts; treat it as a hint and confirm during the agent run. +- Keep the human baseline honest — Agent tax is the *delta*, so a genuinely long workflow (10 steps for humans too) must not be penalized as un-browsable. +- The scripts call `browse stop` on exit; if a daemon hangs, `pkill -f "browse.*daemon"`. diff --git a/skills/browsability/references/rubric.md b/skills/browsability/references/rubric.md new file mode 100644 index 00000000..1fb76d85 --- /dev/null +++ b/skills/browsability/references/rubric.md @@ -0,0 +1,153 @@ +# The Browsability Rubric + +A code-grounded, operational definition of how usable a website is **by an AI browser agent** — +and how to score it. Grounded in what the open-source [Stagehand](https://github.com/browserbase/stagehand) +browser-automation framework actually treats as hard, plus the public Browserbase session settings. + +## The opinion, in one line + +**Browsability is how little help an agent needs to succeed** — and, more precisely, **how much +harder the site is for an agent than for a motivated human.** + +It is *not* discoverability. Forget `llms.txt`, sitemaps, token efficiency, and SEO/AEO — those +measure whether content can be *found and cited*. Browsability measures whether an agent can +*operate* the live site: perceive the controls, drive the DOM, and complete a real task. + +It is measured **operationally** — by running real agent tasks and reading harness + session +telemetry (which controls survived the accessibility tree, how many steps a flow took, which errors +fired, how much stealth/proxy assistance was needed) — not by linting static HTML. + +> **Scope note:** this rubric covers *UI operability* — driving a website in a browser. It is the +> sibling of, not a substitute for, auditing docs/SDK onboarding experience. + +## The key reframe: score the agent-vs-human delta, not absolute effort + +A 10-click checkout that also takes a human 10 clicks is *perfectly browsable* — that's just the +workflow. A 3-click task that takes the agent 10 because controls are unlabeled is *not browsable* — +those extra 7 clicks are the **agent tax**. + +Scoring the **delta over the human baseline** mathematically subtracts out UX/design length (which +costs humans and agents equally) and isolates exactly the agent-specific penalty. This resolves the +"is click-count a UX problem or a browsability problem?" question: only the *excess* over the human +path counts. + +Stagehand surfaces a piece of this directly — a native `` where possible | +| Cross-origin iframes in the flow | same-origin embed, or a direct route | +| Closed shadow DOM | open shadow roots, or expose semantic fallbacks | +| Deep / very large DOM | flatten nesting, paginate, reduce node count | +| High Access Resistance | reduce hostile bot-walls on agent-relevant flows | +| High agent tax | collapse the funnel; remove needless modal steps | +| (ceiling) UI-only | offer an API / deep-link / structured action path for agents | diff --git a/skills/browsability/scripts/friction.ts b/skills/browsability/scripts/friction.ts new file mode 100644 index 00000000..982e7251 --- /dev/null +++ b/skills/browsability/scripts/friction.ts @@ -0,0 +1,93 @@ +#!/usr/bin/env bun +/** + * Browsability — Drivability probe (deterministic, no model, no task run). + * + * bun scripts/friction.ts [--out ] + * + * Measures the parts of the rubric readable from a single page load, grounded in + * what browser-agent frameworks (e.g. open-source Stagehand) treat as hard + * (see references/rubric.md): + * B1 reachability — fraction of interactive controls that SURVIVE the accessibility-tree + * prune (kept iff: accessible name OR named children OR a non-structural role). + * C* tax proxy — native vs custom controls; a custom dropdown costs two steps (expand, + * then select) where a native `); +console.log(` B3 structural ${String(b3).padStart(2)}/15 depth ${p.maxDepth}, ${p.nodes} nodes, ${p.iframes} iframe(s) [${p.crossOriginIframes} x-origin], ${p.openShadowHosts} shadow host(s), ${p.customElements} custom el`); +console.log(` ${"─".repeat(52)}`); +console.log(` Drivability partial ${partial}/60 (Axis A ladder + real agent tax + recoverability need the agent run — scripts/score.ts)`); +if (traps.length) { console.log(`\n Traps detected:`); for (const t of traps) console.log(` • ${t}`); } +console.log(`\n → ${outDir}/friction.json\n`); diff --git a/skills/browsability/scripts/score.ts b/skills/browsability/scripts/score.ts new file mode 100644 index 00000000..a0c93172 --- /dev/null +++ b/skills/browsability/scripts/score.ts @@ -0,0 +1,97 @@ +#!/usr/bin/env bun +/** + * Browsability — composite scorer. Combines the deterministic Drivability probe + * (friction.json) with agent-run results (tasks.json) into the full rubric score. + * + * bun scripts/score.ts --friction /friction.json [--tasks tasks.json] [--out ] + * + * Rubric (see references/rubric.md), 0-100: + * A Access Resistance 30 — inverse of the lowest assistance rung that passes + * B1 Reachability 25 — from friction probe + * B3 Structural traps 15 — from friction probe + * C Agent tax 20 — agent steps OVER the human baseline (delta, not absolute) + * D Recoverability 10 — self-heal / site-errors / overlays / step-ceiling + * + * tasks.json shape: + * { "url": "...", + * "tasks": [ { "name": "...", "type": "...", "humanBaselineSteps": 3, + * "runs": [ {"rung":0,"success":false,"steps":12,"model":"...","note":"hCaptcha"}, + * {"rung":2,"success":true,"steps":5,"model":"...","note":""} ] } ] } + * rung index: 0=vanilla 1=default-assist 2=proxy+fingerprint 3=advanced-stealth 4=verified + */ + +import { readFileSync, writeFileSync, mkdirSync, existsSync } from "node:fs"; + +const args = process.argv.slice(2); +const flag = (n: string, d?: string) => { const i = args.indexOf(n); return i >= 0 && args[i + 1] ? args[i + 1] : d; }; +const frictionPath = flag("--friction", "browsability-out/friction.json")!; +const tasksPath = flag("--tasks"); +const outDir = flag("--out", "browsability-out")!; +const MAX_RUNG = 4; + +if (!existsSync(frictionPath)) { console.error(`no friction file at ${frictionPath} — run scripts/friction.ts first`); process.exit(1); } +const friction = JSON.parse(readFileSync(frictionPath, "utf8")); +const url = friction.url; +const b1 = friction.scores.b1; // /25 +const b3 = friction.scores.b3; // /15 +const taxProxy = friction.scores.taxProxy; // /20 fallback + +type Run = { rung: number; success: boolean; steps?: number; model?: string; note?: string }; +type Task = { name: string; type?: string; humanBaselineSteps?: number; runs?: Run[] }; + +let a = 0, c = taxProxy, d = 10, tasks: Task[] = [], haveRuns = false; +const med = (xs: number[]) => { const s = [...xs].sort((x, y) => x - y); const m = s.length >> 1; return s.length ? (s.length % 2 ? s[m] : Math.round((s[m - 1] + s[m]) / 2)) : 0; }; + +if (tasksPath && existsSync(tasksPath)) { + tasks = (JSON.parse(readFileSync(tasksPath, "utf8")).tasks ?? []) as Task[]; + const withRuns = tasks.filter((t) => t.runs && t.runs.length); + haveRuns = withRuns.length > 0; + if (haveRuns) { + // A — worst-case minimum passing rung across tasks + const rungs = withRuns.map((t) => { + const ok = (t.runs || []).filter((r) => r.success).map((r) => r.rung); + return ok.length ? Math.min(...ok) : MAX_RUNG + 1; // unsolved even at top + }); + const siteRung = Math.max(...rungs); + a = Math.round(30 * (1 - Math.min(siteRung, MAX_RUNG) / MAX_RUNG)); + + // C — agent tax = steps over human baseline on the cheapest passing run + const taxes: number[] = []; + for (const t of withRuns) { + const passing = (t.runs || []).filter((r) => r.success && typeof r.steps === "number"); + if (passing.length && typeof t.humanBaselineSteps === "number") { + const best = passing.reduce((a, b) => (a.rung <= b.rung ? a : b)); + taxes.push(Math.max(0, (best.steps as number) - t.humanBaselineSteps)); + } + } + if (taxes.length) c = Math.max(0, Math.round(20 * (1 - Math.min(med(taxes), 8) / 8))); + + // D — recoverability from run notes + const allRuns = withRuns.flatMap((t) => t.runs || []); + const deadEnds = allRuns.filter((r) => /captcha|modal|shadow|iframe|overlay|consent|cookie|timeout|stuck|self-?heal|blocked/i.test(r.note || "")).length; + d = Math.max(0, 10 - deadEnds * 2); + } +} + +const total = Math.round(a + b1 + b3 + c + d); +const grade = (t: number) => t >= 90 ? "A" : t >= 80 ? "B+" : t >= 70 ? "B" : t >= 60 ? "C+" : t >= 50 ? "C" : t >= 35 ? "D" : "F"; +const rungName = ["L0 vanilla", "L1 default-assist", "L2 proxy+fingerprint", "L3 advanced-stealth", "L4 verified"]; + +mkdirSync(outDir, { recursive: true }); +const report = { + url, scoredAt: new Date().toISOString(), total, grade: grade(total), + axes: { accessResistance: a, reachability_B1: b1, structural_B3: b3, agentTax_C: c, recoverability_D: d }, + driveabilityOnly: !haveRuns, tasks, +}; +writeFileSync(`${outDir}/browsability.json`, JSON.stringify(report, null, 2)); + +console.log(`\n Browsability — ${url}`); +console.log(` ${"─".repeat(50)}`); +console.log(` SCORE ${total}/100 GRADE ${grade(total)}${haveRuns ? "" : " (Drivability only — agent ladder not run)"}`); +console.log(` A Access Resistance ${String(a).padStart(2)}/30${haveRuns ? "" : " PENDING"}`); +console.log(` B1 Reachability ${String(b1).padStart(2)}/25`); +console.log(` B3 Structural traps ${String(b3).padStart(2)}/15`); +console.log(` C Agent tax ${String(c).padStart(2)}/20${haveRuns ? "" : " (proxy from probe)"}`); +console.log(` D Recoverability ${String(d).padStart(2)}/10${haveRuns ? "" : " PENDING"}`); +console.log(` ${"─".repeat(50)}`); +console.log(` → ${outDir}/browsability.json\n`); From 527382fe22aa03dac92f39c01c1348c9705ebb6b Mon Sep 17 00:00:00 2001 From: shubh24 Date: Mon, 1 Jun 2026 12:40:00 -0700 Subject: [PATCH 2/5] =?UTF-8?q?refactor(browsability):=20drop=20scoring/sc?= =?UTF-8?q?ripts=20=E2=80=94=20guidance=20+=20helps-hurts=20table?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Simplify per review: remove the numeric 0–100 score, weighted axes, and the friction.ts/score.ts probes. The skill is now pure guidance — the agent looks at the site with the `browser` skill, uses the rubric as a flexible checklist, and reports what helps vs what hurts (with fixes), letting the agent judge what matters for each site rather than applying hard-coded formulas. Co-Authored-By: Claude Opus 4.8 (1M context) --- README.md | 2 +- skills/browsability/.gitignore | 1 - skills/browsability/SKILL.md | 122 +++++----------- skills/browsability/references/rubric.md | 177 +++++++---------------- skills/browsability/scripts/friction.ts | 93 ------------ skills/browsability/scripts/score.ts | 97 ------------- 6 files changed, 95 insertions(+), 397 deletions(-) delete mode 100644 skills/browsability/.gitignore delete mode 100644 skills/browsability/scripts/friction.ts delete mode 100644 skills/browsability/scripts/score.ts diff --git a/README.md b/README.md index c9631b33..e3673fdd 100644 --- a/README.md +++ b/README.md @@ -19,7 +19,7 @@ This plugin includes the following skills (see `skills/` for details): | [fetch](skills/fetch/SKILL.md) | Fetch HTML or JSON from static pages without a browser session — inspect status codes, headers, follow redirects | | [search](skills/search/SKILL.md) | Search the web and return structured results (titles, URLs, metadata) without a browser session | | [ui-test](skills/ui-test/SKILL.md) | AI-powered adversarial UI testing — analyzes git diffs to test changes, or explores the full app to find bugs | -| [browsability](skills/browsability/SKILL.md) | Score how usable a website is by an AI browser agent — Access Resistance (how much stealth/proxy/captcha help is needed), Drivability (do controls survive the accessibility-tree prune, iframe/shadow-DOM traps), and Agent tax (steps over the human baseline); emits a graded report with concrete fixes | +| [browsability](skills/browsability/SKILL.md) | Assess how usable a website is by an AI browser agent — how much stealth/proxy/captcha help it needs to get in, whether controls are labeled/reachable, iframe/shadow-DOM traps, and extra steps vs a human; reports what helps and what hurts, with concrete fixes (no numeric score) | ## Installation diff --git a/skills/browsability/.gitignore b/skills/browsability/.gitignore deleted file mode 100644 index 4d5a7f03..00000000 --- a/skills/browsability/.gitignore +++ /dev/null @@ -1 +0,0 @@ -browsability-out/ diff --git a/skills/browsability/SKILL.md b/skills/browsability/SKILL.md index ece2a2a0..7db2e2a3 100644 --- a/skills/browsability/SKILL.md +++ b/skills/browsability/SKILL.md @@ -1,103 +1,61 @@ --- name: browsability -description: "Score how usable a website is BY AN AI BROWSER AGENT — its Browsability Index. Measures how little infrastructure assistance an agent needs to operate the site (Access Resistance), whether the agent can perceive and drive the live DOM (Drivability — does each control survive the accessibility-tree prune, are there iframe/shadow-DOM/deep-DOM traps), and how many more steps the agent needs than a human (Agent tax). Grounded in what the open-source Stagehand framework treats as hard. Use when the user asks how browsable / agent-friendly / agent-ready a website or a specific web flow (signup, checkout, search) is for a BROWSER agent, to compare sites on browser-agent usability, or to produce a browsability report card with concrete fixes. Triggers: 'how browsable is ', 'is this site agent-friendly for a browser agent', 'grade this checkout/signup flow for agents', 'browser-agent friendliness', 'DOM friction', 'browsability of '. NOT for SEO/AEO or content discoverability (a different layer), and NOT for docs/SDK onboarding DX (use the agent-experience skill for that)." +description: "Assess how usable a website is BY AN AI BROWSER AGENT — its browsability. Look at how little infrastructure help the agent needs to get in (stealth/proxy/captcha), whether it can perceive and drive the live DOM (are controls labeled and reachable, are there iframe/shadow-DOM/deep-DOM traps), and how many extra steps it takes versus a human. Report what helps and what hurts, with concrete fixes — no numeric score. Use when the user asks how browsable / agent-friendly / agent-ready a website or a specific web flow (signup, checkout, search) is for a BROWSER agent, to compare sites on browser-agent usability, or to get a browsability report with fixes. Triggers: 'how browsable is ', 'is this site agent-friendly for a browser agent', 'check this checkout/signup flow for agents', 'browser-agent friendliness', 'DOM friction', 'browsability of '. NOT for SEO/AEO or content discoverability (a different layer), and NOT for docs/SDK onboarding DX (use the agent-experience skill for that)." license: MIT metadata: author: browserbase - version: "0.1.0" -allowed-tools: Bash Read Write Edit Glob Grep Agent -compatibility: "Requires `bun` and the browse CLI (`npm install -g @browserbasehq/browse-cli`). Remote mode needs BROWSERBASE_API_KEY. The full agent-ladder pass additionally needs a model-driven reference agent (use the `browser` skill as the driver)." + version: "0.2.0" +allowed-tools: Read Bash Glob Grep Agent +compatibility: "Uses the browse CLI (`npm install -g @browserbasehq/browse-cli`) via the `browser` skill to look at and drive the site. Remote mode needs BROWSERBASE_API_KEY." --- # Browsability — how usable is a site for a browser agent? -Score how well an AI **browser** agent can *operate* a website. The opinion: *browsability is how -little help an agent needs to succeed, and how much harder the site is for an agent than for a human.* -This is the operability layer — not discoverability, so ignore `llms.txt`, sitemaps, SEO/AEO. +Judge how well an AI **browser** agent can *operate* a website. The idea is simple: -**Before scoring, read `references/rubric.md`** — the full code-grounded rubric (axes, signals, the -assistance ladder, the agent-vs-human delta, and remediation knowledge). The summary below is only the -operating procedure. +> **Browsability is how little help an agent needs to succeed — and how much harder the site is for +> an agent than for a person.** A 10-click checkout that takes a human 10 clicks too is fine; a +> 3-click task that takes the agent 10 because the buttons are unlabeled is not — those extra clicks +> are the agent's problem, not the workflow's. -## The score (0–100) +This is the *operability* layer — driving the live UI. It is **not** discoverability, so ignore +`llms.txt`, sitemaps, and SEO/AEO. It is also distinct from docs/SDK onboarding (that's the +`agent-experience` skill). -| Axis | Pts | Source | -|---|---|---| -| **A · Access Resistance** | 30 | lowest assistance rung that completes the task (agent ladder) | -| **B1 · Reachability** | 25 | % of controls that survive the accessibility-tree prune (deterministic probe) | -| **B3 · Structural traps** | 15 | cross-origin iframes, shadow DOM, DOM depth/size (deterministic probe) | -| **C · Agent tax** | 20 | agent steps OVER the human baseline (the delta — not absolute click count) | -| **D · Recoverability** | 10 | self-heal / site errors / blocking overlays / step ceiling (agent run) | +There is **no scoring formula here.** Look at the site with your own eyes (and the agent's), use the +checklist in `references/rubric.md` as a guide for what tends to matter, and decide what actually +matters for *this* site. Then report what helps and what hurts. -Score only counts for tasks a verifier confirms actually completed. **Agent-native affordance** (an -API / deep-link / structured action path) is a *ceiling badge*, not a scored component — flag it, do -not add it to the number; this rubric measures operability of the UI. +## How to assess -## Workflow +1. **Actually try to use the site** with the `browser` skill. Open it, take a `browse snapshot` + (the accessibility tree — this is what an agent "sees"), and attempt a real task the site is for: + find the pricing, create an account, add to cart, submit the contact form. Notice where it's easy + and where you get stuck. -### Step 1 — Drivability probe (always; deterministic, no model) +2. **Notice how much help it took to get in.** If a vanilla session sails through, great — that's + maximally browsable. If you needed stealth, a proxy, or captcha-solving just to load or act, that + counts against the site. (`references/rubric.md` describes this assistance ladder.) Remember + `solveCaptchas` is **on by default** — if you want to know whether a site is hostile at the front + door, try it with captcha-solving off first. -Run the probe on the target URL (a page, or the entry point of a flow): +3. **Watch for the things that trip up browser agents** as you go — read `references/rubric.md` for + the full checklist, but in short: unlabeled / `
`-as-button controls, custom dropdowns, + iframes (especially cross-origin), shadow DOM, very deep or huge DOMs, blocking cookie/consent + walls, and flows that take the agent more steps than a person. -```bash -cd skills/browsability -bun scripts/friction.ts --out browsability-out -``` +Use judgment over completeness — surface the few things that genuinely make or break this site for an +agent, not an exhaustive audit. -This loads the page through the browse CLI and reports **B1 reachability** + **B3 structural traps** -straight from the live DOM (40 of 100 points). It needs no model and finishes in seconds. Use remote -mode (`browse env remote`, needs `BROWSERBASE_API_KEY`) for bot-protected sites; local is fine -otherwise. This alone is a useful friction profile and is the right answer for a quick assessment. +## How to report -### Step 2 — Agent ladder + tasks (for the full score) +A simple **Helps / Hurts** table. Each "Hurts" row names the concrete fix. Cite what you observed. -Derive a small set of **canonical tasks** for the site (informational / navigational / transactional — -e.g. "find the price of the paid plan", "create an account", "submit the contact form"). For each -task, run a reference browser agent across the **Access Resistance ladder** and record results: +| ✅ Helps browsability | ⚠️ Hurts browsability | +|---|---| +| Native `