From 9c30e8053a91fcc61244aee681736b2d36cf75f8 Mon Sep 17 00:00:00 2001
From: shubh24 <shubhankar24@gmail.com>
Date: Fri, 29 May 2026 18:54:43 -0700
Subject: [PATCH 1/5] feat(browsability): score how usable a site is for a
 browser agent
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Adds the `browsability` skill — an operational rubric for how well an AI
*browser* agent can drive a website's UI (the sibling of agent-experience,
which covers docs/SDK onboarding DX).

Scores 0–100 across:
  A  Access Resistance — lowest assistance rung (stealth/proxy/captcha ladder)
                          a task needs to complete
  B1 Reachability      — % of controls that survive the accessibility-tree prune
  B3 Structural traps  — cross-origin iframes, shadow DOM, DOM depth/size
  C  Agent tax         — agent steps OVER the human baseline (delta, not absolute)
  D  Recoverability    — self-heal / site errors / blocking overlays / step ceiling

The Drivability slice (B1+B3) runs deterministically from one page load via
scripts/friction.ts (no model). The full score adds an agent run across the
assistance ladder via scripts/score.ts. Rubric grounded in what the open-source
Stagehand framework treats as hard; uses only public Browserbase session settings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 README.md                                |   1 +
 skills/browsability/.gitignore           |   1 +
 skills/browsability/LICENSE.txt          |  21 ++++
 skills/browsability/SKILL.md             | 103 +++++++++++++++
 skills/browsability/references/rubric.md | 153 +++++++++++++++++++++++
 skills/browsability/scripts/friction.ts  |  93 ++++++++++++++
 skills/browsability/scripts/score.ts     |  97 ++++++++++++++
 7 files changed, 469 insertions(+)
 create mode 100644 skills/browsability/.gitignore
 create mode 100644 skills/browsability/LICENSE.txt
 create mode 100644 skills/browsability/SKILL.md
 create mode 100644 skills/browsability/references/rubric.md
 create mode 100644 skills/browsability/scripts/friction.ts
 create mode 100644 skills/browsability/scripts/score.ts

diff --git a/README.md b/README.md
index 03d44f26..c9631b33 100644
--- a/README.md
+++ b/README.md
@@ -19,6 +19,7 @@ This plugin includes the following skills (see `skills/` for details):
 | [fetch](skills/fetch/SKILL.md) | Fetch HTML or JSON from static pages without a browser session — inspect status codes, headers, follow redirects |
 | [search](skills/search/SKILL.md) | Search the web and return structured results (titles, URLs, metadata) without a browser session |
 | [ui-test](skills/ui-test/SKILL.md) | AI-powered adversarial UI testing — analyzes git diffs to test changes, or explores the full app to find bugs |
+| [browsability](skills/browsability/SKILL.md) | Score how usable a website is by an AI browser agent — Access Resistance (how much stealth/proxy/captcha help is needed), Drivability (do controls survive the accessibility-tree prune, iframe/shadow-DOM traps), and Agent tax (steps over the human baseline); emits a graded report with concrete fixes |
 
 ## Installation
 
diff --git a/skills/browsability/.gitignore b/skills/browsability/.gitignore
new file mode 100644
index 00000000..4d5a7f03
--- /dev/null
+++ b/skills/browsability/.gitignore
@@ -0,0 +1 @@
+browsability-out/
diff --git a/skills/browsability/LICENSE.txt b/skills/browsability/LICENSE.txt
new file mode 100644
index 00000000..f2f43974
--- /dev/null
+++ b/skills/browsability/LICENSE.txt
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2026 Browserbase, Inc.
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/skills/browsability/SKILL.md b/skills/browsability/SKILL.md
new file mode 100644
index 00000000..ece2a2a0
--- /dev/null
+++ b/skills/browsability/SKILL.md
@@ -0,0 +1,103 @@
+---
+name: browsability
+description: "Score how usable a website is BY AN AI BROWSER AGENT — its Browsability Index. Measures how little infrastructure assistance an agent needs to operate the site (Access Resistance), whether the agent can perceive and drive the live DOM (Drivability — does each control survive the accessibility-tree prune, are there iframe/shadow-DOM/deep-DOM traps), and how many more steps the agent needs than a human (Agent tax). Grounded in what the open-source Stagehand framework treats as hard. Use when the user asks how browsable / agent-friendly / agent-ready a website or a specific web flow (signup, checkout, search) is for a BROWSER agent, to compare sites on browser-agent usability, or to produce a browsability report card with concrete fixes. Triggers: 'how browsable is <site>', 'is this site agent-friendly for a browser agent', 'grade this checkout/signup flow for agents', 'browser-agent friendliness', 'DOM friction', 'browsability of <url>'. NOT for SEO/AEO or content discoverability (a different layer), and NOT for docs/SDK onboarding DX (use the agent-experience skill for that)."
+license: MIT
+metadata:
+  author: browserbase
+  version: "0.1.0"
+allowed-tools: Bash Read Write Edit Glob Grep Agent
+compatibility: "Requires `bun` and the browse CLI (`npm install -g @browserbasehq/browse-cli`). Remote mode needs BROWSERBASE_API_KEY. The full agent-ladder pass additionally needs a model-driven reference agent (use the `browser` skill as the driver)."
+---
+
+# Browsability — how usable is a site for a browser agent?
+
+Score how well an AI **browser** agent can *operate* a website. The opinion: *browsability is how
+little help an agent needs to succeed, and how much harder the site is for an agent than for a human.*
+This is the operability layer — not discoverability, so ignore `llms.txt`, sitemaps, SEO/AEO.
+
+**Before scoring, read `references/rubric.md`** — the full code-grounded rubric (axes, signals, the
+assistance ladder, the agent-vs-human delta, and remediation knowledge). The summary below is only the
+operating procedure.
+
+## The score (0–100)
+
+| Axis | Pts | Source |
+|---|---|---|
+| **A · Access Resistance** | 30 | lowest assistance rung that completes the task (agent ladder) |
+| **B1 · Reachability** | 25 | % of controls that survive the accessibility-tree prune (deterministic probe) |
+| **B3 · Structural traps** | 15 | cross-origin iframes, shadow DOM, DOM depth/size (deterministic probe) |
+| **C · Agent tax** | 20 | agent steps OVER the human baseline (the delta — not absolute click count) |
+| **D · Recoverability** | 10 | self-heal / site errors / blocking overlays / step ceiling (agent run) |
+
+Score only counts for tasks a verifier confirms actually completed. **Agent-native affordance** (an
+API / deep-link / structured action path) is a *ceiling badge*, not a scored component — flag it, do
+not add it to the number; this rubric measures operability of the UI.
+
+## Workflow
+
+### Step 1 — Drivability probe (always; deterministic, no model)
+
+Run the probe on the target URL (a page, or the entry point of a flow):
+
+```bash
+cd skills/browsability
+bun scripts/friction.ts <url> --out browsability-out
+```
+
+This loads the page through the browse CLI and reports **B1 reachability** + **B3 structural traps**
+straight from the live DOM (40 of 100 points). It needs no model and finishes in seconds. Use remote
+mode (`browse env remote`, needs `BROWSERBASE_API_KEY`) for bot-protected sites; local is fine
+otherwise. This alone is a useful friction profile and is the right answer for a quick assessment.
+
+### Step 2 — Agent ladder + tasks (for the full score)
+
+Derive a small set of **canonical tasks** for the site (informational / navigational / transactional —
+e.g. "find the price of the paid plan", "create an account", "submit the contact form"). For each
+task, run a reference browser agent across the **Access Resistance ladder** and record results:
+
+- **rung 0** vanilla headless — captcha-solving **off** (`solveCaptchas:false`), no proxy, no fingerprint
+- **rung 1** default assist — captcha-solving on
+- **rung 2** proxy + realistic fingerprint
+- **rung 3** advanced stealth + persisted context
+- **rung 4** maximum assistance
+
+Stop climbing once a task succeeds; the lowest passing rung is its Access Resistance. Drive the agent
+with the `browser` skill (the browse CLI) or Stagehand, and judge each run's `success` with a verifier
+— do not trust the agent's self-report. Capture **real step counts** and a **`humanBaselineSteps`**
+estimate per task so Agent tax is computed as the delta. Record into `tasks.json`:
+
+```json
+{ "url": "https://example.com",
+  "tasks": [
+    { "name": "Create an account", "type": "transactional", "humanBaselineSteps": 4,
+      "runs": [ {"rung":0,"success":false,"steps":10,"model":"<model>","note":"signup CTA unlabeled"},
+                {"rung":2,"success":true,"steps":7,"model":"<model>","note":""} ] } ] }
+```
+
+If no model-driven agent is available, act as the reference agent using the `browser` skill: execute
+each task's browse steps, count the steps, and write the runs into `tasks.json` honestly (mark
+single-model). This produces a real, if single-model, result.
+
+### Step 3 — Composite score + report
+
+```bash
+bun scripts/score.ts --friction browsability-out/friction.json --tasks tasks.json --out browsability-out
+```
+
+Writes `browsability-out/browsability.json` with the 0–100 score, grade, and per-axis breakdown. When
+`tasks.json` is absent it reports a **Drivability-only** score (B1 + B3, 40 max) and marks A/C/D
+pending — still honest, just partial.
+
+### Step 4 — Report to the user
+
+Present a **profile, not just a number**: the grade, the per-axis breakdown, the lowest passing rung,
+and — most usefully — a **ranked remediation list** drawn from the rubric's remediation table (e.g.
+"signup CTA has no accessible name → add `aria-label`; estimated lift +X"). Cite the concrete signal
+each finding came from.
+
+## Notes & gotchas
+
+- `solveCaptchas` defaults to **on** in Browserbase — an honest rung-0 must explicitly disable it, or rungs 0 and 1 collapse and captcha-walled sites get over-credited.
+- The deterministic probe approximates "closed shadow DOM" via custom-element count with zero open shadow hosts; treat it as a hint and confirm during the agent run.
+- Keep the human baseline honest — Agent tax is the *delta*, so a genuinely long workflow (10 steps for humans too) must not be penalized as un-browsable.
+- The scripts call `browse stop` on exit; if a daemon hangs, `pkill -f "browse.*daemon"`.
diff --git a/skills/browsability/references/rubric.md b/skills/browsability/references/rubric.md
new file mode 100644
index 00000000..1fb76d85
--- /dev/null
+++ b/skills/browsability/references/rubric.md
@@ -0,0 +1,153 @@
+# The Browsability Rubric
+
+A code-grounded, operational definition of how usable a website is **by an AI browser agent** —
+and how to score it. Grounded in what the open-source [Stagehand](https://github.com/browserbase/stagehand)
+browser-automation framework actually treats as hard, plus the public Browserbase session settings.
+
+## The opinion, in one line
+
+**Browsability is how little help an agent needs to succeed** — and, more precisely, **how much
+harder the site is for an agent than for a motivated human.**
+
+It is *not* discoverability. Forget `llms.txt`, sitemaps, token efficiency, and SEO/AEO — those
+measure whether content can be *found and cited*. Browsability measures whether an agent can
+*operate* the live site: perceive the controls, drive the DOM, and complete a real task.
+
+It is measured **operationally** — by running real agent tasks and reading harness + session
+telemetry (which controls survived the accessibility tree, how many steps a flow took, which errors
+fired, how much stealth/proxy assistance was needed) — not by linting static HTML.
+
+> **Scope note:** this rubric covers *UI operability* — driving a website in a browser. It is the
+> sibling of, not a substitute for, auditing docs/SDK onboarding experience.
+
+## The key reframe: score the agent-vs-human delta, not absolute effort
+
+A 10-click checkout that also takes a human 10 clicks is *perfectly browsable* — that's just the
+workflow. A 3-click task that takes the agent 10 because controls are unlabeled is *not browsable* —
+those extra 7 clicks are the **agent tax**.
+
+Scoring the **delta over the human baseline** mathematically subtracts out UX/design length (which
+costs humans and agents equally) and isolates exactly the agent-specific penalty. This resolves the
+"is click-count a UX problem or a browsability problem?" question: only the *excess* over the human
+path counts.
+
+Stagehand surfaces a piece of this directly — a native `<select>` is a **one-step** action; a custom
+dropdown must be clicked open, re-snapshotted, then selected — a **two-step** action. That second
+step *is* agent tax: incidental inflation, not essential workflow.
+
+## The scored axes (+ one ceiling badge)
+
+| Axis | What it measures | Weight | In score? |
+|---|---|---|---|
+| **A · Access Resistance** | How much infrastructure assistance the agent needs to operate at all (the ladder) | 30 | ✅ |
+| **B1 · Reachability** | Can the agent perceive the controls (survive the accessibility-tree prune) | 25 | ✅ |
+| **B3 · Structural traps** | iframes, shadow DOM, DOM depth/size | 15 | ✅ |
+| **C · Agent tax** | Steps *above the human baseline* (incidental inflation only) | 20 | ✅ |
+| **D · Recoverability** | What happens when something breaks (self-heal, site errors, blocking overlays, step ceiling) | 10 | ✅ |
+| — Essential path length | Inherent workflow steps (humans pay too) | — | ❌ separate "Agent UX" lens |
+| — Agent-native affordance | An API / deep-link / structured action path exists | — | ⭐ ceiling badge, not scored |
+
+Agent-native affordance (offering a non-UI path so an agent need not drive the browser at all) is
+noted as the *ceiling*, not a scored component — this rubric deliberately measures **operability of
+the UI**, the realistic last mile for the large share of the web that is UI-only.
+
+Gate everything on a **success verdict** per task (a verifier, not the agent's self-report): friction
+and tax scores only count for tasks confirmed to have actually completed.
+
+---
+
+## Axis A — Access Resistance (the assistance ladder)
+
+Browserbase exposes public session settings, each mitigating a specific site-side obstacle. Re-run
+the *same task* climbing the ladder; the **lowest rung at which it succeeds** is the site's Access
+Resistance. Lower = more browsable.
+
+| Public setting | Mitigates |
+|---|---|
+| `solveCaptchas` | CAPTCHA challenges |
+| `proxies` | IP blocks, rate limits, geo-gating (residential / geo-targeted) |
+| `fingerprint` | headless-browser fingerprint detection |
+| `advancedStealth` | advanced anti-bot detection |
+| `context` (persist) | re-auth / re-consent walls; session continuity |
+
+The ladder to re-run a task across:
+
+- **L0 Vanilla headless** — captcha-solving **off**, no proxy, no fingerprint, fresh context. The agent looks like raw headless Chrome. *Passing here = maximally browsable.*
+- **L1 Default assist** — captcha-solving on, still no proxy/fingerprint.
+- **L2 Proxied + realistic fingerprint** — geo proxy + a realistic desktop fingerprint.
+- **L3 Advanced stealth + persisted context** — advanced anti-bot mitigation on; cookies persisted.
+- **L4 Maximum assistance** — top-tier anti-bot mitigation. *Needing this rung = barely browsable.*
+
+> **Gotcha:** `solveCaptchas` defaults to **on** in Browserbase, so an honest rung-0 baseline must
+> explicitly turn it off — otherwise L0 and L1 collapse and captcha-walled sites get over-credited.
+
+**Score:** `A = 30 * (1 - minPassingRung / 4)`.
+
+---
+
+## Axis B — Drivability (per-step technical difficulty)
+
+### B1 · Element reachability — can the agent even *see* the control?
+
+Stagehand builds an accessibility tree and **prunes any node that lacks all three of**: an accessible
+name, named children, or a non-structural role. An unlabeled `<div role="generic">` button is removed
+*before the model ever sees it.* The survival rule, from the open-source accessibility snapshot:
+
+```js
+// keep a node iff:
+const keep = !!(name && name.trim())        // it has an accessible name, OR
+          || !!(childIds && childIds.length) // it has named children, OR
+          || !isStructural(role);            // it has a real role (not generic/none/inlinetextbox)
+```
+
+- **Signal:** reachable-control ratio = interactive controls that survive the prune ÷ all interactive controls.
+- **Penalize:** icon-only buttons with no `aria-label`; `<div onclick>` controls; inputs with no associated `<label>`; closed-shadow custom components.
+- **Reward:** native semantic elements (`button`, `a[href]`, `input`, `select`) with text/labels — they always survive.
+
+### B3 · Structural traps — the hard walls
+
+| Trap | Why it hurts an agent |
+|---|---|
+| Closed shadow DOM | roots closed before instrumentation are effectively invisible |
+| Cross-origin iframes | short-lived, separately-managed frames that can drop out mid-operation |
+| Deep DOM (>256 levels) | serialization stack limits force shallower, slower retries |
+| Never-settling network | streaming / sub-second polling never reaches "network idle" → timeout every step |
+| Virtualized lists | no automatic "scroll until found"; an observe→scroll→observe loop is required |
+| Very large DOM | the serialized tree is truncated; elements past the cap become invisible |
+
+---
+
+## Axis C — Agent tax (steps over the human baseline)
+
+For each verifier-confirmed task: `agentTax = agentSteps - humanBaselineSteps`. Where a human baseline
+is unavailable, approximate the incidental inflation from the **two-step ratio** (custom controls the
+framework must expand-then-act on) plus needless modal steps. Only the *excess* counts; essential
+workflow length is reported separately as "Agent UX," not scored as browsability.
+
+---
+
+## Axis D — Recoverability — what happens when something breaks
+
+Stagehand's error taxonomy cleanly separates *site-caused* friction from agent-caused, and its
+self-heal path is the tell: on a stale selector (the DOM mutated under the agent) it re-snapshots and
+re-asks the model once. Frequent self-heal = an unstable, hostile DOM.
+
+- **Site-caused errors (penalize):** element-not-visible, selector-resolution failures, element-not-found, captcha timeouts, navigation timeouts.
+- **Blocking overlays (penalize):** cookie/consent walls, login walls, paywalls — not auto-dismissed; they eat steps or wall the flow entirely.
+- **Max-steps blowout:** agent loops have a default step budget; tasks that exhaust it score as failures.
+- **Signal:** self-heal count, site-caused-error count, overlay-encountered flag, whether the run hit the step ceiling.
+
+---
+
+## Remediation knowledge (turn findings into fixes)
+
+| Finding | Fix |
+|---|---|
+| Low reachable-ratio | add `aria-label` to icon-only controls; use semantic `<button>` / `<a>` |
+| Many custom dropdowns | use native `<select>` where possible |
+| Cross-origin iframes in the flow | same-origin embed, or a direct route |
+| Closed shadow DOM | open shadow roots, or expose semantic fallbacks |
+| Deep / very large DOM | flatten nesting, paginate, reduce node count |
+| High Access Resistance | reduce hostile bot-walls on agent-relevant flows |
+| High agent tax | collapse the funnel; remove needless modal steps |
+| (ceiling) UI-only | offer an API / deep-link / structured action path for agents |
diff --git a/skills/browsability/scripts/friction.ts b/skills/browsability/scripts/friction.ts
new file mode 100644
index 00000000..982e7251
--- /dev/null
+++ b/skills/browsability/scripts/friction.ts
@@ -0,0 +1,93 @@
+#!/usr/bin/env bun
+/**
+ * Browsability — Drivability probe (deterministic, no model, no task run).
+ *
+ *   bun scripts/friction.ts <url> [--out <dir>]
+ *
+ * Measures the parts of the rubric readable from a single page load, grounded in
+ * what browser-agent frameworks (e.g. open-source Stagehand) treat as hard
+ * (see references/rubric.md):
+ *   B1 reachability  — fraction of interactive controls that SURVIVE the accessibility-tree
+ *                      prune (kept iff: accessible name OR named children OR a non-structural role).
+ *   C* tax proxy     — native vs custom controls; a custom dropdown costs two steps (expand,
+ *                      then select) where a native <select> costs one.
+ *   B3 traps         — cross-origin iframes, shadow hosts, DOM depth (>256), DOM size.
+ *
+ * Full score also needs the agent run (Axis A ladder + real C + D) — see scripts/score.ts.
+ * Requires the `browse` CLI (npm i -g @browserbasehq/browse-cli). Remote mode needs BROWSERBASE_API_KEY.
+ */
+
+import { execFileSync } from "node:child_process";
+import { writeFileSync, mkdirSync } from "node:fs";
+
+const args = process.argv.slice(2);
+const url = args.find((a) => !a.startsWith("--"));
+if (!url) { console.error("usage: bun scripts/friction.ts <url> [--out <dir>]"); process.exit(1); }
+const outDir = (() => { const i = args.indexOf("--out"); return i >= 0 && args[i + 1] ? args[i + 1] : "browsability-out"; })();
+
+const PROBE = `(()=>{
+  const SEL='a[href],button,input,select,textarea,[role=button],[role=link],[role=checkbox],[role=menuitem],[role=tab],[role=combobox],[onclick],[contenteditable=""],[contenteditable=true],[tabindex]';
+  const ix=[...document.querySelectorAll(SEL)];
+  const nativeTag=e=>{const t=e.tagName;return t==='BUTTON'||t==='SELECT'||t==='TEXTAREA'||t==='INPUT'||(t==='A'&&!!e.getAttribute('href'));};
+  const hasName=e=>{
+    const a=((e.getAttribute&&(e.getAttribute('aria-label')||e.getAttribute('title')||e.getAttribute('alt')||e.getAttribute('placeholder')))||'').trim();
+    if(a)return true;
+    if(e.getAttribute&&e.getAttribute('aria-labelledby'))return true;
+    if(e.labels&&e.labels.length)return true;
+    if((e.textContent||'').trim())return true;
+    return false;
+  };
+  const reachable=ix.filter(e=>hasName(e)||nativeTag(e));
+  const custom=ix.filter(e=>{const t=e.tagName;const r=(e.getAttribute&&e.getAttribute('role'))||'';const click=e.hasAttribute&&e.hasAttribute('onclick');return (r==='button'||r==='link'||click)&&!(t==='BUTTON'||t==='A');});
+  const customDropdowns=[...document.querySelectorAll('[role=combobox]:not(select),[role=listbox],[aria-haspopup]')].length;
+  const nativeSelects=document.querySelectorAll('select').length;
+  const frames=[...document.querySelectorAll('iframe')];
+  let xorigin=0; frames.forEach(f=>{try{const u=new URL(f.src,location.href);if(u.origin!==location.origin)xorigin++;}catch(e){}});
+  let openShadow=0; document.querySelectorAll('*').forEach(e=>{if(e.shadowRoot)openShadow++;});
+  const customElements=[...document.querySelectorAll('*')].filter(e=>e.tagName.includes('-')).length;
+  let maxDepth=0,nodes=0;
+  (function walk(n,d){nodes++;if(d>maxDepth)maxDepth=d;let c=n.firstElementChild;while(c){walk(c,d+1);c=c.nextElementSibling;}})(document.documentElement,1);
+  return JSON.stringify({interactive:ix.length,reachable:reachable.length,custom:custom.length,customDropdowns,nativeSelects,iframes:frames.length,crossOriginIframes:xorigin,openShadowHosts:openShadow,customElements,maxDepth,nodes,textLen:(document.body.innerText||'').length});
+})()`;
+
+function probe(target: string): any {
+  try {
+    execFileSync("browse", ["open", target], { stdio: "ignore", timeout: 60000 });
+    const out = execFileSync("browse", ["eval", PROBE], { encoding: "utf8", timeout: 30000 });
+    const env = out.match(/\{\s*"result"\s*:\s*"((?:\\.|[^"\\])*)"\s*\}/);
+    if (!env) throw new Error("browse eval returned no result");
+    return JSON.parse(JSON.parse(`"${env[1]}"`));
+  } finally {
+    try { execFileSync("browse", ["stop"], { stdio: "ignore", timeout: 15000 }); } catch {}
+  }
+}
+
+const p = probe(url);
+const reachRatio = p.interactive ? p.reachable / p.interactive : 1;
+const customRatio = p.interactive ? p.custom / p.interactive : 0;
+
+const b1 = Math.round(reachRatio * 25);
+const taxProxy = Math.max(0, Math.round(20 * (1 - Math.min(customRatio * 1.5, 1)) - Math.min(p.customDropdowns, 5)));
+const traps: string[] = [];
+let b3 = 15;
+if (p.crossOriginIframes > 0) { b3 -= 6; traps.push(`${p.crossOriginIframes} cross-origin iframe(s) — out-of-process frames are fragile for agents`); }
+if (p.maxDepth > 256) { b3 -= 5; traps.push(`DOM depth ${p.maxDepth} > 256 — deep trees force slower, shallower snapshots`); }
+else if (p.maxDepth > 128) { b3 -= 2; traps.push(`DOM depth ${p.maxDepth} — approaching the deep-tree cliff`); }
+if (p.nodes > 8000) { b3 -= 3; traps.push(`${p.nodes} DOM nodes — large tree risks accessibility-snapshot truncation`); }
+if (p.customElements > 5 && p.openShadowHosts === 0) { b3 -= 2; traps.push(`${p.customElements} custom elements, 0 open shadow hosts — possible closed shadow DOM (opaque to agents)`); }
+b3 = Math.max(0, b3);
+const partial = b1 + taxProxy + b3;
+
+mkdirSync(outDir, { recursive: true });
+writeFileSync(`${outDir}/friction.json`, JSON.stringify({ url, scannedAt: new Date().toISOString(), raw: p, scores: { b1, taxProxy, b3, partial, of: 60 }, traps }, null, 2));
+
+const pct = (n: number) => `${(n * 100) | 0}%`;
+console.log(`\n  Drivability probe — ${url}`);
+console.log(`  ${"─".repeat(52)}`);
+console.log(`  B1 reachability    ${String(b1).padStart(2)}/25   ${p.reachable}/${p.interactive} controls survive the a11y prune (${pct(reachRatio)})`);
+console.log(`  C  tax proxy       ${String(taxProxy).padStart(2)}/20   ${p.custom} custom controls (${pct(customRatio)}), ${p.customDropdowns} custom dropdown(s), ${p.nativeSelects} native <select>`);
+console.log(`  B3 structural      ${String(b3).padStart(2)}/15   depth ${p.maxDepth}, ${p.nodes} nodes, ${p.iframes} iframe(s) [${p.crossOriginIframes} x-origin], ${p.openShadowHosts} shadow host(s), ${p.customElements} custom el`);
+console.log(`  ${"─".repeat(52)}`);
+console.log(`  Drivability partial ${partial}/60   (Axis A ladder + real agent tax + recoverability need the agent run — scripts/score.ts)`);
+if (traps.length) { console.log(`\n  Traps detected:`); for (const t of traps) console.log(`   • ${t}`); }
+console.log(`\n  → ${outDir}/friction.json\n`);
diff --git a/skills/browsability/scripts/score.ts b/skills/browsability/scripts/score.ts
new file mode 100644
index 00000000..a0c93172
--- /dev/null
+++ b/skills/browsability/scripts/score.ts
@@ -0,0 +1,97 @@
+#!/usr/bin/env bun
+/**
+ * Browsability — composite scorer. Combines the deterministic Drivability probe
+ * (friction.json) with agent-run results (tasks.json) into the full rubric score.
+ *
+ *   bun scripts/score.ts --friction <dir>/friction.json [--tasks tasks.json] [--out <dir>]
+ *
+ * Rubric (see references/rubric.md), 0-100:
+ *   A  Access Resistance  30  — inverse of the lowest assistance rung that passes
+ *   B1 Reachability       25  — from friction probe
+ *   B3 Structural traps   15  — from friction probe
+ *   C  Agent tax          20  — agent steps OVER the human baseline (delta, not absolute)
+ *   D  Recoverability      10  — self-heal / site-errors / overlays / step-ceiling
+ *
+ * tasks.json shape:
+ *   { "url": "...",
+ *     "tasks": [ { "name": "...", "type": "...", "humanBaselineSteps": 3,
+ *                  "runs": [ {"rung":0,"success":false,"steps":12,"model":"...","note":"hCaptcha"},
+ *                            {"rung":2,"success":true,"steps":5,"model":"...","note":""} ] } ] }
+ * rung index: 0=vanilla 1=default-assist 2=proxy+fingerprint 3=advanced-stealth 4=verified
+ */
+
+import { readFileSync, writeFileSync, mkdirSync, existsSync } from "node:fs";
+
+const args = process.argv.slice(2);
+const flag = (n: string, d?: string) => { const i = args.indexOf(n); return i >= 0 && args[i + 1] ? args[i + 1] : d; };
+const frictionPath = flag("--friction", "browsability-out/friction.json")!;
+const tasksPath = flag("--tasks");
+const outDir = flag("--out", "browsability-out")!;
+const MAX_RUNG = 4;
+
+if (!existsSync(frictionPath)) { console.error(`no friction file at ${frictionPath} — run scripts/friction.ts first`); process.exit(1); }
+const friction = JSON.parse(readFileSync(frictionPath, "utf8"));
+const url = friction.url;
+const b1 = friction.scores.b1;          // /25
+const b3 = friction.scores.b3;          // /15
+const taxProxy = friction.scores.taxProxy; // /20 fallback
+
+type Run = { rung: number; success: boolean; steps?: number; model?: string; note?: string };
+type Task = { name: string; type?: string; humanBaselineSteps?: number; runs?: Run[] };
+
+let a = 0, c = taxProxy, d = 10, tasks: Task[] = [], haveRuns = false;
+const med = (xs: number[]) => { const s = [...xs].sort((x, y) => x - y); const m = s.length >> 1; return s.length ? (s.length % 2 ? s[m] : Math.round((s[m - 1] + s[m]) / 2)) : 0; };
+
+if (tasksPath && existsSync(tasksPath)) {
+  tasks = (JSON.parse(readFileSync(tasksPath, "utf8")).tasks ?? []) as Task[];
+  const withRuns = tasks.filter((t) => t.runs && t.runs.length);
+  haveRuns = withRuns.length > 0;
+  if (haveRuns) {
+    // A — worst-case minimum passing rung across tasks
+    const rungs = withRuns.map((t) => {
+      const ok = (t.runs || []).filter((r) => r.success).map((r) => r.rung);
+      return ok.length ? Math.min(...ok) : MAX_RUNG + 1; // unsolved even at top
+    });
+    const siteRung = Math.max(...rungs);
+    a = Math.round(30 * (1 - Math.min(siteRung, MAX_RUNG) / MAX_RUNG));
+
+    // C — agent tax = steps over human baseline on the cheapest passing run
+    const taxes: number[] = [];
+    for (const t of withRuns) {
+      const passing = (t.runs || []).filter((r) => r.success && typeof r.steps === "number");
+      if (passing.length && typeof t.humanBaselineSteps === "number") {
+        const best = passing.reduce((a, b) => (a.rung <= b.rung ? a : b));
+        taxes.push(Math.max(0, (best.steps as number) - t.humanBaselineSteps));
+      }
+    }
+    if (taxes.length) c = Math.max(0, Math.round(20 * (1 - Math.min(med(taxes), 8) / 8)));
+
+    // D — recoverability from run notes
+    const allRuns = withRuns.flatMap((t) => t.runs || []);
+    const deadEnds = allRuns.filter((r) => /captcha|modal|shadow|iframe|overlay|consent|cookie|timeout|stuck|self-?heal|blocked/i.test(r.note || "")).length;
+    d = Math.max(0, 10 - deadEnds * 2);
+  }
+}
+
+const total = Math.round(a + b1 + b3 + c + d);
+const grade = (t: number) => t >= 90 ? "A" : t >= 80 ? "B+" : t >= 70 ? "B" : t >= 60 ? "C+" : t >= 50 ? "C" : t >= 35 ? "D" : "F";
+const rungName = ["L0 vanilla", "L1 default-assist", "L2 proxy+fingerprint", "L3 advanced-stealth", "L4 verified"];
+
+mkdirSync(outDir, { recursive: true });
+const report = {
+  url, scoredAt: new Date().toISOString(), total, grade: grade(total),
+  axes: { accessResistance: a, reachability_B1: b1, structural_B3: b3, agentTax_C: c, recoverability_D: d },
+  driveabilityOnly: !haveRuns, tasks,
+};
+writeFileSync(`${outDir}/browsability.json`, JSON.stringify(report, null, 2));
+
+console.log(`\n  Browsability — ${url}`);
+console.log(`  ${"─".repeat(50)}`);
+console.log(`  SCORE  ${total}/100   GRADE ${grade(total)}${haveRuns ? "" : "   (Drivability only — agent ladder not run)"}`);
+console.log(`  A  Access Resistance  ${String(a).padStart(2)}/30${haveRuns ? "" : "   PENDING"}`);
+console.log(`  B1 Reachability       ${String(b1).padStart(2)}/25`);
+console.log(`  B3 Structural traps   ${String(b3).padStart(2)}/15`);
+console.log(`  C  Agent tax          ${String(c).padStart(2)}/20${haveRuns ? "" : "   (proxy from probe)"}`);
+console.log(`  D  Recoverability     ${String(d).padStart(2)}/10${haveRuns ? "" : "   PENDING"}`);
+console.log(`  ${"─".repeat(50)}`);
+console.log(`  → ${outDir}/browsability.json\n`);

From 527382fe22aa03dac92f39c01c1348c9705ebb6b Mon Sep 17 00:00:00 2001
From: shubh24 <shubhankar24@gmail.com>
Date: Mon, 1 Jun 2026 12:40:00 -0700
Subject: [PATCH 2/5] =?UTF-8?q?refactor(browsability):=20drop=20scoring/sc?=
 =?UTF-8?q?ripts=20=E2=80=94=20guidance=20+=20helps-hurts=20table?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Simplify per review: remove the numeric 0–100 score, weighted axes, and the
friction.ts/score.ts probes. The skill is now pure guidance — the agent looks
at the site with the `browser` skill, uses the rubric as a flexible checklist,
and reports what helps vs what hurts (with fixes), letting the agent judge what
matters for each site rather than applying hard-coded formulas.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 README.md                                |   2 +-
 skills/browsability/.gitignore           |   1 -
 skills/browsability/SKILL.md             | 122 +++++-----------
 skills/browsability/references/rubric.md | 177 +++++++----------------
 skills/browsability/scripts/friction.ts  |  93 ------------
 skills/browsability/scripts/score.ts     |  97 -------------
 6 files changed, 95 insertions(+), 397 deletions(-)
 delete mode 100644 skills/browsability/.gitignore
 delete mode 100644 skills/browsability/scripts/friction.ts
 delete mode 100644 skills/browsability/scripts/score.ts

diff --git a/README.md b/README.md
index c9631b33..e3673fdd 100644
--- a/README.md
+++ b/README.md
@@ -19,7 +19,7 @@ This plugin includes the following skills (see `skills/` for details):
 | [fetch](skills/fetch/SKILL.md) | Fetch HTML or JSON from static pages without a browser session — inspect status codes, headers, follow redirects |
 | [search](skills/search/SKILL.md) | Search the web and return structured results (titles, URLs, metadata) without a browser session |
 | [ui-test](skills/ui-test/SKILL.md) | AI-powered adversarial UI testing — analyzes git diffs to test changes, or explores the full app to find bugs |
-| [browsability](skills/browsability/SKILL.md) | Score how usable a website is by an AI browser agent — Access Resistance (how much stealth/proxy/captcha help is needed), Drivability (do controls survive the accessibility-tree prune, iframe/shadow-DOM traps), and Agent tax (steps over the human baseline); emits a graded report with concrete fixes |
+| [browsability](skills/browsability/SKILL.md) | Assess how usable a website is by an AI browser agent — how much stealth/proxy/captcha help it needs to get in, whether controls are labeled/reachable, iframe/shadow-DOM traps, and extra steps vs a human; reports what helps and what hurts, with concrete fixes (no numeric score) |
 
 ## Installation
 
diff --git a/skills/browsability/.gitignore b/skills/browsability/.gitignore
deleted file mode 100644
index 4d5a7f03..00000000
--- a/skills/browsability/.gitignore
+++ /dev/null
@@ -1 +0,0 @@
-browsability-out/
diff --git a/skills/browsability/SKILL.md b/skills/browsability/SKILL.md
index ece2a2a0..7db2e2a3 100644
--- a/skills/browsability/SKILL.md
+++ b/skills/browsability/SKILL.md
@@ -1,103 +1,61 @@
 ---
 name: browsability
-description: "Score how usable a website is BY AN AI BROWSER AGENT — its Browsability Index. Measures how little infrastructure assistance an agent needs to operate the site (Access Resistance), whether the agent can perceive and drive the live DOM (Drivability — does each control survive the accessibility-tree prune, are there iframe/shadow-DOM/deep-DOM traps), and how many more steps the agent needs than a human (Agent tax). Grounded in what the open-source Stagehand framework treats as hard. Use when the user asks how browsable / agent-friendly / agent-ready a website or a specific web flow (signup, checkout, search) is for a BROWSER agent, to compare sites on browser-agent usability, or to produce a browsability report card with concrete fixes. Triggers: 'how browsable is <site>', 'is this site agent-friendly for a browser agent', 'grade this checkout/signup flow for agents', 'browser-agent friendliness', 'DOM friction', 'browsability of <url>'. NOT for SEO/AEO or content discoverability (a different layer), and NOT for docs/SDK onboarding DX (use the agent-experience skill for that)."
+description: "Assess how usable a website is BY AN AI BROWSER AGENT — its browsability. Look at how little infrastructure help the agent needs to get in (stealth/proxy/captcha), whether it can perceive and drive the live DOM (are controls labeled and reachable, are there iframe/shadow-DOM/deep-DOM traps), and how many extra steps it takes versus a human. Report what helps and what hurts, with concrete fixes — no numeric score. Use when the user asks how browsable / agent-friendly / agent-ready a website or a specific web flow (signup, checkout, search) is for a BROWSER agent, to compare sites on browser-agent usability, or to get a browsability report with fixes. Triggers: 'how browsable is <site>', 'is this site agent-friendly for a browser agent', 'check this checkout/signup flow for agents', 'browser-agent friendliness', 'DOM friction', 'browsability of <url>'. NOT for SEO/AEO or content discoverability (a different layer), and NOT for docs/SDK onboarding DX (use the agent-experience skill for that)."
 license: MIT
 metadata:
   author: browserbase
-  version: "0.1.0"
-allowed-tools: Bash Read Write Edit Glob Grep Agent
-compatibility: "Requires `bun` and the browse CLI (`npm install -g @browserbasehq/browse-cli`). Remote mode needs BROWSERBASE_API_KEY. The full agent-ladder pass additionally needs a model-driven reference agent (use the `browser` skill as the driver)."
+  version: "0.2.0"
+allowed-tools: Read Bash Glob Grep Agent
+compatibility: "Uses the browse CLI (`npm install -g @browserbasehq/browse-cli`) via the `browser` skill to look at and drive the site. Remote mode needs BROWSERBASE_API_KEY."
 ---
 
 # Browsability — how usable is a site for a browser agent?
 
-Score how well an AI **browser** agent can *operate* a website. The opinion: *browsability is how
-little help an agent needs to succeed, and how much harder the site is for an agent than for a human.*
-This is the operability layer — not discoverability, so ignore `llms.txt`, sitemaps, SEO/AEO.
+Judge how well an AI **browser** agent can *operate* a website. The idea is simple:
 
-**Before scoring, read `references/rubric.md`** — the full code-grounded rubric (axes, signals, the
-assistance ladder, the agent-vs-human delta, and remediation knowledge). The summary below is only the
-operating procedure.
+> **Browsability is how little help an agent needs to succeed — and how much harder the site is for
+> an agent than for a person.** A 10-click checkout that takes a human 10 clicks too is fine; a
+> 3-click task that takes the agent 10 because the buttons are unlabeled is not — those extra clicks
+> are the agent's problem, not the workflow's.
 
-## The score (0–100)
+This is the *operability* layer — driving the live UI. It is **not** discoverability, so ignore
+`llms.txt`, sitemaps, and SEO/AEO. It is also distinct from docs/SDK onboarding (that's the
+`agent-experience` skill).
 
-| Axis | Pts | Source |
-|---|---|---|
-| **A · Access Resistance** | 30 | lowest assistance rung that completes the task (agent ladder) |
-| **B1 · Reachability** | 25 | % of controls that survive the accessibility-tree prune (deterministic probe) |
-| **B3 · Structural traps** | 15 | cross-origin iframes, shadow DOM, DOM depth/size (deterministic probe) |
-| **C · Agent tax** | 20 | agent steps OVER the human baseline (the delta — not absolute click count) |
-| **D · Recoverability** | 10 | self-heal / site errors / blocking overlays / step ceiling (agent run) |
+There is **no scoring formula here.** Look at the site with your own eyes (and the agent's), use the
+checklist in `references/rubric.md` as a guide for what tends to matter, and decide what actually
+matters for *this* site. Then report what helps and what hurts.
 
-Score only counts for tasks a verifier confirms actually completed. **Agent-native affordance** (an
-API / deep-link / structured action path) is a *ceiling badge*, not a scored component — flag it, do
-not add it to the number; this rubric measures operability of the UI.
+## How to assess
 
-## Workflow
+1. **Actually try to use the site** with the `browser` skill. Open it, take a `browse snapshot`
+   (the accessibility tree — this is what an agent "sees"), and attempt a real task the site is for:
+   find the pricing, create an account, add to cart, submit the contact form. Notice where it's easy
+   and where you get stuck.
 
-### Step 1 — Drivability probe (always; deterministic, no model)
+2. **Notice how much help it took to get in.** If a vanilla session sails through, great — that's
+   maximally browsable. If you needed stealth, a proxy, or captcha-solving just to load or act, that
+   counts against the site. (`references/rubric.md` describes this assistance ladder.) Remember
+   `solveCaptchas` is **on by default** — if you want to know whether a site is hostile at the front
+   door, try it with captcha-solving off first.
 
-Run the probe on the target URL (a page, or the entry point of a flow):
+3. **Watch for the things that trip up browser agents** as you go — read `references/rubric.md` for
+   the full checklist, but in short: unlabeled / `<div>`-as-button controls, custom dropdowns,
+   iframes (especially cross-origin), shadow DOM, very deep or huge DOMs, blocking cookie/consent
+   walls, and flows that take the agent more steps than a person.
 
-```bash
-cd skills/browsability
-bun scripts/friction.ts <url> --out browsability-out
-```
+Use judgment over completeness — surface the few things that genuinely make or break this site for an
+agent, not an exhaustive audit.
 
-This loads the page through the browse CLI and reports **B1 reachability** + **B3 structural traps**
-straight from the live DOM (40 of 100 points). It needs no model and finishes in seconds. Use remote
-mode (`browse env remote`, needs `BROWSERBASE_API_KEY`) for bot-protected sites; local is fine
-otherwise. This alone is a useful friction profile and is the right answer for a quick assessment.
+## How to report
 
-### Step 2 — Agent ladder + tasks (for the full score)
+A simple **Helps / Hurts** table. Each "Hurts" row names the concrete fix. Cite what you observed.
 
-Derive a small set of **canonical tasks** for the site (informational / navigational / transactional —
-e.g. "find the price of the paid plan", "create an account", "submit the contact form"). For each
-task, run a reference browser agent across the **Access Resistance ladder** and record results:
+| ✅ Helps browsability | ⚠️ Hurts browsability |
+|---|---|
+| Native `<button>` / `<select>` with clear labels | Signup CTA is an unlabeled `<div>` → agent can't see it. *Fix: make it a `<button>` or add `aria-label`* |
+| Loads & acts fine in a vanilla session | Needs proxy + captcha-solving just to load. *Fix: ease bot-walls on agent-relevant flows* |
+| Main flow is same-origin | Checkout is a cross-origin iframe → fragile. *Fix: same-origin embed or a direct route* |
 
-- **rung 0** vanilla headless — captcha-solving **off** (`solveCaptchas:false`), no proxy, no fingerprint
-- **rung 1** default assist — captcha-solving on
-- **rung 2** proxy + realistic fingerprint
-- **rung 3** advanced stealth + persisted context
-- **rung 4** maximum assistance
-
-Stop climbing once a task succeeds; the lowest passing rung is its Access Resistance. Drive the agent
-with the `browser` skill (the browse CLI) or Stagehand, and judge each run's `success` with a verifier
-— do not trust the agent's self-report. Capture **real step counts** and a **`humanBaselineSteps`**
-estimate per task so Agent tax is computed as the delta. Record into `tasks.json`:
-
-```json
-{ "url": "https://example.com",
-  "tasks": [
-    { "name": "Create an account", "type": "transactional", "humanBaselineSteps": 4,
-      "runs": [ {"rung":0,"success":false,"steps":10,"model":"<model>","note":"signup CTA unlabeled"},
-                {"rung":2,"success":true,"steps":7,"model":"<model>","note":""} ] } ] }
-```
-
-If no model-driven agent is available, act as the reference agent using the `browser` skill: execute
-each task's browse steps, count the steps, and write the runs into `tasks.json` honestly (mark
-single-model). This produces a real, if single-model, result.
-
-### Step 3 — Composite score + report
-
-```bash
-bun scripts/score.ts --friction browsability-out/friction.json --tasks tasks.json --out browsability-out
-```
-
-Writes `browsability-out/browsability.json` with the 0–100 score, grade, and per-axis breakdown. When
-`tasks.json` is absent it reports a **Drivability-only** score (B1 + B3, 40 max) and marks A/C/D
-pending — still honest, just partial.
-
-### Step 4 — Report to the user
-
-Present a **profile, not just a number**: the grade, the per-axis breakdown, the lowest passing rung,
-and — most usefully — a **ranked remediation list** drawn from the rubric's remediation table (e.g.
-"signup CTA has no accessible name → add `aria-label`; estimated lift +X"). Cite the concrete signal
-each finding came from.
-
-## Notes & gotchas
-
-- `solveCaptchas` defaults to **on** in Browserbase — an honest rung-0 must explicitly disable it, or rungs 0 and 1 collapse and captcha-walled sites get over-credited.
-- The deterministic probe approximates "closed shadow DOM" via custom-element count with zero open shadow hosts; treat it as a hint and confirm during the agent run.
-- Keep the human baseline honest — Agent tax is the *delta*, so a genuinely long workflow (10 steps for humans too) must not be penalized as un-browsable.
-- The scripts call `browse stop` on exit; if a daemon hangs, `pkill -f "browse.*daemon"`.
+Optionally close with one plain-language line ("easy / moderate / hard for a browser agent, because…").
+Do not invent a number. In Slack contexts use mrkdwn (`*bold*`, `•` bullets), not tables.
diff --git a/skills/browsability/references/rubric.md b/skills/browsability/references/rubric.md
index 1fb76d85..42b1ed67 100644
--- a/skills/browsability/references/rubric.md
+++ b/skills/browsability/references/rubric.md
@@ -1,153 +1,84 @@
-# The Browsability Rubric
+# What makes a site browsable for a browser agent
 
-A code-grounded, operational definition of how usable a website is **by an AI browser agent** —
-and how to score it. Grounded in what the open-source [Stagehand](https://github.com/browserbase/stagehand)
-browser-automation framework actually treats as hard, plus the public Browserbase session settings.
+A checklist of what tends to help or hurt an AI **browser** agent trying to operate a website.
+Grounded in what the open-source [Stagehand](https://github.com/browserbase/stagehand) framework
+treats as hard, plus the public Browserbase session settings.
 
-## The opinion, in one line
+**Use this as a guide, not a rule book.** There is no scoring formula. Look at the site, try the
+task, and decide what actually matters for *this* site — then report what helps and what hurts.
 
-**Browsability is how little help an agent needs to succeed** — and, more precisely, **how much
-harder the site is for an agent than for a motivated human.**
+## The idea
 
-It is *not* discoverability. Forget `llms.txt`, sitemaps, token efficiency, and SEO/AEO — those
-measure whether content can be *found and cited*. Browsability measures whether an agent can
-*operate* the live site: perceive the controls, drive the DOM, and complete a real task.
+**Browsability is how little help an agent needs to succeed, and how much harder the site is for an
+agent than for a person.** Only the *agent-specific* friction counts: a long workflow that's long for
+humans too isn't a browsability problem; a simple task made hard by unlabeled controls is. This is
+*operability* (driving the UI), not *discoverability* (being found/cited — that's SEO/AEO, out of
+scope).
 
-It is measured **operationally** — by running real agent tasks and reading harness + session
-telemetry (which controls survived the accessibility tree, how many steps a flow took, which errors
-fired, how much stealth/proxy assistance was needed) — not by linting static HTML.
-
-> **Scope note:** this rubric covers *UI operability* — driving a website in a browser. It is the
-> sibling of, not a substitute for, auditing docs/SDK onboarding experience.
-
-## The key reframe: score the agent-vs-human delta, not absolute effort
-
-A 10-click checkout that also takes a human 10 clicks is *perfectly browsable* — that's just the
-workflow. A 3-click task that takes the agent 10 because controls are unlabeled is *not browsable* —
-those extra 7 clicks are the **agent tax**.
-
-Scoring the **delta over the human baseline** mathematically subtracts out UX/design length (which
-costs humans and agents equally) and isolates exactly the agent-specific penalty. This resolves the
-"is click-count a UX problem or a browsability problem?" question: only the *excess* over the human
-path counts.
-
-Stagehand surfaces a piece of this directly — a native `<select>` is a **one-step** action; a custom
-dropdown must be clicked open, re-snapshotted, then selected — a **two-step** action. That second
-step *is* agent tax: incidental inflation, not essential workflow.
-
-## The scored axes (+ one ceiling badge)
-
-| Axis | What it measures | Weight | In score? |
-|---|---|---|---|
-| **A · Access Resistance** | How much infrastructure assistance the agent needs to operate at all (the ladder) | 30 | ✅ |
-| **B1 · Reachability** | Can the agent perceive the controls (survive the accessibility-tree prune) | 25 | ✅ |
-| **B3 · Structural traps** | iframes, shadow DOM, DOM depth/size | 15 | ✅ |
-| **C · Agent tax** | Steps *above the human baseline* (incidental inflation only) | 20 | ✅ |
-| **D · Recoverability** | What happens when something breaks (self-heal, site errors, blocking overlays, step ceiling) | 10 | ✅ |
-| — Essential path length | Inherent workflow steps (humans pay too) | — | ❌ separate "Agent UX" lens |
-| — Agent-native affordance | An API / deep-link / structured action path exists | — | ⭐ ceiling badge, not scored |
-
-Agent-native affordance (offering a non-UI path so an agent need not drive the browser at all) is
-noted as the *ceiling*, not a scored component — this rubric deliberately measures **operability of
-the UI**, the realistic last mile for the large share of the web that is UI-only.
-
-Gate everything on a **success verdict** per task (a verifier, not the agent's self-report): friction
-and tax scores only count for tasks confirmed to have actually completed.
-
----
-
-## Axis A — Access Resistance (the assistance ladder)
-
-Browserbase exposes public session settings, each mitigating a specific site-side obstacle. Re-run
-the *same task* climbing the ladder; the **lowest rung at which it succeeds** is the site's Access
-Resistance. Lower = more browsable.
-
-| Public setting | Mitigates |
-|---|---|
-| `solveCaptchas` | CAPTCHA challenges |
-| `proxies` | IP blocks, rate limits, geo-gating (residential / geo-targeted) |
-| `fingerprint` | headless-browser fingerprint detection |
-| `advancedStealth` | advanced anti-bot detection |
-| `context` (persist) | re-auth / re-consent walls; session continuity |
-
-The ladder to re-run a task across:
-
-- **L0 Vanilla headless** — captcha-solving **off**, no proxy, no fingerprint, fresh context. The agent looks like raw headless Chrome. *Passing here = maximally browsable.*
-- **L1 Default assist** — captcha-solving on, still no proxy/fingerprint.
-- **L2 Proxied + realistic fingerprint** — geo proxy + a realistic desktop fingerprint.
-- **L3 Advanced stealth + persisted context** — advanced anti-bot mitigation on; cookies persisted.
-- **L4 Maximum assistance** — top-tier anti-bot mitigation. *Needing this rung = barely browsable.*
-
-> **Gotcha:** `solveCaptchas` defaults to **on** in Browserbase, so an honest rung-0 baseline must
-> explicitly turn it off — otherwise L0 and L1 collapse and captcha-walled sites get over-credited.
-
-**Score:** `A = 30 * (1 - minPassingRung / 4)`.
+When you see extra steps, ask: *would a human also need this step?* If yes, it's the workflow (don't
+count it). If no — e.g. the agent had to click open a custom dropdown that a person reads at a glance
+— that's the agent tax, and it hurts browsability.
 
 ---
 
-## Axis B — Drivability (per-step technical difficulty)
+## 1. Getting in — how much help did the agent need?
 
-### B1 · Element reachability — can the agent even *see* the control?
+Re-frame "how protected is this site" as a ladder of assistance. The less help needed, the more
+browsable. Browserbase exposes these public session settings; each one you have to switch on to make
+the task work is a mark against the site:
 
-Stagehand builds an accessibility tree and **prunes any node that lacks all three of**: an accessible
-name, named children, or a non-structural role. An unlabeled `<div role="generic">` button is removed
-*before the model ever sees it.* The survival rule, from the open-source accessibility snapshot:
+- `solveCaptchas` — CAPTCHA challenges (**on by default**, so test with it off to see front-door hostility)
+- `proxies` — IP blocks, rate limits, geo-gating
+- `fingerprint` — headless-browser fingerprint detection
+- `advancedStealth` — advanced anti-bot detection
+- `context` (persist) — re-auth / re-consent walls
 
-```js
-// keep a node iff:
-const keep = !!(name && name.trim())        // it has an accessible name, OR
-          || !!(childIds && childIds.length) // it has named children, OR
-          || !isStructural(role);            // it has a real role (not generic/none/inlinetextbox)
-```
+**Helps:** a plain vanilla headless session can load and act. **Hurts:** the task only works once you
+add stealth, a proxy, or captcha-solving — and the more of those it needs, the worse.
 
-- **Signal:** reachable-control ratio = interactive controls that survive the prune ÷ all interactive controls.
-- **Penalize:** icon-only buttons with no `aria-label`; `<div onclick>` controls; inputs with no associated `<label>`; closed-shadow custom components.
-- **Reward:** native semantic elements (`button`, `a[href]`, `input`, `select`) with text/labels — they always survive.
+## 2. Seeing the controls — can the agent perceive what to click?
 
-### B3 · Structural traps — the hard walls
+Browser agents work off an **accessibility tree**, and a control is only visible to the agent if it
+has an accessible name, named children, or a real semantic role. An unlabeled `<div role="generic">`
+button is dropped before the model ever sees it — effectively invisible.
 
-| Trap | Why it hurts an agent |
-|---|---|
-| Closed shadow DOM | roots closed before instrumentation are effectively invisible |
-| Cross-origin iframes | short-lived, separately-managed frames that can drop out mid-operation |
-| Deep DOM (>256 levels) | serialization stack limits force shallower, slower retries |
-| Never-settling network | streaming / sub-second polling never reaches "network idle" → timeout every step |
-| Virtualized lists | no automatic "scroll until found"; an observe→scroll→observe loop is required |
-| Very large DOM | the serialized tree is truncated; elements past the cap become invisible |
+- **Helps:** native `<button>`, `<a href>`, `<input>`, `<select>` with real text or labels; inputs tied to a `<label>`.
+- **Hurts:** icon-only buttons with no `aria-label`; `<div onclick>` "buttons"; inputs with no label; controls hidden inside closed shadow DOM.
 
----
+## 3. Structural traps
 
-## Axis C — Agent tax (steps over the human baseline)
+Hard walls that browser agents struggle with regardless of labeling:
 
-For each verifier-confirmed task: `agentTax = agentSteps - humanBaselineSteps`. Where a human baseline
-is unavailable, approximate the incidental inflation from the **two-step ratio** (custom controls the
-framework must expand-then-act on) plus needless modal steps. Only the *excess* counts; essential
-workflow length is reported separately as "Agent UX," not scored as browsability.
+- **Cross-origin iframes** — separately-managed frames that can drop out mid-action; fragile.
+- **Shadow DOM** — closed roots are opaque to the agent.
+- **Very deep DOM (hundreds of levels)** — forces slower, shallower page reads.
+- **Very large DOM** — the accessibility snapshot can get truncated; elements past the cap vanish.
+- **Never-settling pages** — constant streaming/polling means the page never looks "done loading," so the agent waits out a timeout on every step.
+- **Virtualized / infinite lists** — no "scroll until found"; the agent has to scroll-and-look in a loop.
 
----
+## 4. Extra steps the agent pays (but a human doesn't)
 
-## Axis D — Recoverability — what happens when something breaks
+- **Custom dropdowns vs native `<select>`** — a native select is one action; a custom dropdown makes the agent click to open, re-read the page, then pick — two+ actions. Multiply across a form and it adds up.
+- **Needless modals / multi-step wizards** that a human clicks through without thinking but the agent must navigate explicitly.
+- Count only the steps *beyond* what a person would need.
 
-Stagehand's error taxonomy cleanly separates *site-caused* friction from agent-caused, and its
-self-heal path is the tell: on a stale selector (the DOM mutated under the agent) it re-snapshots and
-re-asks the model once. Frequent self-heal = an unstable, hostile DOM.
+## 5. When things break — can the agent recover?
 
-- **Site-caused errors (penalize):** element-not-visible, selector-resolution failures, element-not-found, captcha timeouts, navigation timeouts.
-- **Blocking overlays (penalize):** cookie/consent walls, login walls, paywalls — not auto-dismissed; they eat steps or wall the flow entirely.
-- **Max-steps blowout:** agent loops have a default step budget; tasks that exhaust it score as failures.
-- **Signal:** self-heal count, site-caused-error count, overlay-encountered flag, whether the run hit the step ceiling.
+- **Blocking overlays** — cookie/consent walls, login walls, paywalls that aren't dismissed automatically and sit on top of the flow.
+- **Unstable DOM** — elements that move or re-render between looking and clicking, forcing the agent to re-find them (a sign of a hostile, racey page).
+- **Slow / hanging navigation** — pages that exceed load timeouts.
 
 ---
 
-## Remediation knowledge (turn findings into fixes)
+## Turning findings into fixes
 
 | Finding | Fix |
 |---|---|
-| Low reachable-ratio | add `aria-label` to icon-only controls; use semantic `<button>` / `<a>` |
+| Unlabeled / `<div>`-as-button controls | use semantic `<button>` / `<a>`, or add `aria-label` |
 | Many custom dropdowns | use native `<select>` where possible |
-| Cross-origin iframes in the flow | same-origin embed, or a direct route |
+| Cross-origin iframe in the flow | same-origin embed, or a direct route |
 | Closed shadow DOM | open shadow roots, or expose semantic fallbacks |
 | Deep / very large DOM | flatten nesting, paginate, reduce node count |
-| High Access Resistance | reduce hostile bot-walls on agent-relevant flows |
-| High agent tax | collapse the funnel; remove needless modal steps |
-| (ceiling) UI-only | offer an API / deep-link / structured action path for agents |
+| Needs heavy stealth/proxy/captcha to work | reduce hostile bot-walls on agent-relevant flows |
+| More steps than a human needs | collapse the funnel; remove needless modal steps |
+| UI-only with no agent path | (ceiling) offer an API / deep-link for agents so they needn't drive the UI at all |
diff --git a/skills/browsability/scripts/friction.ts b/skills/browsability/scripts/friction.ts
deleted file mode 100644
index 982e7251..00000000
--- a/skills/browsability/scripts/friction.ts
+++ /dev/null
@@ -1,93 +0,0 @@
-#!/usr/bin/env bun
-/**
- * Browsability — Drivability probe (deterministic, no model, no task run).
- *
- *   bun scripts/friction.ts <url> [--out <dir>]
- *
- * Measures the parts of the rubric readable from a single page load, grounded in
- * what browser-agent frameworks (e.g. open-source Stagehand) treat as hard
- * (see references/rubric.md):
- *   B1 reachability  — fraction of interactive controls that SURVIVE the accessibility-tree
- *                      prune (kept iff: accessible name OR named children OR a non-structural role).
- *   C* tax proxy     — native vs custom controls; a custom dropdown costs two steps (expand,
- *                      then select) where a native <select> costs one.
- *   B3 traps         — cross-origin iframes, shadow hosts, DOM depth (>256), DOM size.
- *
- * Full score also needs the agent run (Axis A ladder + real C + D) — see scripts/score.ts.
- * Requires the `browse` CLI (npm i -g @browserbasehq/browse-cli). Remote mode needs BROWSERBASE_API_KEY.
- */
-
-import { execFileSync } from "node:child_process";
-import { writeFileSync, mkdirSync } from "node:fs";
-
-const args = process.argv.slice(2);
-const url = args.find((a) => !a.startsWith("--"));
-if (!url) { console.error("usage: bun scripts/friction.ts <url> [--out <dir>]"); process.exit(1); }
-const outDir = (() => { const i = args.indexOf("--out"); return i >= 0 && args[i + 1] ? args[i + 1] : "browsability-out"; })();
-
-const PROBE = `(()=>{
-  const SEL='a[href],button,input,select,textarea,[role=button],[role=link],[role=checkbox],[role=menuitem],[role=tab],[role=combobox],[onclick],[contenteditable=""],[contenteditable=true],[tabindex]';
-  const ix=[...document.querySelectorAll(SEL)];
-  const nativeTag=e=>{const t=e.tagName;return t==='BUTTON'||t==='SELECT'||t==='TEXTAREA'||t==='INPUT'||(t==='A'&&!!e.getAttribute('href'));};
-  const hasName=e=>{
-    const a=((e.getAttribute&&(e.getAttribute('aria-label')||e.getAttribute('title')||e.getAttribute('alt')||e.getAttribute('placeholder')))||'').trim();
-    if(a)return true;
-    if(e.getAttribute&&e.getAttribute('aria-labelledby'))return true;
-    if(e.labels&&e.labels.length)return true;
-    if((e.textContent||'').trim())return true;
-    return false;
-  };
-  const reachable=ix.filter(e=>hasName(e)||nativeTag(e));
-  const custom=ix.filter(e=>{const t=e.tagName;const r=(e.getAttribute&&e.getAttribute('role'))||'';const click=e.hasAttribute&&e.hasAttribute('onclick');return (r==='button'||r==='link'||click)&&!(t==='BUTTON'||t==='A');});
-  const customDropdowns=[...document.querySelectorAll('[role=combobox]:not(select),[role=listbox],[aria-haspopup]')].length;
-  const nativeSelects=document.querySelectorAll('select').length;
-  const frames=[...document.querySelectorAll('iframe')];
-  let xorigin=0; frames.forEach(f=>{try{const u=new URL(f.src,location.href);if(u.origin!==location.origin)xorigin++;}catch(e){}});
-  let openShadow=0; document.querySelectorAll('*').forEach(e=>{if(e.shadowRoot)openShadow++;});
-  const customElements=[...document.querySelectorAll('*')].filter(e=>e.tagName.includes('-')).length;
-  let maxDepth=0,nodes=0;
-  (function walk(n,d){nodes++;if(d>maxDepth)maxDepth=d;let c=n.firstElementChild;while(c){walk(c,d+1);c=c.nextElementSibling;}})(document.documentElement,1);
-  return JSON.stringify({interactive:ix.length,reachable:reachable.length,custom:custom.length,customDropdowns,nativeSelects,iframes:frames.length,crossOriginIframes:xorigin,openShadowHosts:openShadow,customElements,maxDepth,nodes,textLen:(document.body.innerText||'').length});
-})()`;
-
-function probe(target: string): any {
-  try {
-    execFileSync("browse", ["open", target], { stdio: "ignore", timeout: 60000 });
-    const out = execFileSync("browse", ["eval", PROBE], { encoding: "utf8", timeout: 30000 });
-    const env = out.match(/\{\s*"result"\s*:\s*"((?:\\.|[^"\\])*)"\s*\}/);
-    if (!env) throw new Error("browse eval returned no result");
-    return JSON.parse(JSON.parse(`"${env[1]}"`));
-  } finally {
-    try { execFileSync("browse", ["stop"], { stdio: "ignore", timeout: 15000 }); } catch {}
-  }
-}
-
-const p = probe(url);
-const reachRatio = p.interactive ? p.reachable / p.interactive : 1;
-const customRatio = p.interactive ? p.custom / p.interactive : 0;
-
-const b1 = Math.round(reachRatio * 25);
-const taxProxy = Math.max(0, Math.round(20 * (1 - Math.min(customRatio * 1.5, 1)) - Math.min(p.customDropdowns, 5)));
-const traps: string[] = [];
-let b3 = 15;
-if (p.crossOriginIframes > 0) { b3 -= 6; traps.push(`${p.crossOriginIframes} cross-origin iframe(s) — out-of-process frames are fragile for agents`); }
-if (p.maxDepth > 256) { b3 -= 5; traps.push(`DOM depth ${p.maxDepth} > 256 — deep trees force slower, shallower snapshots`); }
-else if (p.maxDepth > 128) { b3 -= 2; traps.push(`DOM depth ${p.maxDepth} — approaching the deep-tree cliff`); }
-if (p.nodes > 8000) { b3 -= 3; traps.push(`${p.nodes} DOM nodes — large tree risks accessibility-snapshot truncation`); }
-if (p.customElements > 5 && p.openShadowHosts === 0) { b3 -= 2; traps.push(`${p.customElements} custom elements, 0 open shadow hosts — possible closed shadow DOM (opaque to agents)`); }
-b3 = Math.max(0, b3);
-const partial = b1 + taxProxy + b3;
-
-mkdirSync(outDir, { recursive: true });
-writeFileSync(`${outDir}/friction.json`, JSON.stringify({ url, scannedAt: new Date().toISOString(), raw: p, scores: { b1, taxProxy, b3, partial, of: 60 }, traps }, null, 2));
-
-const pct = (n: number) => `${(n * 100) | 0}%`;
-console.log(`\n  Drivability probe — ${url}`);
-console.log(`  ${"─".repeat(52)}`);
-console.log(`  B1 reachability    ${String(b1).padStart(2)}/25   ${p.reachable}/${p.interactive} controls survive the a11y prune (${pct(reachRatio)})`);
-console.log(`  C  tax proxy       ${String(taxProxy).padStart(2)}/20   ${p.custom} custom controls (${pct(customRatio)}), ${p.customDropdowns} custom dropdown(s), ${p.nativeSelects} native <select>`);
-console.log(`  B3 structural      ${String(b3).padStart(2)}/15   depth ${p.maxDepth}, ${p.nodes} nodes, ${p.iframes} iframe(s) [${p.crossOriginIframes} x-origin], ${p.openShadowHosts} shadow host(s), ${p.customElements} custom el`);
-console.log(`  ${"─".repeat(52)}`);
-console.log(`  Drivability partial ${partial}/60   (Axis A ladder + real agent tax + recoverability need the agent run — scripts/score.ts)`);
-if (traps.length) { console.log(`\n  Traps detected:`); for (const t of traps) console.log(`   • ${t}`); }
-console.log(`\n  → ${outDir}/friction.json\n`);
diff --git a/skills/browsability/scripts/score.ts b/skills/browsability/scripts/score.ts
deleted file mode 100644
index a0c93172..00000000
--- a/skills/browsability/scripts/score.ts
+++ /dev/null
@@ -1,97 +0,0 @@
-#!/usr/bin/env bun
-/**
- * Browsability — composite scorer. Combines the deterministic Drivability probe
- * (friction.json) with agent-run results (tasks.json) into the full rubric score.
- *
- *   bun scripts/score.ts --friction <dir>/friction.json [--tasks tasks.json] [--out <dir>]
- *
- * Rubric (see references/rubric.md), 0-100:
- *   A  Access Resistance  30  — inverse of the lowest assistance rung that passes
- *   B1 Reachability       25  — from friction probe
- *   B3 Structural traps   15  — from friction probe
- *   C  Agent tax          20  — agent steps OVER the human baseline (delta, not absolute)
- *   D  Recoverability      10  — self-heal / site-errors / overlays / step-ceiling
- *
- * tasks.json shape:
- *   { "url": "...",
- *     "tasks": [ { "name": "...", "type": "...", "humanBaselineSteps": 3,
- *                  "runs": [ {"rung":0,"success":false,"steps":12,"model":"...","note":"hCaptcha"},
- *                            {"rung":2,"success":true,"steps":5,"model":"...","note":""} ] } ] }
- * rung index: 0=vanilla 1=default-assist 2=proxy+fingerprint 3=advanced-stealth 4=verified
- */
-
-import { readFileSync, writeFileSync, mkdirSync, existsSync } from "node:fs";
-
-const args = process.argv.slice(2);
-const flag = (n: string, d?: string) => { const i = args.indexOf(n); return i >= 0 && args[i + 1] ? args[i + 1] : d; };
-const frictionPath = flag("--friction", "browsability-out/friction.json")!;
-const tasksPath = flag("--tasks");
-const outDir = flag("--out", "browsability-out")!;
-const MAX_RUNG = 4;
-
-if (!existsSync(frictionPath)) { console.error(`no friction file at ${frictionPath} — run scripts/friction.ts first`); process.exit(1); }
-const friction = JSON.parse(readFileSync(frictionPath, "utf8"));
-const url = friction.url;
-const b1 = friction.scores.b1;          // /25
-const b3 = friction.scores.b3;          // /15
-const taxProxy = friction.scores.taxProxy; // /20 fallback
-
-type Run = { rung: number; success: boolean; steps?: number; model?: string; note?: string };
-type Task = { name: string; type?: string; humanBaselineSteps?: number; runs?: Run[] };
-
-let a = 0, c = taxProxy, d = 10, tasks: Task[] = [], haveRuns = false;
-const med = (xs: number[]) => { const s = [...xs].sort((x, y) => x - y); const m = s.length >> 1; return s.length ? (s.length % 2 ? s[m] : Math.round((s[m - 1] + s[m]) / 2)) : 0; };
-
-if (tasksPath && existsSync(tasksPath)) {
-  tasks = (JSON.parse(readFileSync(tasksPath, "utf8")).tasks ?? []) as Task[];
-  const withRuns = tasks.filter((t) => t.runs && t.runs.length);
-  haveRuns = withRuns.length > 0;
-  if (haveRuns) {
-    // A — worst-case minimum passing rung across tasks
-    const rungs = withRuns.map((t) => {
-      const ok = (t.runs || []).filter((r) => r.success).map((r) => r.rung);
-      return ok.length ? Math.min(...ok) : MAX_RUNG + 1; // unsolved even at top
-    });
-    const siteRung = Math.max(...rungs);
-    a = Math.round(30 * (1 - Math.min(siteRung, MAX_RUNG) / MAX_RUNG));
-
-    // C — agent tax = steps over human baseline on the cheapest passing run
-    const taxes: number[] = [];
-    for (const t of withRuns) {
-      const passing = (t.runs || []).filter((r) => r.success && typeof r.steps === "number");
-      if (passing.length && typeof t.humanBaselineSteps === "number") {
-        const best = passing.reduce((a, b) => (a.rung <= b.rung ? a : b));
-        taxes.push(Math.max(0, (best.steps as number) - t.humanBaselineSteps));
-      }
-    }
-    if (taxes.length) c = Math.max(0, Math.round(20 * (1 - Math.min(med(taxes), 8) / 8)));
-
-    // D — recoverability from run notes
-    const allRuns = withRuns.flatMap((t) => t.runs || []);
-    const deadEnds = allRuns.filter((r) => /captcha|modal|shadow|iframe|overlay|consent|cookie|timeout|stuck|self-?heal|blocked/i.test(r.note || "")).length;
-    d = Math.max(0, 10 - deadEnds * 2);
-  }
-}
-
-const total = Math.round(a + b1 + b3 + c + d);
-const grade = (t: number) => t >= 90 ? "A" : t >= 80 ? "B+" : t >= 70 ? "B" : t >= 60 ? "C+" : t >= 50 ? "C" : t >= 35 ? "D" : "F";
-const rungName = ["L0 vanilla", "L1 default-assist", "L2 proxy+fingerprint", "L3 advanced-stealth", "L4 verified"];
-
-mkdirSync(outDir, { recursive: true });
-const report = {
-  url, scoredAt: new Date().toISOString(), total, grade: grade(total),
-  axes: { accessResistance: a, reachability_B1: b1, structural_B3: b3, agentTax_C: c, recoverability_D: d },
-  driveabilityOnly: !haveRuns, tasks,
-};
-writeFileSync(`${outDir}/browsability.json`, JSON.stringify(report, null, 2));
-
-console.log(`\n  Browsability — ${url}`);
-console.log(`  ${"─".repeat(50)}`);
-console.log(`  SCORE  ${total}/100   GRADE ${grade(total)}${haveRuns ? "" : "   (Drivability only — agent ladder not run)"}`);
-console.log(`  A  Access Resistance  ${String(a).padStart(2)}/30${haveRuns ? "" : "   PENDING"}`);
-console.log(`  B1 Reachability       ${String(b1).padStart(2)}/25`);
-console.log(`  B3 Structural traps   ${String(b3).padStart(2)}/15`);
-console.log(`  C  Agent tax          ${String(c).padStart(2)}/20${haveRuns ? "" : "   (proxy from probe)"}`);
-console.log(`  D  Recoverability     ${String(d).padStart(2)}/10${haveRuns ? "" : "   PENDING"}`);
-console.log(`  ${"─".repeat(50)}`);
-console.log(`  → ${outDir}/browsability.json\n`);

From 7440dd021a7b90a4ae1fbf62cbf67e8f62bc350f Mon Sep 17 00:00:00 2001
From: shubh24 <shubhankar24@gmail.com>
Date: Mon, 1 Jun 2026 12:42:02 -0700
Subject: [PATCH 3/5] =?UTF-8?q?docs(browsability):=20fix=20reference=20?=
 =?UTF-8?q?=E2=80=94=20the=20browse=20skill,=20not=20browser?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 skills/browsability/SKILL.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/skills/browsability/SKILL.md b/skills/browsability/SKILL.md
index 7db2e2a3..c9deaa28 100644
--- a/skills/browsability/SKILL.md
+++ b/skills/browsability/SKILL.md
@@ -6,7 +6,7 @@ metadata:
   author: browserbase
   version: "0.2.0"
 allowed-tools: Read Bash Glob Grep Agent
-compatibility: "Uses the browse CLI (`npm install -g @browserbasehq/browse-cli`) via the `browser` skill to look at and drive the site. Remote mode needs BROWSERBASE_API_KEY."
+compatibility: "Uses the browse CLI (`npm install -g @browserbasehq/browse-cli`) via the `browse` skill to look at and drive the site. Remote mode needs BROWSERBASE_API_KEY."
 ---
 
 # Browsability — how usable is a site for a browser agent?
@@ -28,7 +28,7 @@ matters for *this* site. Then report what helps and what hurts.
 
 ## How to assess
 
-1. **Actually try to use the site** with the `browser` skill. Open it, take a `browse snapshot`
+1. **Actually try to use the site** with the `browse` skill. Open it, take a `browse snapshot`
    (the accessibility tree — this is what an agent "sees"), and attempt a real task the site is for:
    find the pricing, create an account, add to cart, submit the contact form. Notice where it's easy
    and where you get stuck.

From 23fed20d381e4dc2f9d91c10a6ba67fb531fb0ef Mon Sep 17 00:00:00 2001
From: shubh24 <shubhankar24@gmail.com>
Date: Mon, 1 Jun 2026 12:55:31 -0700
Subject: [PATCH 4/5] docs(browsability): report helps and hurts as two
 separate tables

A combined two-column table implied that a Helps row and the Hurts row beside
it were related. Split into two standalone tables (Hurts keeps a Fix column).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 skills/browsability/SKILL.md | 27 ++++++++++++++++++++-------
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/skills/browsability/SKILL.md b/skills/browsability/SKILL.md
index c9deaa28..4173d9fa 100644
--- a/skills/browsability/SKILL.md
+++ b/skills/browsability/SKILL.md
@@ -49,13 +49,26 @@ agent, not an exhaustive audit.
 
 ## How to report
 
-A simple **Helps / Hurts** table. Each "Hurts" row names the concrete fix. Cite what you observed.
+Use **two separate tables** — one for what helps, one for what hurts. Do **not** put them in one
+two-column table: a Helps row and a Hurts row are unrelated, so placing them side by side falsely
+implies they're connected.
 
-| ✅ Helps browsability | ⚠️ Hurts browsability |
+First, what helps:
+
+| ✅ Helps browsability |
+|---|
+| Native `<button>` / `<select>` with clear labels |
+| Loads & acts fine in a vanilla session |
+| Main flow is same-origin |
+
+Then, what hurts — each with its concrete fix:
+
+| ⚠️ Hurts browsability | Fix |
 |---|---|
-| Native `<button>` / `<select>` with clear labels | Signup CTA is an unlabeled `<div>` → agent can't see it. *Fix: make it a `<button>` or add `aria-label`* |
-| Loads & acts fine in a vanilla session | Needs proxy + captcha-solving just to load. *Fix: ease bot-walls on agent-relevant flows* |
-| Main flow is same-origin | Checkout is a cross-origin iframe → fragile. *Fix: same-origin embed or a direct route* |
+| Signup CTA is an unlabeled `<div>` → agent can't see it | make it a `<button>` or add `aria-label` |
+| Needs proxy + captcha-solving just to load | ease bot-walls on agent-relevant flows |
+| Checkout is a cross-origin iframe → fragile | same-origin embed, or a direct route |
 
-Optionally close with one plain-language line ("easy / moderate / hard for a browser agent, because…").
-Do not invent a number. In Slack contexts use mrkdwn (`*bold*`, `•` bullets), not tables.
+Cite what you observed. Optionally close with one plain-language line ("easy / moderate / hard for a
+browser agent, because…"). Do not invent a number. In Slack contexts use mrkdwn (`*bold*`, `•`
+bullets), not tables.

From 4469d3ad82b8b85bf6fd2e4a2fbf9aa4059fef1c Mon Sep 17 00:00:00 2001
From: shubh24 <shubhankar24@gmail.com>
Date: Mon, 1 Jun 2026 13:10:32 -0700
Subject: [PATCH 5/5] feat(browsability): flag remote-browser blocking as a
 browsability failure
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Agents run on remote/cloud browsers, so that's the environment that counts.
If a task works on a local browser but is blocked/errors on a remote one, the
site is gating automated browsers — a major browsability failure, not a
testing caveat. Adds the remote-vs-local test and the chrome-error/bare-domain
diagnostic (an empty remote page is usually a blocked navigation, not a render).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 skills/browsability/SKILL.md             | 22 +++++++++++++++++-----
 skills/browsability/references/rubric.md |  8 ++++++++
 2 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/skills/browsability/SKILL.md b/skills/browsability/SKILL.md
index 4173d9fa..607a5c66 100644
--- a/skills/browsability/SKILL.md
+++ b/skills/browsability/SKILL.md
@@ -22,6 +22,11 @@ This is the *operability* layer — driving the live UI. It is **not** discovera
 `llms.txt`, sitemaps, and SEO/AEO. It is also distinct from docs/SDK onboarding (that's the
 `agent-experience` skill).
 
+> **Agents run on remote/cloud browsers.** So the target environment is a *remote* browser, not your
+> local one. A site that works in a local/residential browser but **blocks or errors on a remote
+> browser is, by definition, not browsable** — an agent literally can't use it. Treat that gap as a
+> top finding, not a footnote.
+
 There is **no scoring formula here.** Look at the site with your own eyes (and the agent's), use the
 checklist in `references/rubric.md` as a guide for what tends to matter, and decide what actually
 matters for *this* site. Then report what helps and what hurts.
@@ -33,11 +38,18 @@ matters for *this* site. Then report what helps and what hurts.
    find the pricing, create an account, add to cart, submit the contact form. Notice where it's easy
    and where you get stuck.
 
-2. **Notice how much help it took to get in.** If a vanilla session sails through, great — that's
-   maximally browsable. If you needed stealth, a proxy, or captcha-solving just to load or act, that
-   counts against the site. (`references/rubric.md` describes this assistance ladder.) Remember
-   `solveCaptchas` is **on by default** — if you want to know whether a site is hostile at the front
-   door, try it with captcha-solving off first.
+2. **Test on a remote browser, and notice how much help it took to get in.** That's the environment
+   agents actually run in. If a vanilla remote session sails through, great — that's maximally
+   browsable. If you needed stealth, a proxy, or captcha-solving just to load or act, that counts
+   against the site. (`references/rubric.md` describes this assistance ladder.) Remember
+   `solveCaptchas` is **on by default** — to see if a site is hostile at the front door, try it with
+   captcha-solving off first.
+
+   **If a task fails on the remote browser, confirm with a local one.** If it works locally but is
+   blocked or errors out remotely, the site is gating cloud/automated browsers — **flag that as a
+   major browsability failure** (it's the whole point: an agent can't use the site). When something
+   comes back empty, always check the final URL — a `chrome-error://…` (or a title that's just the
+   bare domain) means the navigation *failed/was blocked*, not that the page rendered empty.
 
 3. **Watch for the things that trip up browser agents** as you go — read `references/rubric.md` for
    the full checklist, but in short: unlabeled / `<div>`-as-button controls, custom dropdowns,
diff --git a/skills/browsability/references/rubric.md b/skills/browsability/references/rubric.md
index 42b1ed67..f24633e3 100644
--- a/skills/browsability/references/rubric.md
+++ b/skills/browsability/references/rubric.md
@@ -36,6 +36,14 @@ the task work is a mark against the site:
 **Helps:** a plain vanilla headless session can load and act. **Hurts:** the task only works once you
 add stealth, a proxy, or captcha-solving — and the more of those it needs, the worse.
 
+**The remote-vs-local test (the strongest signal here).** Agents run on remote/cloud browsers, so
+that's the environment that counts. If a task **works on a local/residential browser but is blocked
+or errors on a remote one**, the site is gating cloud/automated browsers — that is a *major*
+browsability failure, because a real agent simply cannot use it. Flag it loudly; do not excuse it as
+"we just need a proxy." (Diagnostic tip: when a remote page comes back empty, check the final URL —
+`chrome-error://…` or a title that's only the bare domain means the navigation was *blocked/failed*,
+not that the page rendered empty. Confirm by loading the same URL locally.)
+
 ## 2. Seeing the controls — can the agent perceive what to click?
 
 Browser agents work off an **accessibility tree**, and a control is only visible to the agent if it