Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ This plugin includes the following skills (see `skills/` for details):
| [fetch](skills/fetch/SKILL.md) | Fetch HTML or JSON from static pages without a browser session — inspect status codes, headers, follow redirects |
| [search](skills/search/SKILL.md) | Search the web and return structured results (titles, URLs, metadata) without a browser session |
| [ui-test](skills/ui-test/SKILL.md) | AI-powered adversarial UI testing — analyzes git diffs to test changes, or explores the full app to find bugs |
| [browsability](skills/browsability/SKILL.md) | Assess how usable a website is by an AI browser agent — how much stealth/proxy/captcha help it needs to get in, whether controls are labeled/reachable, iframe/shadow-DOM traps, and extra steps vs a human; reports what helps and what hurts, with concrete fixes (no numeric score) |

## Installation

Expand Down
21 changes: 21 additions & 0 deletions skills/browsability/LICENSE.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2026 Browserbase, Inc.

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
86 changes: 86 additions & 0 deletions skills/browsability/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
---
name: browsability
description: "Assess how usable a website is BY AN AI BROWSER AGENT — its browsability. Look at how little infrastructure help the agent needs to get in (stealth/proxy/captcha), whether it can perceive and drive the live DOM (are controls labeled and reachable, are there iframe/shadow-DOM/deep-DOM traps), and how many extra steps it takes versus a human. Report what helps and what hurts, with concrete fixes — no numeric score. Use when the user asks how browsable / agent-friendly / agent-ready a website or a specific web flow (signup, checkout, search) is for a BROWSER agent, to compare sites on browser-agent usability, or to get a browsability report with fixes. Triggers: 'how browsable is <site>', 'is this site agent-friendly for a browser agent', 'check this checkout/signup flow for agents', 'browser-agent friendliness', 'DOM friction', 'browsability of <url>'. NOT for SEO/AEO or content discoverability (a different layer), and NOT for docs/SDK onboarding DX (use the agent-experience skill for that)."
license: MIT
metadata:
author: browserbase
version: "0.2.0"
allowed-tools: Read Bash Glob Grep Agent
compatibility: "Uses the browse CLI (`npm install -g @browserbasehq/browse-cli`) via the `browse` skill to look at and drive the site. Remote mode needs BROWSERBASE_API_KEY."
---

# Browsability — how usable is a site for a browser agent?

Judge how well an AI **browser** agent can *operate* a website. The idea is simple:

> **Browsability is how little help an agent needs to succeed — and how much harder the site is for
> an agent than for a person.** A 10-click checkout that takes a human 10 clicks too is fine; a
> 3-click task that takes the agent 10 because the buttons are unlabeled is not — those extra clicks
> are the agent's problem, not the workflow's.

This is the *operability* layer — driving the live UI. It is **not** discoverability, so ignore
`llms.txt`, sitemaps, and SEO/AEO. It is also distinct from docs/SDK onboarding (that's the
`agent-experience` skill).

> **Agents run on remote/cloud browsers.** So the target environment is a *remote* browser, not your
> local one. A site that works in a local/residential browser but **blocks or errors on a remote
> browser is, by definition, not browsable** — an agent literally can't use it. Treat that gap as a
> top finding, not a footnote.

There is **no scoring formula here.** Look at the site with your own eyes (and the agent's), use the
checklist in `references/rubric.md` as a guide for what tends to matter, and decide what actually
matters for *this* site. Then report what helps and what hurts.

## How to assess

1. **Actually try to use the site** with the `browse` skill. Open it, take a `browse snapshot`
(the accessibility tree — this is what an agent "sees"), and attempt a real task the site is for:
find the pricing, create an account, add to cart, submit the contact form. Notice where it's easy
and where you get stuck.

2. **Test on a remote browser, and notice how much help it took to get in.** That's the environment
agents actually run in. If a vanilla remote session sails through, great — that's maximally
browsable. If you needed stealth, a proxy, or captcha-solving just to load or act, that counts
against the site. (`references/rubric.md` describes this assistance ladder.) Remember
`solveCaptchas` is **on by default** — to see if a site is hostile at the front door, try it with
captcha-solving off first.

**If a task fails on the remote browser, confirm with a local one.** If it works locally but is
blocked or errors out remotely, the site is gating cloud/automated browsers — **flag that as a
major browsability failure** (it's the whole point: an agent can't use the site). When something
comes back empty, always check the final URL — a `chrome-error://…` (or a title that's just the
bare domain) means the navigation *failed/was blocked*, not that the page rendered empty.

3. **Watch for the things that trip up browser agents** as you go — read `references/rubric.md` for
the full checklist, but in short: unlabeled / `<div>`-as-button controls, custom dropdowns,
iframes (especially cross-origin), shadow DOM, very deep or huge DOMs, blocking cookie/consent
walls, and flows that take the agent more steps than a person.

Use judgment over completeness — surface the few things that genuinely make or break this site for an
agent, not an exhaustive audit.

## How to report

Use **two separate tables** — one for what helps, one for what hurts. Do **not** put them in one
two-column table: a Helps row and a Hurts row are unrelated, so placing them side by side falsely
implies they're connected.

First, what helps:

| ✅ Helps browsability |
|---|
| Native `<button>` / `<select>` with clear labels |
| Loads & acts fine in a vanilla session |
| Main flow is same-origin |

Then, what hurts — each with its concrete fix:

| ⚠️ Hurts browsability | Fix |
|---|---|
| Signup CTA is an unlabeled `<div>` → agent can't see it | make it a `<button>` or add `aria-label` |
| Needs proxy + captcha-solving just to load | ease bot-walls on agent-relevant flows |
| Checkout is a cross-origin iframe → fragile | same-origin embed, or a direct route |

Cite what you observed. Optionally close with one plain-language line ("easy / moderate / hard for a
browser agent, because…"). Do not invent a number. In Slack contexts use mrkdwn (`*bold*`, `•`
bullets), not tables.
92 changes: 92 additions & 0 deletions skills/browsability/references/rubric.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
# What makes a site browsable for a browser agent

A checklist of what tends to help or hurt an AI **browser** agent trying to operate a website.
Grounded in what the open-source [Stagehand](https://github.com/browserbase/stagehand) framework
treats as hard, plus the public Browserbase session settings.

**Use this as a guide, not a rule book.** There is no scoring formula. Look at the site, try the
task, and decide what actually matters for *this* site — then report what helps and what hurts.

## The idea

**Browsability is how little help an agent needs to succeed, and how much harder the site is for an
agent than for a person.** Only the *agent-specific* friction counts: a long workflow that's long for
humans too isn't a browsability problem; a simple task made hard by unlabeled controls is. This is
*operability* (driving the UI), not *discoverability* (being found/cited — that's SEO/AEO, out of
scope).

When you see extra steps, ask: *would a human also need this step?* If yes, it's the workflow (don't
count it). If no — e.g. the agent had to click open a custom dropdown that a person reads at a glance
— that's the agent tax, and it hurts browsability.

---

## 1. Getting in — how much help did the agent need?

Re-frame "how protected is this site" as a ladder of assistance. The less help needed, the more
browsable. Browserbase exposes these public session settings; each one you have to switch on to make
the task work is a mark against the site:

- `solveCaptchas` — CAPTCHA challenges (**on by default**, so test with it off to see front-door hostility)
- `proxies` — IP blocks, rate limits, geo-gating
- `fingerprint` — headless-browser fingerprint detection
- `advancedStealth` — advanced anti-bot detection
- `context` (persist) — re-auth / re-consent walls

**Helps:** a plain vanilla headless session can load and act. **Hurts:** the task only works once you
add stealth, a proxy, or captcha-solving — and the more of those it needs, the worse.

**The remote-vs-local test (the strongest signal here).** Agents run on remote/cloud browsers, so
that's the environment that counts. If a task **works on a local/residential browser but is blocked
or errors on a remote one**, the site is gating cloud/automated browsers — that is a *major*
browsability failure, because a real agent simply cannot use it. Flag it loudly; do not excuse it as
"we just need a proxy." (Diagnostic tip: when a remote page comes back empty, check the final URL —
`chrome-error://…` or a title that's only the bare domain means the navigation was *blocked/failed*,
not that the page rendered empty. Confirm by loading the same URL locally.)

## 2. Seeing the controls — can the agent perceive what to click?

Browser agents work off an **accessibility tree**, and a control is only visible to the agent if it
has an accessible name, named children, or a real semantic role. An unlabeled `<div role="generic">`
button is dropped before the model ever sees it — effectively invisible.

- **Helps:** native `<button>`, `<a href>`, `<input>`, `<select>` with real text or labels; inputs tied to a `<label>`.
- **Hurts:** icon-only buttons with no `aria-label`; `<div onclick>` "buttons"; inputs with no label; controls hidden inside closed shadow DOM.

## 3. Structural traps

Hard walls that browser agents struggle with regardless of labeling:

- **Cross-origin iframes** — separately-managed frames that can drop out mid-action; fragile.
- **Shadow DOM** — closed roots are opaque to the agent.
- **Very deep DOM (hundreds of levels)** — forces slower, shallower page reads.
- **Very large DOM** — the accessibility snapshot can get truncated; elements past the cap vanish.
- **Never-settling pages** — constant streaming/polling means the page never looks "done loading," so the agent waits out a timeout on every step.
- **Virtualized / infinite lists** — no "scroll until found"; the agent has to scroll-and-look in a loop.

## 4. Extra steps the agent pays (but a human doesn't)

- **Custom dropdowns vs native `<select>`** — a native select is one action; a custom dropdown makes the agent click to open, re-read the page, then pick — two+ actions. Multiply across a form and it adds up.
- **Needless modals / multi-step wizards** that a human clicks through without thinking but the agent must navigate explicitly.
- Count only the steps *beyond* what a person would need.

## 5. When things break — can the agent recover?

- **Blocking overlays** — cookie/consent walls, login walls, paywalls that aren't dismissed automatically and sit on top of the flow.
- **Unstable DOM** — elements that move or re-render between looking and clicking, forcing the agent to re-find them (a sign of a hostile, racey page).
- **Slow / hanging navigation** — pages that exceed load timeouts.

---

## Turning findings into fixes

| Finding | Fix |
|---|---|
| Unlabeled / `<div>`-as-button controls | use semantic `<button>` / `<a>`, or add `aria-label` |
| Many custom dropdowns | use native `<select>` where possible |
| Cross-origin iframe in the flow | same-origin embed, or a direct route |
| Closed shadow DOM | open shadow roots, or expose semantic fallbacks |
| Deep / very large DOM | flatten nesting, paginate, reduce node count |
| Needs heavy stealth/proxy/captcha to work | reduce hostile bot-walls on agent-relevant flows |
| More steps than a human needs | collapse the funnel; remove needless modal steps |
| UI-only with no agent path | (ceiling) offer an API / deep-link for agents so they needn't drive the UI at all |