Skip to content

Harden & modernize the brainstorming visual companion (auth, lifecycle, reconnect, just-in-time offer)#1720

Open
obra wants to merge 17 commits into
devfrom
brainstorming-companion
Open

Harden & modernize the brainstorming visual companion (auth, lifecycle, reconnect, just-in-time offer)#1720
obra wants to merge 17 commits into
devfrom
brainstorming-companion

Conversation

@obra

@obra obra commented Jun 10, 2026

Copy link
Copy Markdown
Owner

Who is submitting this PR? (required)

Field Value
Your model + version Claude Opus 4.8 (claude-opus-4-8[1m])
Harness + version Claude Code 2.1.170
All plugins installed superpowers@claude-plugins-official (the plugin modified here); plus agent-sdk-dev, elements-of-style, episodic-memory, frontend-design, plugin-dev, superpowers-developing-for-claude-code, superpowers-lab, code-simplifier, claude-code-setup, primeradiant-ops, superpowers-chrome, claude-session-driver, github-triage, summarize-meetings, release-radar, linear, context7, mcp-server-dev, worldview-synthesis, {gopls,rust-analyzer,swift}-lsp
Human partner who reviewed this diff Jesse Vincent (@obra) — maintainer; directed the work and reviewed it throughout, including two adversarial-review rounds
Follow-up model + version Codex (GPT-5)
Follow-up harness + version Codex desktop/API coding-agent environment (harness version not exposed)
Follow-up plugins used GitHub, Codex Security, Browser, Linear, Prime Radiant Ops

What problem are you trying to solve?

The brainstorming visual companion (skills/brainstorming/scripts/) had a cluster of real, separately-reported problems. Grounded in a working session (an apparent reload failure that root-caused to a wedged fseventsd on the dev machine — not a companion bug) plus a triage of open issues/PRs:

What does this PR change?

Hardens and modernizes the visual-companion server and skill: per-session secret-key auth on every endpoint (+ cookie bootstrap, friendly 403, null-payload guard); resource-fork dotfile filtering; ownership-checked stop-server.sh; a 4h configurable idle timeout that closes WebSockets cleanly on shutdown; exponential-backoff reconnect with a live status indicator and a "paused" overlay; same-port + same-key restart so an already-open tab reconnects on its own; opt-in --open browser launch on the first screen; and offering the companion just-in-time (the first time a visual question arises) instead of upfront.

Is this change appropriate for the core library?

Yes. The visual companion is core brainstorming infrastructure shipped with superpowers — every user who accepts the companion benefits, regardless of project type. No new dependencies (the server stays zero-dep), no third-party services. The security fix protects every companion session.

What alternatives did you consider?

  • Security — Host/Origin allowlist (the brainstorm-server: validate websocket origin before upgrade #1110 / fix(brainstorm-server): validate Host header to defeat DNS rebinding #1553 approach). Rejected: it only defends the loopback browser-confused-deputy; a direct client on a --host 0.0.0.0 bind just sends the expected Host, so the allowlist is theater for remote use. A per-session secret authenticates the real client uniformly across loopback, SSH tunnel, and remote binds, and defeats DNS rebinding.
  • Reload reliability — replace fs.watch with polling. Built and tested, then dropped after root-causing the symptom to a wedged fseventsd daemon on the dev machine (a 100%-CPU spin starving FSEvents). fs.watch was fine; the OS daemon needed a kick. No reason to carry polling.
  • Vendoring Alpine.js for interactive mockups (feat(brainstorming): Alpine-driven interactive visual companion (SUP-215) #1639): out of scope; not pursued (keeps the runtime zero-dep).
  • Consent timing — keep the upfront offer. Rejected via eval (see Evaluation) — upfront pitches the companion before the problem is understood.

Does this PR contain multiple unrelated changes?

They are all changes to one subsystem — the brainstorming visual companion (server.cjs, helper.js, frame-template.html, start-server.sh, stop-server.sh, and the companion skill docs) — developed and consolidated as one coherent overhaul. The branch is structured as focused commits, each with its own tests, so it can be split into per-issue PRs if you'd prefer that over a single subsystem PR.

Existing PRs

Environment tested

Harness Harness version Model Model version/ID
Claude Code 2.1.170 Claude Opus claude-opus-4-8 (1M context)
Codex desktop/API version not exposed Codex GPT-5

Original Claude pass: macOS (Darwin 25.2.0), Node v26.0.0.

Follow-up Codex pass: macOS (Darwin 25.5.0 arm64), Node v24.14.0; Windows on ballmer via Git Bash / MINGW64_NT-10.0-26200, Node v26.2.0.

New harness support (required if this PR adds a new harness)

N/A — this PR does not add a new harness.

Evaluation

The one behavior-shaping change (offering the companion just-in-time, in SKILL.md) was eval'd with the evals/ drill harness. A new scenario — brainstorming-companion-just-in-time (pushed to superpowers-evals as add-companion-eval-scenario) — asks the agent to design a dashboard and judges when the companion is offered:

  • RED (dev, upfront-offer text): 0/4 pass — the agent offers the companion as its "very next action," before any clarifying questions.
  • GREEN (this branch, just-in-time text): 4/4 pass — the agent explores and clarifies first; never offers prematurely.
  • The brainstorming skill triggered correctly in all runs, both arms — the change moves the offer without regressing triggering.

Server/client behavior: 114 unit + integration tests on macOS (ws-protocol 32, helper 15, auth 20, server 31, lifecycle 10, start-server 2, stop-server 4), all green. Follow-up Windows Git Bash pass on ballmer: 113 runnable tests green in tests/brainstorm-server (same suite; the lsof-only impostor check is skipped because Git Bash lacks lsof), plus windows-lifecycle.test.sh 12/12 over two 75-second lifecycle windows. Static checks also passed: git diff --check, node --check for server/helper, and shell lint. The branch also survived two rounds of adversarial review (two independent reviewers per round). Both rounds surfaced real bugs that were then fixed with teeth-verified regression tests — including a same-port-restart token mismatch and an uncaught /files/ EISDIR crash that the first test pass had missed.

Rigor

  • Skills change: used eval evidence (drill RED/GREEN above) for the one behavior-shaping change.
  • Tested adversarially, not just the happy path (two adversarial review rounds; each finding fixed with a regression test verified to fail on the pre-fix code).
  • Did not modify carefully-tuned content (Red Flags tables, rationalization lists, "human partner" language). The SKILL.md change only moves when the companion offer happens — it does not touch the Red Flags / rationalization content.

Human review

  • A human has reviewed the COMPLETE proposed diff before submission — Jesse Vincent (@obra), maintainer.

/cc @arittr — requesting your review.

obra added 13 commits June 9, 2026 15:02
On macOS (and ExFAT/SMB volumes) the OS writes ._<name>.html sidecar files
holding binary resource-fork metadata. These end with .html, so they passed the
content filter and could be picked as the newest screen — serving binary garbage
to the browser instead of the mockup — or fetched via /files/.

Skip dotfiles (leading '.') at all four sites that list or serve content:
getNewestScreen, the /files/ endpoint, the known-files seed, and the fs.watch
handler. Tests cover serving (/ and /files/) and the watch path (a ._ file must
not trigger a reload).

Refs #950
stop-server.sh read server.pid and SIGKILL'd that PID with no checks. After a
reboot or PID wraparound the pid file can point at an unrelated, live process —
which we would then kill.

Verify the PID is actually our server (a running 'node ... server.cjs') before
signalling it. If ownership can't be proven, fail closed: remove the stale pid
file and report {status: stale_pid} without killing anything. Real servers still
stop ({status: stopped}); a missing pid file still reports not_running.

Adds stop-server.test.sh covering: an unrelated reused PID is left alone, a real
server is stopped, and a missing pid file.

Refs #1703
…utdown

The companion shut down after only 30 minutes idle — too short for real
brainstorming, where a single question can sit far longer. And shutdown() never
closed upgraded WebSocket sockets, so an open browser connection could keep the
Node process alive after it was supposed to exit.

- Default idle timeout raised to 4 hours, configurable via BRAINSTORM_IDLE_TIMEOUT_MS
  and start-server.sh --idle-timeout-minutes (validated positive integer).
- Reported as idle_timeout_ms in the server-started JSON / server-info.
- shutdown() now destroys all client sockets so the process exits even with an
  open WebSocket.
- Watchdog check interval is configurable (BRAINSTORM_LIFECYCLE_CHECK_MS, default
  60s) so the lifecycle can be tested without minute-long waits.

Adds lifecycle.test.js (configured timeout reported; idle shutdown exits despite
an open WS — teeth-verified; the start-server flag). Wires ws-protocol,
lifecycle, and stop-server suites into npm test.

Closes #1237
Refs #1689
…overlay

The injected client reconnected on a fixed 1s timer with no feedback: if the
laptop slept or the server restarted, the page showed 'Connected' over a dead
socket and silently queued events. And when the server stopped, the user got a
bare connection-refused with no explanation.

helper.js now:
- reconnects with exponential backoff (500ms, doubling, capped at 30s; reset on
  open), with an onerror->close handler, nulls the socket on close, and clears a
  pending timer before scheduling another;
- drives the frame status pill Connected/Reconnecting/Disconnected via a
  --status-color custom property (frame-template.html);
- after ~15s disconnected, shows a self-styled 'Companion paused' overlay
  (tombstone) explaining the companion stopped and will reconnect automatically;
- on recovery from a tombstoned outage (e.g. server restarted on the same port)
  reloads to pick up the restarted server's current screen.

The reconnect-backoff is an exported pure function; helper.test.js unit-tests it
(doubling + cap progression) and asserts the status/tombstone/reconnect wiring.
DOM behaviour is verified live.

Refs #856, #1237
When the companion idle-shuts-down and the agent restarts it, a fresh random
port meant the user's open browser tab pointed at a dead URL. Persist the bound
port per project and prefer it on the next start, so the restarted server comes
up on the same port and the open tab's reconnect just works.

- start-server.sh exports BRAINSTORM_PORT_FILE=<project>/.superpowers/brainstorm/
  .last-port for project sessions (not /tmp).
- server.cjs prefers an explicit BRAINSTORM_PORT, else the recorded port, else
  random; writes the actually-bound port back; and on EADDRINUSE (preferred port
  still in use) falls back to a random port once instead of crashing.

lifecycle.test.js: restart reuses the recorded port; a taken preferred port
falls back to a random one without crashing.

Refs #1237
… screen

When the user approves the visual companion, open their browser automatically the
first time a screen is actually ready to show — rather than at startup (just the
waiting page) or making them open the URL by hand.

Opt-in and gated on approval: off unless BRAINSTORM_OPEN is set (start-server.sh
--open, which the agent passes only after the user agrees to use the companion).
Even then it fires once, and is skipped if a browser is already connected, on a
non-loopback/remote bind, or when headless. Launcher is the platform default
(open / xdg-open / WSL cmd.exe) or BRAINSTORM_OPEN_CMD; best-effort, never fatal.

lifecycle.test.js: opens once on the first screen when approved; does NOT open
without approval.

Closes #755
Refs #759
…lifecycle guidance

Move the companion consent from an upfront, anticipatory offer to the first
moment a question would genuinely be clearer shown than told. If no visual
question ever arises, it's never offered. On approval the agent starts the
server with --open, so the user's browser opens to the first screen — the pop is
tied to that approval, never unsolicited.

Also hardens visual-companion.md: confirming the server is alive (server-info
present, server-stopped absent) before referring to the URL is now a required
step; restart with the same --project-dir reuses the port so the open tab
reconnects on its own (paused overlay while down); idle default corrected to 4h.

NOTE: SKILL.md is behavior-shaping content — this flow change should be
eval-tested (writing-skills adversarial pressure test) before merge.

Refs #1237, #1037
From a two-reviewer adversarial pass:

- [High] EADDRINUSE fallback clobbered the shared .last-port: onListen wrote the
  bound port unconditionally, so a fallback to a random port overwrote the
  preferred port another live session still owns — stranding that session's open
  tab forever. Now persist only when we bound the preferred port (not on
  fallback). The fallback test now asserts .last-port integrity (teeth-verified).

- [Medium] maybeOpenBrowser ran the URL through a shell (exec + JSON.stringify),
  which does NOT neutralize $(...) in a url-host. Platform launchers now use
  execFile with the URL as an argv element (no shell). The operator-set
  BRAINSTORM_OPEN_CMD path stays shell-based (trusted input).

- [Medium] --open was a silent no-op on native Windows (no win32 branch). Added.

- [Medium] helper.js reconnect/status/tombstone had only substring-grep tests.
  Added behavioral tests driving the state machine against a mocked browser:
  Reconnecting+backoff (500->1000->2000), tombstone after the grace period, and
  reload-on-recovery.

- [Low] status pill showed a false 'Connected' before the socket opened; now
  starts 'Connecting…' until onopen.

Not changed (flagged): stop-server.sh's PID-ownership check still matches any
'node ... server.cjs' (narrow residual — a recycled PID onto an unrelated node
server.cjs); robust fix needs fragile cross-platform process introspection.
The node+server.cjs command match (from the adversarial review) still matched any
unrelated node process running a file named server.cjs. When we recorded the
bound port (state/server-info) and lsof is available, additionally require the
PID to be the process actually LISTENING on this session's port — which rules out
a different project's server.cjs / editor task runner that recycled the stale
PID. Falls back to the command match when the port or lsof isn't available.

Test: a 'node server.cjs' process not listening on the recorded port is spared.

Refs #1703
… for security

Records the triage of open issues/PRs touching the brainstorm companion server
and the decision to protect it with a per-session secret key (supersedes the
Host/Origin allowlist approach) so remote-connected users are covered, not just
loopback.
The companion server is reachable by any local browser tab (default loopback
bind) and by any host that can route to it (remote --host bind). It served
screens, files, and accepted event-injecting WebSocket connections with no
authentication, so a malicious browser tab or a direct remote client could read
brainstorm content or inject events that the agent reads as the user's input
(prompt injection into a live session).

Generate a per-session secret token, carry it in the served URL as ?key=, and
mirror it into an HttpOnly SameSite=Strict per-port cookie on first load so
same-origin subresources and the WebSocket handshake authenticate automatically.
Every HTTP request and WebSocket upgrade now requires a valid key (query or
cookie, constant-time compared); unauthenticated requests get a friendly 403
explaining they need the full URL. A secret authenticates the client uniformly
across loopback, tunnel, and remote binds and defeats DNS rebinding, which a
Host/Origin allowlist cannot.

Also guard handleMessage against a null JSON payload that crashed the process.

Tests: new auth.test.js (13 cases) covering the key on /, /files/*, and WS plus
cookie bootstrap and the null-payload guard; server.test.js threads the key;
ws-protocol.test.js + auth.test.js wired into npm test.

Closes #1014
Refs #1110, #1553, #1504
…merge

Integrating the per-session-key auth onto the same branch as the dotfile and
lifecycle work: two tests added after the auth commit opened WebSockets without a
key (server.test.js dotfile-reload, lifecycle.test.js idle-shutdown), which the
auth gate now resets. Pass ?key=/BRAINSTORM_TOKEN in both. Full suite green:
ws-protocol 32, helper 12, auth 13, server 28, lifecycle 7, stop-server 4.
…view

A second adversarial review of the merged branch found that combining the
session-key auth with the feature work created real bugs the (vacuous) tests
missed:

- [Critical] GET /files/ (empty name) resolved to CONTENT_DIR and crashed the
  process with uncaught EISDIR — newly reachable because the query-stripping
  refactor turns /files/?key=... into /files/. Reject non-regular-file names.
- [High] --open opened a KEYLESS url, which the auth gate 403s — the headline
  feature landed on the error page. Open the keyed url.
- [High] Same-port restart regenerated the token (port persisted, token not), so
  the open tab's old cookie 403'd and never reconnected — contradicting the
  documented promise. Persist the token (BRAINSTORM_TOKEN_FILE / .last-token)
  alongside the port.
- [Medium] Token sat in world-readable server-info/server.log (0644 in /tmp).
  umask 077 in start-server.sh + mode 0600 on server-info/.last-token.
- [Medium] touchActivity() ran before the auth check, so unauthenticated requests
  defeated the idle timeout. Count activity only after authorization.
- [Low] COOKIE_NAME embedded the pre-fallback port; derive it from the actual
  bound port (also prevents a cross-server cookie-jar collision on fallback).

Tests added/strengthened (previously passed vacuously): /files/ no-crash; the
auto-open url carries the key and is reachable (200); restart reuses the same key
not just the port; unauthenticated requests don't reset the idle clock.
Full suite green (ws-protocol 32, helper 12, auth 13, server 29, lifecycle 8,
stop-server 4); restart smoke confirms same port+key and old URL -> 200.
Bump the evals submodule (ff3ee83 -> f1ac859) to include the brainstorming
visual-companion just-in-time eval scenario that validates the SKILL.md
consent-move in this PR (GREEN pass / RED fail on Quorum and drill).

Scope: dev's recorded pointer predates the drill->Quorum migration, so this
bump also carries that migration.
@obra

obra commented Jun 10, 2026

Copy link
Copy Markdown
Owner Author

Added the evals submodule bump (ff3ee83f1ac859, commit ff50f01) to this PR per maintainer request — it includes the brainstorming-companion just-in-time eval scenario (validated GREEN pass / RED fail on Quorum and drill). ⚠️ Note: dev's pointer predated the drill→Quorum migration, so the bump also carries that migration — worth a look when reviewing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants