Skip to content

WIP - burst-100k benchmark mvp#118

Open
kisernl wants to merge 28 commits into
masterfrom
one-hundred-k-mvp
Open

WIP - burst-100k benchmark mvp#118
kisernl wants to merge 28 commits into
masterfrom
one-hundred-k-mvp

Conversation

@kisernl
Copy link
Copy Markdown
Collaborator

@kisernl kisernl commented May 13, 2026

This pull request implements the infrastructure and code to support the "100k burst" benchmark, enabling large-scale concurrent benchmarking of compute providers. It introduces a new database schema, coordinator logic, provider configuration, launch automation, and supporting scripts and dependencies. The changes are grouped below by theme.

Database and Schema Setup:

  • Added db/burst-100k.sql, defining the runs and sandbox_results tables and associated indexes for tracking benchmark runs and their results. The schema is idempotent and optimized for status and stuck-run queries.

Coordinator and Benchmark Logic:

  • Added src/burst-100k/coordinator.ts, the main coordinator that orchestrates the benchmark run: validates environment, manages heartbeats, coordinates results writing to Postgres and R2, handles shutdown, and computes final statistics.
  • Added src/burst-100k/providers.ts, defining the opt-in provider(s) (initially "e2b") and their configuration, including environment requirements and concurrency targets.

Automation and Tooling:

  • Added scripts/burst-100k-launch.sh, a robust shell script to provision a Namespace VM, upload the coordinator bundle, apply the schema, and launch the benchmark with proper environment and error handling.
  • Added a comprehensive implementation checklist in one-hundred-k-mvp-checklist.md to track progress and document lessons learned.

Dependency and Build Updates:

  • Updated package.json to add dependencies required for the benchmark (pg, p-limit, esbuild, @types/pg) and new scripts for bundling and running the coordinator locally. [1] [2]

Database and Schema:

  • Introduced db/burst-100k.sql with runs and sandbox_results tables, supporting indexes, and idempotent schema evolution for tracking large-scale benchmark runs.

Coordinator and Providers:

  • Implemented src/burst-100k/coordinator.ts to manage the lifecycle of a benchmark run, including environment validation, progress tracking, error handling, and finalization.
  • Defined provider opt-in and configuration in src/burst-100k/providers.ts, starting with the "e2b" provider and supporting extensibility.

Automation and Documentation:

  • Added scripts/burst-100k-launch.sh to automate VM provisioning, schema setup, bundle upload, and detached benchmark execution on Namespace.
  • Created one-hundred-k-mvp-checklist.md to document implementation steps, operational notes, and onboarding for additional providers.

Build and Dependency Management:

  • Updated package.json with new dependencies and scripts for building and running the burst-100k coordinator. [1] [2]

Summary by CodeRabbit

  • New Features

    • Added a 100k concurrent sandbox burst benchmark system for performance testing across providers.
  • Documentation

    • Added planning guide, implementation checklist, and data inventory documentation for the benchmark.
  • Chores

    • Added npm build scripts and dependencies (p-limit, pg, esbuild).
    • Added database schema and launch script for benchmark orchestration.
    • Updated .gitignore to exclude build artifacts.

Review Change Stack


Note

Medium Risk
Introduces new long-running benchmark orchestration that provisions remote VMs, writes to Postgres, and streams artifacts to S3/Tigris; errors could impact benchmark reliability and data integrity but changes are largely additive and isolated from existing benchmarks.

Overview
Adds a new opt-in 100k “burst” benchmark path under src/burst-100k/, including a coordinator, provider registry, structured logging, and a ramped concurrency runner that records per-sandbox results (including first_command_ms and provider metadata).

Introduces durable persistence and analytics outputs: an idempotent Postgres schema in db/burst-100k.sql (runs + sandbox_results, heartbeat + stuck-run index) plus a Tigris sink that streams raw.jsonl via multipart upload and periodically overwrites heartbeat.json, metrics.jsonl, coordinator.log, and a rich meta.json summary.

Adds scripts/burst-100k-launch.sh to bundle and launch the coordinator on a fresh Namespace --bare VM (installs node, uploads a self-deleting startup script to safely pass env/secrets, inserts the runs row before handoff, and verifies the process started). Also updates tooling (esbuild, pg, p-limit) and ignores dist/.

Reviewed by Cursor Bugbot for commit b261e82. Bugbot is set up for automated code reviews on this repo. Configure here.

kisernl added 2 commits May 13, 2026 09:59
Adds opt-in 100k-sandbox burst benchmark module alongside the daily
~100-burst path. Includes the design plan + implementation checklist,
idempotent Postgres schema, and a coordinator (types, e2b provider,
pg/R2 sinks, p-limit ramp runner) bundled via esbuild. Local smoke
validated against e2b/Neon/R2; launch script + workflow are next.
scripts/burst-100k-launch.sh provisions a Namespace VM, applies the
schema, uploads the bundled coordinator, records the run, and starts
the coordinator detached. Uses --bare/--cidfile for nsc, installs
nodejs via apk, passes env via an uploaded chmod-600 startup script
(printf %q-quoted) that self-destructs after detaching node, and
pgrep-verifies the hand-off succeeded.

Validated end-to-end at N=10 against e2b/Neon/R2.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 13, 2026

Browser Benchmark Results

# Provider Score Create Connect Navigate Release Total Status
1 Browserbase 93.8 0.21s 0.11s 0.18s 0.13s 0.63s 10/10
2 Kernel 93.4 0.05s 0.39s 0.14s 0.06s 0.72s 10/10
3 Hyperbrowser 89.5 0.25s 0.47s 0.22s 0.10s 1.12s 10/10
4 Steel 79.0 0.17s 0.63s 0.12s 0.12s 1.17s 10/10

View full run · SVG available as build artifact

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 13, 2026

Sandbox Benchmark Results

Sequential

# Provider Score Median TTI P95 P99 Status
1 tensorlake 97.5 0.21s 0.32s 0.32s 10/10
2 daytona 96.0 0.21s 0.68s 0.68s 10/10
3 upstash 95.3 0.38s 0.60s 0.60s 10/10
4 blaxel 94.5 0.50s 0.61s 0.61s 10/10
5 archil 93.9 0.54s 0.71s 0.71s 10/10
6 e2b 92.8 0.52s 1.03s 1.03s 10/10
7 vercel 84.5 0.75s 2.75s 2.75s 10/10
8 runloop 84.0 1.36s 1.95s 1.95s 10/10
9 modal 82.8 1.57s 1.96s 1.96s 10/10
10 cloudflare 80.8 1.82s 2.08s 2.08s 10/10
11 hopx 80.4 1.67s 2.39s 2.39s 10/10
12 namespace 74.4 1.71s 3.84s 3.84s 10/10
13 codesandbox 0.0 0.00s 0.00s 0.00s 0/10
14 declaw 0.0 0.00s 0.00s 0.00s 0/10

Staggered

# Provider Score Median TTI P95 P99 Status
1 tensorlake 98.0 0.19s 0.21s 0.21s 10/10
2 daytona 96.4 0.23s 0.56s 0.56s 10/10
3 upstash 96.1 0.37s 0.41s 0.41s 10/10
4 archil 95.8 0.35s 0.53s 0.53s 10/10
5 blaxel 95.0 0.50s 0.52s 0.52s 10/10
6 e2b 93.9 0.52s 0.74s 0.74s 10/10
7 vercel 92.0 0.64s 1.03s 1.03s 10/10
8 hopx 83.8 1.51s 1.79s 1.79s 10/10
9 modal 83.1 1.55s 1.91s 1.91s 10/10
10 runloop 82.5 1.58s 2.00s 2.00s 10/10
11 namespace 82.1 1.69s 1.94s 1.94s 10/10
12 cloudflare 80.9 1.66s 2.27s 2.27s 10/10
13 codesandbox 0.0 0.00s 0.00s 0.00s 0/10
14 declaw 0.0 0.00s 0.00s 0.00s 0/10

Burst

# Provider Score Median TTI P95 P99 Status
1 tensorlake 97.3 0.25s 0.29s 0.29s 10/10
2 daytona 96.8 0.21s 0.48s 0.48s 10/10
3 upstash 95.4 0.43s 0.49s 0.49s 10/10
4 archil 95.2 0.37s 0.64s 0.64s 10/10
5 blaxel 94.7 0.52s 0.54s 0.54s 10/10
6 e2b 93.2 0.57s 0.85s 0.85s 10/10
7 vercel 92.9 0.67s 0.77s 0.77s 10/10
8 modal 82.8 1.55s 1.98s 1.98s 10/10
9 hopx 79.6 2.01s 2.08s 2.08s 10/10
10 cloudflare 78.8 1.81s 2.59s 2.59s 10/10
11 namespace 78.5 1.91s 2.51s 2.51s 10/10
12 runloop 65.7 3.18s 3.82s 3.82s 10/10
13 codesandbox 0.0 0.00s 0.00s 0.00s 0/10
14 declaw 0.0 0.00s 0.00s 0.00s 0/10

View full run · SVGs available as build artifacts

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 13, 2026

Storage Benchmark Results

1MB Files

# Provider Score Download Throughput Upload Status
1 Tigris 95.3 0.06s 141.8 Mbps 0.11s 1000/1000
2 Cloudflare R2 94.8 0.12s 70.2 Mbps 0.16s 1000/1000

4MB Files

# Provider Score Download Throughput Upload Status
1 Cloudflare R2 95.3 0.15s 228.0 Mbps 0.26s 1000/1000
2 Tigris 94.6 0.23s 147.2 Mbps 0.35s 1000/1000

10MB Files

# Provider Score Download Throughput Upload Status
1 Cloudflare R2 95.1 0.26s 325.7 Mbps 0.59s 1000/1000
2 Tigris 93.8 0.56s 150.3 Mbps 0.51s 1000/1000

16MB Files

# Provider Score Download Throughput Upload Status
1 Cloudflare R2 95.0 0.37s 362.7 Mbps 0.63s 1000/1000
2 Tigris 93.1 0.84s 158.9 Mbps 0.54s 1000/1000

View full run · SVGs available as build artifacts

kisernl added 6 commits May 13, 2026 14:12
Same S3-compatible API, different provider. Renames sinks/r2.ts →
sinks/tigris.ts (R2Sink → TigrisSink), env vars R2_* → TIGRIS_STORAGE_*,
and the runs.r2_prefix column → tigris_prefix. Also fixes launch.sh's
pgrep false-negative (now retries up to 10s and matches against
coordinator.cjs) and updates the plan doc to reflect Tigris and the
current --bare/--cidfile nsc flags.

Validated end-to-end at N=10: 10/10 sandboxes ok, all three Tigris
objects (raw.jsonl, heartbeat.json, meta.json) written.
Coordinator now reads $COORDINATOR_LOG_PATH (set to /root/run.log by
launch.sh) and pushes its own stdout/stderr to Tigris on every
heartbeat and at shutdown. Closes the "logs die with the VM" gap.
Local runs skip silently when the env var is unset.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 13, 2026

📝 Walkthrough

Walkthrough

This pull request implements a complete "100k burst" benchmark system that orchestrates 100,000 concurrent sandbox creation requests, persists results to Postgres and Tigris, and decouples long-running work from GitHub Actions via SSH to a Namespace VM.

Changes

100k Burst Benchmark Implementation

Layer / File(s) Summary
Planning documentation and database schema
one-hundred-k-mvp-plan.md, one-hundred-k-mvp-checklist.md, one-hundred-k-mvp-data-inventory.md, db/burst-100k.sql, .gitignore
Documents the burst-100k design, goals, architecture, and operational model. Defines Postgres runs table (with heartbeat/stuck-run indexes) and sandbox_results table (composite PK, status enum, HTTP/error fields, foreign key to runs) for durable result capture. Updates repo configuration to ignore the build output directory.
Type definitions and provider registry
src/burst-100k/types.ts, src/burst-100k/providers.ts
Defines BurstProviderConfig (concurrency/ramp/timeout), SandboxResult (timing/latency/status/error), ProgressStats, and FinalStats types. Exports opt-in provider registry for e2b, modal, runloop with getProvider(name) lookup and required env var gates.
Concurrency-ramped task runner
src/burst-100k/runner.ts
Implements runBurst that linearly ramps sandbox-creation requests over a concurrency-limited pool, measures per-request latency, classifies failures (timeout/http_error/network_error), and triggers non-blocking cleanup via sandbox.destroy().
Postgres and Tigris persistence
src/burst-100k/sinks/postgres.ts, src/burst-100k/sinks/tigris.ts
Implements batched Postgres sink for runs/sandbox_results with heartbeat and lifecycle updates. Implements S3-compatible Tigris sink for streaming JSONL results, periodic heartbeat snapshots, final metadata, and coordinator logs via multipart upload.
Coordinator main loop and orchestration
src/burst-100k/coordinator.ts
Wires provider/Postgres/Tigris from environment, validates required env vars, runs burst orchestrator with streaming writes and periodic heartbeats, computes final latency distributions (p50/p99/histogram), and manages SIGTERM/SIGINT shutdown with log uploads and failure recording.
Build configuration and launch orchestration
package.json, scripts/burst-100k-launch.sh
Adds npm scripts for bundling coordinator to CJS via esbuild. Implements launch script that validates env, bundles coordinator, applies Postgres schema idempotently, provisions Namespace VM, uploads bundle, generates startup script with quoted env forwarding, records run in Postgres, verifies process startup, and surfaces operational next-steps.

Sequence Diagram(s)

sequenceDiagram
  participant Main as Coordinator Main
  participant Postgres as PostgresSink
  participant Tigris as TigrisSink
  participant Runner as runBurst
  participant Compute as Provider Compute
  
  Main->>Postgres: bootstrap(provider, commit_sha, instance_id, tigris_prefix)
  Main->>Tigris: initialize with config
  Main->>+Runner: runBurst with callbacks
  
  loop Ramp & Concurrency
    Runner->>Compute: createSandbox (ramped over rampSeconds)
    Compute-->>Runner: SandboxResult (latency, status, timing)
    Runner->>Tigris: writeResult (streaming JSONL)
    Runner->>Postgres: write (buffered)
    Runner->>Main: onProgress callback
  end
  
  Runner-->>-Main: completion
  Main->>Postgres: flush pending results
  Main->>Postgres: complete(final stats with p50/p99)
  Main->>Tigris: writeMeta (full distribution, final stats)
  Main->>Tigris: writeLog (coordinator log)
  Main->>Tigris: close (await multipart upload)
  Main->>Postgres: close connection
  
  Note over Main: SIGTERM/SIGINT gracefully flushes and exits
Loading
sequenceDiagram
  participant Script as burst-100k-launch.sh
  participant Build as esbuild
  participant DB as Postgres
  participant NSC as nsc CLI
  participant VM as Namespace VM
  participant Coordinator as coordinator process
  
  Script->>Script: validate required env vars
  Script->>Build: npm run bundle:burst-100k
  Build-->>Script: dist/burst-100k.cjs
  Script->>DB: psql (apply burst-100k.sql)
  Script->>NSC: nsc create (provision VM)
  NSC-->>Script: instance_id
  Script->>DB: INSERT runs row (ON CONFLICT DO NOTHING)
  Script->>VM: upload dist/burst-100k.cjs
  Script->>VM: generate /root/start.sh (with quoted env)
  Script->>VM: chmod +x /root/start.sh
  Script->>VM: execute /root/start.sh detached
  Script->>Script: wait & retry pgrep coordinator.cjs
  Coordinator->>VM: run (heartbeat, write results)
  Script-->>Script: print OK + operational commands
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 A burst of a hundred thousand dreams,
Racing through concurrency streams,
Postgres and Tigris hold the tale,
Heartbeats and latencies never fail—
From humble laptop to Namespace VM,
The benchmark hops with grace and vim! 🚀

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 10.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'WIP - burst-100k benchmark mvp' is vague and uses non-descriptive terms ('WIP', 'mvp') without conveying the primary technical change or scope of the extensive changeset. Replace with a more specific title summarizing the main change, e.g., 'Add burst-100k benchmark infrastructure with coordinator, schema, and launch automation' or similar.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch one-hundred-k-mvp

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (1)
src/burst-100k/runner.ts (1)

54-77: ⚡ Quick win

Consider logging errors from onResult callback.

Line 68 swallows errors from callbacks.onResult(result) to prevent a single write failure from aborting the entire burst. However, silent failures make debugging difficult if results are not being written.

Consider logging caught errors:

try { await callbacks.onResult(result); } catch (e: any) { 
  console.error('[runner] onResult failed:', e?.message ?? e);
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/burst-100k/runner.ts` around lines 54 - 77, In the finally block where
the code awaits callbacks.onResult(result) inside runner.ts, change the empty
catch that swallows errors to log the caught error details (message and stack if
present) so write failures are visible; specifically wrap the await
callbacks.onResult(result) so its catch logs something like a clear prefix (e.g.
"[runner] onResult failed:") plus e?.message and e?.stack (or the error object)
and then continue to swallow to avoid aborting the burst.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@db/burst-100k.sql`:
- Around line 15-17: Stuck-run detection currently only checks last_heartbeat
age and therefore misses rows with status='running' where last_heartbeat IS
NULL; update the detection logic to treat NULL heartbeats as stale by including
rows WHERE status = 'running' AND (last_heartbeat IS NULL OR last_heartbeat <
now() - interval '...') so that runs that never emitted a heartbeat are flagged;
apply the same change to the other occurrence that references last_heartbeat and
status (the second block mentioned around lines 29-32).

In `@one-hundred-k-mvp-checklist.md`:
- Around line 12-14: Update the checklist and any related docs to remove or
replace legacy R2/static-token steps with the new Tigris/OIDC flow: search for
and update references to "R2", "sinks/r2.ts", "NSC_TOKEN", and any checklist
items that suggest running aws s3 round-trips or static token setup, and replace
them with instructions that validate Tigris connectivity and OIDC-based auth
(e.g., OIDC trust configuration and runtime token exchange), making the guidance
consistent across lines noted (around items previously at 12–14, 35, 45, 50–52,
81, 104).

In `@one-hundred-k-mvp-plan.md`:
- Line 60: Several fenced code blocks are unlabeled and trigger markdownlint
MD040; update each unlabeled triple-backtick fence that surrounds the snippets
with an appropriate language identifier (e.g., use ```text) so the blocks become
labeled. Specifically, change the fences around the blocks containing the
strings "src/burst-100k/", "GitHub Action (workflow_dispatch / schedule)",
"s3://<bucket>/<run_id>/", and "GitHub Secrets" to use a language tag (e.g.,
text), and apply the same fix to the other unlabeled fences that contain the
same or similar snippets referenced in the comment (the other occurrences noted
in the review). Ensure every opening ``` has a language identifier and the
corresponding closing ``` remains.
- Around line 135-136: Update the example options array so it accurately
reflects current opt-in workflow providers: replace or augment the existing
entry "options: [e2b, modal, daytona, codesandbox]" to include "runloop" (i.e.,
ensure the list contains runloop alongside e2b, modal, daytona, codesandbox) and
keep the accompanying comment about only providers opted in; make the change
where the options line appears in the document.

In `@scripts/burst-100k-launch.sh`:
- Around line 71-76: The INSERT into the runs table uses unescaped shell
variable interpolation (RUN_ID, PROVIDER, GITHUB_SHA, INSTANCE_ID,
TIGRIS_STORAGE_BUCKET) which can cause SQL injection or syntax errors; change
the psql invocation to use parameterized psql variables (e.g. \set or --set) or
psql's parameter binding (psql ... -c "INSERT ... VALUES (:v1,:v2,...)") and
pass the shell values via --set / -v to safely substitute them, or alternatively
shell-escape each variable before embedding; update the block that builds the
SQL string where RUN_ID, PROVIDER, GITHUB_SHA, INSTANCE_ID and
TIGRIS_STORAGE_BUCKET are referenced so values are passed as psql parameters
instead of raw interpolation.

In `@src/burst-100k/coordinator.ts`:
- Around line 30-35: The code sets provider.concurrencyTarget =
parseInt(override, 10) without validating the result; update the override
handling so after calling parseInt(override, 10) you check for NaN (e.g.,
Number.isNaN or !Number.isFinite) and handle invalid input by either falling
back to the original provider.concurrencyTarget or exiting with a clear error
log; update the console.log to reflect the validated integer (or the fallback)
and reference the exact symbols process.env.CONCURRENCY_TARGET,
parseInt(override, 10) and provider.concurrencyTarget when making the change.

In `@src/burst-100k/sinks/postgres.ts`:
- Around line 38-47: The timed flush handler in write (references: write,
flushTimer, flush, buffer, BATCH_SIZE, BATCH_TIMEOUT_MS) swallows errors via
.catch and only logs them, risking silent data loss; modify the handler to track
consecutive failures (e.g., a failureCounter on the PostgresSink instance),
increment it on each flush rejection and emit a warning or call a provided error
callback / trigger run-failure once the counter exceeds a threshold N, and reset
the counter on a successful flush; alternatively allow the error to propagate by
rethrowing from the timer callback or invoking the sink's error handler so the
coordinator can react.

---

Nitpick comments:
In `@src/burst-100k/runner.ts`:
- Around line 54-77: In the finally block where the code awaits
callbacks.onResult(result) inside runner.ts, change the empty catch that
swallows errors to log the caught error details (message and stack if present)
so write failures are visible; specifically wrap the await
callbacks.onResult(result) so its catch logs something like a clear prefix (e.g.
"[runner] onResult failed:") plus e?.message and e?.stack (or the error object)
and then continue to swallow to avoid aborting the burst.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 5678f203-d2c0-455d-b181-aab0a46e1b9e

📥 Commits

Reviewing files that changed from the base of the PR and between 8205b11 and 63c19b3.

⛔ Files ignored due to path filters (1)
  • package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (13)
  • .gitignore
  • db/burst-100k.sql
  • one-hundred-k-mvp-checklist.md
  • one-hundred-k-mvp-data-inventory.md
  • one-hundred-k-mvp-plan.md
  • package.json
  • scripts/burst-100k-launch.sh
  • src/burst-100k/coordinator.ts
  • src/burst-100k/providers.ts
  • src/burst-100k/runner.ts
  • src/burst-100k/sinks/postgres.ts
  • src/burst-100k/sinks/tigris.ts
  • src/burst-100k/types.ts

Comment thread db/burst-100k.sql
Comment thread one-hundred-k-mvp-checklist.md
Comment thread one-hundred-k-mvp-plan.md
Comment thread one-hundred-k-mvp-plan.md
Comment thread scripts/burst-100k-launch.sh
Comment thread src/burst-100k/coordinator.ts
Comment thread src/burst-100k/sinks/postgres.ts
kisernl added 5 commits May 14, 2026 14:06
Coordinator tallies per-status counts during the burst and writes them
to new columns on runs (timeouts, http_errors, network_errors) plus
an error_histogram object in Tigris meta.json. Schema migration is
idempotent (ALTER TABLE ADD COLUMN IF NOT EXISTS), so re-running the
launch script catches up existing DBs.
Coordinator now tracks every sandbox's start/end timestamps and builds
an interval-overlap sweep at run-end. Writes concurrency_summary
(peak_concurrent, peak_t_ms, mean_concurrent, total_run_ms) and a
1Hz concurrency_timeline to Tigris meta.json. Lets us tell whether
the ramp actually behaved and where the burst saturates.
Runner reflects every primitive prop off the adapter's returned
sandbox object (skipping credential-shaped keys) and stores the
result as a JSONB column on sandbox_results and as a field in
Tigris raw.jsonl. Verified on e2b and runloop — both expose
{ provider, sandboxId }, which lets us cross-reference against
provider dashboards. Schema migration is idempotent.
Coordinator samples every 5s (process CPU, memory, event-loop lag
percentiles, load averages, /proc/self/fd count, /proc/net/sockstat)
into <run_id>/metrics.jsonl. Uploaded on every 30s heartbeat for
partial-result durability plus a final flush at shutdown. Headline
peaks land in meta.json.metrics_summary for at-a-glance review.
Adds a small logger module (ISO-timestamped, level-tagged lines with
phase markers) and replaces ad-hoc console.* calls throughout the
coordinator and runner. Per-sandbox events are sampled at high N
(pickSamplingPeriod) so coordinator.log stays bounded — every sandbox
at N<=1000, ~100 sampled + every error at higher N. Adds milestone
progress lines with rate/ETA every ~10% of work done.
Comment thread scripts/burst-100k-launch.sh
Comment thread src/burst-100k/coordinator.ts
Comment thread src/burst-100k/runner.ts
Comment thread src/burst-100k/coordinator.ts
kisernl added 3 commits May 15, 2026 16:52
Removes pickSamplingPeriod() so every sandbox gets a [ok]/[error] line
in coordinator.log regardless of N. At full 100k this produces a
~14 MB log file (still cheap to upload + store via the existing
heartbeat-cadence Tigris flush). Trade-off documented in the data
inventory.
Comment thread src/burst-100k/sinks/postgres.ts
Runner now runs `node -v` after each successful sandbox.create() and
records the two phases separately. SandboxResult gains
first_command_ms; Postgres sandbox_results gets a matching nullable
column (idempotent ALTER). meta.json adds first_command_distribution
and tti_distribution alongside the existing (allocate-only)
latency_distribution. Mirrors the daily benchmark's readiness check
so numbers are directly comparable.
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes using default mode and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit b261e82. Configure here.

Comment thread src/burst-100k/coordinator.ts
kisernl added 3 commits May 20, 2026 18:27
Remove the 60s linear ramp from the 100k burst — all sandbox-create
requests now go out as fast as the event loop dispatches them. The
ramp was hiding the very provider-side overload behaviour we want to
measure. Drops `rampSeconds` from BurstProviderConfig, renames the
meta.json `ramp_segments` bucket to `submission_segments` (idx now
reflects event-loop submission order, not ramp position), and removes
the now-stale `ramp_seconds_configured` field from concurrency_summary.
…ncurrency

The old runner destroyed each sandbox as soon as its create+readiness check
returned, so peak concurrency was bounded by per-sandbox lifetime, not by
the provider's actual capacity to hold N sandboxes simultaneously. This
made the headline number a measure of churn, not concurrency.

Reshape the runner into two phases:
  1. parallel create + `node -v` readiness; survivors stay alive
  2. after all phase-1 tasks settle, run a final `node -v` liveness probe
     against every survivor, then destroy

Replace the 'ok | timeout | http_error | network_error' status with a
four-state lifecycle taxonomy:
  - success          created, readiness passed, alive at end-of-test
  - partial          created, readiness passed, died before end-of-test
  - readiness_failed created, but first `node -v` never returned
  - failed           sandbox.create() errored

Move the timeout/http_error/network_error sub-classification into a new
`failure_class` column so it works across any non-success status. Bump
sandboxOptions.timeoutMs to 30 min on providers that support it so they
don't auto-destroy mid-burst. Schema updated idempotently: CHECK
constraints swapped to the new values (NOT VALID for back-compat),
`failure_class` added to sandbox_results, `partials` + `readiness_failures`
added to runs.

Verified end-to-end with a 100-sandbox modal smoke run: 100/100 success,
peak_concurrent=100 (vs. the old model where peak depended on destroy
timing), Postgres + Tigris meta.json both write the new shape.
…d watch scripts

Add scripts/burst-100k-launch-multi.ts (provisions one Namespace VM
per provider via launch.sh, defaults to all 5 × 1000) and
scripts/burst-100k-watch.ts (polls Postgres for one-or-more RUN_IDs
until all reach a terminal state). Both wired up as npm scripts.
kisernl and others added 7 commits May 22, 2026 14:25
Adds an alternate way to run the burst when a single VM can't hold the
target concurrency (file descriptors, NIC queue depth, event-loop lag).
The sharded launcher spawns N namespace VMs in parallel, each firing
total/N sandboxes at t=0, all tagged with a shared group_id. An
aggregator collapses the per-shard rows back into the same metrics
shape an unsharded burst produces.

  npm run bench:burst-100k:sharded -- --provider e2b --total 100000 --vms 20
  npm run bench:burst-100k:aggregate -- --recent

Persistence

* runs gains group_id / shard_index / shard_count columns so shards
  in a group are queryable as a unit.
* New run_groups table holds one row per group with the aggregate
  scalars + full meta.json (JSONB). Mirrors runs' columns so dashboards
  can union per-VM and per-group views.
* Aggregator also uploads the full meta.json to
  s3://<bucket>/groups/<group_id>/meta.json. Both writes are on by
  default; opt out with --no-pg / --no-tigris.

Sharding mechanics

* burst-100k-launch-sharded.ts validates total % vms == 0, generates a
  group_id, and spawn()s N burst-100k-launch.sh children in parallel
  with per-child stdout prefixed [sNN]. Schema is applied once up-front
  and children get SKIP_SCHEMA=1 — CREATE TABLE/INDEX IF NOT EXISTS
  isn't race-safe under parallel applies, and Neon's -pooler endpoint
  breaks session-level advisory locks.
* burst-100k-launch.sh forwards GROUP_ID / SHARD_INDEX / SHARD_COUNT to
  the VM startup script and the pre-handoff INSERT.
* Coordinator reads the shard env, threads it through pg.bootstrap, and
  tags its own Tigris meta.json with the group fields.

Single-VM runs (bench:burst-100k:local, :multi) are unchanged — every
new field is optional.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants