Skip to content

feat(webapp,run-engine,redis-worker): mollifier — burst-buffer to absorb trigger storms#3709

Closed
d-cs wants to merge 160 commits into
mainfrom
mollifier-phase-3
Closed

feat(webapp,run-engine,redis-worker): mollifier — burst-buffer to absorb trigger storms#3709
d-cs wants to merge 160 commits into
mainfrom
mollifier-phase-3

Conversation

@d-cs
Copy link
Copy Markdown
Collaborator

@d-cs d-cs commented May 22, 2026

Summary

The mollifier sits in front of engine.trigger and diverts trigger storms into a Redis buffer when the per-env trigger rate exceeds a configurable threshold. A background drainer materialises buffered entries back into Postgres at a controlled rate. The customer's trigger() call still returns a valid runId synchronously; the run materialises in PG within a sub-second window in healthy operation.

The gate is a Lua sliding-window rate check (per-env). When tripped, it holds for TRIGGER_MOLLIFIER_HOLD_MS and every trigger during the hold goes to the buffer. The drainer is a fair round-robin over orgs → envs with configurable concurrency. Every read and mutate API that touches a run (retrieve, cancel, replay, reschedule, addTags, updateMetadata, realtime stream, etc.) gains a buffered-source branch so the buffered window is invisible to the customer — same response shape, same Zod schema, same dashboard rendering.

Default: TRIGGER_MOLLIFIER_ENABLED=0. Off completely unless explicitly opted in.

Design

The gate (apps/webapp/app/v3/mollifier/mollifierGate.server.ts): per-env sliding window, Lua-atomic INCR + tripped-flag SETEX. Configured by TRIGGER_MOLLIFIER_TRIP_WINDOW_MS, TRIGGER_MOLLIFIER_TRIP_THRESHOLD, TRIGGER_MOLLIFIER_HOLD_MS. Globally short-circuited by TRIGGER_MOLLIFIER_ENABLED before the evaluateGate call so a non-mollifier deployment pays nothing per trigger.

The buffer (packages/redis-worker/src/mollifier/buffer.ts): Redis ZSET per env (mollifier:queue:<envId>) keyed by createdAtMicros, plus a per-run hash (mollifier:entries:<runId>) carrying the snapshot. Idempotency-lookup keys provide trigger-time dedup symmetric with PG's unique constraint. Entries persist until the drainer ACKs (with a 30s post-materialise grace) or FAILs them — no accept-time TTL, since silent eviction would lose runs without a customer-visible signal.

The drainer (apps/webapp/app/v3/mollifier/mollifierDrainer.server.ts + @trigger.dev/redis-worker's MollifierDrainer): polls the buffer, replays each entry through engine.trigger. Handles the cancel-before-PG bifurcation via engine.createCancelledRun (no engine.trigger needed when a buffered cancel landed first). On non-retryable engine errors, writes a SYSTEM_FAILURE PG row via engine.createFailedTaskRun so customers see the failure in their dashboard — and createFailedTaskRun now emits runFailed so the alert pipeline picks the row up.

Read-side fallbacks: every public API route that retrieves a run gains a buffered-source branch. The realtime stream specifically holds the Electric SQL subscription open across the buffered window so useRealtimeRun doesn't see a 404 and bail — when the drainer materialises the PG row, Electric streams the INSERT to the client.

The run span lands in the event store at trigger time, before the gate divert decision. Buffered runs are visible in the trace view immediately rather than only after drain — important for useRealtimeRun and for trigger-and-wait flows where a parent's trace expects the child's span to exist.

Observability: mollifier.decisions{outcome} counter for gate decisions. mollifier.stale_entries.current{envId} gauge for entries sitting longer than TRIGGER_MOLLIFIER_STALE_SWEEP_THRESHOLD_MS (default 5min) — the alertable signal for an offline / falling-behind drainer. mollifier.realtime_subscriptions.buffered{envId} counter for visibility into customers hitting the buffered window with the realtime hook. All emit via the standard OTel meter pipeline.

Customer-visible surface: the API response from a mollified trigger carries a mollifier.queued notice. Otherwise the buffered window is invisible — same response shapes, same dashboard, same SDK behaviour. The list APIs (and dashboard runs list) are eventually consistent and do not surface buffered rows; that visibility is parked for a future global status-bar UX piece.

Test plan

  • Per-feature unit + container tests across every commit (apps/webapp/test/mollifier*.test.ts, internal-packages/run-engine/src/engine/tests/createFailedTaskRun.test.ts, packages/redis-worker/src/mollifier/buffer.test.ts).
  • SDK response shape audit against a buffered run — exercises nine apiClient methods so zodfetch schemas validate against the buffered responses end-to-end. No drift surfaced.
  • Empirical probes across the realtime, cancel-before-PG, and drainer-cancel-bifurcation flows during development. The probes are not committed; the regressions they caught are pinned by the unit/container tests above.
  • Reviewer: run the local dev stack with TRIGGER_MOLLIFIER_ENABLED=1, trigger a burst that exceeds the configured threshold, confirm the dashboard list + run-detail flow matches the non-buffered experience.

d-cs and others added 30 commits May 14, 2026 16:43
Redis-backed burst-smoothing layer behind MOLLIFIER_ENABLED=0 (default).
With the kill switch off, the gate short-circuits on its first env check
and production behaviour is identical to main.

@trigger.dev/redis-worker:
- MollifierBuffer: atomic Lua-backed FIFO with accept / pop / ack /
  requeue / fail + TTL. Per-env queues with HSET entry storage,
  atomic RPOP + status transition, FIFO retry ordering.
- MollifierDrainer: generic round-robin worker with concurrency cap,
  retry semantics, and a stop deadline to avoid livelock on a hung
  handler. Phase 3 will wire the handler to engine.trigger().
- Full testcontainers-backed test suite (21 tests).

apps/webapp:
- evaluateGate cascade-check (kill switch -> org feature flag ->
  shadow mode -> trip evaluator -> mollify / shadow_log / pass_through).
  Dependencies injected for testability; the trip evaluator stub
  returns { divert: false } in phase 1.
- Inserted into RunEngineTriggerTaskService.call() before
  traceEventConcern.traceRun. The mollify branch throws (unreachable
  in phase 1).
- Lazy MollifierBuffer + MollifierDrainer singletons; no Redis
  connection unless MOLLIFIER_ENABLED=1.
- 12 MOLLIFIER_* env vars (all safe defaults) and a mollifierEnabled
  feature flag in the global catalog.
- Drainer booted from worker.server.ts on first import.
- Read-fallback stub for phase 3.
- Gate cascade tests + .env loader so env.server validates in vitest
  workers.

Phase 2 will land the real trip evaluator; phase 3 will activate the
buffer-write + drain path.
…dual-write monitoring + drainer ack loop)

Phase 1 of the trigger-burst smoothing initiative. Adds the A-side trip
evaluator (atomic Lua sliding-window per env) and wires it into the trigger
hot path. When the per-org mollifierEnabled feature flag is on AND the
evaluator says divert, the canonical replay payload is buffered to Redis
(via buffer.accept) AND the trigger continues through engine.trigger —
i.e. dual-write. The drainer pops + acks (no-op handler) to prove the
dequeue mechanism works end-to-end. Operators audit by joining
mollifier.buffered (write) and mollifier.drained (consume) logs by runId.

Buffer primitives hardened:
- accept is idempotent on duplicate runId (Lua EXISTS guard)
- pop skips orphan queue references (entry HASH TTL'd while runId queued)
- fail no-ops on missing entry (no partial FAILED hash leak)
- mollifier:envs set pruned on draining pop, restored on requeue
- 16-row truth-table test enumerates the gate cascade
- BufferedTriggerPayload defines the canonical replay shape Phase 2 will
  use to invoke engine.trigger
- payload hash for audit-equivalence computed off the hot path (in the
  drainer) to avoid CPU during a spike

Regression tests in apps/webapp/test/engine/triggerTask.test.ts pin the
mollifier integration:
- validation throws BEFORE the gate runs (no orphan buffer write on
  rejected triggers)
- mollify dual-write happy path (Postgres + Redis both reflect the run)
- pass_through path does NOT call buffer.accept
- engine.trigger throwing AFTER buffer.accept leaves an orphan
  (documented behaviour — drainer auto-cleans; audit-trail surfaces it)
- idempotency-key match short-circuits BEFORE the gate is consulted
- debounce match produces an orphan (documented behaviour — Phase 2
  must lift handleDebounce upfront before buffer.accept)

Behaviour with MOLLIFIER_ENABLED=0 (default) is byte-identical to main.
With MOLLIFIER_ENABLED=1 and the flag off, only mollifier.would_mollify
logs fire (no buffer state). With the flag on, dual-write activates.

Includes two opt-in *.fuzz.test.ts suites (gated on FUZZ=1) that
randomise operation sequences against evaluateTrip and the drainer to
find timing edges. They are clearly marked TEMPORARY in their headers.
- changeset: drop "deferred" wording — phase-1 actively dual-writes + runs
  the drainer ack loop.
- worker.server.ts: wrap mollifier drainer init in try/catch + register
  SIGTERM/SIGINT handlers so the polling loop stops cleanly on shutdown.
- bufferedTriggerPayload: only serialise idempotencyKeyExpiresAt when an
  idempotencyKey is present (avoid impossible orphan-expiry payloads).
- mollifierTelemetry: narrow recordDecision reason to DecisionReason union
  to keep OTEL attribute cardinality bounded.
- mollifierGate: rename resolveOrgFlag → resolveFlag. The underlying
  FeatureFlag table is global by key, so the "org" prefix was misleading;
  per-org gating is out of scope for phase-1.
- tests: drop vi.fn mocks. mollifierGate now uses plain closure spies;
  mollifierTripEvaluator runs against a real MollifierBuffer backed by a
  redisTest container (closed client exercises the fail-open path).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…stacking

Worker.init() is called per request from entry.server.tsx, so the
process.once SIGTERM/SIGINT pair added in 98c1520 would stack a fresh
listener every request under dev hot-reload (process.once only removes
after firing). Gate registration on a process-global flag, matching the
existing __worker__ pattern.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tion.featureFlags

The mollifier gate's resolveOrgFlag was a global feature-flag lookup
named as if org-scoped. Phase-1 plan and design doc both intended
per-org gating; the implementation regressed because the global
flag() helper has no orgId parameter.

Adopt the existing per-org feature-flag pattern (used by canAccessAi,
canAccessPrivateConnections, compute beta gating): pass
`Organization.featureFlags` through as `flag()` overrides. Per-org
opt-in now works admin-toggleable via the existing
Organization.featureFlags JSON column — no schema migration needed.

- mollifierGate: revert resolveFlag/flagEnabled back to
  resolveOrgFlag/orgFlagEnabled (the name now matches reality).
  GateInputs gains `orgFeatureFlags`; the default resolver passes
  them as overrides to `flag()`.
- triggerTask.server.ts: thread `environment.organization.featureFlags`
  into the gate call.
- tests: three new postgresTest cases exercise the real DB-backed
  resolveOrgFlag end-to-end, proving (a) per-org opt-in isolation,
  (b) unrelated beta flags don't bleed across, (c) per-org overrides
  take precedence over the global FeatureFlag row.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nect

The unit cascade tests in mollifierGate.test.ts import the gate module,
which transitively pulls in ~/db.server. That module constructs the
prisma singleton at import time and eagerly calls $connect(), which
fails against localhost:5432 in the unit-test shard and surfaces as an
unhandled rejection that fails the whole vitest run. Mocking the module
keeps the cascade tests pure and leaves the postgresTest cases on the
testcontainer-fixture prisma untouched.
- Gate drainer init on WORKER_ENABLED so only worker replicas run the polling loop.
- Update the enqueueSystem TTL comment now that delayed/pending-version are first enqueues.
- Correct the mollifier gate docstring to describe the fixed-window counter and tripped-key rearm.
- Swap findUnique for findFirst in the trigger task test to match the webapp Prisma rule.
…eFlags

The gate's `GateInputs` now requires `orgFeatureFlags`, but the surface type used by the trigger service was still the pre-org-scope shape, so the default evaluator wasn't assignable and the call site couldn't pass the flag overrides.
…est startup

The per-org isolation suite uses `postgresTest`, which spins up a fresh Postgres testcontainer per case. On CI the 5s vitest default regularly times out on container start before the test body runs. Match the 30s `vi.setConfig` used by other postgresTest suites in this app.
…rrors

resolveOrgFlag now checks the per-org Organization.featureFlags override
in-memory before falling back to the global flag() helper, so the common
per-org enablement path resolves without a Prisma round-trip on every
trigger call. evaluateGate also wraps the flag resolution in try/catch
and fails open to false on error, mirroring the trip evaluator.
…exit

Pass a configurable timeout to drainer.stop() so SIGTERM/SIGINT can't hang
forever if an in-flight handler is wedged. Matches the precedent set by
BATCH_TRIGGER_WORKER_SHUTDOWN_TIMEOUT_MS (default 30s).
processOneFromEnv now catches buffer.pop() failures so one env's hiccup
doesn't reject Promise.all and bubble up to the loop's outer catch. The
polling loop itself wraps each runOnce in try/catch and backs off with
capped exponential delay (up to 5s) instead of exiting permanently on the
first listEnvs/pop error. Stop semantics are unchanged: only the stopping
flag breaks the loop.

Adds two regression tests using a stub buffer (no Redis container) so
fault injection is deterministic.
The phase-1 scaffolding referenced MollifierBuffer, getMollifierBuffer,
and deserialiseMollifierSnapshot without importing them — CI typecheck
fails with TS2304. The runtime path is gated behind MOLLIFIER_ENABLED=0
so this never produced a runtime symptom, but the types must resolve.
…luator fail-open

The TripDecision header comment claimed each webapp instance maintained
its own rate counter — wrong. evaluateTrip writes to mollifier:rate:\${envId}
with no per-instance prefix, so all replicas pointing at the same Redis
share the key. The threshold is the fleet-wide ceiling.

Also wrap d.evaluator() in evaluateGate in try/catch so a throwing
evaluator falls back to no-divert. The default createRealTripEvaluator
catches its own errors, but the contract should be symmetric with the
already-wrapped resolveOrgFlag call so a future evaluator can't break
the trigger hot path's fail-open contract.
The two notes describe the same PR's behaviour from two angles; merging
them into one entry gives a cleaner changelog line and matches how the
PR is presented to reviewers.
drainer.fuzz.test.ts and evaluateTrip.fuzz.test.ts are valuable as
ongoing property checks but aren't load-bearing for the phase-1 review.
Moving them to a follow-up keeps this PR smaller without losing coverage
of the production paths (buffer.test.ts and drainer.test.ts together
cover the contract surface).
The enqueueSystem.ts comment touch-up was an unrelated drive-by during
phase-1 review and doesn't belong in this PR. Will land separately.
External changelog readers don't have context on internal phase numbering;
describe the feature itself (opt-in burst protection, default-off env vars,
shadow mode, dual-write activation) instead of "phase 1".
…iversion

The previous wording implied the buffer/drainer was active protection;
in this release they're audit-only. Spell out that no trigger calls are
diverted or rate-limited yet, and that active smoothing follows later.
MollifierEvaluateGate and MollifierGetBuffer were defined in the
consumer (triggerTask.server.ts) but described the surface of the gate
and the buffer accessor respectively. Move each to the module that
owns the underlying implementation so the type lives with the producer,
not the caller. No behavioural change.
…-redis fallback

Two operational guards for misconfigured rollouts:

1. Drop the MOLLIFIER_REDIS_* fallback to the main REDIS_* cluster.
   The mollifier writes to a dedicated Redis to keep burst traffic off
   the engine's primary queue — silently colocating with the main Redis
   when MOLLIFIER_REDIS_HOST is unset defeats the design.

2. Degrade gracefully instead of crashing the pod. If MOLLIFIER_ENABLED
   was flipped on without setting MOLLIFIER_REDIS_HOST, the buffer
   returns null (with a one-shot warn log per process) and the drainer
   no-ops. No crash loops, no failed deploys, no traffic impact —
   operators see the warn line and fix the misconfig in a follow-up
   deploy.

The drainer's previously-unreachable "env vars inconsistent" throw
becomes reachable in this degraded mode; replace it with a null return
so worker.server.ts's existing null check short-circuits cleanly.
mollifier:envs is a Redis SET that grows with the count of envs that
currently have buffered entries. Under normal operation that's small,
but an extended drainer outage can leave entries piled up across
thousands of envs — at which point runOnce would queue one
processOneFromEnv per env through pLimit, ballooning per-tick latency
and event-loop queue depth.

Cap per-tick fan-out at MOLLIFIER_DRAIN_MAX_ENVS_PER_TICK (default 500).
When the set fits within the cap, behaviour is unchanged (take all,
rotate cursor by 1 for fairness). When the set exceeds the cap, take a
rotating slice and advance the cursor by the slice size so successive
ticks sweep through the full set.

Tests use a stub buffer to drive listEnvs() deterministically with
thousands of envs without provisioning a real Redis.
…tchQueue

The MollifierDrainer's stop() was polling `isRunning` every 20ms until
the loop exited, which differs from the codebase's convention for
similar polling loops (FairQueue, BatchQueue both hold the loop promise
as a field and await it directly on stop).

Switch to the same pattern: store the loop promise on start(), then in
stop() race it against the timeout via Promise.race. With no timeout we
just await the loop directly. With a timeout the warn-and-return
behaviour is unchanged. No polling, no separate `isRunning` poll loop.

Behaviour is identical to the previous implementation, including the
hung-handler timeout path (covered by the existing
"stop returns after timeoutMs even if a handler is hung" test).
The previous chunking advanced the cursor by sliceSize each tick,
producing fixed disjoint slices like [0..3], [4..7], [0..3], ... With
that pattern env_0 was always at slice position 0 (first into pLimit)
and env_3 always at position 3 (last) — reinstating the head-of-line
bias rotation was meant to prevent.

Advance the cursor by 1 instead. Slices now overlap across consecutive
ticks (e.g. [0..3], [1..4], [2..5], ...) so every env reaches every
slot position 0..sliceSize-1 across one envs.length-tick cycle.

Drainage rate per env is unchanged: each env still appears in exactly
sliceSize of every envs.length ticks. New regression test pins the
fairness property by asserting each env touches every slot at least
once per cycle.
…y envs

Adds a regression test that proves a light env (single buffered entry)
is drained within (envs.length - sliceSize + 1) ticks regardless of how
many entries the heavy envs have queued. The test uses a stub buffer
whose listEnvs/pop pair mirrors the production atomic-Lua semantic: an
env disappears from listEnvs the moment its queue empties, so the light
env exits the rotation as soon as it's popped — while the heavy envs
stay in the rotation until their thousands of entries are drained.

Together with the head-of-line fairness test this pins both fairness
properties: (1) every env touches every slice slot per cycle (no
within-slice bias), and (2) no env's drainage latency depends on the
queue depth of other envs (no across-slice starvation).
…eckedIndexedAccess

The fairness test compared popsPerTick[0][0] vs popsPerTick[1][0]
directly. Under the redis-worker package's strict tsconfig
(noUncheckedIndexedAccess implied), array index access returns T |
undefined, which trips TS2532. Destructure into named locals and use
optional chaining — same assertion, no `\!` non-null soup.
1. start() resets envCursor to 0 — new behaviour. A stop+start cycle now
   begins rotation cleanly from envs[0] rather than inheriting between-
   restart cursor drift.
2. Malformed payload → non-retryable handler error path. Pins that the
   deserialise failure goes terminal without invoking the handler.
3. Ack failure after handler success — documents the current behavioural
   gap. ack() lives inside processEntry's try, so a Redis blip on ack
   routes a successfully-handled entry through the retry/terminal path.
   Phase 2's engine-replay handler will need idempotency to absorb the
   re-execution, OR ack should be lifted out of the try block.
4. start() idempotency — second call is a no-op (no doubled loop).
5. stop() idempotency — safe to call when never started or twice.
6. Loop-level backoff actually grows on consecutive runOnce failures
   and resets on first success. Distinct from per-entry retry attempts
   already covered elsewhere; this is the consecutiveErrors counter
   that drives backoffMs between ticks.

Also adds org-level fairness analogue of the existing env starvation
test: a light org (1 env, 1 entry) is not starved behind a heavy org
with many envs and many entries. The buffer doesn't track orgs as a
separate axis, so org fairness is an emergent property of env rotation
— the test pins that property explicitly.
…el fairness

Previously the drainer rotated per-env: an org with N busy envs got N
scheduling slots per tick. A noisy tenant with many envs would drain
proportionally faster than a quiet tenant with one env. Switch to
hierarchical rotation: pick orgs round-robin (capped by maxOrgsPerTick),
then pick one env per picked org (also rotating).

Implementation is drainer-side only — no buffer or Lua changes. The
drainer caches envId→orgId from popped entries; envs not yet cached are
treated as their own pseudo-org for one tick, so cold start matches the
old per-env behaviour and converges to hierarchical once cache is hot
(usually within one tick). Cache and cursors reset on start() alongside
the existing cursor reset.

API change: maxEnvsPerTick → maxOrgsPerTick on MollifierDrainerOptions,
MOLLIFIER_DRAIN_MAX_ENVS_PER_TICK → MOLLIFIER_DRAIN_MAX_ORGS_PER_TICK on
the webapp env. Same default (500). Operators tune for "typical orgs
with pending entries" rather than env count.

Trade-off: total per-tick pops drop from O(envs) to O(orgs). For an org
with N envs, each env's individual drainage rate is 1/N of what it was,
but the tenant overall is bounded the same way as a single-env tenant —
which is the fairness contract.

Tests:
- Renamed maxEnvsPerTick references throughout existing tests; old
  behaviour still holds at cold cache (each env = pseudo-org).
- New "heavy org with many envs does not dominate vs light org" pins
  the post-warm-up ~1:1 drainage ratio between a 6-env org and a 1-env
  org over a sustained 20-tick run.
- New "within an org, envs are rotated round-robin across ticks" pins
  the inner env cursor's behaviour for a single multi-env org.
- Cursor-reset test renamed and now asserts cache+cursors all reset.

Also removed an outdated test-count comment in
apps/webapp/test/engine/triggerTask.test.ts that listed "four tests"
when reality has moved on.
…uage)

The changeset accreted across the PR's evolution and ended up reading as
three deltas ("now survives", "is now two-level", "no longer scales").
On merge this is the introduction of the feature — there's no prior
state to contrast against. Rewrite as one cohesive description of what
ships.
…rness

Previously the drainer cached envId→orgId from popped entries and used a
sentinel pseudo-org for envs it hadn't seen yet. The sentinel polluted
the bucket map with fake org IDs and was a foreseeable source of bugs.

This commit moves org membership into the buffer's atomic Lua scripts.
New Redis keys, both maintained transactionally alongside per-env queues:
- mollifier:orgs — orgs with at least one queued env
- mollifier:org-envs:${orgId} — envs of that org with queued entries

acceptMollifierEntry SADDs into all three sets (envs + orgs + org-envs).
popAndMarkDraining cleans up envs+orgs+org-envs together when the queue
empties in the success branch (we know orgId from the popped entry). The
no-runId branch can't read orgId so it only cleans envs — stale org-envs
entries are bounded by env count and recovered on the next accept.
requeueMollifierEntry re-SADDs all three since the env may have just been
pruned.

The drainer now walks listOrgs() → listEnvsForOrg(org) → pop(env) with
two cursors: orgCursor across all active orgs and a per-org envCursor
for round-robin within each org. No client-side cache, no sentinel,
deterministic from the first tick.

Tests updated:
- multi-org-round-robin (was multi-env-round-robin): two orgs with one
  and two envs respectively, asserts org_B drains its only env each
  tick while org_A rotates through its two.
- concurrency-cap test spreads 12 envs across 12 orgs (otherwise one
  org → one pop per tick).
- "heavy org doesn't dominate vs light org" gets explicit listOrgs /
  listEnvsForOrg from the test's env→org map; assertion tightened to
  0.7–1.5 ratio over 20 ticks.
- "within an org envs rotated round-robin" gets explicit listEnvsForOrg.
- "envCursor resets" → "rotation cursors reset"; cache is gone, only
  orgCursor and perOrgEnvCursors reset on start().
- makeStubBuffer auto-derives listOrgs/listEnvsForOrg from listEnvs
  (each env as its own org) so tests that don't care about org grouping
  don't need to provide them explicitly.

24/24 drainer tests pass, 35/35 buffer tests pass (some redis-container
flakes under full-suite load; all green in isolation). Webapp typecheck
clean.
@d-cs
Copy link
Copy Markdown
Collaborator Author

d-cs commented May 26, 2026

Closing as I've created split PR's for easier review

@d-cs d-cs closed this May 26, 2026
d-cs added a commit that referenced this pull request May 26, 2026
…ity envId from realtime counter

Three CodeRabbit findings from #3709, re-raised on #3757:

- resources.taskruns.$runParam.debug.ts: buffered fallback returned the
  run's queue / concurrencyKey / queueTimestamp from the snapshot
  without verifying org membership. Any authenticated user who knew a
  friendlyId could read those fields across orgs. Now joins through
  orgMember the same way the PG path does and 404s on miss.
- resources.runs.$runParam.logs.download.ts: same shape — the buffered
  placeholder leaked runId existence to non-members on direct URL
  access. Same orgMember check now gates the buffered branch.
- mollifierTelemetry.server.ts: recordRealtimeBufferedSubscription was
  attaching envId (a UUID) as an OTEL counter dimension, violating the
  project's "no high-cardinality IDs in metric attributes" guideline.
  Dropped the parameter; the call site's logger.info still emits envId.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
d-cs added a commit that referenced this pull request May 26, 2026
…ity envId from realtime counter

Three CodeRabbit findings from #3709, re-raised on #3757:

- resources.taskruns.$runParam.debug.ts: buffered fallback returned the
  run's queue / concurrencyKey / queueTimestamp from the snapshot
  without verifying org membership. Any authenticated user who knew a
  friendlyId could read those fields across orgs. Now joins through
  orgMember the same way the PG path does and 404s on miss.
- resources.runs.$runParam.logs.download.ts: same shape — the buffered
  placeholder leaked runId existence to non-members on direct URL
  access. Same orgMember check now gates the buffered branch.
- mollifierTelemetry.server.ts: recordRealtimeBufferedSubscription was
  attaching envId (a UUID) as an OTEL counter dimension, violating the
  project's "no high-cardinality IDs in metric attributes" guideline.
  Dropped the parameter; the call site's logger.info still emits envId.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
d-cs added a commit that referenced this pull request May 26, 2026
…ity envId from realtime counter

Three CodeRabbit findings from #3709, re-raised on #3757:

- resources.taskruns.$runParam.debug.ts: buffered fallback returned the
  run's queue / concurrencyKey / queueTimestamp from the snapshot
  without verifying org membership. Any authenticated user who knew a
  friendlyId could read those fields across orgs. Now joins through
  orgMember the same way the PG path does and 404s on miss.
- resources.runs.$runParam.logs.download.ts: same shape — the buffered
  placeholder leaked runId existence to non-members on direct URL
  access. Same orgMember check now gates the buffered branch.
- mollifierTelemetry.server.ts: recordRealtimeBufferedSubscription was
  attaching envId (a UUID) as an OTEL counter dimension, violating the
  project's "no high-cardinality IDs in metric attributes" guideline.
  Dropped the parameter; the call site's logger.info still emits envId.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
d-cs added a commit that referenced this pull request May 26, 2026
…ity envId from realtime counter

Three CodeRabbit findings from #3709, re-raised on #3757:

- resources.taskruns.$runParam.debug.ts: buffered fallback returned the
  run's queue / concurrencyKey / queueTimestamp from the snapshot
  without verifying org membership. Any authenticated user who knew a
  friendlyId could read those fields across orgs. Now joins through
  orgMember the same way the PG path does and 404s on miss.
- resources.runs.$runParam.logs.download.ts: same shape — the buffered
  placeholder leaked runId existence to non-members on direct URL
  access. Same orgMember check now gates the buffered branch.
- mollifierTelemetry.server.ts: recordRealtimeBufferedSubscription was
  attaching envId (a UUID) as an OTEL counter dimension, violating the
  project's "no high-cardinality IDs in metric attributes" guideline.
  Dropped the parameter; the call site's logger.info still emits envId.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
d-cs added a commit that referenced this pull request May 26, 2026
…ity envId from realtime counter

Three CodeRabbit findings from #3709, re-raised on #3757:

- resources.taskruns.$runParam.debug.ts: buffered fallback returned the
  run's queue / concurrencyKey / queueTimestamp from the snapshot
  without verifying org membership. Any authenticated user who knew a
  friendlyId could read those fields across orgs. Now joins through
  orgMember the same way the PG path does and 404s on miss.
- resources.runs.$runParam.logs.download.ts: same shape — the buffered
  placeholder leaked runId existence to non-members on direct URL
  access. Same orgMember check now gates the buffered branch.
- mollifierTelemetry.server.ts: recordRealtimeBufferedSubscription was
  attaching envId (a UUID) as an OTEL counter dimension, violating the
  project's "no high-cardinality IDs in metric attributes" guideline.
  Dropped the parameter; the call site's logger.info still emits envId.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
d-cs added a commit that referenced this pull request May 26, 2026
…ity envId from realtime counter

Three CodeRabbit findings from #3709, re-raised on #3757:

- resources.taskruns.$runParam.debug.ts: buffered fallback returned the
  run's queue / concurrencyKey / queueTimestamp from the snapshot
  without verifying org membership. Any authenticated user who knew a
  friendlyId could read those fields across orgs. Now joins through
  orgMember the same way the PG path does and 404s on miss.
- resources.runs.$runParam.logs.download.ts: same shape — the buffered
  placeholder leaked runId existence to non-members on direct URL
  access. Same orgMember check now gates the buffered branch.
- mollifierTelemetry.server.ts: recordRealtimeBufferedSubscription was
  attaching envId (a UUID) as an OTEL counter dimension, violating the
  project's "no high-cardinality IDs in metric attributes" guideline.
  Dropped the parameter; the call site's logger.info still emits envId.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
d-cs added a commit that referenced this pull request May 26, 2026
…ity envId from realtime counter

Three CodeRabbit findings from #3709, re-raised on #3757:

- resources.taskruns.$runParam.debug.ts: buffered fallback returned the
  run's queue / concurrencyKey / queueTimestamp from the snapshot
  without verifying org membership. Any authenticated user who knew a
  friendlyId could read those fields across orgs. Now joins through
  orgMember the same way the PG path does and 404s on miss.
- resources.runs.$runParam.logs.download.ts: same shape — the buffered
  placeholder leaked runId existence to non-members on direct URL
  access. Same orgMember check now gates the buffered branch.
- mollifierTelemetry.server.ts: recordRealtimeBufferedSubscription was
  attaching envId (a UUID) as an OTEL counter dimension, violating the
  project's "no high-cardinality IDs in metric attributes" guideline.
  Dropped the parameter; the call site's logger.info still emits envId.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
d-cs added a commit that referenced this pull request May 26, 2026
…ity envId from realtime counter

Three CodeRabbit findings from #3709, re-raised on #3757:

- resources.taskruns.$runParam.debug.ts: buffered fallback returned the
  run's queue / concurrencyKey / queueTimestamp from the snapshot
  without verifying org membership. Any authenticated user who knew a
  friendlyId could read those fields across orgs. Now joins through
  orgMember the same way the PG path does and 404s on miss.
- resources.runs.$runParam.logs.download.ts: same shape — the buffered
  placeholder leaked runId existence to non-members on direct URL
  access. Same orgMember check now gates the buffered branch.
- mollifierTelemetry.server.ts: recordRealtimeBufferedSubscription was
  attaching envId (a UUID) as an OTEL counter dimension, violating the
  project's "no high-cardinality IDs in metric attributes" guideline.
  Dropped the parameter; the call site's logger.info still emits envId.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
d-cs added a commit that referenced this pull request May 26, 2026
…ity envId from realtime counter

Three CodeRabbit findings from #3709, re-raised on #3757:

- resources.taskruns.$runParam.debug.ts: buffered fallback returned the
  run's queue / concurrencyKey / queueTimestamp from the snapshot
  without verifying org membership. Any authenticated user who knew a
  friendlyId could read those fields across orgs. Now joins through
  orgMember the same way the PG path does and 404s on miss.
- resources.runs.$runParam.logs.download.ts: same shape — the buffered
  placeholder leaked runId existence to non-members on direct URL
  access. Same orgMember check now gates the buffered branch.
- mollifierTelemetry.server.ts: recordRealtimeBufferedSubscription was
  attaching envId (a UUID) as an OTEL counter dimension, violating the
  project's "no high-cardinality IDs in metric attributes" guideline.
  Dropped the parameter; the call site's logger.info still emits envId.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
d-cs added a commit that referenced this pull request May 26, 2026
…ity envId from realtime counter

Three CodeRabbit findings from #3709, re-raised on #3757:

- resources.taskruns.$runParam.debug.ts: buffered fallback returned the
  run's queue / concurrencyKey / queueTimestamp from the snapshot
  without verifying org membership. Any authenticated user who knew a
  friendlyId could read those fields across orgs. Now joins through
  orgMember the same way the PG path does and 404s on miss.
- resources.runs.$runParam.logs.download.ts: same shape — the buffered
  placeholder leaked runId existence to non-members on direct URL
  access. Same orgMember check now gates the buffered branch.
- mollifierTelemetry.server.ts: recordRealtimeBufferedSubscription was
  attaching envId (a UUID) as an OTEL counter dimension, violating the
  project's "no high-cardinality IDs in metric attributes" guideline.
  Dropped the parameter; the call site's logger.info still emits envId.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
d-cs added a commit that referenced this pull request May 26, 2026
…ity envId from realtime counter

Three CodeRabbit findings from #3709, re-raised on #3757:

- resources.taskruns.$runParam.debug.ts: buffered fallback returned the
  run's queue / concurrencyKey / queueTimestamp from the snapshot
  without verifying org membership. Any authenticated user who knew a
  friendlyId could read those fields across orgs. Now joins through
  orgMember the same way the PG path does and 404s on miss.
- resources.runs.$runParam.logs.download.ts: same shape — the buffered
  placeholder leaked runId existence to non-members on direct URL
  access. Same orgMember check now gates the buffered branch.
- mollifierTelemetry.server.ts: recordRealtimeBufferedSubscription was
  attaching envId (a UUID) as an OTEL counter dimension, violating the
  project's "no high-cardinality IDs in metric attributes" guideline.
  Dropped the parameter; the call site's logger.info still emits envId.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
d-cs added a commit that referenced this pull request May 27, 2026
…ity envId from realtime counter

Three CodeRabbit findings from #3709, re-raised on #3757:

- resources.taskruns.$runParam.debug.ts: buffered fallback returned the
  run's queue / concurrencyKey / queueTimestamp from the snapshot
  without verifying org membership. Any authenticated user who knew a
  friendlyId could read those fields across orgs. Now joins through
  orgMember the same way the PG path does and 404s on miss.
- resources.runs.$runParam.logs.download.ts: same shape — the buffered
  placeholder leaked runId existence to non-members on direct URL
  access. Same orgMember check now gates the buffered branch.
- mollifierTelemetry.server.ts: recordRealtimeBufferedSubscription was
  attaching envId (a UUID) as an OTEL counter dimension, violating the
  project's "no high-cardinality IDs in metric attributes" guideline.
  Dropped the parameter; the call site's logger.info still emits envId.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
d-cs added a commit that referenced this pull request May 27, 2026
…ity envId from realtime counter

Three CodeRabbit findings from #3709, re-raised on #3757:

- resources.taskruns.$runParam.debug.ts: buffered fallback returned the
  run's queue / concurrencyKey / queueTimestamp from the snapshot
  without verifying org membership. Any authenticated user who knew a
  friendlyId could read those fields across orgs. Now joins through
  orgMember the same way the PG path does and 404s on miss.
- resources.runs.$runParam.logs.download.ts: same shape — the buffered
  placeholder leaked runId existence to non-members on direct URL
  access. Same orgMember check now gates the buffered branch.
- mollifierTelemetry.server.ts: recordRealtimeBufferedSubscription was
  attaching envId (a UUID) as an OTEL counter dimension, violating the
  project's "no high-cardinality IDs in metric attributes" guideline.
  Dropped the parameter; the call site's logger.info still emits envId.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
d-cs added a commit that referenced this pull request May 27, 2026
…ity envId from realtime counter

Three CodeRabbit findings from #3709, re-raised on #3757:

- resources.taskruns.$runParam.debug.ts: buffered fallback returned the
  run's queue / concurrencyKey / queueTimestamp from the snapshot
  without verifying org membership. Any authenticated user who knew a
  friendlyId could read those fields across orgs. Now joins through
  orgMember the same way the PG path does and 404s on miss.
- resources.runs.$runParam.logs.download.ts: same shape — the buffered
  placeholder leaked runId existence to non-members on direct URL
  access. Same orgMember check now gates the buffered branch.
- mollifierTelemetry.server.ts: recordRealtimeBufferedSubscription was
  attaching envId (a UUID) as an OTEL counter dimension, violating the
  project's "no high-cardinality IDs in metric attributes" guideline.
  Dropped the parameter; the call site's logger.info still emits envId.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
d-cs added a commit that referenced this pull request May 27, 2026
…ity envId from realtime counter

Three CodeRabbit findings from #3709, re-raised on #3757:

- resources.taskruns.$runParam.debug.ts: buffered fallback returned the
  run's queue / concurrencyKey / queueTimestamp from the snapshot
  without verifying org membership. Any authenticated user who knew a
  friendlyId could read those fields across orgs. Now joins through
  orgMember the same way the PG path does and 404s on miss.
- resources.runs.$runParam.logs.download.ts: same shape — the buffered
  placeholder leaked runId existence to non-members on direct URL
  access. Same orgMember check now gates the buffered branch.
- mollifierTelemetry.server.ts: recordRealtimeBufferedSubscription was
  attaching envId (a UUID) as an OTEL counter dimension, violating the
  project's "no high-cardinality IDs in metric attributes" guideline.
  Dropped the parameter; the call site's logger.info still emits envId.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
d-cs added a commit that referenced this pull request May 27, 2026
…ity envId from realtime counter

Three CodeRabbit findings from #3709, re-raised on #3757:

- resources.taskruns.$runParam.debug.ts: buffered fallback returned the
  run's queue / concurrencyKey / queueTimestamp from the snapshot
  without verifying org membership. Any authenticated user who knew a
  friendlyId could read those fields across orgs. Now joins through
  orgMember the same way the PG path does and 404s on miss.
- resources.runs.$runParam.logs.download.ts: same shape — the buffered
  placeholder leaked runId existence to non-members on direct URL
  access. Same orgMember check now gates the buffered branch.
- mollifierTelemetry.server.ts: recordRealtimeBufferedSubscription was
  attaching envId (a UUID) as an OTEL counter dimension, violating the
  project's "no high-cardinality IDs in metric attributes" guideline.
  Dropped the parameter; the call site's logger.info still emits envId.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
d-cs added a commit that referenced this pull request May 27, 2026
…ity envId from realtime counter

Three CodeRabbit findings from #3709, re-raised on #3757:

- resources.taskruns.$runParam.debug.ts: buffered fallback returned the
  run's queue / concurrencyKey / queueTimestamp from the snapshot
  without verifying org membership. Any authenticated user who knew a
  friendlyId could read those fields across orgs. Now joins through
  orgMember the same way the PG path does and 404s on miss.
- resources.runs.$runParam.logs.download.ts: same shape — the buffered
  placeholder leaked runId existence to non-members on direct URL
  access. Same orgMember check now gates the buffered branch.
- mollifierTelemetry.server.ts: recordRealtimeBufferedSubscription was
  attaching envId (a UUID) as an OTEL counter dimension, violating the
  project's "no high-cardinality IDs in metric attributes" guideline.
  Dropped the parameter; the call site's logger.info still emits envId.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
d-cs added a commit that referenced this pull request May 27, 2026
…ity envId from realtime counter

Three CodeRabbit findings from #3709, re-raised on #3757:

- resources.taskruns.$runParam.debug.ts: buffered fallback returned the
  run's queue / concurrencyKey / queueTimestamp from the snapshot
  without verifying org membership. Any authenticated user who knew a
  friendlyId could read those fields across orgs. Now joins through
  orgMember the same way the PG path does and 404s on miss.
- resources.runs.$runParam.logs.download.ts: same shape — the buffered
  placeholder leaked runId existence to non-members on direct URL
  access. Same orgMember check now gates the buffered branch.
- mollifierTelemetry.server.ts: recordRealtimeBufferedSubscription was
  attaching envId (a UUID) as an OTEL counter dimension, violating the
  project's "no high-cardinality IDs in metric attributes" guideline.
  Dropped the parameter; the call site's logger.info still emits envId.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
d-cs added a commit that referenced this pull request May 27, 2026
…ity envId from realtime counter

Three CodeRabbit findings from #3709, re-raised on #3757:

- resources.taskruns.$runParam.debug.ts: buffered fallback returned the
  run's queue / concurrencyKey / queueTimestamp from the snapshot
  without verifying org membership. Any authenticated user who knew a
  friendlyId could read those fields across orgs. Now joins through
  orgMember the same way the PG path does and 404s on miss.
- resources.runs.$runParam.logs.download.ts: same shape — the buffered
  placeholder leaked runId existence to non-members on direct URL
  access. Same orgMember check now gates the buffered branch.
- mollifierTelemetry.server.ts: recordRealtimeBufferedSubscription was
  attaching envId (a UUID) as an OTEL counter dimension, violating the
  project's "no high-cardinality IDs in metric attributes" guideline.
  Dropped the parameter; the call site's logger.info still emits envId.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
d-cs added a commit that referenced this pull request May 27, 2026
…ity envId from realtime counter

Three CodeRabbit findings from #3709, re-raised on #3757:

- resources.taskruns.$runParam.debug.ts: buffered fallback returned the
  run's queue / concurrencyKey / queueTimestamp from the snapshot
  without verifying org membership. Any authenticated user who knew a
  friendlyId could read those fields across orgs. Now joins through
  orgMember the same way the PG path does and 404s on miss.
- resources.runs.$runParam.logs.download.ts: same shape — the buffered
  placeholder leaked runId existence to non-members on direct URL
  access. Same orgMember check now gates the buffered branch.
- mollifierTelemetry.server.ts: recordRealtimeBufferedSubscription was
  attaching envId (a UUID) as an OTEL counter dimension, violating the
  project's "no high-cardinality IDs in metric attributes" guideline.
  Dropped the parameter; the call site's logger.info still emits envId.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
d-cs added a commit that referenced this pull request May 27, 2026
…ity envId from realtime counter

Three CodeRabbit findings from #3709, re-raised on #3757:

- resources.taskruns.$runParam.debug.ts: buffered fallback returned the
  run's queue / concurrencyKey / queueTimestamp from the snapshot
  without verifying org membership. Any authenticated user who knew a
  friendlyId could read those fields across orgs. Now joins through
  orgMember the same way the PG path does and 404s on miss.
- resources.runs.$runParam.logs.download.ts: same shape — the buffered
  placeholder leaked runId existence to non-members on direct URL
  access. Same orgMember check now gates the buffered branch.
- mollifierTelemetry.server.ts: recordRealtimeBufferedSubscription was
  attaching envId (a UUID) as an OTEL counter dimension, violating the
  project's "no high-cardinality IDs in metric attributes" guideline.
  Dropped the parameter; the call site's logger.info still emits envId.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
d-cs added a commit that referenced this pull request May 27, 2026
…ity envId from realtime counter

Three CodeRabbit findings from #3709, re-raised on #3757:

- resources.taskruns.$runParam.debug.ts: buffered fallback returned the
  run's queue / concurrencyKey / queueTimestamp from the snapshot
  without verifying org membership. Any authenticated user who knew a
  friendlyId could read those fields across orgs. Now joins through
  orgMember the same way the PG path does and 404s on miss.
- resources.runs.$runParam.logs.download.ts: same shape — the buffered
  placeholder leaked runId existence to non-members on direct URL
  access. Same orgMember check now gates the buffered branch.
- mollifierTelemetry.server.ts: recordRealtimeBufferedSubscription was
  attaching envId (a UUID) as an OTEL counter dimension, violating the
  project's "no high-cardinality IDs in metric attributes" guideline.
  Dropped the parameter; the call site's logger.info still emits envId.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant