Skip to content

feat(ipc): async server handlers + server-side per-fork ordering#24210

Closed
charlielye wants to merge 13 commits into
cl/ipc-wsdb-concurrent-serverfrom
cl/ipc-wsdb-server-ordering
Closed

feat(ipc): async server handlers + server-side per-fork ordering#24210
charlielye wants to merge 13 commits into
cl/ipc-wsdb-concurrent-serverfrom
cl/ipc-wsdb-server-ordering

Conversation

@charlielye

@charlielye charlielye commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

What

Moves request ordering from the client into the database. Previously each TS client serialized its own reads/writes per fork (WorldStateOpsQueue); a second async client could corrupt state by skipping that contract. This makes the wsdb server own per-fork ordering, built on an async server-handler codegen pattern adopted symmetrically across all four languages.

Stacked on #24198 (concurrent reactor).

Codegen — async server handlers (C++, Rust, Zig; TS already async)

The generated server dispatch hands each handler the decoded command plus a respond callback, instead of returning a value. A handler may run inline or defer to a thread pool and call respond when ready. The wire protocol is unchanged, so the cross-language interop matrix and golden fixtures are untouched.

  • C++: void handle_x(Ctx&, Cmd&&, Responder<Resp>); make_*_handler returns void(span, RawRespond); serve() drives it via run_reactor.
  • Rust: Handler trait methods take Responder<Resp>; handle_request collects synchronously so the runtime/echo glue is unchanged.
  • Zig: dispatch hands handlers a Responder (ok/err); handleRequest unchanged.
  • TS: already Promise-returning — no change.

Each language's echo server updated to the new shape (~6 handlers each).

wsdb — server-side per-fork ordering

Each of the 40 handlers declares its own ordering by calling schedule_read (reads run concurrently; committed reads bypass ordering) or schedule_write (exclusive on its fork), extracting its fork from the typed command. WsdbScheduler implements the read-batch / write-barrier model per fork, with an inline fast path when idle and a dispatch pool distinct from WorldState's intra-op pool. The hand-rolled request classifier is gone — read/write + fork come from the typed handler.

Testing

  • Cross-language echo matrix (build + interop + goldens 22/22) green across C++/Rust/Zig/TS.
  • aztec-wsdb builds; ipc-runtime reactor/echo gtests green.
  • native_world_state.test.ts (50 tests incl. Concurrent requests, which asserts read-your-writes ordering) green against the async server.
  • parallel_read.bench SHM: ~3.7x read scaling (8.1k → 30.4k reads/s, c=1 → c=16).
  • Full root bootstrap: wsdb + yarn-project + bb build clean.

Client queue removed

With the server owning per-fork ordering, the TS client's per-fork WorldStateOpsQueue is now redundant and has been deleted: IpcWorldState sends every op immediately and the server serializes writes / parallelizes reads. With the client queue gone, native_world_state.test.ts's Concurrent requests test now genuinely exercises server-side read/write ordering (and passes), and the TS-consumer no longer pays the queue's per-op overhead.

🤖 Generated with Claude Code

https://claude.ai/code/session_01XfznXp9wzGWLU6LC8EHtpi

A sequential-insert response can run to ~1.5 MB, but the default SHM rings
were 1 MB, so the response could never be sent and the client hung forever
(no error surfaced: the server crashed to a discarded stderr and the SHM
client has no peer-disconnect signal). UDS streamed it fine, which is why this
only bit the SHM path, and bb.js never hit it (tiny responses, SPSC only).

- Size the spawned aztec-wsdb SHM rings to 32 MiB request+response (an SHM
  frame is capped at half the ring, so this gives ~16 MiB headroom). Keep the
  WSDB_TRANSPORT env gate (default uds) for opting into SHM.
- SpscShm::create: gate the pre-fault memset to small rings so large rings
  stay demand-paged instead of forcing the whole mapping resident.
- IpcServer::run(): catch handler/send failures and shut down cleanly with a
  logged reason instead of letting an uncaught exception reach std::terminate.
- Generated spawned-service backend: capture the child's stdout/stderr to a
  temp logfile (an fd, not a pipe, to preserve clean process exit) and, on
  unexpected child exit, reject in-flight calls surfacing the log path — so a
  server crash is a clear error rather than a silent hang over SHM.
- Add single-client MPSC pipelined-flood and burst tests; the SPSC grind never
  exercised the MPSC path.

Verified: world-state native suite 50/50 over both SHM and UDS; avm_bulk
(TS world state client 0 + C++ AVM client 1 on one aztec-wsdb) passes over
SHM; no perf regression vs UDS.
The IPC-backed wsdb served all requests through a single-threaded run()
loop, so parallel reads were capped at the single-thread dispatch rate
(~15-16k reads/s regardless of concurrency) where the in-process world
state scaled with cores. AVM proving runs N simulator processes against
the wsdb, so this serialization is a throughput regression.

Add IpcServer::run_reactor(handler, executor): a reactor that owns all
ring/socket I/O and never blocks on a handler. It copies each request
out, releases the slot, assigns a per-connection sequence, and hands the
handler to a caller-supplied executor (a thread pool for wsdb, inline for
serial callers). Workers post completed responses to a per-connection
reorder stash and notify() the reactor, which is the sole sender, so the
response ring stays single-producer/lock-free and per-connection FIFO is
preserved with no wire/request-id change. ipc-runtime owns no pool and
spawns no thread; concurrency is purely the caller's executor choice, so
the serial run() path (e.g. bb) is untouched.

notify() wakes the reactor without a timeout poll: sockets use a
self-pipe in the epoll/kqueue set; MPSC-SHM bumps the doorbell seq then
futex_wakes (mirroring publish, so the wake is not lost against the
consumer's armed futex), and the consumer's spin/wait checks the
completion predicate inside the same seq-latched window. The wsdb wires a
dispatch ThreadPool distinct from WorldState's intra-op pool (mutating
handlers wait on the latter; sharing could deadlock) and raises
max_shm_clients to cover the AVM pool.

Parallel-read throughput now scales ~5x (UDS to ~29k, SHM to ~32k at
concurrency 16) instead of staying flat. Single in-flight reads pay a
thread-handoff latency (crossover ~concurrency 3).
run_reactor's dispatch model pays a thread-handoff + wakeup per request,
which is pure latency when there is no concurrency to exploit: a single
in-flight / sequential reader measured ~25-35% slower than the serial
run() and the in-process baseline (crossover around concurrency 3).

Add an inline fast path: when nothing is in flight AND no further request
is already pending, run the handler on the reactor thread instead of
dispatching it. The `inflight == 0` test short-circuits the pending check,
so a burst (which keeps inflight > 0 once started) never pays for it and
stays fully on the dispatch path — preserving the high-concurrency
scaling. The pending check is has_pending_request(): sockets poll via
wait_for_data(0) (epoll is stateless), while MPSC-SHM uses a new
side-effect-free MpscConsumer::has_data() so the frequent peek doesn't
disturb the round-robin cursor or adaptive-spin state (a wait_for_data(0)
peek there measurably depressed high-concurrency throughput).

This restores ~in-process parity at concurrency 1-2 (~7.5-8.4k reads/s,
up from ~5.5k) while keeping concurrent throughput unchanged (SHM ~30-33k,
UDS ~28-29k at concurrency 16).
parallel_read.bench pipelines N reads down one connection (the TS
world-state client's pattern). The AVM proving path is different: one
aztec-wsdb with N independent sim connections, each issuing sequential
reads. Add a bench that models that — N AsyncApi clients, one in-flight
read each — to measure what the simulator pool actually sees.

The N-connection topology scales better than one pipelined connection,
since load spreads across N request/response rings instead of contending
on one: SHM reaches ~40-47k reads/s at 7 connections (at/above the
in-process baseline), UDS ~30k. IPC-only; skips on the in-process build.
…-process comparison

The raw AsyncApi path skips the WorldStateOpsQueue + facade (~40% overhead)
that the in-process numbers go through, so these figures measure server
capability, not an in-process win. Document the 7-connection cap as the real
SHM ceiling (8 slots minus the world-state client's).
…/Rust)

Move the server dispatch to an asynchronous handler pattern so the database,
not the client, owns request ordering. A handler receives the decoded command
plus a `respond` callback and may run inline or defer to a thread pool, calling
respond when the result is ready. The wire protocol is unchanged.

Codegen (ipc-codegen):
- C++: dispatch is now async — `void handle_x(Ctx&, Cmd&&, Responder<Resp>)`;
  make_*_handler returns `void(span, RawRespond)`; serve() drives it via
  run_reactor. Responder encodes ok()/error() frames.
- Rust: Handler trait methods take `Responder<Resp>`; handle_request collects a
  synchronous response (runtime/echo glue unchanged) while leaving the handler
  free to defer with an async transport.
- TS already returns Promises (no change). Zig still to do.

wsdb: each of the 40 handlers declares its own ordering — schedule_read (reads
run concurrently; committed reads bypass ordering) or schedule_write (exclusive
per fork) — extracting its fork from the typed command. WsdbScheduler implements
the read-batch / write-barrier model per fork with an inline fast path when
idle; the hand-rolled request classifier is gone (read/write + fork come from
the typed handler). Distinct dispatch pool from WorldState's intra-op pool.

Validated: ipc-runtime reactor/echo gtests green; native_world_state.test.ts
(50 tests incl. Concurrent requests) green against the async server.
Zig server dispatch now hands each handler a typed Responder (ok/err) instead
of returning !Resp, matching the async pattern adopted for C++ and Rust. The
handleRequest entry still collects the response synchronously, so the Zig
runtime glue and echo main() are unchanged. Wire format is unchanged; the
cross-language echo matrix (goldens + interop) passes.
multi_connection_read.bench.test.ts imports @aztec/ipc-runtime to open extra
connections; declare it so eslint import-x/no-extraneous-dependencies passes.
The dispatch pool was sized from std::thread::hardware_concurrency(), which
ignores cgroup CPU limits and returns the host core count (192 in a 2-CPU CI
container). Each aztec-wsdb spawned ~192 dispatch threads instead of the
intended ~16, exhausting the per-UID thread limit (pthread_create EAGAIN ->
libc++abi abort) under the AVM simulator e2e. Size it from the caller-provided
`threads` budget (= getWsdbThreadCount, capped at 16), matching the WorldState
pool.
Read/write ordering is now enforced server-side by the wsdb per-fork
scheduler, so the IPC client no longer needs its own per-fork ordering
queue: every op is sent immediately and the server serializes writes per
fork while running reads concurrently. IpcWorldState.execute() now sends
directly; the per-fork queue map, its lifecycle (stop on close/deleteFork),
and the WorldStateOpsQueue class are deleted. The WorldStateOperationName
label type moves to world_state_operation.ts (still used for
instrumentation). The A-1055 delayed-close-fork regression test drops its
now-obsolete per-fork-queue-cleanup assertion (the silent-dispose check
remains).

native_world_state.test.ts (50 tests) passes against the async server,
including "Concurrent requests" — which, with the client queue gone, now
genuinely exercises server-side per-fork read/write ordering.
@charlielye

Copy link
Copy Markdown
Contributor Author

Superseded by the restack. The reactor (#24198) and ordering work have been combined into a single machinery commit on #24198, which now targets next and sits below the wsdb cutover (#23036). The WorldStateOpsQueue removal moved into the cutover PR. Closing as the commits here are now redistributed across #24198 and #23036.

@charlielye charlielye closed this Jun 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant