feat(ipc): async server handlers + server-side per-fork ordering by charlielye · Pull Request #24210 · AztecProtocol/aztec-packages

charlielye · 2026-06-20T13:14:12Z

What

Moves request ordering from the client into the database. Previously each TS client serialized its own reads/writes per fork (WorldStateOpsQueue); a second async client could corrupt state by skipping that contract. This makes the wsdb server own per-fork ordering, built on an async server-handler codegen pattern adopted symmetrically across all four languages.

Stacked on #24198 (concurrent reactor).

Codegen — async server handlers (C++, Rust, Zig; TS already async)

The generated server dispatch hands each handler the decoded command plus a respond callback, instead of returning a value. A handler may run inline or defer to a thread pool and call respond when ready. The wire protocol is unchanged, so the cross-language interop matrix and golden fixtures are untouched.

C++: void handle_x(Ctx&, Cmd&&, Responder<Resp>); make_*_handler returns void(span, RawRespond); serve() drives it via run_reactor.
Rust: Handler trait methods take Responder<Resp>; handle_request collects synchronously so the runtime/echo glue is unchanged.
Zig: dispatch hands handlers a Responder (ok/err); handleRequest unchanged.
TS: already Promise-returning — no change.

Each language's echo server updated to the new shape (~6 handlers each).

wsdb — server-side per-fork ordering

Each of the 40 handlers declares its own ordering by calling schedule_read (reads run concurrently; committed reads bypass ordering) or schedule_write (exclusive on its fork), extracting its fork from the typed command. WsdbScheduler implements the read-batch / write-barrier model per fork, with an inline fast path when idle and a dispatch pool distinct from WorldState's intra-op pool. The hand-rolled request classifier is gone — read/write + fork come from the typed handler.

Testing

Cross-language echo matrix (build + interop + goldens 22/22) green across C++/Rust/Zig/TS.
aztec-wsdb builds; ipc-runtime reactor/echo gtests green.
native_world_state.test.ts (50 tests incl. Concurrent requests, which asserts read-your-writes ordering) green against the async server.
parallel_read.bench SHM: ~3.7x read scaling (8.1k → 30.4k reads/s, c=1 → c=16).
Full root bootstrap: wsdb + yarn-project + bb build clean.

Client queue removed

With the server owning per-fork ordering, the TS client's per-fork WorldStateOpsQueue is now redundant and has been deleted: IpcWorldState sends every op immediately and the server serializes writes / parallelizes reads. With the client queue gone, native_world_state.test.ts's Concurrent requests test now genuinely exercises server-side read/write ordering (and passes), and the TS-consumer no longer pays the queue's per-op overhead.

🤖 Generated with Claude Code

https://claude.ai/code/session_01XfznXp9wzGWLU6LC8EHtpi

A sequential-insert response can run to ~1.5 MB, but the default SHM rings were 1 MB, so the response could never be sent and the client hung forever (no error surfaced: the server crashed to a discarded stderr and the SHM client has no peer-disconnect signal). UDS streamed it fine, which is why this only bit the SHM path, and bb.js never hit it (tiny responses, SPSC only). - Size the spawned aztec-wsdb SHM rings to 32 MiB request+response (an SHM frame is capped at half the ring, so this gives ~16 MiB headroom). Keep the WSDB_TRANSPORT env gate (default uds) for opting into SHM. - SpscShm::create: gate the pre-fault memset to small rings so large rings stay demand-paged instead of forcing the whole mapping resident. - IpcServer::run(): catch handler/send failures and shut down cleanly with a logged reason instead of letting an uncaught exception reach std::terminate. - Generated spawned-service backend: capture the child's stdout/stderr to a temp logfile (an fd, not a pipe, to preserve clean process exit) and, on unexpected child exit, reject in-flight calls surfacing the log path — so a server crash is a clear error rather than a silent hang over SHM. - Add single-client MPSC pipelined-flood and burst tests; the SPSC grind never exercised the MPSC path. Verified: world-state native suite 50/50 over both SHM and UDS; avm_bulk (TS world state client 0 + C++ AVM client 1 on one aztec-wsdb) passes over SHM; no perf regression vs UDS.

The IPC-backed wsdb served all requests through a single-threaded run() loop, so parallel reads were capped at the single-thread dispatch rate (~15-16k reads/s regardless of concurrency) where the in-process world state scaled with cores. AVM proving runs N simulator processes against the wsdb, so this serialization is a throughput regression. Add IpcServer::run_reactor(handler, executor): a reactor that owns all ring/socket I/O and never blocks on a handler. It copies each request out, releases the slot, assigns a per-connection sequence, and hands the handler to a caller-supplied executor (a thread pool for wsdb, inline for serial callers). Workers post completed responses to a per-connection reorder stash and notify() the reactor, which is the sole sender, so the response ring stays single-producer/lock-free and per-connection FIFO is preserved with no wire/request-id change. ipc-runtime owns no pool and spawns no thread; concurrency is purely the caller's executor choice, so the serial run() path (e.g. bb) is untouched. notify() wakes the reactor without a timeout poll: sockets use a self-pipe in the epoll/kqueue set; MPSC-SHM bumps the doorbell seq then futex_wakes (mirroring publish, so the wake is not lost against the consumer's armed futex), and the consumer's spin/wait checks the completion predicate inside the same seq-latched window. The wsdb wires a dispatch ThreadPool distinct from WorldState's intra-op pool (mutating handlers wait on the latter; sharing could deadlock) and raises max_shm_clients to cover the AVM pool. Parallel-read throughput now scales ~5x (UDS to ~29k, SHM to ~32k at concurrency 16) instead of staying flat. Single in-flight reads pay a thread-handoff latency (crossover ~concurrency 3).

run_reactor's dispatch model pays a thread-handoff + wakeup per request, which is pure latency when there is no concurrency to exploit: a single in-flight / sequential reader measured ~25-35% slower than the serial run() and the in-process baseline (crossover around concurrency 3). Add an inline fast path: when nothing is in flight AND no further request is already pending, run the handler on the reactor thread instead of dispatching it. The `inflight == 0` test short-circuits the pending check, so a burst (which keeps inflight > 0 once started) never pays for it and stays fully on the dispatch path — preserving the high-concurrency scaling. The pending check is has_pending_request(): sockets poll via wait_for_data(0) (epoll is stateless), while MPSC-SHM uses a new side-effect-free MpscConsumer::has_data() so the frequent peek doesn't disturb the round-robin cursor or adaptive-spin state (a wait_for_data(0) peek there measurably depressed high-concurrency throughput). This restores ~in-process parity at concurrency 1-2 (~7.5-8.4k reads/s, up from ~5.5k) while keeping concurrent throughput unchanged (SHM ~30-33k, UDS ~28-29k at concurrency 16).

parallel_read.bench pipelines N reads down one connection (the TS world-state client's pattern). The AVM proving path is different: one aztec-wsdb with N independent sim connections, each issuing sequential reads. Add a bench that models that — N AsyncApi clients, one in-flight read each — to measure what the simulator pool actually sees. The N-connection topology scales better than one pipelined connection, since load spreads across N request/response rings instead of contending on one: SHM reaches ~40-47k reads/s at 7 connections (at/above the in-process baseline), UDS ~30k. IPC-only; skips on the in-process build.

…-process comparison The raw AsyncApi path skips the WorldStateOpsQueue + facade (~40% overhead) that the in-process numbers go through, so these figures measure server capability, not an in-process win. Document the 7-connection cap as the real SHM ceiling (8 slots minus the world-state client's).

…/Rust) Move the server dispatch to an asynchronous handler pattern so the database, not the client, owns request ordering. A handler receives the decoded command plus a `respond` callback and may run inline or defer to a thread pool, calling respond when the result is ready. The wire protocol is unchanged. Codegen (ipc-codegen): - C++: dispatch is now async — `void handle_x(Ctx&, Cmd&&, Responder<Resp>)`; make_*_handler returns `void(span, RawRespond)`; serve() drives it via run_reactor. Responder encodes ok()/error() frames. - Rust: Handler trait methods take `Responder<Resp>`; handle_request collects a synchronous response (runtime/echo glue unchanged) while leaving the handler free to defer with an async transport. - TS already returns Promises (no change). Zig still to do. wsdb: each of the 40 handlers declares its own ordering — schedule_read (reads run concurrently; committed reads bypass ordering) or schedule_write (exclusive per fork) — extracting its fork from the typed command. WsdbScheduler implements the read-batch / write-barrier model per fork with an inline fast path when idle; the hand-rolled request classifier is gone (read/write + fork come from the typed handler). Distinct dispatch pool from WorldState's intra-op pool. Validated: ipc-runtime reactor/echo gtests green; native_world_state.test.ts (50 tests incl. Concurrent requests) green against the async server.

Zig server dispatch now hands each handler a typed Responder (ok/err) instead of returning !Resp, matching the async pattern adopted for C++ and Rust. The handleRequest entry still collects the response synchronously, so the Zig runtime glue and echo main() are unchanged. Wire format is unchanged; the cross-language echo matrix (goldens + interop) passes.

multi_connection_read.bench.test.ts imports @aztec/ipc-runtime to open extra connections; declare it so eslint import-x/no-extraneous-dependencies passes.

The dispatch pool was sized from std::thread::hardware_concurrency(), which ignores cgroup CPU limits and returns the host core count (192 in a 2-CPU CI container). Each aztec-wsdb spawned ~192 dispatch threads instead of the intended ~16, exhausting the per-UID thread limit (pthread_create EAGAIN -> libc++abi abort) under the AVM simulator e2e. Size it from the caller-provided `threads` budget (= getWsdbThreadCount, capped at 16), matching the WorldState pool.

Read/write ordering is now enforced server-side by the wsdb per-fork scheduler, so the IPC client no longer needs its own per-fork ordering queue: every op is sent immediately and the server serializes writes per fork while running reads concurrently. IpcWorldState.execute() now sends directly; the per-fork queue map, its lifecycle (stop on close/deleteFork), and the WorldStateOpsQueue class are deleted. The WorldStateOperationName label type moves to world_state_operation.ts (still used for instrumentation). The A-1055 delayed-close-fork regression test drops its now-obsolete per-fork-queue-cleanup assertion (the silent-dispose check remains). native_world_state.test.ts (50 tests) passes against the async server, including "Concurrent requests" — which, with the client queue gone, now genuinely exercises server-side per-fork read/write ordering.

charlielye · 2026-06-20T21:02:24Z

Superseded by the restack. The reactor (#24198) and ordering work have been combined into a single machinery commit on #24198, which now targets next and sits below the wsdb cutover (#23036). The WorldStateOpsQueue removal moved into the cutover PR. Closing as the commits here are now redistributed across #24198 and #23036.

charlielye added 13 commits June 17, 2026 12:17

refactor(world-state): cut over to generated wsdb package

017a351

refactor(world-state): use ipc path terminology

bacad66

refactor(world-state): remove legacy wsdb message dispatch

04ba4e6

chore(world-state): declare @aztec/ipc-runtime dep for bench test

54ed439

multi_connection_read.bench.test.ts imports @aztec/ipc-runtime to open extra connections; declare it so eslint import-x/no-extraneous-dependencies passes.

charlielye added the ci-full-no-test-cache label Jun 20, 2026

charlielye force-pushed the cl/ipc-wsdb-concurrent-server branch from f407002 to 317a72f Compare June 20, 2026 21:02

charlielye requested review from IlyasRidhuan, MirandaWood and jeanmon as code owners June 20, 2026 21:02

charlielye closed this Jun 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ipc): async server handlers + server-side per-fork ordering#24210

feat(ipc): async server handlers + server-side per-fork ordering#24210
charlielye wants to merge 13 commits into
cl/ipc-wsdb-concurrent-serverfrom
cl/ipc-wsdb-server-ordering

charlielye commented Jun 20, 2026 •

edited

Loading

Uh oh!

charlielye commented Jun 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

charlielye commented Jun 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Codegen — async server handlers (C++, Rust, Zig; TS already async)

wsdb — server-side per-fork ordering

Testing

Client queue removed

Uh oh!

charlielye commented Jun 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

charlielye commented Jun 20, 2026 •

edited

Loading