feat(ipc): concurrent service machinery — async reactor + codegen#24198
Open
charlielye wants to merge 1 commit into
Open
feat(ipc): concurrent service machinery — async reactor + codegen#24198charlielye wants to merge 1 commit into
charlielye wants to merge 1 commit into
Conversation
f407002 to
317a72f
Compare
Asynchronous server-handler codegen across C++/Rust/Zig (TS already async): the generated dispatch hands each handler a respond callback, so a handler may run inline or defer to a thread pool and respond when ready. The wire protocol is unchanged. ipc-runtime gains run_reactor (a non-blocking reactor that owns all ring I/O, is the sole sender, and reorders responses per connection) plus notify()/wait_for_data_or_ready for completion wakeups, and the run() serial loop is unchanged. The wsdb C++ server adopts this: each of the 40 handlers declares its own ordering via schedule_read / schedule_write (reads concurrent, committed reads unordered, writes exclusive per fork), and WsdbScheduler implements the read-batch / write-barrier model with an inline fast path when idle and a dispatch pool distinct from WorldState's intra-op pool. This is the IPC-machinery layer that the wsdb cutover builds on; it is independent of the world-state IPC consumer. Includes the single-connection parallel-read benchmark (transport-agnostic).
317a72f to
317eaf3
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
General-purpose machinery in
ipc-runtimeandipc-codegenfor building IPC services that handle requests concurrently, plus its first consumer (the wsdb world-state service).Until now an
ipc-runtimeserver processed requests through a single-threadedrun()loop (accept → wait_for_data → receive → handler → send → release) — one in-flight request at a time. A service backed by this could only dispatch as fast as one thread, regardless of how concurrent its clients were. This adds an asynchronous, non-blocking reactor that any generated service can opt into, so a service can service its connections concurrently whileipc-runtimeitself stays thread-pool-agnostic.Nothing here is wsdb-specific except the per-fork scheduler called out under Consumers. The reactor, the async codegen, the wake mechanism, and the response reordering are service-agnostic:
aztec-wsdb,aztec-vm-sim, andcdball generate against the same async server interface. wsdb is simply the first service to exercise it under parallel load, so the benchmarks below are wsdb's.The machinery (ipc-runtime + ipc-codegen)
run_reactor(AsyncHandler): the reactor owns all ring/socket I/O and never blocks on a handler. It copies each request out,release()s the slot immediately, assigns a per-connection sequence, and invokes anAsyncHandler—void(int client_id, std::span<const uint8_t> request, Respond respond), whereRespond = std::function<void(std::vector<uint8_t>)>. The handler may respond inline (on the reactor thread) or hand the work to a thread pool and respond later from a worker. ipc-runtime owns no pool and spawns no thread — concurrency is purely the handler's choice. Serial callers (bb) keep the untouchedrun().Responder<Resp>(ok()/error()); RustResponder<R>over aBox<dyn FnOnce(Vec<u8>) + Send>; ZigResponder(RespType)(ok/err); TS handlers are plainasyncfunctions returning aPromise. The wire format is unchanged — the cross-language echo matrix and frozen golden fixtures still pass — so this is purely a server-handler shape change for any service that regenerates.send(), so each response ring stays single-producer/lock-free. A small per-connection reorder stash (keyed by sequence) preserves FIFO on the wire even when handlers complete out of order — no request-id envelope, no wire/client change. The only lock is over the in-process stash, never a ring.notify()wake (no timeout poll): lets a worker that finished off the reactor thread wake the reactor immediately. Sockets use a self-pipe registered in the epoll/kqueue set; MPSC-SHM bumps the doorbell seq thenfutex_wakes (mirroringpublish, so the wake isn't lost against the consumer's armed futex), and the consumer evaluates the completion predicate inside the same seq-latched window. The SHM spin also breaks on a doorbell-seq change so low-concurrency completions aren't spun through.inflight == 0short-circuits the pending check, so a burst never pays for it and stays on the dispatch path. The check ishas_pending_request(): sockets poll viawait_for_data(0); MPSC-SHM uses a side-effect-freeMpscConsumer::has_data()so the peek doesn't disturb its round-robin / spin state.Consumers
aztec-wsdb) — the first consumer, and the only part of this PR that is service-specific. A per-forkWsdbSchedulersits on top of the reactor: each handler routes its work throughschedule_read/schedule_write, and per fork, committed reads run concurrently (they read the committed snapshot, independent of in-flight writes), uncommitted reads wait behind an in-flight write on that fork, and writes are exclusive per fork. This moves read/write ordering out of the client (the oldWorldStateOpsQueue, removed in refactor: cut TS world state and NAPI AVM over to WSDB IPC; delete NAPI WSDB #23036) and into the database, so multiple clients stay consistent. Wiring: a dispatchbb::ThreadPooldistinct from WorldState's intra-op pool (mutating handlerswait()on the latter; sharing could deadlock), sized from the caller'sthreadsbudget (nothardware_concurrency(), which ignores cgroup limits and would exhaust the per-UID thread limit in containers), andmax_shm_clientsraised to 8 to cover the AVM pool.aztec-vm-simandcdbgenerate against the same async server interface. The avm server runs its handler inline on the reactor thread (one in-flight simulation per connection, so no dispatch pool), which is exactly the inline fast path above.Results
These are the wsdb service under parallel reads — the consumer that drives this work. reads/s, same machine, single-worker jest (meaningful run-to-run variance on a shared host; medians shown).
Single connection, pipelined (the TS world-state client pattern)
parallel_read.bench.test.ts— N in-flight reads down one connection, vs the in-processnextbaseline:next)With the old serial
run()the IPC path was flat at ~15k reads/s regardless of concurrency. The reactor now scales; the inline fast path holds c=1–2 at ~in-process parity (no low-concurrency regression). One pipelined connection plateaus around ~30k because all N in-flight reads contend on a single request/response ring and the single reactor funnel.N independent connections, sequential (the AVM simulator-pool topology)
multi_connection_read.bench.test.ts— one wsdb, N client connections, each a synchronous read loop (one in-flight). This is what theaztec-vm-simpool actually does. Caveat: not a like-for-like in-process comparison — this bench callsAsyncApidirectly, skipping the facade that the in-process numbers (andparallel_read.bench) go through. That facade costs ~40% (the 1-connection number here, ~10.6k SHM, vsparallel_readSHM c=1, ~7.4k), so these figures measure server capability with a lean client, not an in-process win:Two takeaways: (1) spreading load across N rings scales better than pipelining one connection (N=7 SHM raw ~46k vs one pipelined connection ~30k), so the server itself is not the bottleneck; (2) deflated by the ~1.4x facade factor, SHM lands at ~32k facade-equivalent — ~80-86% of in-process, never above — consistent with the like-for-like single-connection table. The IPC path does not beat in-process; the residual gap is the facade + cross-process transport. (7 connections is the production SHM ceiling:
max_shm_clients=8 minus the world-state client's slot. UDS is uncapped.)Testing
SocketTest.ReactorPipelinedConcurrencyAndOrderandShmTest.MpscReactorPipelinedConcurrencyAndOrder— pipelined requests with reversed completion latencies must come back FIFO and complete concurrently (a lost wake shows as a stall). Full ipc-runtime suite green.native_world_state.test.ts(50 tests) green over the reactor, including Concurrent requests — mutating and non-mutating requests are correctly queued (validates wsdb's per-fork read/write ordering and that mutating handlers run on the dispatch pool without deadlocking on WorldState's intra-op pool).parallel_read.bench.test.tsruns on bothnext(in-process) and this branch (identical file);multi_connection_read.bench.test.tsis IPC-only (skips in-process).