fix: improve atomic ordering in ThreadPool and NAPI init#310
fix: improve atomic ordering in ThreadPool and NAPI init#310
Conversation
- ThreadPool: relax err_flag.load from .acquire to .monotonic in worker hot loops (L154, L260). The flag is a pure early-exit signal with no data dependency on the setter's other writes. Matches the pattern already used in pubkey_cache.zig. - NAPI root: guard shared state initialization with init_mutex to prevent a race where concurrent register calls (e.g. from Node.js Worker threads) could observe partially-initialized state. Add errdefer to roll back env_refcount on init failure so the next caller retries.
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses potential concurrency issues and performance bottlenecks related to atomic operations. By refining memory ordering in hot loops and introducing proper synchronization for shared state initialization, the changes ensure both improved execution efficiency and robust thread safety across the codebase. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Pull request overview
This PR refines atomic usage in two concurrency-sensitive areas: the BLS verification thread pool hot loops and the N-API module’s shared-state initialization, aiming to reduce unnecessary barriers in hot paths while preventing partially-initialized shared state under concurrent register() calls.
Changes:
- Downgrade
err_flag.load(.acquire)to.monotonicin BLS ThreadPool worker hot loops. - Add an
init_mutexto serialize N-APIregister()initialization and add rollback on init failure.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| src/bls/ThreadPool.zig | Reduces ordering strength in worker-loop early-exit checks to avoid unnecessary acquire loads in hot paths. |
| bindings/napi/root.zig | Serializes shared state initialization across concurrent register() calls and adds partial rollback on init failure. |
Comments suppressed due to low confidence (1)
bindings/napi/root.zig:55
- If
env.addEnvCleanupHook(...)fails,env_refcounthas already been incremented and there will be no cleanup hook to decrement it later, leaking the refcount (and potentially preventing future init/deinit from behaving correctly). Add rollback logic for failures after the increment until the cleanup hook is successfully registered (and if rolling back to 0, ensure shared state is torn down consistently).
try env.addEnvCleanupHook(EnvCleanup, &env_cleanup, EnvCleanup.hook);
try pool.register(env, exports);
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Code Review
This pull request introduces a mutex to synchronize shared state initialization in the N-API bindings, ensuring thread safety for concurrent registrations, and adds a rollback mechanism for the environment reference count upon initialization failure. It also optimizes the thread pool by relaxing memory ordering for error flag checks. Feedback recommends strengthening the atomic ordering of the reference count increment for better long-term robustness and suggests adding an assertion in the error recovery path to validate the reference count state, consistent with the repository's emphasis on safety and invariant checking.
…riant - Take init_mutex in EnvCleanup.hook to prevent init/deinit race when a Worker thread tears down while another is registering. - Assert env_refcount == 1 in errdefer to enforce the invariant that only our increment is present when init fails.
|
@codex review |
|
|
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
There was a problem hiding this comment.
Pull request overview
This PR adjusts concurrency semantics in two areas: (1) lowers atomic load ordering in ThreadPool hot loops where the flag is used only as an early-exit signal, and (2) serializes N-API module shared-state initialization to prevent concurrent register() calls from observing partial initialization.
Changes:
- Switch
err_flag.load(.acquire)to.monotonicin ThreadPool worker loops to reduce hot-path overhead. - Add a global
init_mutexto serialize N-APIregister()initialization and adderrdeferto roll backenv_refcounton init failure.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
src/bls/ThreadPool.zig |
Relaxes atomic load ordering on an early-exit flag inside worker loops. |
bindings/napi/root.zig |
Adds a mutex to serialize shared state init/teardown across concurrent N-API environments and refcount rollback on init failure. |
Comments suppressed due to low confidence (2)
bindings/napi/root.zig:59
register()incrementsenv_refcountbefore callingenv.addEnvCleanupHook(...). IfaddEnvCleanupHookfails, the refcount is never rolled back, leaving the shared state permanently referenced (and potentially never deinitialized) and futureregister()calls treating the module as already initialized. Consider adding anerrdeferto decrement the refcount on failures until the cleanup hook is successfully registered (and/or factor the decrement+deinit path into a shared helper used by both the hook and error handling).
fn register(env: napi.Env, exports: napi.Value) !void {
{
init_mutex.lock();
defer init_mutex.unlock();
if (env_refcount.fetchAdd(1, .monotonic) == 0) {
// First environment — initialize shared state.
// On failure, roll back the refcount so the next caller retries.
errdefer {
const old = env_refcount.fetchSub(1, .monotonic);
std.debug.assert(old == 1);
}
try pool.state.init();
try pubkeys.state.init();
config.state.init();
}
}
try env.addEnvCleanupHook(EnvCleanup, &env_cleanup, EnvCleanup.hook);
bindings/napi/root.zig:35
- The PR description says
EnvCleanup.hookruns outside the newinit_mutex, but the implementation now locksinit_mutexin the cleanup hook as well. Please update the PR description (or adjust the code) so the documented concurrency behavior matches what was shipped.
const EnvCleanup = struct {
fn hook(_: *EnvCleanup) void {
init_mutex.lock();
defer init_mutex.unlock();
if (env_refcount.fetchSub(1, .acq_rel) == 1) {
// Last environment — tear down shared state.
config.state.deinit();
pubkeys.state.deinit();
pool.state.deinit();
metrics.deinit();
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
Motivation
ThreadPool worker hot loops use
.acquireordering onerr_flag.load()where.monotonicsuffices — the flag is a pure early-exit signal with no data dependency on the setter's other writes. BLS verification is CPU-intensive, so this matters in the hot path.NAPI
register()uses a barefetchAdd(.monotonic)to guard shared state initialization. If two Node.js Worker threads callregisterconcurrently, Thread B can skip init and use partially-initialized state because.monotonicprovides no ordering guarantees for the init stores. While unlikely in practice (Worker thread creation is main-thread-driven), the code is incorrect.Description
src/bls/ThreadPool.zig(L154, L260):err_flag.load(.acquire)→.monotonicinexecVerifyMultiandexecAggVerifyworker loopspubkey_cache.zig:50which already correctly uses.monotonicerr_flag.load(.acquire)afterdispatch()(L233, L332) is unchanged — though technically redundant (thework_doneevent provides happens-before), it serves as defense-in-depthbindings/napi/root.zig:init_mutexto serializeregister()calls, ensuring Thread B blocks until Thread A's init completeserrdeferto roll backenv_refcounton init failure, so the next caller retries initializationfetchAdd(.monotonic)is correct inside the mutex (the mutex lock/unlock provides acquire/release barriers)fetchSub(.acq_rel)inEnvCleanup.hookunchanged — it runs outside the mutex and needsacq_relto ensure the last thread sees all writes beforedeinit