Skip to content

fix: improve atomic ordering in ThreadPool and NAPI init#310

Open
GrapeBaBa wants to merge 2 commits intomainfrom
worktree-fix+atomic-ordering
Open

fix: improve atomic ordering in ThreadPool and NAPI init#310
GrapeBaBa wants to merge 2 commits intomainfrom
worktree-fix+atomic-ordering

Conversation

@GrapeBaBa
Copy link
Copy Markdown
Contributor

@GrapeBaBa GrapeBaBa commented Apr 10, 2026

Motivation

  1. ThreadPool worker hot loops use .acquire ordering on err_flag.load() where .monotonic suffices — the flag is a pure early-exit signal with no data dependency on the setter's other writes. BLS verification is CPU-intensive, so this matters in the hot path.

  2. NAPI register() uses a bare fetchAdd(.monotonic) to guard shared state initialization. If two Node.js Worker threads call register concurrently, Thread B can skip init and use partially-initialized state because .monotonic provides no ordering guarantees for the init stores. While unlikely in practice (Worker thread creation is main-thread-driven), the code is incorrect.

Description

src/bls/ThreadPool.zig (L154, L260):

  • err_flag.load(.acquire).monotonic in execVerifyMulti and execAggVerify worker loops
  • Aligns with the identical pattern in pubkey_cache.zig:50 which already correctly uses .monotonic
  • The final err_flag.load(.acquire) after dispatch() (L233, L332) is unchanged — though technically redundant (the work_done event provides happens-before), it serves as defense-in-depth

bindings/napi/root.zig:

  • Add init_mutex to serialize register() calls, ensuring Thread B blocks until Thread A's init completes
  • Add errdefer to roll back env_refcount on init failure, so the next caller retries initialization
  • fetchAdd(.monotonic) is correct inside the mutex (the mutex lock/unlock provides acquire/release barriers)
  • fetchSub(.acq_rel) in EnvCleanup.hook unchanged — it runs outside the mutex and needs acq_rel to ensure the last thread sees all writes before deinit

- ThreadPool: relax err_flag.load from .acquire to .monotonic in worker
  hot loops (L154, L260). The flag is a pure early-exit signal with no
  data dependency on the setter's other writes. Matches the pattern
  already used in pubkey_cache.zig.

- NAPI root: guard shared state initialization with init_mutex to
  prevent a race where concurrent register calls (e.g. from Node.js
  Worker threads) could observe partially-initialized state. Add
  errdefer to roll back env_refcount on init failure so the next
  caller retries.
@GrapeBaBa GrapeBaBa requested a review from a team as a code owner April 10, 2026 09:47
Copilot AI review requested due to automatic review settings April 10, 2026 09:47
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses potential concurrency issues and performance bottlenecks related to atomic operations. By refining memory ordering in hot loops and introducing proper synchronization for shared state initialization, the changes ensure both improved execution efficiency and robust thread safety across the codebase.

Highlights

  • Atomic Ordering Optimization: Updated ThreadPool worker loops to use .monotonic atomic ordering instead of .acquire for error flag checks, improving performance on ARM architectures.
  • NAPI Initialization Safety: Introduced a mutex in the NAPI registration process to prevent race conditions during shared state initialization, ensuring thread safety for concurrent Node.js Worker threads.
  • Error Handling: Added an errdefer block to the NAPI registration to properly roll back reference counts if initialization fails, allowing for successful retries.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refines atomic usage in two concurrency-sensitive areas: the BLS verification thread pool hot loops and the N-API module’s shared-state initialization, aiming to reduce unnecessary barriers in hot paths while preventing partially-initialized shared state under concurrent register() calls.

Changes:

  • Downgrade err_flag.load(.acquire) to .monotonic in BLS ThreadPool worker hot loops.
  • Add an init_mutex to serialize N-API register() initialization and add rollback on init failure.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/bls/ThreadPool.zig Reduces ordering strength in worker-loop early-exit checks to avoid unnecessary acquire loads in hot paths.
bindings/napi/root.zig Serializes shared state initialization across concurrent register() calls and adds partial rollback on init failure.
Comments suppressed due to low confidence (1)

bindings/napi/root.zig:55

  • If env.addEnvCleanupHook(...) fails, env_refcount has already been incremented and there will be no cleanup hook to decrement it later, leaking the refcount (and potentially preventing future init/deinit from behaving correctly). Add rollback logic for failures after the increment until the cleanup hook is successfully registered (and if rolling back to 0, ensure shared state is torn down consistently).

    try env.addEnvCleanupHook(EnvCleanup, &env_cleanup, EnvCleanup.hook);

    try pool.register(env, exports);

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread bindings/napi/root.zig
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a mutex to synchronize shared state initialization in the N-API bindings, ensuring thread safety for concurrent registrations, and adds a rollback mechanism for the environment reference count upon initialization failure. It also optimizes the thread pool by relaxing memory ordering for error flag checks. Feedback recommends strengthening the atomic ordering of the reference count increment for better long-term robustness and suggests adding an assertion in the error recovery path to validate the reference count state, consistent with the repository's emphasis on safety and invariant checking.

Comment thread bindings/napi/root.zig
Comment thread bindings/napi/root.zig Outdated
…riant

- Take init_mutex in EnvCleanup.hook to prevent init/deinit race when
  a Worker thread tears down while another is registering.
- Assert env_refcount == 1 in errdefer to enforce the invariant that
  only our increment is present when init fails.
@GrapeBaBa GrapeBaBa requested a review from Copilot April 10, 2026 10:45
@GrapeBaBa
Copy link
Copy Markdown
Contributor Author

@codex review

@GrapeBaBa
Copy link
Copy Markdown
Contributor Author

gemini review

@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adjusts concurrency semantics in two areas: (1) lowers atomic load ordering in ThreadPool hot loops where the flag is used only as an early-exit signal, and (2) serializes N-API module shared-state initialization to prevent concurrent register() calls from observing partial initialization.

Changes:

  • Switch err_flag.load(.acquire) to .monotonic in ThreadPool worker loops to reduce hot-path overhead.
  • Add a global init_mutex to serialize N-API register() initialization and add errdefer to roll back env_refcount on init failure.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
src/bls/ThreadPool.zig Relaxes atomic load ordering on an early-exit flag inside worker loops.
bindings/napi/root.zig Adds a mutex to serialize shared state init/teardown across concurrent N-API environments and refcount rollback on init failure.
Comments suppressed due to low confidence (2)

bindings/napi/root.zig:59

  • register() increments env_refcount before calling env.addEnvCleanupHook(...). If addEnvCleanupHook fails, the refcount is never rolled back, leaving the shared state permanently referenced (and potentially never deinitialized) and future register() calls treating the module as already initialized. Consider adding an errdefer to decrement the refcount on failures until the cleanup hook is successfully registered (and/or factor the decrement+deinit path into a shared helper used by both the hook and error handling).
fn register(env: napi.Env, exports: napi.Value) !void {
    {
        init_mutex.lock();
        defer init_mutex.unlock();
        if (env_refcount.fetchAdd(1, .monotonic) == 0) {
            // First environment — initialize shared state.
            // On failure, roll back the refcount so the next caller retries.
            errdefer {
                const old = env_refcount.fetchSub(1, .monotonic);
                std.debug.assert(old == 1);
            }
            try pool.state.init();
            try pubkeys.state.init();
            config.state.init();
        }
    }

    try env.addEnvCleanupHook(EnvCleanup, &env_cleanup, EnvCleanup.hook);

bindings/napi/root.zig:35

  • The PR description says EnvCleanup.hook runs outside the new init_mutex, but the implementation now locks init_mutex in the cleanup hook as well. Please update the PR description (or adjust the code) so the documented concurrency behavior matches what was shipped.
const EnvCleanup = struct {
    fn hook(_: *EnvCleanup) void {
        init_mutex.lock();
        defer init_mutex.unlock();
        if (env_refcount.fetchSub(1, .acq_rel) == 1) {
            // Last environment — tear down shared state.
            config.state.deinit();
            pubkeys.state.deinit();
            pool.state.deinit();
            metrics.deinit();
        }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@GrapeBaBa
Copy link
Copy Markdown
Contributor Author

gemini review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants