Skip to content

refactor(malachite): restructure service initialization and externalities#5580

Open
grishasobol wants to merge 22 commits into
masterfrom
gsobol/ethexe/tune-malachite-errors
Open

refactor(malachite): restructure service initialization and externalities#5580
grishasobol wants to merge 22 commits into
masterfrom
gsobol/ethexe/tune-malachite-errors

Conversation

@grishasobol

Copy link
Copy Markdown
Member

Summary

This PR refactors the Malachite consensus service initialization and externalities layer to improve separation of concerns and enable better testability. Key changes:

  • New MalachiteServiceStarter: Extracted service initialization logic into a separate builder type that prepares all components before starting the core engine. This decouples configuration from runtime state.

  • Restructured EthexeExternalities: Simplified the externalities struct by:

    • Moving configuration into a dedicated ExternalitiesConfig struct
    • Replacing Option<SimpleBlockData> chain head with a new ChainHead type that tracks both latest observed and synced Ethereum blocks
    • Changing Mutex<VecDeque> pending events to RwLock<VecDeque> for async-safe access
    • Making mempool optional (Option<Arc<dyn Mempool>>) to support non-validator nodes
  • Updated Externalities trait:

    • validate_block_above now returns Acceptance<(), String> instead of bool for richer rejection reasons
    • process_mb_proposal and process_mb_finalized are now async (.await on event emission)
    • validate_block_above takes &BlockPayload instead of owned value
  • New ChainHead type: Encapsulates chain head state with separate fields for latest observed and latest synced Ethereum blocks, plus a notification mechanism for async coordination.

  • Renamed core types: MalachiteConfigMalachiteServiceConfig, MalachiteServiceMalachiteCore (in ethexe-malachite-core), improving clarity about which layer each type belongs to.

  • Added Acceptance<A, R> enum: New generic type in ethexe-common for representing accepted/rejected validation results with optional rejection reasons.

These changes improve code organization, enable better async/await patterns, and make the service more testable by separating initialization concerns from runtime behavior.

How to test

  • Existing unit tests in ethexe/malachite/service/tests/restart_resilience.rs have been updated and should pass
  • Core consensus tests in ethexe/malachite/core/tests/multi_validators.rs updated for new Acceptance return type
  • Integration tests in ethexe/service/src/tests/ updated for new config and starter types
  • Run cargo nextest run -p "ethexe-*" to verify all ethexe crates pass

Notes

  • This is a refactoring with no functional changes to consensus behavior
  • The MalachiteServiceStarter pattern enables future improvements like lazy initialization and better error handling during startup
  • Async changes to externalities methods require callers to .await event emission, improving integration with async runtime
  • Non-validator nodes now properly handle the absence of a mempool via Option<Arc<dyn Mempool>>

https://claude.ai/code/session_014FBHbneaDtHjPcJDWgV4j4

grishasobol and others added 12 commits June 10, 2026 20:28
Gate `Signer` re-export behind `std`, await the now-async
`try_emit_or_queue`/`receive_*` calls, switch the service event loop
and test env to the new `MalachiteServiceStarter` API, drop unused
imports.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…rnals

Drop both `disable-tests` cfg gates; port externalities unit tests to
the `ChainHead` register / `Option<Arc<dyn Mempool>>` fields and async
helpers, port restart_resilience to the `MalachiteServiceStarter` API.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ynced

`ChainHead::latest_synced` was only seeded once in the starter, so the
producer's quarantine probe never saw new EBs and proposals timed out.
Update it in `receive_eb_synced` before waking the producer.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ased validation

`Externalities::validate_block_above` now takes `&BlockPayload` and
returns `Acceptance<(), String>`; rejection reasons travel in the
variant (logged by the core) instead of local `warn!` + `Ok(false)`.
Port the multi_validators TestExt impl and drop imports left unused
by the core change.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… rework

Drop all `disable-tests` cfg gates. Port mempool tests to the async
Mempool API (proptests drive it via block_on), quarantine tests to the
SimpleBlockData/Acceptance signatures (drop `verify_passed` proptest —
the depth check now lives in validate_block_above), and flip the
unresolved-ref_block test to assert the now-implemented purge.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`MalachiteConfig` → `MalachiteCoreConfig` / `MalachiteServiceConfig`,
`MalachiteService` → `MalachiteCore` (in core), `MalachiteService::new`
→ `MalachiteServiceStarter::new`. CI docs build runs with
`-D warnings`, so the stale links failed `build / docs`.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Apply the short, informative doc style across ethexe-malachite and
ethexe-malachite-core: one-line field docs, compressed method/struct
docs without internal narration, trimmed inline comment blocks.
Crate-level docs untouched.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@grishasobol grishasobol marked this pull request as ready for review June 11, 2026 15:12
@gemini-code-assist

Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors the Malachite consensus service initialization and externalities layer. The primary goal is to improve the separation of concerns and testability by introducing a builder-pattern starter, simplifying the externalities structure, and adopting more robust async patterns. These changes do not alter consensus behavior but significantly improve the codebase's maintainability and integration with async runtimes.

Highlights

  • Service Initialization Refactor: Introduced MalachiteServiceStarter to decouple service configuration from runtime state, enabling better testability and future lazy initialization.
  • Externalities Layer Restructuring: Simplified EthexeExternalities by introducing ExternalitiesConfig, replacing mutable state with RwLock, and making the mempool optional to support non-validator nodes.
  • Async API Improvements: Updated Externalities trait methods to be async and introduced the Acceptance<A, R> enum for richer validation feedback.
  • Type Renaming and Cleanup: Renamed MalachiteConfig to MalachiteServiceConfig and MalachiteService to MalachiteCore (in the core crate) to improve layer clarity.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@grishasobol

Copy link
Copy Markdown
Member Author

@claude opus review

@claude

claude Bot commented Jun 11, 2026

Copy link
Copy Markdown

Claude finished @grishasobol's task in 3m 20s —— View job


Opus review — refactor(malachite): restructure service initialization and externalities

  • Gather context (diff, key files)
  • Review service/starter/externalities restructuring
  • Review ChainHead, Acceptance, async changes
  • Post review findings

Summary. Clean, well-motivated refactor. Extracting MalachiteServiceStarter to separate "prepare components" from "launch core" is a good split, the ChainHead register (separate latest / latest_synced with a Notify) is a clear improvement over the old Option<SimpleBlockData>, and threading Acceptance<(), String> through validate_block_above so rejection reasons get logged (app.rs:353, app.rs:457) is a real win for debuggability. The lost-wakeup handling in wait_for_proposable_content (enable() before the select!) and the FIFO-preserving try_emit_or_queue/drain_pending_events pair look correct. Mostly cleanup items below — no correctness blockers found.


severity: medium — leftover scratch comment merged into the event loop

File: ethexe/service/src/lib.rs line 988

// +_+_+ ??? whether we really need this?
// Non-proposer nodes (validators that didn't propose ...

The +_+_+ ??? marker reads like an unresolved note to self. Either resolve the question (does a non-proposer really need to compute_mb on BlockFinalized, or is it redundant with the BlockProposal path?) and keep the explanatory comment, or drop the marker. Shipping ??? into the orchestrator's hot path is the kind of thing that outlives everyone's memory of why it was there. Fix this →


severity: medium — stale crate docs reference removed API

File: ethexe/malachite/service/src/lib.rs lines 14-15, 29-30, 37-39

The module doc still describes the pre-refactor API:

  • "it calls receive_new_chain_head on each ObserverEvent::BlockSynced" — that method is gone; the service now calls receive_new_eb (on ObserverEvent::Block) and receive_eb_synced (on BlockSynced).
  • The "Driver methods" list names receive_new_chain_head.
  • "Construct with MalachiteService::new(config, db, signer, validator_pub_key, mempool)" — construction is now via MalachiteServiceStarter::new(...).start(); there is no MalachiteService::new.

Since the rename is the headline of this PR, the doc should move with it. Fix this →


severity: low — validator set assigned twice; one clone() is wasted

File: ethexe/service/src/lib.rs lines 511-538

malachite_service_config already gets the set via .with_validators(malachite_validator_set.clone()) at line 516, then line 537-538 re-applies it:

let malachite_config = malachite_service_config.with_validators(malachite_validator_set);

The second with_validators is redundant and the .clone() at 516 exists only to feed it. Drop the clone at 516 and the re-assignment at 537-538 (pass malachite_service_config straight into the starter), or remove the 516 call and keep the move at 537. Looks like a refactor artifact. Fix this →


severity: low — confusing log message wording

File: ethexe/service/src/lib.rs line 535

log::info!("Malachite not local role: {role}",);

"Malachite not local role: validator" is hard to parse and has a trailing comma. Likely intended something like "Malachite node role: {role}".


severity: low — error message and a TODO comment still use the old type name

  • ethexe/malachite/service/src/starter.rs:50anyhow!("MalachiteConfig::validators is empty") should read MalachiteServiceConfig::validators.
  • ethexe/malachite/service/src/config.rs:28 (and the from_home_dir area) — fine, but worth a grep for any other MalachiteConfig / MalachiteService::new string references left behind by the rename.

severity: lowMalachiteServiceStarter::new is async but never awaits

File: ethexe/malachite/service/src/starter.rs:40-128

new contains no .await (all the awaited work is in start). It compiles fine and the async keeps API symmetry with start, but clippy::unused_async would flag it under pedantic. Optional — keep if the symmetry is deliberate.


Minor observations (no action required)

  • validate_block_above uses limit = touched.len().max(MAX_TOUCHED_PROGRAMS_PER_MB) while build_block_above drops all injected txs once initial touched exceeds the cap. The asymmetry is safe (the validator side still rejects any MB that grows touched beyond the cap), but the two paths diverging is exactly what TODO ethexe-malachite: extract shared helper for producer/validator EB-advance check #5477 calls out — worth keeping that ticket alive.
  • Full nodes get a fresh PrivateKey::random() libp2p identity on every restart (starter.rs:78). Intended per the doc comment, just noting the peer-id churn.
  • ChainHead uses two independent RwLocks (latest, latest_synced); all writers are the single service event loop, so there's no cross-field race, but the invariant "only the run loop writes these" is worth a one-line comment on the struct since the locks technically permit concurrent writers.
    · branch gsobol/ethexe/tune-malachite-errors

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the 'ethexe-malachite' consensus service, introducing 'MalachiteServiceStarter' to separate initialization from execution, updating 'EthexeExternalities' and 'quarantine' to use the 'Acceptance' enum, and converting 'InjectedTxMempool' to use async methods with 'RwLock'. Feedback on these changes includes removing a temporary debug comment in 'ethexe/service/src/lib.rs', replacing a panic with an error in 'externalities.rs', using more idiomatic 'ok_or_else' error conversions in 'quarantine.rs', and reverting or documenting an unintended timeout change in 'starter.rs'.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread ethexe/service/src/lib.rs Outdated
Comment thread ethexe/malachite/service/src/externalities.rs
Comment thread ethexe/malachite/service/src/quarantine.rs Outdated
Comment thread ethexe/malachite/service/src/quarantine.rs Outdated
Comment thread ethexe/malachite/service/src/starter.rs Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f154c85e86

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread ethexe/malachite/service/src/service.rs Outdated
grishasobol and others added 3 commits June 11, 2026 17:25
- receive_eb_synced: advance latest_synced by height instead of requiring
  the synced hash to equal the latest observed head — under lagging sync
  every BlockSynced was dropped and the producer stalled (codex P1)
- wait_for_proposable_content: bail instead of panic on missing mempool
- quarantine: Option -> Result via ok_or_else instead of with_context
- drop redundant double with_validators + clone in service setup
- fix "Malachite not local role" log wording
- refresh stale `MalachiteConfig` / `MalachiteService::new` /
  `receive_new_chain_head` mentions in docs and messages
- document the single-writer / no-guard-across-await invariant on ChainHead

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- app loop: propagate fatal errors again — the catch-all made
  `FinalizationError::Fatal` dead code and silenced the error stream
- receive_eb_synced: wake the producer on stale synced blocks too (a
  lower-height sync may land headers a failed descendant walk needs)
- restore the `candidate == parent_advance` short-circuit in
  find_eb_candidate_for_advancing (no more "X does not descend from X"
  warn spam every idle round)
- drop the redundant compute_mb on BlockFinalized: BlockProposal is
  always emitted first on every node, so compute is already triggered
- make propose_timeout a MalachiteServiceConfig field (default stays
  2 * SLOT_DURATION — Ethereum block time can stretch past one slot);
  test envs use a short timeout so idle rounds don't burn 24s each
- MalachiteServiceStarter::new: drop unused async
- nextest: reserve 4 threads for ethexe-service tests in the default
  profile like CI does — heavy multi-validator tests starved under
  full-suite parallelism and blew the 120s cap (full local suite is
  527/527 with this)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@grishasobol grishasobol self-assigned this Jun 12, 2026
@grishasobol grishasobol added scope: vara.eth Vara Ethereum application layer (L2) ai-generated Created entirely by an AI agent without direct human authorship type: fix Bug fix correcting incorrect behavior labels Jun 12, 2026
…malachite-errors

# Conflicts:
#	ethexe/compute/src/compute.rs
#	ethexe/malachite/service/src/externalities.rs
…malachite-errors

# Conflicts:
#	ethexe/malachite/service/src/externalities.rs
grishasobol added a commit that referenced this pull request Jun 23, 2026
Re-implements the parent-advance (AdvanceTillEthereumBlock) rule on top of
#5580's MalachiteServiceStarter/ChainHead structure:

- db init seeds the genesis era's on-chain validator set (a synced block must
  always have its era's validators in the db).
- observer backfills db.validators(era) for every era spanned by a freshly
  synced chain, so a node joining mid-chain holds historical eras' sets.
- new Externalities::validators_for_child_of resolves the era a parent MB
  advanced into; core app validator_set_for_height verifies each height's
  certificate against that era's set (falling back to the live shared set).
- the on-chain address -> pub key pool is threaded into EthexeExternalities.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_0147tTHt6e7mhirSAPwNxRc3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai-generated Created entirely by an AI agent without direct human authorship scope: vara.eth Vara Ethereum application layer (L2) type: fix Bug fix correcting incorrect behavior

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant