refactor: remove build() and async shuffling calculation#8688
refactor: remove build() and async shuffling calculation#8688matthewkeil merged 32 commits intoChainSafe:unstablefrom
Conversation
Summary of ChangesHello @guha-rahul, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request refactors the shuffling calculation mechanism by removing the asynchronous Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request effectively refactors the shuffling calculation by removing the asynchronous build() method and the asyncShufflingCalculation option. This change simplifies the codebase by enforcing synchronous shuffling calculation, which is a necessary step for upcoming features like post-Fulu proposer lookahead. The changes are consistent across all modified files, including interfaces, implementations, and tests. The IShufflingCache interface is updated to use a set() method, and the implementation in ShufflingCache correctly makes this method public. The epoch processing logic in EpochCache and EpochTransitionCache is also cleaned up by removing the now-obsolete async path. Overall, this is a solid refactoring. I have a couple of minor suggestions to improve code clarity and documentation.
| @@ -380,7 +370,6 @@ export function beforeProcessEpoch( | |||
| }); | |||
|
|
|||
| // Trigger async build of shuffling for epoch after next (nextShuffling post epoch transition) | |||
| /** | ||
| * Queue asynchronous build for an EpochShuffling | ||
| * Add an EpochShuffling to the ShufflingCache. | ||
| */ |
There was a problem hiding this comment.
The comment for the set method in the IShufflingCache interface could be more descriptive. The implementation in ShufflingCache also handles resolving promises if they exist. Reflecting this behavior in the interface documentation would improve clarity for developers using this interface.
| /** | |
| * Queue asynchronous build for an EpochShuffling | |
| * Add an EpochShuffling to the ShufflingCache. | |
| */ | |
| /** | |
| * Add an EpochShuffling to the ShufflingCache. If a promise for the shuffling is present it will | |
| * resolve the promise with the built shuffling. | |
| */ |
twoeths
left a comment
There was a problem hiding this comment.
this PR misses the context of #6938:
- state-transition should not have ShufflingCache dependency, we will not have it in lodestar-z also
- beacon-node should populate the ShufflingCache instead, see https://github.com/ChainSafe/lodestar/pull/6938/changes#diff-c21a87c055e2b34c02de2fa28ecc52c53837159f567eef5fb22e80ed55827044L341 . That's also the direction for integrating lodestar-z state-transition, need to think how to pull shuffling there
- a lot of metrics in ShuffllingCache introduced by #6938 becomes useless
- if we regen state for attestation verification, we should also populate ShufflingCache https://github.com/ChainSafe/lodestar/pull/6938/changes#diff-29e6daed96db2b885b1c4f516eb55bf7e6916e298d50f78bd2f8c57811481d2fL902
- add more comments on why we have to compute shuffling for epoch
n + 2postfulu (with proposer lookahead spec change)
in the end it'd be more about reverting #6938, and we should deploy on a feature group to make sure there is no regression
also would like @matthewkeil to have a close look to make sure we don't miss anything
@twoeths It looks like you covered changes that will get reverted from #6938. The gist Rahul will be that there should be no shuffling cache on the epochCtx anymore. I think if you remove that from the |
| const parentEpoch = computeEpochAtSlot(parentBlockSlot); | ||
| if (parentEpoch < blockEpoch && postState.epochCtx.nextShuffling !== null) { | ||
| // current epoch and previous epoch are likely cached in previous states | ||
| this.shufflingCache.set(postState.epochCtx.nextShuffling, postState.epochCtx.nextDecisionRoot); | ||
| this.logger.verbose("Processed shuffling for next epoch", {parentEpoch, blockEpoch, slot: blockSlot}); | ||
| } |
There was a problem hiding this comment.
Originally this happened right before the epoch work. I think it should be ok doing it early here but for consistency sake can we move this down to just before the line if (blockSlot % SLOTS_PER_EPOCH === 0) { in case there are error cases that will cause us to exit the import early. We only cache a limited number of shufflings and will be better to not cache ones from imports that fail.
There was a problem hiding this comment.
agree, should stay where it was originally
| }) ?? | ||
| // Only for testing. shufflingCache should always be available in prod | ||
| computeEpochShuffling(state, cache.nextShufflingActiveIndices, epoch); | ||
| const shuffling = computeEpochShuffling(state, cache.nextShufflingActiveIndices, epoch); |
There was a problem hiding this comment.
Do we cache this result when the state comes back out of the state transition in all cases? I think this will be the case but please check for places where processSlot or stateTransition is in the call stack and verify we are doing that.
There was a problem hiding this comment.
Perhaps its better to do all calculations via the cache and that way there is no accidental duplication. Seems like an easier pattern to always use the build method instead of having free floating calls to computeEpochShuffling. Will keep everything centralized and coordinated. What do you think @twoeths ??
|
@guha-rahul this e2e test consistently failed also lint CI is failing |
| buckets: [0.5, 1, 1.5, 2], | ||
| }), | ||
| shufflingCalculationTime: register.histogram<{source: "build" | "getSync"}>({ | ||
| shufflingCalculationTime: register.histogram({ |
There was a problem hiding this comment.
I notice that we don't have this metric anymore so need to remove it
- pre-fulu this metric was useful to know how much time we save
- post-fulu we have to compute shuffling for epoch
n+2so we don't have that saving time. I suppose that takes up theprocessProposerLookaheadtime
…hul/lodestar into refactor_shiffling_cache
This reverts commit f75cd15.
matthewkeil
left a comment
There was a problem hiding this comment.
LGTM!! 🚀
Metrics are all looking good
…hul/lodestar into refactor_shiffling_cache
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## unstable #8688 +/- ##
============================================
+ Coverage 52.02% 52.06% +0.04%
============================================
Files 848 848
Lines 64650 64480 -170
Branches 4767 4757 -10
============================================
- Hits 33632 33572 -60
+ Misses 30949 30839 -110
Partials 69 69 🚀 New features to boost your workflow:
|
|
🎉 This PR is included in v1.40.0 🎉 |




Motivation
lodestar-zhappens,BeaconStateAllForkswill be a blocker and Thebuild()method inShufflingCachedepends on it.BeaconState, requiring shufflings synchronously during epoch transitions—making the asyncbuild()pattern no longer viable.Description
build()method fromIShufflingCacheinterface andShufflingCacheclassset()toIShufflingCacheinterface to add shufflingsasyncShufflingCalculationCloses #8653
AI Assistance Disclosure
use claude to understand how ShufflingCache avoids recomputation