[wip]feat(fork-choice): implement Fast Confirmation Rule (FCR)#336
[wip]feat(fork-choice): implement Fast Confirmation Rule (FCR)#336GrapeBaBa wants to merge 12 commits into
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces the Fast Confirmation Rule (FCR) to the fork choice mechanism, enabling faster finality assessments as specified in the Ethereum consensus specs. The implementation is currently in a draft state, focusing on the core algorithm and the necessary infrastructure to validate it against official spec tests. Changes include updates to the fork choice logic to support specialized test modes and the addition of comprehensive test runners to ensure parity with the consensus specifications. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces the Fast Confirmation (FCR) logic and its associated spec test runner, updating the consensus spec version to v1.7.0-alpha.5. It adds the CONFIRMATION_BYZANTINE_THRESHOLD configuration and extends ForkChoice with a compute_unrealized_without_pull_up mode to support FCR testing. Feedback identifies several critical issues: the use of test-only dependencies (std.testing.io) in production code, incorrect usage of std.ArrayList instead of std.ArrayListUnmanaged in the test runner, and violations of the "limit everything" safety rule by using .unlimited for file allocations. Additionally, it is recommended to implement a proper balances getter in the test runner to avoid workarounds in the production ForkChoice implementation.
| const unrealized = try state_transition.computeUnrealizedCheckpoints(state, allocator); | ||
| // Note: io is sourced from std.testing in this code path; non-test callers | ||
| // should propagate io through onBlock once Phase F+ wires production callers. | ||
| const unrealized = try state_transition.computeUnrealizedCheckpoints(allocator, std.testing.io, state); |
There was a problem hiding this comment.
The use of std.testing.io in a production source file violates the "Safety" and "Technical Debt" goals of the style guide (Repository Style Guide lines 19 and 32). Production code should not depend on std.testing for core logic. The onBlock function should be updated to accept an io parameter (e.g., std.Io) to be propagated to computeUnrealizedCheckpoints, rather than hardcoding a test-only dependency.
References
- Use only very simple, explicit control flow for clarity. Do not use recursion to ensure that all executions that should be bounded are bounded. (link)
- TigerBeetle has a “zero technical debt” policy. We do it right the first time.
| /// Parse the steps.yaml content into an owned slice of Step values. | ||
| /// All name slices are owned by `allocator` and must be freed via `freeSteps`. | ||
| fn parseSteps(allocator: Allocator, content: []const u8) !std.ArrayList(Step) { | ||
| var steps: std.ArrayList(Step) = .empty; |
There was a problem hiding this comment.
The variable steps is declared as std.ArrayList(Step), but it is initialized with .empty and its methods are called with an explicit allocator (e.g., at line 191). This is the API for std.ArrayListUnmanaged. std.ArrayList is a managed type that requires init(allocator) and does not take an allocator in its append methods. This will cause a compilation error. The same issue applies to current at line 150 and the function signature at line 175.
var steps: std.ArrayListUnmanaged(Step) = .empty;
| // the current logical step buffer and flush whenever we see a `- ` at | ||
| // column 0 (indicating the next step) or end-of-file. | ||
| var lines = std.mem.splitScalar(u8, content, '\n'); | ||
| var current: std.ArrayList(u8) = .empty; |
| fcr.setSpecTestMode(true); | ||
|
|
||
| // ---------- Read steps.yaml ---------- | ||
| const steps_content = try dir.readFileAlloc(std.testing.io, "steps.yaml", allocator, .unlimited); |
There was a problem hiding this comment.
Using .unlimited in readFileAlloc violates the "Put a limit on everything" rule (Repository Style Guide line 38). Spec test files can be large, and a fixed upper bound should be provided to prevent excessive memory allocation in case of malformed or unexpectedly large test files.
References
- Put a limit on everything because, in reality, this is what we expect—everything has a limit. (link)
| fn nullBalancesGetter(_: ?*anyopaque, _: Checkpoint, _: *CachedBeaconState) JustifiedBalances { | ||
| return .empty; | ||
| } |
There was a problem hiding this comment.
nullBalancesGetter returning .empty is a significant limitation that has already leaked into the production ForkChoice implementation (see the workaround in src/fork_choice/fork_choice.zig line 1466). To maintain a clean separation between test logic and production code, the runner should provide a proper getter that extracts effective balances from the CachedBeaconState using state.epoch_cache.getEffectiveBalanceIncrements().
fn balancesGetter(_: ?*anyopaque, _: Checkpoint, state: *CachedBeaconState) JustifiedBalances {
return state.epoch_cache.getEffectiveBalanceIncrements();
}
References
- TigerBeetle has a “zero technical debt” policy. We do it right the first time.
Add empty FastConfirmation module skeleton with struct fields, init/deinit, and CONFIRMATION_BYZANTINE_THRESHOLD config constant. Spec helpers and algorithm follow in subsequent commits. Spec: ethereum/consensus-specs#4747 Reference: Lighthouse PR sigp/lighthouse#8951
- Drop unused Error.InvalidByzantineThreshold (YAGNI; init silently clamps per Lighthouse precedent, no caller fails on out-of-range yet). - Widen byzantine_threshold and proposer_score_boost from u8 to u64 to match ChainConfig field types and Lighthouse reference, eliminating @intcast at every future call site. - Drop unused Epoch and ValidatorIndex imports (re-introduce in phases that use them).
Implement spec misc helpers (isStartSlotAtEpoch, getBlockSlot, getBlockEpoch, getCheckpointForBlock, isAncestor, getAncestorRoots, getCurrentTarget) and state helpers (BalanceSourceData.rebuild, SlotAssignments.rebuild, getSlotCommittee). Spec: specs/phase0/fast-confirmation.md (Misc + State helpers sections)
- Widen FCR Error to include ForkChoiceError variants. getCheckpointForBlock and isAncestor now match only "block not found" variants (MissingProtoArrayBlock, UnknownAncestor), propagating corruption signals (BeaconStateErr, InvalidParentIndex, etc.) instead of collapsing them. - Change isAncestor signature to Error!bool so non-not-found errors surface. - Tighten BalanceSourceData.rebuild and SlotAssignments.rebuild to explicit Error!void return type. State subsystem errors are mapped to StateMissing; OutOfMemory propagates separately. - Fix misleading TigerStyle comment on BalanceSourceData.rebuild's validators.len > 0 assert (it asserts, doesn't silently return). - Add upper-bound assert on SlotAssignments.rebuild slot range. - Drop stale "(rebuild added in Phase B)" parentheticals from section banners.
Implement 12 LMD-GHOST safety threshold helpers per spec: isFullValidatorSetCovered, adjustCommitteeWeightEstimateToEnsureSafety, estimateCommitteeWeightBetweenSlots, getEquivocationScore, computeAdversarialWeight, getAdversarialWeight, getBlockSupportBetweenSlots, computeEmptySlotSupportDiscount, getSupportDiscount, computeSafetyThreshold, isOneConfirmed, isConfirmedChainSafe. Spec: specs/phase0/fast-confirmation.md (LMD-GHOST helpers section)
- Document BalanceSourceData.rebuild's caller invariant (state's epoch must
match cp.epoch) since active/slashed predicates evaluate at cp.epoch.
- Rename two tests that didn't actually exercise the branch their names
promised:
- "computeAdversarialWeight basic saturation behavior" ->
"empty equivocating set returns positive max" (saturation-to-zero
requires committee-aware setup, out of scope for unit test).
- "computeSafetyThreshold underflow guard saturates to 0" ->
"degenerate zero inputs return zero breakdown" (underflow branch needs
empty-slot parent-support setup, out of scope for unit test).
Implement 4 FFG helpers per spec: getCurrentTargetScore, computeHonestFfgSupportForCurrentTarget, willNoConflictingCheckpointBeJustified, willCurrentTargetBeJustified. Spec: specs/phase0/fast-confirmation.md (FFG helpers section)
- Rename two tests whose names overstated coverage:
- "computeHonestFfgSupportForCurrentTarget bounded by total active
balance" -> "zero-vote upper bound" (only the trivial branch is
exercised; positive-vote case requires more involved fixture).
- "willNoConflictingCheckpointBeJustified insufficient honest yields
false" -> "zero-balance degenerate case yields false" (named branch
requires committee + ffg_weight setup out of scope for unit test).
- Add Phase E TODO note on the redundant getTotalActiveBalance scan in
willNoConflictingCheckpointBeJustified (called once at top, again
transitively via computeHonestFfgSupportForCurrentTarget).
Implement updateFastConfirmationVariables, findLatestConfirmedDescendant, getLatestConfirmed (private algorithm methods) plus onFastConfirmation and runConfirmation public entry points. Algorithm is a direct port of spec pseudocode; spec test mode lets the runner trigger confirmation explicitly. Spec: specs/phase0/fast-confirmation.md (algorithm + handlers sections)
Phase E confused two different unrealized-justified concepts: - store.unrealized_justified_checkpoint (global, used in update_fast_confirmation_variables per spec line 814-816) - store.unrealized_justifications[head] (per-head, used by FFG helpers per spec line 770, 1003) The runner had been passing the head's per-block unrealized_justified to update_fast_confirmation_variables, which at anchor blocks carries ZERO_HASH. This stomped fcr.previous_epoch_greatest_unrealized_checkpoint to zero at every epoch boundary, cascading into all FCR field mismatches. Fix: updateFastConfirmationVariables takes *const ForkChoice and reads the global value directly from fc.fc_store.unrealized_justified.checkpoint. The per-head head_unrealized_justified parameter on entry points (onFastConfirmation/runConfirmation) keeps its meaning and continues feeding the FFG helpers downstream — those usages were correct. Spec test pass rate jumps from 0/1014 to 4006/4798 (~83.5%) on minimal preset; remaining 792 FcrConfirmedRootMismatch failures will be diagnosed in follow-up commits. Surfaced by Phase F EF spec test runner.
Spec line 999 (`get_latest_confirmed`) gates the "restart from observed
justified" branch on the BLOCK's slot's epoch:
is_observed_justified_block_epoch_ok = (
compute_epoch_at_slot(observed_justified_block_slot) + 1 == current_epoch
)
Phase E had an outer guard `observed.epoch + 1 == current_epoch` using the
checkpoint's `epoch` field — these can differ when the checkpoint's root
is an empty-slot block from an earlier epoch (e.g., checkpoint at epoch 2
but its root block is at slot 15, epoch 1). The inner correct check
(observed_block_epoch + 1 == current_epoch) is preserved.
Also add diagnostic logging to test/spec/runner/fast_confirmation.zig so
future failure-debugging can trace expected/actual confirmed_root and
the FCR variables alongside.
Note: this fix is spec-aligned but does not change the spec test pass
count (4006/4798 unchanged). The cases that fail don't hit this gate.
The remaining 792 failures need per-case algorithmic trace, suspected
in `findLatestConfirmedDescendant`'s isOneConfirmed branch (balance
source semantics or safety-threshold computation).
eea5c7d to
b085072
Compare
236b604 to
59ea6f5
Compare
Remove "Phase A/B/C/D/E/F/H" labels from section headers, doc comments, and TODO notes. These referred to the original implementation roadmap, not to anything in the code itself. Replace with code-relevant phrasing where the original intent was a concrete future improvement, drop where it was just historical context. No functional change; 241/241 fork_choice tests still pass.
59ea6f5 to
e80d456
Compare
Summary
Implements the Fast Confirmation Rule (FCR) algorithm + module as defined in consensus-specs PR 4747.
This is a draft PR targeting
gr/feature/forkchoice-z. No spec test code in this PR — the FCR EF spec test runner lives separately ongr/feat-fcr-on-spec-tests.References: