lnwallet/test: fix flaky neutrino reorg sync timeout#10725
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves flakiness in neutrino reorg sync tests by refining the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
e99d65b to
2247e98
Compare
🔴 PR Severity: CRITICAL
🔴 Critical (1 file)
AnalysisThis PR modifies Note: The changed file is in a No severity bump triggers were met (1 file changed, 85 lines total). To override, add a |
|
overwriting this to medium since it only touches a test. |
There was a problem hiding this comment.
Code Review
This pull request refactors the waitForWalletSync function to enhance its robustness and diagnostic capabilities. Key changes include extending the timeout duration, preventing goroutine leaks by using time.NewTicker, and providing more detailed error messages that distinguish between header sync and full wallet sync failures. The function also receives extensive documentation explaining its two-layer synchronization process. A minor style guide violation was noted, requiring a blank line between case stanzas in the select statement for improved readability.
| bestHeight) | ||
| case <-ticker.C: |
There was a problem hiding this comment.
According to the style guide, there should be spacing between case stanzas in select statements for better readability. Please add a blank line before case <-ticker.C:.
bestHeight)
case <-ticker.C:References
- The style guide requires spacing between
caseandselectstanzas to improve code readability. The provided example forswitchstatements demonstrates this with blank lines between cases. (link)
|
@ziggie1984, remember to re-request review from reviewers when ready |
yyforyongyu
left a comment
There was a problem hiding this comment.
Pending linter fix, otherwise LGTM
waitForWalletSync used time.Tick inside the poll loop, leaking a new goroutine on every iteration. Over 5 reorg cycles with ~300 polls each this accumulated up to 1500 leaked goroutines, adding measurable system load that made the 30s timeout too tight, especially when running against a postgres backend where block-processing writes carry more overhead. Fix the leak by using a single time.NewTicker (deferred Stop), and raise the timeout to 2 minutes to give the neutrino P2P layer and the address-manager transaction walk enough headroom under load. Also improve the timeout error messages to identify which of the two sync layers was stuck: - Layer 1 (header/P2P): ChainIO.GetBestBlock height has not yet caught up to the miner tip — neutrino is still fetching headers. - Layer 2 (transaction walk): heights matched but IsSynced() never returned true — the chain-sync notification or the address-manager DB write (undo+redo on reorg) did not complete in time. Add a detailed doc comment to waitForWalletSync explaining the three pipeline stages (header sync, compact-filter/block fetch, transaction walk) and why each stage is relevant, so a future timeout can be diagnosed from the error message alone.
2247e98 to
3cf60bd
Compare
🟡 PR Severity: MEDIUM
🟡 Override (severity-override-medium present)
AnalysisThis PR carries a No file-by-file classification was performed, as the override is authoritative. To change the override, replace the |
yyforyongyu
left a comment
There was a problem hiding this comment.
LGTM 🔨 - Gonna merge given that this only touches test files and has low severity.
Summary
waitForWalletSync:time.Tickinside thepoll loop leaked a new goroutine on every iteration (~1500 over 5
reorg cycles), adding system load that caused the 30s timeout to fire
spuriously, particularly on postgres backends where block-processing
writes carry more overhead.
layer and the address-manager transaction walk enough headroom under
load.
ChainIO.GetBestBlockhas not caughtup to the miner tip — neutrino is still fetching headers.
IsSynced()never returned true — the chain-sync notification or the
address-manager DB write (undo+redo on reorg) did not complete in
time.
waitForWalletSyncexplaining thethree-stage sync pipeline (header sync → compact-filter/block fetch →
transaction walk + DB write) and why each stage matters, so a future
timeout can be diagnosed from the error message alone.
Test plan
make unit pkg=github.com/lightningnetwork/lnd/lnwallet/test/neutrino case=TestLightningWalletpasses consistentlyNewTickerwith deferredStop)