Skip to content

lnwallet/test: fix flaky neutrino reorg sync timeout#10725

Merged
yyforyongyu merged 1 commit into
lightningnetwork:masterfrom
ziggie1984:lnwallet/fix-neutrino-reorg-wallet-sync-timeout
Apr 27, 2026
Merged

lnwallet/test: fix flaky neutrino reorg sync timeout#10725
yyforyongyu merged 1 commit into
lightningnetwork:masterfrom
ziggie1984:lnwallet/fix-neutrino-reorg-wallet-sync-timeout

Conversation

@ziggie1984
Copy link
Copy Markdown
Collaborator

@ziggie1984 ziggie1984 commented Apr 9, 2026

Summary

  • Fix a goroutine leak in waitForWalletSync: time.Tick inside the
    poll loop leaked a new goroutine on every iteration (~1500 over 5
    reorg cycles), adding system load that caused the 30s timeout to fire
    spuriously, particularly on postgres backends where block-processing
    writes carry more overhead.
  • Raise the timeout from 30s to 2 minutes to give the neutrino P2P
    layer and the address-manager transaction walk enough headroom under
    load.
  • Improve timeout error messages to identify which sync layer was stuck:
    • Layer 1 (header/P2P lag): ChainIO.GetBestBlock has not caught
      up to the miner tip — neutrino is still fetching headers.
    • Layer 2 (transaction-walk lag): heights matched but IsSynced()
      never returned true — the chain-sync notification or the
      address-manager DB write (undo+redo on reorg) did not complete in
      time.
  • Add a detailed doc comment to waitForWalletSync explaining the
    three-stage sync pipeline (header sync → compact-filter/block fetch →
    transaction walk + DB write) and why each stage matters, so a future
    timeout can be diagnosed from the error message alone.

Test plan

  • make unit pkg=github.com/lightningnetwork/lnd/lnwallet/test/neutrino case=TestLightningWallet passes consistently
  • No new goroutine leaks (single NewTicker with deferred Stop)

@ziggie1984 ziggie1984 self-assigned this Apr 9, 2026
@ziggie1984 ziggie1984 added flake fix neutrino Lightweight neutrino backend-type no-changelog labels Apr 9, 2026
@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves flakiness in neutrino reorg sync tests by refining the waitForWalletSync function. It eliminates a goroutine leak, extends the synchronization timeout to prevent spurious failures under load, and provides more granular error messages to pinpoint the exact stage of a sync timeout. These changes aim to improve the robustness and diagnosability of wallet synchronization in test environments.

Highlights

  • Goroutine Leak Fix: Addressed a goroutine leak in waitForWalletSync by replacing time.Tick with time.NewTicker and ensuring proper cleanup with defer ticker.Stop().
  • Timeout Extension: Increased the waitForWalletSync timeout from 30 seconds to 2 minutes to accommodate slower sync operations, especially under load or with postgres backends.
  • Enhanced Error Reporting: Improved timeout error messages to clearly indicate whether the sync failure occurred during header synchronization (Layer 1) or full wallet processing (Layer 2), aiding in quicker diagnosis.
  • Detailed Documentation: Added a comprehensive doc comment to waitForWalletSync explaining the multi-stage sync pipeline (header sync, compact-filter/block fetch, transaction walk + DB write) for easier future diagnosis.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@ziggie1984 ziggie1984 force-pushed the lnwallet/fix-neutrino-reorg-wallet-sync-timeout branch from e99d65b to 2247e98 Compare April 9, 2026 09:49
@github-actions github-actions Bot added the severity-critical Requires expert review - security/consensus critical label Apr 9, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 9, 2026

🔴 PR Severity: CRITICAL

Classified by file path | 1 file | 85 lines changed

🔴 Critical (1 file)
  • lnwallet/test/test_interface.go - Falls under lnwallet/*: wallet operations test infrastructure

Analysis

This PR modifies lnwallet/test/test_interface.go, which resides in the lnwallet/ package tree. Per classification rules, all files under lnwallet/* are CRITICAL as this package covers wallet operations, channel funding, signing, and commitment transactions.

Note: The changed file is in a test/ subdirectory of lnwallet and carries the flake fix label, indicating it is test infrastructure rather than production wallet logic. The file does not match the explicit test-exclusion patterns (*_test.go, itest/*, lntest/*), so it inherits the CRITICAL rating from its parent package path. If this change is purely test scaffolding with no production impact, consider adding severity-override-low or severity-override-medium to reflect the actual risk.

No severity bump triggers were met (1 file changed, 85 lines total).


To override, add a severity-override-{critical,high,medium,low} label.

@ziggie1984 ziggie1984 added the severity-override-medium Manual override to medium label Apr 9, 2026
@ziggie1984
Copy link
Copy Markdown
Collaborator Author

overwriting this to medium since it only touches a test.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the waitForWalletSync function to enhance its robustness and diagnostic capabilities. Key changes include extending the timeout duration, preventing goroutine leaks by using time.NewTicker, and providing more detailed error messages that distinguish between header sync and full wallet sync failures. The function also receives extensive documentation explaining its two-layer synchronization process. A minor style guide violation was noted, requiring a blank line between case stanzas in the select statement for improved readability.

Comment on lines +3084 to +3085
bestHeight)
case <-ticker.C:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

According to the style guide, there should be spacing between case stanzas in select statements for better readability. Please add a blank line before case <-ticker.C:.

				bestHeight)

		case <-ticker.C:
References
  1. The style guide requires spacing between case and select stanzas to improve code readability. The provided example for switch statements demonstrates this with blank lines between cases. (link)

@lightninglabs-deploy
Copy link
Copy Markdown
Collaborator

@ziggie1984, remember to re-request review from reviewers when ready

Copy link
Copy Markdown
Member

@yyforyongyu yyforyongyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pending linter fix, otherwise LGTM

waitForWalletSync used time.Tick inside the poll loop, leaking a new
goroutine on every iteration. Over 5 reorg cycles with ~300 polls each
this accumulated up to 1500 leaked goroutines, adding measurable system
load that made the 30s timeout too tight, especially when running against
a postgres backend where block-processing writes carry more overhead.

Fix the leak by using a single time.NewTicker (deferred Stop), and raise
the timeout to 2 minutes to give the neutrino P2P layer and the
address-manager transaction walk enough headroom under load.

Also improve the timeout error messages to identify which of the two
sync layers was stuck:
- Layer 1 (header/P2P): ChainIO.GetBestBlock height has not yet caught
  up to the miner tip — neutrino is still fetching headers.
- Layer 2 (transaction walk): heights matched but IsSynced() never
  returned true — the chain-sync notification or the address-manager
  DB write (undo+redo on reorg) did not complete in time.

Add a detailed doc comment to waitForWalletSync explaining the three
pipeline stages (header sync, compact-filter/block fetch, transaction
walk) and why each stage is relevant, so a future timeout can be
diagnosed from the error message alone.
@ziggie1984 ziggie1984 force-pushed the lnwallet/fix-neutrino-reorg-wallet-sync-timeout branch from 2247e98 to 3cf60bd Compare April 27, 2026 01:32
@github-actions github-actions Bot added severity-medium Focused review required and removed severity-critical Requires expert review - security/consensus critical labels Apr 27, 2026
@github-actions
Copy link
Copy Markdown

⚠️ Severity changed: severity-criticalseverity-medium (files changed since last classification)

🟡 PR Severity: MEDIUM

Override applied | 1 file | 87 lines changed

🟡 Override (severity-override-medium present)
  • A severity-override-medium label was found on this PR — automated classification is skipped and the override level is used directly.

Analysis

This PR carries a severity-override-medium label, which takes precedence over any automated classification. The previously applied label (severity-critical) has been replaced with severity-medium to match the override.

No file-by-file classification was performed, as the override is authoritative.


To change the override, replace the severity-override-medium label with severity-override-{critical,high,medium,low}.
<!-- pr-severity-bot -->

Copy link
Copy Markdown
Member

@yyforyongyu yyforyongyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🔨 - Gonna merge given that this only touches test files and has low severity.

@yyforyongyu yyforyongyu merged commit 5b85551 into lightningnetwork:master Apr 27, 2026
42 of 45 checks passed
@ziggie1984 ziggie1984 deleted the lnwallet/fix-neutrino-reorg-wallet-sync-timeout branch April 27, 2026 13:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

flake fix neutrino Lightweight neutrino backend-type no-changelog severity-medium Focused review required severity-override-medium Manual override to medium

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants