Skip to content

Adding diagnostics mode for proof failures#2060

Open
sergey3bv wants to merge 2 commits into
lightninglabs:mainfrom
sergey3bv:feat/diagnostic-mode
Open

Adding diagnostics mode for proof failures#2060
sergey3bv wants to merge 2 commits into
lightninglabs:mainfrom
sergey3bv:feat/diagnostic-mode

Conversation

@sergey3bv
Copy link
Copy Markdown
Contributor

Should close #1867

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request implements a diagnostics mode designed to capture and persist artifacts related to proof validation failures. By asynchronously writing these failures to a specified directory, it provides better visibility into issues encountered during the asset transfer process, aiding in debugging and troubleshooting.

Highlights

  • New Diagnostics Package: Introduced a new diagnostics package to handle asynchronous persistence of proof validation failures, including a service for writing failure artifacts to disk.
  • Integration with ChainPorter: Updated ChainPorter to capture and report proof validation failures during both pre-broadcast and post-broadcast stages.
  • Configuration Updates: Added a new --diagnostics-dir configuration flag to enable and configure the diagnostics service.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new diagnostics service for Taproot Assets, enabling asynchronous persistence of proof validation failures to disk. The service includes a background writer, a queue for non-blocking reporting, and logic to sanitize and store failure metadata alongside binary proof artifacts. Feedback includes a critical fix for a potential race condition in the cloneFailure function where pointer fields were not being deep-copied, a request to add missing documentation for the writeFailureReport function per the style guide, and a suggestion to refactor pre-broadcast failure reporting in the ChainPorter to use existing helper methods for better consistency.

Comment thread diagnostics/service.go
Comment thread diagnostics/service.go
Comment thread tapfreighter/chain_porter.go
@sergey3bv sergey3bv force-pushed the feat/diagnostic-mode branch from 55087ee to e39586e Compare April 10, 2026 13:57
@lightninglabs-deploy
Copy link
Copy Markdown

@sergey3bv, remember to re-request review from reviewers when ready

@sergey3bv
Copy link
Copy Markdown
Contributor Author

Hey, @jtobin, make itest-parallel runs completely fine locally yet fails locally. Could you please take a look

@kaldun-tech
Copy link
Copy Markdown
Contributor

Hey @sergey3bv it looks like you have a bad merge or rebase on your branch that caused the integration test issues. Found the root cause using Claude Opus.

  1. limit_constraints_test.go - The entire file was deleted including the entire test for RFQ limit-order constraints (critical coverage)
  2. decode_invoice_test.go - Removed the test coverage for multi-tranche group key decoding.
  3. liquidity_test.go - Synchronization code was removed which caused this CI failure

Here's Claude's recommendations. Once you have these fixed we can proceed with the rest of the review:

  1. Immediate: Revert all changes to test files:
    git checkout main -- itest/custom_channels/decode_invoice_test.go
    git checkout main -- itest/custom_channels/limit_constraints_test.go
    git checkout main -- itest/custom_channels/liquidity_test.go
    git checkout main -- itest/custom_channels/passive_assets_test.go
  2. Then: Re-apply only the necessary net.Miner → net.Miner.Client changes if needed for API compatibility
  3. Review: The PR should be rebased cleanly on main with only the diagnostics-related changes.

@sergey3bv sergey3bv force-pushed the feat/diagnostic-mode branch from e39586e to d4af801 Compare May 12, 2026 07:42
@sergey3bv
Copy link
Copy Markdown
Contributor Author

Hey, @kaldun-tech, I updated the PR according to your comment, could you please take a look.

cc @jtobin

@sergey3bv sergey3bv force-pushed the feat/diagnostic-mode branch 2 times, most recently from 0eda341 to eb74718 Compare May 12, 2026 10:41
@sergey3bv sergey3bv force-pushed the feat/diagnostic-mode branch from eb74718 to 45ea017 Compare May 12, 2026 12:10

return nil
}, ccTransferTimeout)
}, ccTransferConfirmTimeout)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes to itest look appropriate to improve CI stability via diagnostics testing

"artifacts for diagnostics "+
"(output_idx=%d): %v", idx,
inputArtifactsErr)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Here the error is logged as a warning. This seems unusual yet acceptable for diagnostics with best-effort to capture it, given that partial collection is not supported.

Comment thread diagnostics/service.go
TransferOutputIndex *int `json:"transfer_output_idx,omitempty"`
OutputProofFiles []string `json:"output_proof_files,omitempty"`
InputProofFiles []string `json:"input_proof_files,omitempty"`
}
Copy link
Copy Markdown
Contributor

@kaldun-tech kaldun-tech May 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: The metadata.json schema is not documented. Consider adding a tapd version field in a future PR.

Something like: TapdVersion string json:"tapd_version,omitempty"

This would help support teams know which tapd version produced the diagnostics dump when it happens

Comment thread diagnostics/service.go
}
}

func (s *Service) writer() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to skip a comment block on these new functions?

Comment thread tapconfig/config.go

DiagnosticsService *diagnostics.Service

Diagnostics diagnostics.Recorder
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: It's confusing to have both a Diagnostics and Diagnostics Service as it doesn't clearly convey a distinction. You could call Diagnostics a DiagnosticsRecorder instead.

Comment thread diagnostics/service.go
}

return clones
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File level concern: You don't have a mechanism for disk-space management. Ex:

  • Limit total directory size
  • Limit number of failure reports
  • Delete old reports (retention policy)
  • Rotate/archive old runs

Every proof validation failure can write to disk indefinitely. So the risk is if a node has persistent proof validation issues, the diagnostics directory could grow unbounded.

This seems acceptable for version 1 as the feature is expliclty for debugging. In a future enhancement you could add flags --diagnostics-max-size or --diagnostics-retention-days

Copy link
Copy Markdown
Contributor

@kaldun-tech kaldun-tech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks solid. The only change I would strongly recommend is where Gemini flagged the missing function comments. The rest is nits that we can take care of in a future PR.

There's clean separation of concerns here and correct async patterns. Good test coverage too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: 🆕 New

Development

Successfully merging this pull request may close these issues.

[feature]: Add diagnostics mode to capture proof failures and support artefacts

3 participants