Adding diagnostics mode for proof failures#2060
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request implements a diagnostics mode designed to capture and persist artifacts related to proof validation failures. By asynchronously writing these failures to a specified directory, it provides better visibility into issues encountered during the asset transfer process, aiding in debugging and troubleshooting. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a new diagnostics service for Taproot Assets, enabling asynchronous persistence of proof validation failures to disk. The service includes a background writer, a queue for non-blocking reporting, and logic to sanitize and store failure metadata alongside binary proof artifacts. Feedback includes a critical fix for a potential race condition in the cloneFailure function where pointer fields were not being deep-copied, a request to add missing documentation for the writeFailureReport function per the style guide, and a suggestion to refactor pre-broadcast failure reporting in the ChainPorter to use existing helper methods for better consistency.
55087ee to
e39586e
Compare
|
@sergey3bv, remember to re-request review from reviewers when ready |
|
Hey, @jtobin, |
|
Hey @sergey3bv it looks like you have a bad merge or rebase on your branch that caused the integration test issues. Found the root cause using Claude Opus.
Here's Claude's recommendations. Once you have these fixed we can proceed with the rest of the review:
|
e39586e to
d4af801
Compare
|
Hey, @kaldun-tech, I updated the PR according to your comment, could you please take a look. cc @jtobin |
0eda341 to
eb74718
Compare
eb74718 to
45ea017
Compare
|
|
||
| return nil | ||
| }, ccTransferTimeout) | ||
| }, ccTransferConfirmTimeout) |
There was a problem hiding this comment.
These changes to itest look appropriate to improve CI stability via diagnostics testing
| "artifacts for diagnostics "+ | ||
| "(output_idx=%d): %v", idx, | ||
| inputArtifactsErr) | ||
| } |
There was a problem hiding this comment.
Minor: Here the error is logged as a warning. This seems unusual yet acceptable for diagnostics with best-effort to capture it, given that partial collection is not supported.
| TransferOutputIndex *int `json:"transfer_output_idx,omitempty"` | ||
| OutputProofFiles []string `json:"output_proof_files,omitempty"` | ||
| InputProofFiles []string `json:"input_proof_files,omitempty"` | ||
| } |
There was a problem hiding this comment.
Minor: The metadata.json schema is not documented. Consider adding a tapd version field in a future PR.
Something like: TapdVersion string json:"tapd_version,omitempty"
This would help support teams know which tapd version produced the diagnostics dump when it happens
| } | ||
| } | ||
|
|
||
| func (s *Service) writer() { |
There was a problem hiding this comment.
Is there a reason to skip a comment block on these new functions?
|
|
||
| DiagnosticsService *diagnostics.Service | ||
|
|
||
| Diagnostics diagnostics.Recorder |
There was a problem hiding this comment.
Minor: It's confusing to have both a Diagnostics and Diagnostics Service as it doesn't clearly convey a distinction. You could call Diagnostics a DiagnosticsRecorder instead.
| } | ||
|
|
||
| return clones | ||
| } |
There was a problem hiding this comment.
File level concern: You don't have a mechanism for disk-space management. Ex:
- Limit total directory size
- Limit number of failure reports
- Delete old reports (retention policy)
- Rotate/archive old runs
Every proof validation failure can write to disk indefinitely. So the risk is if a node has persistent proof validation issues, the diagnostics directory could grow unbounded.
This seems acceptable for version 1 as the feature is expliclty for debugging. In a future enhancement you could add flags --diagnostics-max-size or --diagnostics-retention-days
kaldun-tech
left a comment
There was a problem hiding this comment.
Overall this looks solid. The only change I would strongly recommend is where Gemini flagged the missing function comments. The rest is nits that we can take care of in a future PR.
There's clean separation of concerns here and correct async patterns. Good test coverage too.
Should close #1867