Skip to content

fix: plug five silent asset-safety bugs across selector, storage, and htlc#1522

Merged
adecaro merged 2 commits into
hyperledger-labs:mainfrom
Aman-Cool:fix/multi-store-safety
Apr 11, 2026
Merged

fix: plug five silent asset-safety bugs across selector, storage, and htlc#1522
adecaro merged 2 commits into
hyperledger-labs:mainfrom
Aman-Cool:fix/multi-store-safety

Conversation

@Aman-Cool
Copy link
Copy Markdown
Contributor

@Aman-Cool Aman-Cool commented Apr 10, 2026

Been going through some edge cases in the token pipeline and found a few things worth fixing.

Selector lock leak: selectInternal was returning SelectorInsufficientFunds on the genuine no-funds path without calling UnlockAll first. Any tokens locked during that selection attempt just stay locked until something else cleans them up, which it won't.

HTLC wall-clock non-determinism: Verifier.Verify was calling time.Now() directly to evaluate the deadline. Endorsing peers with any clock skew can disagree on whether the deadline has passed, causing endorsement failures near the boundary. Made the clock injectable via a ClockFunc field so callers with access to the block timestamp can pass it in, defaulting to time.Now() when not set.

IdentityDB namespace collision: StoreIdentityData, GetAuditInfo, GetTokenInfo, StoreSignerInfo, GetExistingSignerInfo, and GetSignerInfo were all building KVS composite keys without including the TMSID. In any deployment with multiple TMS instances sharing a single KVS backend, identity and signer data silently collides across TMS instances. Added the TMSID segment to every key.

All three are reachable through normal SDK usage, not just edge case theory.

@Aman-Cool Aman-Cool changed the title fix: plug five asset-safety bugs across selector, storage, htlc, and … fix: plug five silent asset-safety bugs across selector, storage, htlc, and auditor Apr 10, 2026
@Aman-Cool
Copy link
Copy Markdown
Contributor Author

Aman-Cool commented Apr 10, 2026

Hey @adecaro, Found these while tracing the commit pipeline for the first PR. Figured I'd fix them in one shot since they're all in the same area.

The auditor RestoreTMS one is probably the most urgent; it's the same class of crash-recovery bug as the previous PR but on the audit side, just nobody noticed because the ttx side gets more attention. Silent permanent Pending record, no alert, no recovery.

The ON CONFLICT DO NOTHING is embarrassingly simple but without it any transient error that triggers a view retry just permanently kills the operation. Surprised it hasn't surfaced more in testing.

Rest are smaller but real; the lock leak compounds over time, the HTLC clock thing is a latent endorsement bomb near deadlines, and the identity key scoping only bites in multi-TMS setups which are increasingly common in production.

Would appreciate a second pair of eyes especially on the auditor recovery path and the SQL idempotency change.

@adecaro
Copy link
Copy Markdown
Contributor

adecaro commented Apr 10, 2026

Hi @Aman-Cool , great effort indeed.

A few comments, I would remove from this PR the changes to manager.go and transactions.go.
I think that issue needs to be addressed more globally. For instance, RestoreTMS is only invoked at bootstrap but we might have to recover transactions as the system is running to due a replica crash. Regarding the split-brain issue. I think we must make that transactional to be more consistent.

The rest is totally fine. If it is okay for you, I'll merge the rest once you remove the changes I pointed to.
What do you think?

@Aman-Cool Aman-Cool force-pushed the fix/multi-store-safety branch from d778723 to be0fed4 Compare April 10, 2026 13:58
@Aman-Cool
Copy link
Copy Markdown
Contributor Author

@adecaro, Thanks a lot for the thorough review, really appreciate it; you're absolutely right on both points and I should've caught that myself before opening this.

Pulled out the auditor/manager.go and transactions.go changes. The three remaining fixes (selector lock leak, HTLC clock injection, identitydb namespace scoping) are clean and standalone so this should be ready to go now.

The auditor recovery and the split-brain atomicity fix are tracked separately in #1507 ; would love your eyes on that when you get a chance, especially since you clearly have good context on the commit pipeline.

@adecaro adecaro force-pushed the fix/multi-store-safety branch from 5b901eb to 1766f65 Compare April 10, 2026 15:06
Copy link
Copy Markdown
Contributor

@adecaro adecaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Great effort. Thanks.

@Aman-Cool Aman-Cool force-pushed the fix/multi-store-safety branch from 1766f65 to b38b311 Compare April 10, 2026 16:28
- selector/sherdlock: call UnlockAll before returning SelectorInsufficientFunds
  so advisory locks don't leak when a wallet has genuinely no funds
- htlc/signer: replace hard-coded time.Now() in Verifier.Verify with injectable
  ClockFunc so callers can pass the Fabric block timestamp for deterministic
  deadline enforcement across endorsers with clock skew
- kvs/identitydb: add tmsID to StoreIdentityData, StoreSignerInfo, and all
  corresponding read paths to prevent cross-TMS identity data collisions when
  multiple TMS instances share one KVS backend

Signed-off-by: Aman-Cool <aman017102007@gmail.com>
- Add blank line before return in selector.go for nlreturn compliance
- Replace context.Background() with t.Context() in test files
- Log unlock errors instead of embedding in error message

Signed-off-by: Aman-Cool <aman017102007@gmail.com>
@adecaro adecaro force-pushed the fix/multi-store-safety branch from b38b311 to a2cd0b0 Compare April 10, 2026 18:00
@adecaro adecaro self-assigned this Apr 11, 2026
@adecaro adecaro added bug Something isn't working labels Apr 11, 2026
@adecaro adecaro modified the milestone: Q2/26 Apr 11, 2026
@adecaro
Copy link
Copy Markdown
Contributor

adecaro commented Apr 11, 2026

Hi @Aman-Cool , please, update the description of the PR to reflect the actual changes. Thanks a lot 🙏

@adecaro adecaro changed the title fix: plug five silent asset-safety bugs across selector, storage, htlc, and auditor fix: plug five silent asset-safety bugs across selector, storage, and htlc Apr 11, 2026
@adecaro
Copy link
Copy Markdown
Contributor

adecaro commented Apr 11, 2026

great, thanks a lot @Aman-Cool 🙏

@adecaro adecaro merged commit 7be0a36 into hyperledger-labs:main Apr 11, 2026
94 checks passed
sid200727 pushed a commit to sid200727/fabric-token-sdk that referenced this pull request Apr 24, 2026
… htlc (hyperledger-labs#1522)

Signed-off-by: Aman-Cool <aman017102007@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants