Skip to content

fix: cachedFetcher.update() no longer blocks token reads during DB refresh#1535

Open
NETIZEN-11 wants to merge 11 commits intohyperledger-labs:mainfrom
NETIZEN-11:fix/sherdlock-update-blocked-readers
Open

fix: cachedFetcher.update() no longer blocks token reads during DB refresh#1535
NETIZEN-11 wants to merge 11 commits intohyperledger-labs:mainfrom
NETIZEN-11:fix/sherdlock-update-blocked-readers

Conversation

@NETIZEN-11
Copy link
Copy Markdown

@NETIZEN-11 NETIZEN-11 commented Apr 12, 2026

What was fixed

The cachedFetcher.update() function in token/services/selector/sherdlock/fetcher.go was holding a write lock for the entire duration of the database query. This meant whenever the cache needed to refresh (e.g., on a slow DB), all token read operations would block waiting for the lock.

How it was fixed

  • Released the lock before executing the DB query
  • Added a staleness re-check after the DB call completes, to avoid overwriting a cache that another goroutine may have already refreshed
  • The cache update still happens atomically once the lock is re-acquired

Fix: #1547

Tests added

  • TestCachedFetcher_UpdateDoesNotBlockReaders: verifies that concurrent readers can still acquire the lock while update() is waiting on the database
  • TestCachedFetcher_UpdateReacquiresLockAfterDB: verifies the cache is correctly updated after the lock is re-acquired
go test -v -run "TestCachedFetcher_UpdateDoesNotBlockReaders|TestCachedFetcher_UpdateReacquiresLockAfterDB" ./token/services/selector/sherdlock/...
image

@NETIZEN-11 NETIZEN-11 force-pushed the fix/sherdlock-update-blocked-readers branch from c05df30 to 3b791af Compare April 12, 2026 19:22
@NETIZEN-11
Copy link
Copy Markdown
Author

@adecaro Would love a review on this. Thanks!

@adecaro
Copy link
Copy Markdown
Contributor

adecaro commented Apr 13, 2026

Hi @NETIZEN-11 , thanks for submitting this 🙏
I'll review ASAP but, please, open a Github Issue related to this.

Thanks much 🙏

@adecaro
Copy link
Copy Markdown
Contributor

adecaro commented Apr 13, 2026

@NETIZEN-11 , please, run make lint-auto-fix. It should solve the current lint errors 🙏

@adecaro adecaro self-assigned this Apr 13, 2026
@adecaro adecaro self-requested a review April 13, 2026 05:41
@adecaro adecaro added this to the Q2/26 milestone Apr 13, 2026
@AkramBitar AkramBitar added the enhancement New feature or request label Apr 13, 2026
@NETIZEN-11 NETIZEN-11 force-pushed the fix/sherdlock-update-blocked-readers branch 3 times, most recently from 429f175 to 727e53d Compare April 13, 2026 16:54
@NETIZEN-11
Copy link
Copy Markdown
Author

@adecaro implemented the fix. Would love a review!

Copy link
Copy Markdown
Contributor

@adecaro adecaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks a lot @NETIZEN-11

@adecaro
Copy link
Copy Markdown
Contributor

adecaro commented Apr 13, 2026

@NETIZEN-11 , the linter is still failing. Please, double check. You can run locally make lint and make checks

@adecaro
Copy link
Copy Markdown
Contributor

adecaro commented Apr 14, 2026

Hi @NETIZEN-11 , unfortunately, the checks are still reporting issues 😞
Please, make sure all the checks are passing 🙏

@NETIZEN-11 NETIZEN-11 force-pushed the fix/sherdlock-update-blocked-readers branch 2 times, most recently from 25ab751 to d65a96e Compare April 14, 2026 15:31
@NETIZEN-11
Copy link
Copy Markdown
Author

Hi @adecero,

Thanks for your feedback 🙏
I’ve fixed the linting and formatting issues and verified locally using make lint and make checks.

All checks should be passing now. Please take another look.

Thanks!

@adecaro adecaro force-pushed the fix/sherdlock-update-blocked-readers branch from d65a96e to d09034e Compare April 14, 2026 15:50
@AkramBitar
Copy link
Copy Markdown
Contributor

Hi @NETIZEN-11 , thanks a lot for this PR.

Could you please check the failed test?

@adecaro
Copy link
Copy Markdown
Contributor

adecaro commented Apr 15, 2026

the unit-tests pass on my machine, it must be that the test is not tuned for a slower machine like the one the CI is using perhaps. @NETIZEN-11 , please, fix this ASAP 🙏

@NETIZEN-11 NETIZEN-11 requested a review from adecaro April 15, 2026 18:43
@NETIZEN-11 NETIZEN-11 force-pushed the fix/sherdlock-update-blocked-readers branch from ca8232d to f407b2f Compare April 16, 2026 09:18
Comment thread .golangci.yml
Comment thread token/services/selector/sherdlock/manager_test.go Outdated
@NETIZEN-11
Copy link
Copy Markdown
Author

@adecrao Thanks for the review!

  • Restored version: "2" in .golangci.yml — it was removed unintentionally while fixing local lint issues.
  • Removed the t.Skipf from manager_test.go and replaced it with proper error handling (require.NoError). Tests will now fail if PostgreSQL doesn't start, as expected.

Let me know if any further changes are needed.

@adecaro
Copy link
Copy Markdown
Contributor

adecaro commented Apr 16, 2026

Hi @NETIZEN-11 , I'm having a second pass. I still don't understand what we are gaining. If a go routing is updating, it means that other readers might try to update as well because the cache is empty or not fresh enough. These other go routines then will block on the signal of the updating being completing.

Do you see my point? I'm still wrapping my mind around 😅

@adecaro
Copy link
Copy Markdown
Contributor

adecaro commented Apr 16, 2026

yes, if an update is in progress, other readers will satisfy the condition that triggers an update. Now, after the first go routine that acquired the lock completes, these lines (in the original code)

	f.lastFetched = time.Now()
	atomic.StoreUint32(&f.queriesResponded, 0)

guarantee that the other waiting for mu will not need a new db query.
No?

@adecaro adecaro self-requested a review April 16, 2026 17:09
@NETIZEN-11
Copy link
Copy Markdown
Author

@adecaro Thanks for explaining! The sync.Cond implementation ensures that once the first goroutine completes its DB refresh and
▎ updates lastFetched and queriesResponded, all waiting goroutines wake up and skip their DB queries since the cache is now fresh.
▎ No new DB query is needed by the waiting goroutines. Please let me know if you need anything else.

@adecaro
Copy link
Copy Markdown
Contributor

adecaro commented Apr 23, 2026

Hi @NETIZEN-11 , so, I think your patch helps when the refresh is invoked when the cache is not yet exhausted. Fair enough.

We need to cleanup the code to remove duplications and make the code more robust. Please, have another pass on the code. Many thanks for the effort 🙏

@adecaro adecaro force-pushed the fix/sherdlock-update-blocked-readers branch 2 times, most recently from 743dc63 to f4b5c51 Compare April 26, 2026 08:12
Nitesh added 8 commits April 28, 2026 07:50
Signed-off-by: Nitesh <nitesh@example.com>
Signed-off-by: Nitesh <nitesh@example.com>
Signed-off-by: Nitesh <nitesh@example.com>
Signed-off-by: Nitesh <nitesh@example.com>
…cachedFetcher

This commit introduces a sync.Cond mechanism to ensure only one database refresh runs at a time in cachedFetcher, preventing redundant work and timeouts on CI. It also adds proper error handling in groupTokensByKey to prevent updating the cache with incomplete data, and resolves several linting and configuration issues.

Signed-off-by: Nitesh <nitesh@example.com>
Signed-off-by: Nitesh <nitesh@example.com>
@adecaro adecaro force-pushed the fix/sherdlock-update-blocked-readers branch from f4b5c51 to c90cc2f Compare April 28, 2026 05:50
@adecaro
Copy link
Copy Markdown
Contributor

adecaro commented Apr 28, 2026

Hi @NETIZEN-11 , I'll review your latest changes ASAP. Sorry for the delay 🙏

@adecaro
Copy link
Copy Markdown
Contributor

adecaro commented Apr 29, 2026

Hi @NETIZEN-11 , please, resolve the conflicts 🙏

Comment thread token/services/selector/sherdlock/fetcher.go Outdated
Comment thread token/services/selector/sherdlock/fetcher.go
Comment thread token/services/selector/sherdlock/fetcher.go Outdated
Comment thread token/services/selector/sherdlock/fetcher.go
@adecaro adecaro self-requested a review April 29, 2026 12:25
@adecaro
Copy link
Copy Markdown
Contributor

adecaro commented Apr 29, 2026

Hi @NETIZEN-11 , I left a few more comments. Sorry for being picky. This code is on the critical path of a token transaction's lifecycle. We need to make it as clean as possible. Thanks for the understanding 🙏

NETIZEN-11 and others added 2 commits April 29, 2026 18:04
Signed-off-by: Nitesh Kumar <niteshkumar121411@gmail.com>
Signed-off-by: Nitesh <nitesh@example.com>
@NETIZEN-11
Copy link
Copy Markdown
Author

Hi @adecaro, I’ve addressed all the review comments and pushed the updates. Could you please take another look when you get a chance? Thank u!

@adecaro
Copy link
Copy Markdown
Contributor

adecaro commented Apr 30, 2026

Hi @NETIZEN-11 , thanks for the changes. We still have linter issues. Please, run make checks and make lint-auto-fix before pushing again. Thanks much 🙏

@adecaro
Copy link
Copy Markdown
Contributor

adecaro commented May 4, 2026

Hi @NETIZEN-11 , please, don't forget to fix the lint issues. Thanks 🙏

@adecaro
Copy link
Copy Markdown
Contributor

adecaro commented May 7, 2026

Hi @NETIZEN-11 , please, let me know if you can continue on this PR. Many thanks 🙏

@NETIZEN-11
Copy link
Copy Markdown
Author

Hi @adecaro, I’ve been dealing with some health issues recently, so it’s taking me a little more time. I’m working on fixing this as soon as possible. Thank you for your patience 🙏

@adecaro
Copy link
Copy Markdown
Contributor

adecaro commented May 8, 2026

Hi @adecaro, I’ve been dealing with some health issues recently, so it’s taking me a little more time. I’m working on fixing this as soon as possible. Thank you for your patience 🙏

Hi @NETIZEN-11 , your personal health first. Have a speedy recovery 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request token-selector

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG]: cachedFetcher.update() blocks token reads during cache refresh

3 participants