feat: look up cache with range calculation#8123
Conversation
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 226a02e6de
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Code Review
This pull request replaces the previous page-level cache with a more granular byte-fragment cache (PageRangeCache) to support partial cache hits and overlapping range lookups in Parquet row groups. Key changes include the introduction of PageFragmentKey and PageRangeLookup, as well as logic in the SstParquetRangeFetcher to assemble requested ranges from a mix of cached fragments and newly fetched data. A critical issue was identified in the PageRangeCache eviction listener, where holding a write lock while calling remove_if on the index map could lead to deadlocks or race conditions.
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8029580d64
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
|
@codex review |
|
Codex Review: Didn't find any major issues. Hooray! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
| fn remove_stale_index_entries(&self, keys: Vec<PageFragmentKey>) { | ||
| for key in keys { | ||
| if self.cache.contains_key(&key) { | ||
| continue; | ||
| } | ||
|
|
||
| if self.remove_index_entry(key) && self.cache.contains_key(&key) { | ||
| self.insert_index_entry(key); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
remove_stale_index_entries() is invoked from the eviction listener and calls self.cache.contains_key(&key). Is there a potential deadlock issue?
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 54c19174eb
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| let mut index = self.index.write().unwrap(); | ||
| if self.cache.contains_key(&key) { | ||
| return; |
There was a problem hiding this comment.
Avoid cache lookup while holding index write lock
remove_uncached_index_entry acquires self.index.write() and then calls self.cache.contains_key(&key), but the cache is configured with an eviction listener that calls remove_index_entry (which also takes self.index.write()). In Moka 0.12, maintenance/removal notifications are delivered during foreground cache operations, so a contains_key that drains pending notifications can re-enter the listener and block on the same lock, causing lookup threads to deadlock under eviction pressure.
Useful? React with 👍 / 👎.
| self.cache.insert(key, bytes); | ||
| self.insert_index_entry(key); | ||
| let mut index = self.index.write().unwrap(); | ||
| index | ||
| .entry(key.group_key()) | ||
| .or_default() | ||
| .insert((key.start, key.end), key); |
There was a problem hiding this comment.
Should we insert the index before the cache? So we can ensure the key is in the index when the cache evicts items.
Signed-off-by: Ruihang Xia waynestxia@gmail.comI hereby agree to the terms of the GreptimeDB CLA.
Refer to a related PR or issue link (optional)
What's changed and what's your intention?
Use cached content with range calculation instead of precise key matching to improve cache efficiency. Test scenario:
Changes:
PR Checklist
Please convert it to a draft if some of the following conditions are not met.