Skip to content

feat: look up cache with range calculation#8123

Open
waynexia wants to merge 7 commits into
mainfrom
range-cache
Open

feat: look up cache with range calculation#8123
waynexia wants to merge 7 commits into
mainfrom
range-cache

Conversation

@waynexia
Copy link
Copy Markdown
Member

Signed-off-by: Ruihang Xia waynestxia@gmail.comI hereby agree to the terms of the GreptimeDB CLA.

Refer to a related PR or issue link (optional)

What's changed and what's your intention?

Use cached content with range calculation instead of precise key matching to improve cache efficiency. Test scenario:

SELECT pod, container, avg(greptime_value) AS avg_v
FROM container_cpu_usage_seconds_total
WHERE greptime_timestamp >= '2025-10-20 00:00:00'
  AND greptime_timestamp <  '2025-10-21 00:00:00'
  AND namespace = 'kube-system'
GROUP BY pod, container
ORDER BY avg_v DESC
LIMIT 20;

Changes:

Version page hits page misses cache bytes added
before 0 +63 +132,136,272
after +32 +63 +24,335,439
  • query gets 32 page-cache hits where old gets 0, meaning fragment reuse is happening inside the same SQL workload.
  • The cache footprint is much smaller: about 24MB added vs 132MB added for the same query.

PR Checklist

Please convert it to a draft if some of the following conditions are not met.

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.
  • This PR requires documentation updates.
  • API changes are backward compatible.
  • Schema or data changes are backward compatible.

waynexia added 2 commits May 16, 2026 11:14
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
@waynexia waynexia requested review from evenyag and v0y4g3r as code owners May 16, 2026 03:18
@github-actions github-actions Bot added size/M docs-not-required This change does not impact docs. labels May 16, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 226a02e6de

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/mito2/src/cache.rs Outdated
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request replaces the previous page-level cache with a more granular byte-fragment cache (PageRangeCache) to support partial cache hits and overlapping range lookups in Parquet row groups. Key changes include the introduction of PageFragmentKey and PageRangeLookup, as well as logic in the SstParquetRangeFetcher to assemble requested ranges from a mix of cached fragments and newly fetched data. A critical issue was identified in the PageRangeCache eviction listener, where holding a write lock while calling remove_if on the index map could lead to deadlocks or race conditions.

Comment thread src/mito2/src/cache.rs Outdated
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
@waynexia
Copy link
Copy Markdown
Member Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8029580d64

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/mito2/src/cache.rs Outdated
@waynexia waynexia changed the title feat: loop up cache with range calculation feat: look up cache with range calculation May 18, 2026
waynexia added 3 commits May 18, 2026 17:39
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
@waynexia
Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Hooray!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/mito2/src/cache.rs Outdated
Comment on lines +1411 to +1421
fn remove_stale_index_entries(&self, keys: Vec<PageFragmentKey>) {
for key in keys {
if self.cache.contains_key(&key) {
continue;
}

if self.remove_index_entry(key) && self.cache.contains_key(&key) {
self.insert_index_entry(key);
}
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove_stale_index_entries() is invoked from the eviction listener and calls self.cache.contains_key(&key). Is there a potential deadlock issue?

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 54c19174eb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/mito2/src/cache.rs
Comment on lines +1394 to +1396
let mut index = self.index.write().unwrap();
if self.cache.contains_key(&key) {
return;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid cache lookup while holding index write lock

remove_uncached_index_entry acquires self.index.write() and then calls self.cache.contains_key(&key), but the cache is configured with an eviction listener that calls remove_index_entry (which also takes self.index.write()). In Moka 0.12, maintenance/removal notifications are delivered during foreground cache operations, so a contains_key that drains pending notifications can re-enter the listener and block on the same lock, causing lookup threads to deadlock under eviction pressure.

Useful? React with 👍 / 👎.

Comment thread src/mito2/src/cache.rs
Comment on lines 1359 to +1364
self.cache.insert(key, bytes);
self.insert_index_entry(key);
let mut index = self.index.write().unwrap();
index
.entry(key.group_key())
.or_default()
.insert((key.start, key.end), key);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we insert the index before the cache? So we can ensure the key is in the index when the cache evicts items.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs-not-required This change does not impact docs. size/M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants