Skip to content

feat: initial implementation for range cache with time filters#8130

Open
evenyag wants to merge 3 commits into
GreptimeTeam:mainfrom
evenyag:feat/range-cache-time-range
Open

feat: initial implementation for range cache with time filters#8130
evenyag wants to merge 3 commits into
GreptimeTeam:mainfrom
evenyag:feat/range-cache-time-range

Conversation

@evenyag
Copy link
Copy Markdown
Contributor

@evenyag evenyag commented May 18, 2026

I hereby agree to the terms of the GreptimeDB CLA.

Refer to a related PR or issue link (optional)

What's changed and what's your intention?

Revives the range-cache optimization that strips time-only predicates from the cache key when the query covers a partition's file-time range, so that different queries with different-but-equally-covering time bounds can share a cache entry. The previous version was disabled because its cover check (extract_time_range_from_expr + a TimestampRange containment test) was not a sound implication check and could let the cache serve rows that should have been filtered out.

  • Adds implied_time_range_from_exprs in read::range_cache, a per-expression walker over time-only Expr nodes that returns a TimestampRange in which every predicate is provably true. It handles =, <, <=, >, >=, BETWEEN, and AND-composition; lower bounds round up and upper bounds round down on unit conversion so containment of the partition's FileTimeRange is a sound implication. It bails (returns None) on unsupported shapes — OR, NOT, NOT BETWEEN, IN, non-literal RHS, column-name mismatch, string-timestamp literals, = literals not exactly representable in the column unit, or overflow.
  • build_range_cache_key now calls without_time_filters() only when the scan's implied range contains both the partition's file_min and file_max; otherwise it keeps the full fingerprint. Asserts that the file timestamps and the implied range share the time index column's TimeUnit.
  • build_scan_fingerprint returns a new ScanFingerprintBundle ({ fingerprint, implied_time_range }). StreamContext carries scan_implied_time_range alongside scan_fingerprint. Time-only exprs that the legacy extractor recognizes are routed into the implication walker; unrecognized ones stay in filters and are never stripped.
  • table::predicate exports is_string_timestamp_literal, which now covers Utf8, LargeUtf8, and Utf8View. The existing return_none_if_utf8! macro and the new walker both use it to reject string-timestamp literals consistently.
  • Replaces the prior "preserve filters" tests with parameterized cases for stripping vs. preserving time filters, two distinct queries sharing the same cache key when both cover a partition, and OR disabling the optimization. Adds unit tests for implied_time_range_from_exprs covering supported shapes, unsupported shapes, and cross-unit literal conversion.

PR Checklist

Please convert it to a draft if some of the following conditions are not met.

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.
  • This PR requires documentation updates.
  • API changes are backward compatible.
  • Schema or data changes are backward compatible.

Signed-off-by: evenyag <realevenyag@gmail.com>
@github-actions github-actions Bot added size/M docs-not-required This change does not impact docs. labels May 18, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request improves range cache efficiency by introducing a mechanism to derive implied timestamp ranges from query predicates. This allows time-only filters to be safely removed from cache keys when they fully cover a partition's range, facilitating cache sharing between different queries. The review feedback suggests refining the logic for the Less Than operator to provide tighter bounds when literals are not aligned with the column's time unit, which would maximize cache hit opportunities.

Comment thread src/mito2/src/read/range_cache.rs
evenyag added 2 commits May 19, 2026 17:22
Signed-off-by: evenyag <realevenyag@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
@evenyag evenyag marked this pull request as ready for review May 19, 2026 13:52
@evenyag evenyag requested review from a team, v0y4g3r and waynexia as code owners May 19, 2026 13:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs-not-required This change does not impact docs. size/M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant