Skip to content

feat(tesseract): Support separate pre-aggregations for different multi-stage subqueries#10684

Open
waralexrom wants to merge 18 commits intomasterfrom
tesseract-multi-stage-pre-aggregations-full-support
Open

feat(tesseract): Support separate pre-aggregations for different multi-stage subqueries#10684
waralexrom wants to merge 18 commits intomasterfrom
tesseract-multi-stage-pre-aggregations-full-support

Conversation

@waralexrom
Copy link
Copy Markdown
Member

Check List

  • Tests have been run in packages where changes have been made if available
  • Linter has been run for changed code
  • Tests for the changes have been added if not covered yet
  • Docs have been added / updated if required

@waralexrom waralexrom requested review from a team as code owners April 14, 2026 12:56
@github-actions github-actions bot added rust Pull requests that update Rust code javascript Pull requests that update Javascript code labels Apr 14, 2026
@claude
Copy link
Copy Markdown

claude bot commented Apr 14, 2026

Claude finished @waralexrom's task in 5m 17s —— View job


Code Review: PR #10684 — Support separate pre-aggregations for different multi-stage subqueries

  • Gather context and read PR diff
  • Review Rust changes (Tesseract planner)
  • Review TypeScript changes (schema-compiler, query-orchestrator)
  • Review test coverage
  • Post comprehensive review

Summary

This PR enables Tesseract's multi-stage query engine to use different pre-aggregations for different subqueries within the same top-level query. Previously, all multi-stage leaf measures had to match a single pre-aggregation, or the entire optimization was skipped. Now each leaf measure can independently match its own pre-aggregation, and the system generates unique table placeholders (__usage_N) that get resolved to the appropriate physical tables at query execution time.

The change spans three layers:

  1. Rust planner — optimizer tracks per-usage pre-aggregation metadata, generates __usage_N suffixed placeholders, extracts per-usage date ranges (including time-shift awareness)
  2. Schema compiler (TS) — handles the new array-based multi-usage response format from Rust, maps each usage to pre-aggregation descriptions with merged date ranges
  3. Query orchestrator (TS) — resolves usage-specific placeholders to physical partition table names

Code Quality

Strengths:

  • The rollback mechanism in optimizer.rs (saving/restoring usages.len() and usage_counter) is well-implemented for atomicity — either all subqueries match pre-aggregations or none do
  • Backward compatibility is well-preserved: single-usage queries strip the __usage_N suffix and return the old-style pre-aggregation object format
  • Clean separation of concerns between the Rust planner (generating placeholders) and TS orchestrator (resolving them)
  • The replacement ordering in QueryCache.ts (usage-specific before base) correctly prevents substring collision

Concerns:

  • See inline comments for specific issues around the Join variant in source_with_usage_index, the extract_date_range first-match behavior, and the redundant condition in base_query.rs

Security

No security concerns identified. The changes are internal to the query planning pipeline and don't introduce any new external inputs, SQL injection vectors, or authentication/authorization changes.


Performance

  • The String::replace in base_query.rs:97-98 for single-usage backward compat is a full scan of the SQL string, but this is negligible since it only runs for single-usage queries
  • The partition filtering loop in PreAggregationPartitionRangeLoader.ts for per-usage date ranges is O(partitions × usages), which is acceptable given typical cardinalities
  • group_usages in base_query.rs uses a HashMap and sorts the output — also fine for expected sizes

No performance bottlenecks identified.


Testing

Coverage is good for the happy path:

  • 2 Rust integration tests: separate pre-aggs (same cube, different measures), and time-shift with different date ranges
  • 2 TypeScript integration tests: two multi-stage measures with different pre-aggs, and time-shift partition loading
  • Snapshot tests verify correct SQL output
  • Test schemas are well-constructed to exercise the feature

Potential gaps:

  • Rollback path: No explicit test where one subquery matches a pre-aggregation but another doesn't, verifying the optimizer correctly rolls back to no pre-aggregations at all
  • Join pre-aggregations in multi-stage: The source_with_usage_index has a Join branch that doesn't set usage_index — no test covers this case
  • Edge case — same pre-aggregation used by multiple subqueries: Both subqueries matching the same pre-aggregation would create two usages with different indices for the same physical table. This scenario works in the grouping logic (group_usages), but an explicit test would build confidence
  • Empty date range handling: When extract_date_range finds no date range filters, None is propagated — worth testing that partition loading handles this correctly for multi-usage scenarios

Documentation

The PR doesn't include documentation updates. Since this is an internal planner optimization that doesn't change any user-facing API or configuration, that's acceptable. The code comments explaining the rollback logic and backward-compat choices are helpful.


Overall Assessment

This is a well-structured feature that enables an important optimization for multi-stage queries. The architecture (usage indices, placeholder suffixes, grouped usage info) is clean and extensible. The backward-compatibility strategy is sound.

The main areas for improvement are: (1) adding tests for the rollback/failure path, (2) clarifying the Join variant behavior in source_with_usage_index, and (3) minor code cleanup noted in inline comments. See the 8 inline comments for specific details.

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 14, 2026

Codecov Report

❌ Patch coverage is 51.66667% with 29 lines in your changes missing coverage. Please review.
✅ Project coverage is 78.79%. Comparing base (6f79820) to head (2b4fad1).
⚠️ Report is 20 commits behind head on master.

Files with missing lines Patch % Lines
...orchestrator/PreAggregationPartitionRangeLoader.ts 0.00% 23 Missing ⚠️
...-query-orchestrator/src/orchestrator/QueryCache.ts 0.00% 4 Missing ⚠️
...ejs-schema-compiler/src/adapter/PreAggregations.ts 90.90% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           master   #10684       +/-   ##
===========================================
+ Coverage   57.84%   78.79%   +20.95%     
===========================================
  Files         215      465      +250     
  Lines       16609    91962    +75353     
  Branches     3336     3363       +27     
===========================================
+ Hits         9607    72461    +62854     
- Misses       6514    19010    +12496     
- Partials      488      491        +3     
Flag Coverage Δ
cube-backend 57.99% <51.66%> (+0.14%) ⬆️
cubesql 83.41% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

javascript Pull requests that update Javascript code rust Pull requests that update Rust code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant