-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[CALCITE-7551] Project/Filter/Join transpose and merge rules should not duplicate non-deterministic expressions (e.g. RAND()) #4970
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 1 commit
61088ed
88d3ad7
b851108
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -165,6 +165,13 @@ protected FilterProjectTransposeRule( | |
| // it can be pushed down. For now we don't support this. | ||
| return; | ||
| } | ||
| // Pushing the filter below the project would split a single | ||
| // non-deterministic evaluation (e.g. RAND()) into two: one consumed by | ||
| // the new filter condition, and the original still produced by the | ||
| // project above. Refuse to transpose in that case. | ||
| if (!project.getProjects().stream().allMatch(RexUtil::isDeterministic)) { | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Blocks transpose (because RAND() exists in project)? eg: |
||
| return; | ||
| } | ||
| // convert the filter to one that references the child of the project | ||
| RexNode newCondition = | ||
| RelOptUtil.pushPastProjectUnlessBloat(filter.getCondition(), project, config.bloat()); | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -6133,4 +6133,16 @@ void checkUserDefinedOrderByOver(NullCollation nullCollation) { | |
| + "FROM emp JOIN dept using (deptno)"; | ||
| sql(sql).withConformance(SqlConformanceEnum.PRESTO).ok(); | ||
| } | ||
|
|
||
| /** Test case of | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why is SqlToRel connected to these rules?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Because Concretely, |
||
| * <a href="https://issues.apache.org/jira/browse/CALCITE-7551">[CALCITE-7551] | ||
| * Non-deterministic expressions (e.g. {@code RAND()}) should not be | ||
| * duplicated when projections are merged</a>. The two references to | ||
| * {@code a} in the outer query must resolve to the same {@code RAND()} | ||
| * evaluation produced by the inner sub-query, so the inner projection | ||
| * must not be flattened into the outer. */ | ||
| @Test void testRandNotDuplicatedInProjectionMerge() { | ||
| final String sql = "select a, a + 1 as b from (select rand() as a)"; | ||
| sql(sql).ok(); | ||
| } | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed in Jira, this is a bit too conservative, since it will not distinguish CURRENT_TIMESTAMP from RAND. But fixing that may be in the scope of a separate PR - we need really two separate notions of nondeterminism.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. We should complete the fine-grained judgment of the deterministic function first before completing this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understand, CURRENT_TIMESTAMP is actually different than non-deterministic functions like RAND().
Non-deterministic function: it may return different values for every evaluations.
1.1 Returns false to
isDeterministic().1.2 Returns false to
isDynamicFunction().Dynamic Function: It will return same value at every call site within one statement; can differ across executions
2.1 Returns true to
isDeterministic().2.2 Returns true to
isDynamicFunction().And this path only blocks when we get
isDeterministic()method response as false. Dynamic functions likeCURRENT_TIMESTAMPcan be duplicated and actually it is safe to duplicate them.