Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions core/src/main/java/org/apache/calcite/plan/RelOptUtil.java
Original file line number Diff line number Diff line change
Expand Up @@ -3346,6 +3346,21 @@ public static List<RexNode> pushPastProject(List<? extends RexNode> nodes,
// function? Possibly. But it's invalid SQL, so don't go there.
return null;
}
// [CALCITE-7551] Refuse to merge if it would duplicate a
// non-deterministic expression (e.g. RAND()).
final List<RexNode> bottom = project.getProjects();
final int[] refs = new int[bottom.size()];
new RexVisitorImpl<Void>(true) {
@Override public Void visitInputRef(RexInputRef ref) {
refs[ref.getIndex()]++;
return null;
}
}.visitEach(nodes);
for (int i = 0; i < refs.length; i++) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in Jira, this is a bit too conservative, since it will not distinguish CURRENT_TIMESTAMP from RAND. But fixing that may be in the scope of a separate PR - we need really two separate notions of nondeterminism.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. We should complete the fine-grained judgment of the deterministic function first before completing this PR.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand, CURRENT_TIMESTAMP is actually different than non-deterministic functions like RAND().

  1. Non-deterministic function: it may return different values for every evaluations.
    1.1 Returns false to isDeterministic().
    1.2 Returns false to isDynamicFunction().

  2. Dynamic Function: It will return same value at every call site within one statement; can differ across executions
    2.1 Returns true to isDeterministic().
    2.2 Returns true to isDynamicFunction().

And this path only blocks when we get isDeterministic() method response as false. Dynamic functions like CURRENT_TIMESTAMP can be duplicated and actually it is safe to duplicate them.

if (refs[i] > 1 && !RexUtil.isDeterministic(bottom.get(i))) {
return null;
}
}
final List<RexNode> list = pushPastProject(nodes, project);
final int bottomCount = RexUtil.nodeCount(project.getProjects());
final int topCount = RexUtil.nodeCount(nodes);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -165,6 +165,13 @@ protected FilterProjectTransposeRule(
// it can be pushed down. For now we don't support this.
return;
}
// Pushing the filter below the project would split a single
// non-deterministic evaluation (e.g. RAND()) into two: one consumed by
// the new filter condition, and the original still produced by the
// project above. Refuse to transpose in that case.
if (!project.getProjects().stream().allMatch(RexUtil::isDeterministic)) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocks transpose (because RAND() exists in project)? eg:

SELECT * FROM (
    SELECT RAND() as r, col1 as b FROM emp
  ) 
  WHERE col1 > 0

return;
}
// convert the filter to one that references the child of the project
RexNode newCondition =
RelOptUtil.pushPastProjectUnlessBloat(filter.getCondition(), project, config.bloat());
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@
import org.apache.calcite.rex.RexNode;
import org.apache.calcite.rex.RexProgram;
import org.apache.calcite.rex.RexProgramBuilder;
import org.apache.calcite.rex.RexUtil;
import org.apache.calcite.sql.validate.SqlValidatorUtil;
import org.apache.calcite.tools.RelBuilder;
import org.apache.calcite.tools.RelBuilderFactory;
Expand Down Expand Up @@ -151,6 +152,21 @@ public JoinProjectTransposeRule(RelOptRuleOperand operand,
rightJoinChild = join.getRight();
}

// Skip projects that contain non-deterministic expressions
// (e.g. RAND). The merge below inlines projected expressions
// into the join condition via expandLocalRef, which would
// duplicate every non-deterministic call referenced more than once.
if (leftProject != null
&& !leftProject.getProjects().stream().allMatch(RexUtil::isDeterministic)) {
leftProject = null;
leftJoinChild = join.getLeft();
}
if (rightProject != null
&& !rightProject.getProjects().stream().allMatch(RexUtil::isDeterministic)) {
rightProject = null;
rightJoinChild = join.getRight();
}

if ((leftProject == null) && (rightProject == null)) {
return;
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@
import org.apache.calcite.rex.RexNode;
import org.apache.calcite.rex.RexProgram;
import org.apache.calcite.rex.RexProgramBuilder;
import org.apache.calcite.rex.RexUtil;
import org.apache.calcite.sql.validate.SqlValidatorUtil;
import org.apache.calcite.tools.RelBuilder;
import org.apache.calcite.util.Pair;
Expand Down Expand Up @@ -72,6 +73,15 @@ protected SemiJoinProjectTransposeRule(Config config) {
final Join semiJoin = call.rel(0);
final Project project = call.rel(1);

// Skip when the project contains a non-deterministic expression
// (e.g. RAND). Pulling such a project above the semi-join inlines
// its expressions into the join condition via expandLocalRef and
// then re-emits the projection above, splitting one evaluation
// into many. See [CALCITE-7551].
if (!project.getProjects().stream().allMatch(RexUtil::isDeterministic)) {
return;
}

// Convert the LHS semi-join keys to reference the child projection
// expression; all projection expressions must be RexInputRefs,
// otherwise, we wouldn't have created this semi-join.
Expand Down
66 changes: 66 additions & 0 deletions core/src/test/java/org/apache/calcite/test/RelOptRulesTest.java
Original file line number Diff line number Diff line change
Expand Up @@ -1533,6 +1533,48 @@ private void checkSemiOrAntiJoinProjectTranspose(JoinRelType type) {
.check();
}

/** Test case for
* <a href="https://issues.apache.org/jira/browse/CALCITE-7551">[CALCITE-7551]
* Project/Filter/Join transpose and merge rules can duplicate
* non-deterministic expressions</a>. JoinProjectTransposeRule must
* not pull a project containing a non-deterministic expression above
* the join, because it inlines the expression into the new join
* condition via {@code mergedProgram.expandLocalRef}. */
@Test void testJoinProjectTransposeShouldIgnoreNonDeterministic() {
final Function<RelBuilder, RelNode> relFn = b -> b
.scan("EMP")
.project(b.field("EMPNO"),
b.alias(b.call(SqlStdOperatorTable.RAND), "r"))
.scan("DEPT")
.join(JoinRelType.INNER,
b.and(
b.greaterThan(b.field(2, 0, "r"), b.literal(0.0)),
b.lessThan(b.field(2, 0, "r"), b.literal(1.0))))
.build();
relFn(relFn).withRule(CoreRules.JOIN_PROJECT_LEFT_TRANSPOSE).checkUnchanged();
}

/** Test case for
* <a href="https://issues.apache.org/jira/browse/CALCITE-7551">[CALCITE-7551]
* Project/Filter/Join transpose and merge rules can duplicate
* non-deterministic expressions</a>. SemiJoinProjectTransposeRule
* uses the same {@code mergePrograms} + {@code expandLocalRef}
* pattern as JoinProjectTransposeRule, and must not pull a project
* containing a non-deterministic expression above the semi-join. */
@Test void testSemiJoinProjectTransposeShouldIgnoreNonDeterministic() {
final Function<RelBuilder, RelNode> relFn = b -> b
.scan("EMP")
.project(b.field("EMPNO"),
b.alias(b.call(SqlStdOperatorTable.RAND), "r"))
.scan("DEPT")
.join(JoinRelType.SEMI,
b.and(
b.greaterThan(b.field(2, 0, "r"), b.literal(0.0)),
b.lessThan(b.field(2, 0, "r"), b.literal(1.0))))
.build();
relFn(relFn).withRule(CoreRules.SEMI_JOIN_PROJECT_TRANSPOSE).checkUnchanged();
}

/** Test case for
* <a href="https://issues.apache.org/jira/browse/CALCITE-1338">[CALCITE-1338]
* JoinProjectTransposeRule should not pull a literal above the
Expand Down Expand Up @@ -3204,6 +3246,18 @@ private void checkProjectCorrelateTransposeRuleSemiOrAntiCorrelate(JoinRelType t
.check();
}

/** Test case for
* <a href="https://issues.apache.org/jira/browse/CALCITE-7551">[CALCITE-7551]
* Project/Filter/Join transpose and merge rules can duplicate
* non-deterministic expressions</a>. FilterProjectTransposeRule must
* not pull a filter that references a non-deterministic projected
* column below the project. */
@Test void testFilterProjectTransposeShouldIgnoreNonDeterministic() {
final String sql = "select * from (select rand() as a from emp)\n"
+ "where a > 0 and a < 1";
sql(sql).withRule(CoreRules.FILTER_PROJECT_TRANSPOSE).checkUnchanged();
}

private static final String NOT_STRONG_EXPR =
"case when e.sal < 11 then 11 else -1 * e.sal end";

Expand Down Expand Up @@ -6920,6 +6974,18 @@ private HepProgram getTransitiveProgram() {
sql(sql).withRule(CoreRules.PROJECT_MERGE).checkUnchanged();
}

/** Test case for
* <a href="https://issues.apache.org/jira/browse/CALCITE-7551">[CALCITE-7551]
* Project/Filter/Join transpose and merge rules can duplicate
* non-deterministic expressions</a>. ProjectMergeRule must not merge
* adjacent projects when doing so would duplicate a non-deterministic
* expression. */
@Test void testProjectMergeShouldIgnoreNonDeterministic() {
final String sql = "select a, a + 1 as b from (select rand() as a from emp)";
sql(sql).withRule(CoreRules.PROJECT_MERGE).checkUnchanged();
}


@Test void testAggregateProjectPullUpConstants() {
final String sql = "select job, empno, sal, sum(sal) as s\n"
+ "from emp where empno = 10\n"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6133,4 +6133,16 @@ void checkUserDefinedOrderByOver(NullCollation nullCollation) {
+ "FROM emp JOIN dept using (deptno)";
sql(sql).withConformance(SqlConformanceEnum.PRESTO).ok();
}

/** Test case of
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is SqlToRel connected to these rules?

Copy link
Copy Markdown
Contributor Author

@darpan-e6 darpan-e6 May 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because SqlToRelConverter builds its RelNode tree through RelBuilder, and RelBuilder eagerly merges adjacent projects at construction time using the same helper that the planner rules use.

Concretely, RelBuilder.project_ method uses the method RelOptUtil.pushPastProjectUnlessBloat(nodeList, project, config.bloat()) which we are fixing as part of this PR, so I thought of putting a test here as well.

* <a href="https://issues.apache.org/jira/browse/CALCITE-7551">[CALCITE-7551]
* Non-deterministic expressions (e.g. {@code RAND()}) should not be
* duplicated when projections are merged</a>. The two references to
* {@code a} in the outer query must resolve to the same {@code RAND()}
* evaluation produced by the inner sub-query, so the inner projection
* must not be flattened into the outer. */
@Test void testRandNotDuplicatedInProjectionMerge() {
final String sql = "select a, a + 1 as b from (select rand() as a)";
sql(sql).ok();
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -5903,6 +5903,20 @@ LogicalAggregate(group=[{}], EXPR$0=[COUNT()])
LogicalTableScan(table=[[CATALOG, SALES, EMP]])
}))])
LogicalTableScan(table=[[CATALOG, SALES, EMP]])
]]>
</Resource>
</TestCase>
<TestCase name="testFilterProjectTransposeShouldIgnoreNonDeterministic">
<Resource name="sql">
<![CDATA[select * from (select rand() as a from emp)
where a > 0 and a < 1]]>
</Resource>
<Resource name="planBefore">
<![CDATA[
LogicalProject(A=[$0])
LogicalFilter(condition=[AND(>($0, CAST(0):DOUBLE NOT NULL), <($0, CAST(1):DOUBLE NOT NULL))])
LogicalProject(A=[RAND()])
LogicalTableScan(table=[[CATALOG, SALES, EMP]])
]]>
</Resource>
</TestCase>
Expand Down Expand Up @@ -8445,6 +8459,16 @@ LogicalProject(DEPTNO=[$0], NAME=[$1], NAME0=[$2], EXPR$1=[$3])
LogicalJoin(condition=[=($1, $3)], joinType=[left])
LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
]]>
</Resource>
</TestCase>
<TestCase name="testJoinProjectTransposeShouldIgnoreNonDeterministic">
<Resource name="planBefore">
<![CDATA[
LogicalJoin(condition=[SEARCH($1, Sarg[(0.0E0:DOUBLE..1.0E0:DOUBLE)]:DOUBLE)], joinType=[inner])
LogicalProject(EMPNO=[$0], r=[RAND()])
LogicalTableScan(table=[[scott, EMP]])
LogicalTableScan(table=[[scott, DEPT]])
]]>
</Resource>
</TestCase>
Expand Down Expand Up @@ -11947,6 +11971,18 @@ LogicalProject(EXPR$0=[+($0, 1)])
})])
LogicalProject(X=[ARRAY(1, 2, 3)])
LogicalValues(tuples=[[{ 0 }]])
]]>
</Resource>
</TestCase>
<TestCase name="testProjectMergeShouldIgnoreNonDeterministic">
<Resource name="sql">
<![CDATA[select a, a + 1 as b from (select rand() as a from emp)]]>
</Resource>
<Resource name="planBefore">
<![CDATA[
LogicalProject(A=[$0], B=[+($0, 1)])
LogicalProject(A=[RAND()])
LogicalTableScan(table=[[CATALOG, SALES, EMP]])
]]>
</Resource>
</TestCase>
Expand Down Expand Up @@ -17971,6 +18007,16 @@ LogicalProject(DNAME=[$1])
LogicalAggregate(group=[{0}])
LogicalProject($f0=[*(2, $0)])
LogicalTableScan(table=[[scott, DEPT]])
]]>
</Resource>
</TestCase>
<TestCase name="testSemiJoinProjectTransposeShouldIgnoreNonDeterministic">
<Resource name="planBefore">
<![CDATA[
LogicalJoin(condition=[SEARCH($1, Sarg[(0.0E0:DOUBLE..1.0E0:DOUBLE)]:DOUBLE)], joinType=[semi])
LogicalProject(EMPNO=[$0], r=[RAND()])
LogicalTableScan(table=[[scott, EMP]])
LogicalTableScan(table=[[scott, DEPT]])
]]>
</Resource>
</TestCase>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7243,6 +7243,18 @@ LogicalProject(EXPR$0=[= SOME(1970-01-01 01:23:45, ARRAY(1970-01-01 01:23:45, 19
<![CDATA[
LogicalProject(EXPR$0=[= SOME(1970-01-01 01:23:45, CAST(ARRAY('1970-01-01 01:23:45', '1970-01-01 01:23:46')):TIMESTAMP(0) NOT NULL ARRAY NOT NULL)])
LogicalValues(tuples=[[{ 0 }]])
]]>
</Resource>
</TestCase>
<TestCase name="testRandNotDuplicatedInProjectionMerge">
<Resource name="sql">
<![CDATA[select a, a + 1 as b from (select rand() as a)]]>
</Resource>
<Resource name="plan">
<![CDATA[
LogicalProject(A=[$0], B=[+($0, 1)])
LogicalProject(A=[RAND()])
LogicalValues(tuples=[[{ 0 }]])
]]>
</Resource>
</TestCase>
Expand Down
Loading