Skip to content

C++: Speed up cpp/leap-year/unchecked-after-arithmetic-year-modification#21768

Merged
MathiasVP merged 2 commits intomainfrom
speed-up-unchecked-leap-year-after-modification
Apr 30, 2026
Merged

C++: Speed up cpp/leap-year/unchecked-after-arithmetic-year-modification#21768
MathiasVP merged 2 commits intomainfrom
speed-up-unchecked-leap-year-after-modification

Conversation

@MathiasVP
Copy link
Copy Markdown
Contributor

This PR fixes a performance problem in cpp/leap-year/unchecked-after-arithmetic-year-modification which was caused by #21292.

The problem can be seen in this partial run:

[2026-04-28 05:55:08] Evaluated non-recursive predicate UncheckedLeapYearAfterYearModification::IgnorableOperationToOperationSourceCandidateFlow::Flow::S6Graph::localStep/9#f7e3c284@74ab34gm in 3319052ms (size: 310706007).
Evaluated relational algebra for predicate UncheckedLeapYearAfterYearModification::IgnorableOperationToOperationSourceCandidateFlow::Flow::S6Graph::localStep/9#f7e3c284@74ab34gm with tuple counts:
  260207923   ~6%    {7} r1 = SCAN num#UncheckedLeapYearAfterYearModification::IgnorableOperationToOperationSourceCandidateFlow::Flow::S6Graph::TPathNodeMid#586d6f71 OUTPUT In.0, In.3, In.4, In.5, In.1, In.2, In.6
                  
      582279   ~5%    {9} r2 = JOIN r1 WITH `UncheckedLeapYearAfterYearModification::IgnorableOperationToOperationSourceCandidateFlow::Flow::Stage6::fwdFlowRead/11#743a499a_01239105678#join_rhs` ON FIRST 6 OUTPUT Lhs.6, Rhs.6, Lhs.4, Lhs.5, Rhs.7, Rhs.8, Rhs.9, _, _
      582279   ~0%    {9}    | REWRITE WITH Out.7 := "", Out.8 := false
                  
    9407058   ~3%    {8} r3 = JOIN r1 WITH `UncheckedLeapYearAfterYearModification::IgnorableOperationToOperationSourceCandidateFlow::Flow::Stage6::fwdFlowStore/10#7594007a_0123894567#join_rhs` ON FIRST 6 OUTPUT Rhs.6, Lhs.2, Lhs.4, Lhs.5, Lhs.6, Rhs.7, Rhs.8, Rhs.9
    7512614   ~5%    {9}    | JOIN WITH `UncheckedLeapYearAfterYearModification::IgnorableOperationToOperationSourceCandidateFlow::Flow::AccessPath.isCons/2#dispred#1f6e207a_120#join_rhs` ON FIRST 2 OUTPUT Lhs.4, Lhs.7, Lhs.2, Lhs.3, Lhs.5, Rhs.2, Lhs.6, _, _
    7512614   ~3%    {9}    | REWRITE WITH Out.7 := "", Out.8 := true
                  
  260207923   ~4%    {7} r4 = SCAN num#UncheckedLeapYearAfterYearModification::IgnorableOperationToOperationSourceCandidateFlow::Flow::S6Graph::TPathNodeMid#586d6f71 OUTPUT In.1, In.0, In.2, In.3, In.4, In.5, In.6
  260207923   ~4%    {7}    | JOIN WITH `DataFlowImplCommon::CallContextSensitivity<UncheckedLeapYearAfterYearModification::IgnorableOperationToOperationSourceCandidateFlow::Flow::Stage6Param::CallContextSensitivityInput>::CallContext#1201065c` ON FIRST 1 OUTPUT Lhs.0, Lhs.1, Lhs.2, Lhs.3, Lhs.4, Lhs.5, Lhs.6
                  
  260203381   ~4%    {7} r5 = r4 AND NOT `DataFlowImplCommon::CallContextSensitivity<UncheckedLeapYearAfterYearModification::IgnorableOperationToOperationSourceCandidateFlow::Flow::Stage6Param::CallContextSensitivityInput>::PrunedViableImpl<DefaultPrunedViableImplInput>::LocalCallContext::getUnreachable/1#4e807359_0#antijoin_rhs`(FIRST 1)
  260203381   ~0%    {8}    | JOIN WITH cached_DataFlowImplCommon::Cached::TAnyLocalCall#c01dcce5 CARTESIAN PRODUCT OUTPUT Rhs.0, Lhs.0, Lhs.1, Lhs.2, Lhs.3, Lhs.4, Lhs.5, Lhs.6
  260203381   ~0%    {8}    | JOIN WITH DataFlowImplCommon::LocalCallContext#6167e342 ON FIRST 1 OUTPUT Lhs.2, Lhs.1, Lhs.3, Lhs.4, Lhs.5, Lhs.6, Lhs.7, Lhs.0
                  
        4542   ~0%    {8} r6 = JOIN r4 WITH `DataFlowImplCommon::CallContextSensitivity<UncheckedLeapYearAfterYearModification::IgnorableOperationToOperationSourceCandidateFlow::Flow::Stage6Param::CallContextSensitivityInput>::PrunedViableImpl<DefaultPrunedViableImplInput>::LocalCallContext::getUnreachable/1#4e807359` ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.0, Lhs.2, Lhs.3, Lhs.4, Lhs.5, Lhs.6
        4542   ~0%    {8}    | JOIN WITH cached_DataFlowImplCommon::Cached::TSpecificLocalCall#05dd466c ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.2, Lhs.3, Lhs.4, Lhs.5, Lhs.6, Lhs.7
        4542   ~0%    {8}    | JOIN WITH DataFlowImplCommon::LocalCallContext#6167e342 ON FIRST 1 OUTPUT Lhs.1, Lhs.2, Lhs.3, Lhs.4, Lhs.5, Lhs.6, Lhs.7, Lhs.0
                  
  260207923   ~0%    {8} r7 = r5 UNION r6
  260207923   ~1%    {8}    | JOIN WITH num#UncheckedLeapYearAfterYearModification::IgnorableOperationToOperationSourceCandidateFlow::Flow::S6Graph::TPathNodeMid#586d6f71 ON FIRST 7 OUTPUT Lhs.0, Lhs.7, Lhs.1, Lhs.2, Lhs.4, Lhs.5, Lhs.6, Lhs.3
  302370000   ~0%    {9}    | JOIN WITH `project#UncheckedLeapYearAfterYearModification::IgnorableOperationToOperationSourceCandidateFlow::Flow::Stage5::LocalFlowBigStep<Stage4::LocalFlowBigStep<Stage3Param::localFlowBigStep>::localFlowBigStepTc>::localFlowBigStepTc/6#58c31154#5_0213#join_rhs` ON FIRST 2 OUTPUT Lhs.6, Rhs.2, Lhs.2, Lhs.3, Lhs.7, Lhs.4, Lhs.5, Rhs.3, _
  302370000   ~0%    {9}    | REWRITE WITH Out.8 := false
                  
  260207923   ~3%    {6} r8 = SCAN num#UncheckedLeapYearAfterYearModification::IgnorableOperationToOperationSourceCandidateFlow::Flow::S6Graph::TPathNodeMid#586d6f71 OUTPUT In.1, In.0, In.2, In.4, In.5, In.6
  260207923   ~3%    {6}    | JOIN WITH `DataFlowImplCommon::CallContextSensitivity<UncheckedLeapYearAfterYearModification::IgnorableOperationToOperationSourceCandidateFlow::Flow::Stage6Param::CallContextSensitivityInput>::CallContext#1201065c` ON FIRST 1 OUTPUT Lhs.0, Lhs.1, Lhs.2, Lhs.3, Lhs.4, Lhs.5
                  
  260203381   ~3%    {6} r9 = r8 AND NOT `DataFlowImplCommon::CallContextSensitivity<UncheckedLeapYearAfterYearModification::IgnorableOperationToOperationSourceCandidateFlow::Flow::Stage6Param::CallContextSensitivityInput>::PrunedViableImpl<DefaultPrunedViableImplInput>::LocalCallContext::getUnreachable/1#4e807359_0#antijoin_rhs`(FIRST 1)
  260203381   ~2%    {7}    | JOIN WITH cached_DataFlowImplCommon::Cached::TAnyLocalCall#c01dcce5 CARTESIAN PRODUCT OUTPUT Rhs.0, Lhs.0, Lhs.1, Lhs.2, Lhs.3, Lhs.4, Lhs.5
  260203381   ~0%    {7}    | JOIN WITH DataFlowImplCommon::LocalCallContext#6167e342 ON FIRST 1 OUTPUT Lhs.4, Lhs.2, Lhs.1, Lhs.3, Lhs.5, Lhs.6, Lhs.0
                  
        4542   ~1%    {7} r10 = JOIN r8 WITH `DataFlowImplCommon::CallContextSensitivity<UncheckedLeapYearAfterYearModification::IgnorableOperationToOperationSourceCandidateFlow::Flow::Stage6Param::CallContextSensitivityInput>::PrunedViableImpl<DefaultPrunedViableImplInput>::LocalCallContext::getUnreachable/1#4e807359` ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.0, Lhs.2, Lhs.3, Lhs.4, Lhs.5
        4542   ~0%    {7}    | JOIN WITH cached_DataFlowImplCommon::Cached::TSpecificLocalCall#05dd466c ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.2, Lhs.3, Lhs.4, Lhs.5, Lhs.6
        4542   ~0%    {7}    | JOIN WITH DataFlowImplCommon::LocalCallContext#6167e342 ON FIRST 1 OUTPUT Lhs.4, Lhs.1, Lhs.2, Lhs.3, Lhs.5, Lhs.6, Lhs.0
                  
  260207923   ~0%    {7} r11 = r9 UNION r10
    2115749   ~0%    {8}    | JOIN WITH num#UncheckedLeapYearAfterYearModification::IgnorableOperationToOperationSourceCandidateFlow::Flow::TAccessPathNil#dc6d0df5 ON FIRST 1 OUTPUT Lhs.1, _, Lhs.6, Lhs.2, Lhs.3, Lhs.0, Lhs.4, Lhs.5
    2115749   ~5%    {8}    | REWRITE WITH Out.1 := false
      251639   ~0%    {9}    | JOIN WITH `UncheckedLeapYearAfterYearModification::IgnorableOperationToOperationSourceCandidateFlow::Flow::Stage5::LocalFlowBigStep<Stage4::LocalFlowBigStep<Stage3Param::localFlowBigStep>::localFlowBigStepTc>::localFlowBigStepTc/6#58c31154_024135#join_rhs` ON FIRST 3 OUTPUT Lhs.7, Rhs.3, Lhs.3, Lhs.4, Rhs.4, Lhs.5, Lhs.6, Rhs.5, _
      251639   ~0%    {9}    | REWRITE WITH Out.8 := false
                  
  310716532   ~0%    {9} r12 = r2 UNION r3 UNION r7 UNION r11
                      return r12

The underlying problem is that the dataflow computation specified by IgnorableOperationToOperationSourceCandidateConfig has way too many sources and sinks. So even though it is restricted to flows where the source and the sink are in the same enclosing callable there is way too many PathNodes.

To fix this problem we modify the ordering of dataflows. On main we do (ignoring some of the other dataflow configurations used in this query):

  1. An initial dataflow computation specified by IgnorableOperationToOperationSourceCandidateConfig to identify operations to ignore
  2. Use the result from (1) to filter out certain OperationSource, which then serve as the sources for the "final" dataflow computation specified by OperationToYearAssignmentConfig.

After this PR we do:

  1. An initial dataflow computation OperationToYearAssignmentConfig0 (which is equal to the original OperationToYearAssignmentConfig, but without the filtering done by IgnorableOperationToOperationSourceCandidateConfig)
  2. Another dataflow computation specified by IgnorableOperationToOperationSourceCandidateConfig. This is the same computation as we do on main, but now restricted to only relevant sinks (which we get from (1))
  3. A "final" dataflow computation specified by OperationToYearAssignmentConfig (which is equal to OperationToYearAssignmentConfig0, but with the restricted set of sinks from (2))

On the particular database I was testing on this now completes "trivially" (since sentinals now detect that the query will have 0 results since UncheckedLeapYearAfterYearModification::OperationToYearAssignmentFlow::Flow::S6::flowPath/2#9b05e974/2 is empty).

…er an initial round of the query to reduce the number of sinks.
Copilot AI review requested due to automatic review settings April 28, 2026 21:04
@MathiasVP MathiasVP requested a review from a team as a code owner April 28, 2026 21:04
@github-actions github-actions Bot added the C++ label Apr 28, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a severe performance regression in the C++ query cpp/leap-year/unchecked-after-arithmetic-year-modification by restructuring the query’s dataflow computations to drastically reduce the number of sources/sinks (and resulting PathNodes) considered during evaluation.

Changes:

  • Introduces an initial “pre-filter” dataflow (OperationToYearAssignmentConfig0 / OperationToYearAssignmentFlow0) to identify only operation-source candidates that can reach a year assignment.
  • Restricts the sinks of IgnorableOperationToOperationSourceCandidateConfig to only those operation-source candidates that are relevant to year assignments.
  • Refactors the final OperationToYearAssignmentConfig to use the restricted set and apply the ignorable-operation filtering in a later stage.
Show a summary per file
File Description
cpp/ql/src/Likely Bugs/Leap Year/UncheckedLeapYearAfterYearModification.ql Reorders and splits dataflow configurations to constrain sink/source sets and improve evaluation performance.

Copilot's findings

  • Files reviewed: 1/1 changed files
  • Comments generated: 1

Comment thread cpp/ql/src/Likely Bugs/Leap Year/UncheckedLeapYearAfterYearModification.ql Outdated
…ification.ql

Co-authored-by: Jeroen Ketema <93738568+jketema@users.noreply.github.com>
@MathiasVP MathiasVP added the no-change-note-required This PR does not need a change note label Apr 29, 2026
@MathiasVP
Copy link
Copy Markdown
Contributor Author

severe performance regression

Not really sure I agree with Copilot's definition of "severe" here since this was 1 repository across all of Microsoft, but oh well

Copy link
Copy Markdown
Contributor

@geoffw0 geoffw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine and reasonable, performance is about the same in my local testing but I can see how this has a dramatic effect on certain cases.

I think I've said it before and I'll say it again: this file really needs a diagram of how all the data flows fit together. Its a bit late for this change, but next time I do a review on this query, I might hold you / the author to this.

Not really sure I agree with Copilot's definition of "severe" here since this was 1 repository across all of Microsoft, but oh well

😆

@MathiasVP
Copy link
Copy Markdown
Contributor Author

Its a bit late for this change, but next time I do a review on this query, I might hold you / the author to this.

Partly fair. I also have very little idea about how this query works so I'm glad you're not blocking the approval on that 😅 I was only concerned about fixing the performance

@MathiasVP MathiasVP merged commit 154d213 into main Apr 30, 2026
18 of 19 checks passed
@MathiasVP MathiasVP deleted the speed-up-unchecked-leap-year-after-modification branch April 30, 2026 15:06
@geoffw0
Copy link
Copy Markdown
Contributor

geoffw0 commented Apr 30, 2026

... we actually lose a (single) result in grpc/grpc due to this change. I think when I reviewed this my MRVA run was at 95% or something, so this didn't show up and I called it good enough. Anyway I'm not too worried, but worth noting that something in this PR probably wasn't strictly behaviour neutral.

@MathiasVP
Copy link
Copy Markdown
Contributor Author

... we actually lose a (single) result in grpc/grpc due to this change. I think when I reviewed this my MRVA run was at 95% or something, so this didn't show up and I called it good enough. Anyway I'm not too worried, but worth noting that something in this PR probably wasn't strictly behaviour neutral.

Oh, that's weird. There was not supposed to be any changes to the semantics of the query. I'll take a look

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

C++ no-change-note-required This PR does not need a change note

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants