Skip to content

C++: Divide number of bounds between branches for phi nodes#21329

Merged
jketema merged 6 commits intogithub:mainfrom
paldepind:cpp/simple-range-analysis-phi-divide
Feb 20, 2026
Merged

C++: Divide number of bounds between branches for phi nodes#21329
jketema merged 6 commits intogithub:mainfrom
paldepind:cpp/simple-range-analysis-phi-divide

Conversation

@paldepind
Copy link
Contributor

@paldepind paldepind commented Feb 16, 2026

This PR changes the heuristic for how the number of bounds is calculated for a guard phi node.

This is an improvement in a few ways:

  • It avoids a special case where nodes that are both normal phi nodes and guard phi nodes needed special treatment.

  • I think this heuristic makes sense intuitively.

  • It fixes a problem where a series of if-else statements with guards on the same variable caused the number of bounds to exponentially increase while the actual range analysis did in fact not introduce any bounds.

    The first commit adds an example of this. In the last commit the change to the expected file around test.c:453 to test.c:463 show how we now handle this case better.

    This fixes a recent problem observed over in coding standards. See: Revert "C++: Accept test changes after github/codeql#21313." codeql-coding-standards#1041.

For reviewing the comment inside nrOfBoundsPhiGuard should help explain what is going on.

@github-actions github-actions bot added the C++ label Feb 16, 2026
@paldepind paldepind force-pushed the cpp/simple-range-analysis-phi-divide branch from f041b27 to e5c8f38 Compare February 16, 2026 13:23
@paldepind paldepind force-pushed the cpp/simple-range-analysis-phi-divide branch from e5c8f38 to d0681c6 Compare February 16, 2026 13:36
@paldepind paldepind marked this pull request as ready for review February 16, 2026 13:46
Copilot AI review requested due to automatic review settings February 16, 2026 13:46
@paldepind paldepind requested a review from a team as a code owner February 16, 2026 13:46
@paldepind paldepind requested a review from MathiasVP February 16, 2026 13:46
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves the heuristic for calculating the number of bounds for guard phi nodes in C++ range analysis. The change addresses an exponential growth issue that occurred when analyzing series of if-else statements with guards on the same variable.

Changes:

  • Modified nrOfBoundsPhiGuard function to use a new heuristic: (varBounds + 1) / 2 instead of special-casing certain phi nodes
  • Added test case repeated_if_else_statements demonstrating the fixed scenario with 11 consecutive if-else statements
  • Added helper predicates countNrOfLowerBounds and countNrOfUpperBounds in SimpleRangeAnalysisInternal module
  • Updated test query nrOfBounds.ql to output actual bounds counts alongside estimates

Reviewed changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated no comments.

File Description
cpp/ql/lib/semmle/code/cpp/rangeanalysis/SimpleRangeAnalysis.qll Changed bounds estimation heuristic in nrOfBoundsPhiGuard from special-casing to dividing by 2; added helper predicates for counting actual bounds
cpp/ql/test/library-tests/rangeanalysis/SimpleRangeAnalysis/test.c Added repeated_if_else_statements function demonstrating the exponential growth scenario that is now fixed
cpp/ql/test/library-tests/rangeanalysis/SimpleRangeAnalysis/nrOfBounds.ql Enhanced query to also output actual lower and upper bounds counts for validation
cpp/ql/test/library-tests/rangeanalysis/SimpleRangeAnalysis/*.expected Updated expected test outputs to reflect the improved bounds estimates (line numbers shifted due to new test case)

@paldepind paldepind added the no-change-note-required This PR does not need a change note label Feb 16, 2026
Copy link
Contributor

@MathiasVP MathiasVP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes looks good to me! I think it would be a good idea to add a consistency check to check whether the estimated number of bounds is an upper bound on the actual number of bounds computed in the recursion. I realize this measure isn't genuinely an upper bound in all cases, but I still think it would be a good check to have. In nothing else, it makes it easy to validate changes to the measurements by running that consistency check on e.g., MRVA

if (rhs < 19) { rhs << 1; } else { rhs << 2; }
if (rhs < 20) { rhs << 1; } else { rhs << 2; }
return rhs; // rhs has 12 bounds
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I would have appreciated putting this function in the bottom of the file to avoid conflating the diff in the .expected file (since these tests still aren't using inline expectations). No biggie! Just thought I would mention it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it fits here as it's related to the repeated_if_statements function just above. You right that the diff gets bigger, but at least the QL changes are in a separate commit with a clean .expected diff :)

Co-authored-by: Mathias Vorreiter Pedersen <mathiasvp@github.com>
Copy link
Contributor

@jketema jketema left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question and one typo.

Comment on lines +560 to +561
// by the condition. In this case all lower bounds flow to `{ e1 }` and only
// lower bounds that are smaller than `c` flow to `{ e2 }`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My knowledge of the simple range analysis library, so excuse my ignorance.

In this case all lower bounds flow to { e1 } and only
lower bounds that are smaller than c flow to { e2 }.

This I don't understand, and it's not clear how relate this with the division by 2 you do below. Why do all lower bounds flow to e1 and why is the "smaller than" condition there in the case of e2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. Maybe it's easiest to explain by making things a bit more concrete.

Suppose the lower bounds for x are {2, 11, 22} and c is the constant 5.

In the true branch we know x < 5. This is an upper bound and thus doesn't affect the lower bounds, hence the "all lower bounds flow to { e1 }".

In the false branch we know x >= 5 which allow us to prevent the lower bounds 11 and 22 from flowing to x (done here). So x's bounds will be {2, 5} hence the "only lower bounds that are smaller than c flow to { e2 }"

At the phi node after the if statement the bounds from both branches are joined and we end up with 5 lower bounds: {2, 5, 11, 22}.

It's important to get that final estimate correct, as inaccuracies there can compound (as in the coding standards test with a gazillion if statements after each other) and dividing by 2 and adding a half gets us there.

In general we can't know how many bounds will exist inside each branch. But the number of bounds will be at most the number of bounds on x and at least 1, so guessing right in the middle reduces how wrong we can be in the worst case.

Does that make sense and should I try and expand on the comment?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation. I think it's fine to leave it as is. I clearly just don't have enough background on how the lower and upper bounds interact.

I'm happy to approve once you've fixed the typo I spotted.

paldepind and others added 2 commits February 20, 2026 16:24
Co-authored-by: Jeroen Ketema <93738568+jketema@users.noreply.github.com>
@jketema jketema merged commit 8947f7a into github:main Feb 20, 2026
20 of 21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

C++ no-change-note-required This PR does not need a change note

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments