Fix volatile scalar subquery deduplication#22736
Conversation
|
@neilconway would you be available to review this pr? |
| let mut all_subqueries = Self::collect_scalar_subqueries(logical_plan); | ||
| let freshened = if all_subqueries.iter().any(Subquery::is_volatile) { | ||
| Some(Self::freshen_volatile_subqueries(logical_plan)?) | ||
| } else { | ||
| Vec::new() | ||
| None | ||
| }; | ||
| let logical_plan = match &freshened { | ||
| Some(freshened) => { | ||
| all_subqueries = Self::collect_scalar_subqueries(freshened); | ||
| freshened | ||
| } | ||
| None => logical_plan, | ||
| }; |
There was a problem hiding this comment.
Calling collect_scalar_subqueries twice is unfortunate; as is the need to traverse the whole tree yet again to freshen the volatile SQs. I'm also not crazy about distinguishing between two volatile-containing subqueries based purely on the value of an Arc pointer.
Stepping back a bit, whether a SQ is volatile is not is a local property of the SQ, so we should be able to determine how a subquery should be handled as part of a single tree traversal. What if we did something like:
- Do a single pass over the plan
- When we encounter a subquery, determine if it is volatile or not
- If it's volatile, give it a fresh index. If non-volatile, look it up (structural equality) and assign/reuse an index as appropriate
- Write the index back into the logical plan node
That means that two textually equal SQs containing volatile expressions will compare non-equal via structural equality, so I think we'll get the semantics we want? It's a bit gross to write a field back into the logical plan but I can't think of something cleaner at the moment.
There was a problem hiding this comment.
Thank you @neilconway for great feedback as always. I have tried to address the concerns please check.
| pub fn is_volatile(&self) -> bool { | ||
| self.exists(|expr| Ok(expr.is_volatile_node())) | ||
| .expect("exists closure is infallible") | ||
| self.exists(|expr| { | ||
| let subquery_is_volatile = match expr { | ||
| Expr::ScalarSubquery(subquery) | ||
| | Expr::Exists(Exists { subquery, .. }) | ||
| | Expr::InSubquery(InSubquery { subquery, .. }) | ||
| | Expr::SetComparison(SetComparison { subquery, .. }) => { | ||
| subquery.is_volatile() | ||
| } | ||
| _ => false, | ||
| }; | ||
| Ok(expr.is_volatile_node() || subquery_is_volatile) | ||
| }) | ||
| .expect("exists closure is infallible") | ||
| } |
There was a problem hiding this comment.
$ ag is_volatile | wc -l
77
We should be careful about changing the semantics of a widely used API like this. It's not clear to me that the change is wrong (or right), but it has both behavioral and performance consequences. We should look at all of those existing call-sites and understand whether recursing into subqueries is the right behavior or not. And if we are going to change this, we should ensure it has unit test coverage.
|
Thank you for opening this pull request! Reviewer note: cargo-semver-checks reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch). Details |
| } | ||
|
|
||
| /// Returns `true` if any expression in `plan` is volatile. | ||
| fn plan_contains_volatile(plan: &LogicalPlan) -> bool { |
There was a problem hiding this comment.
small but is it possible to make this public?
Would be useful for detecting volatile functions when deciding to materialize subplans (#22676)
Which issue does this PR close?
Rationale for this change
Identical uncorrelated scalar subqueries are deduplicated, but this is not correct when the subquery contains volatile expressions such as
random().For example,
(SELECT random()) - (SELECT random())must evaluate both scalar subquery occurrences independently.What changes are included in this PR?
ExecutionPropsfield type.Are these changes tested?
Yes
Are there any user-facing changes?
Yes. Queries with repeated volatile scalar subqueries now evaluate each occurrence independently instead of incorrectly reusing one result.
There is also a public API type change for
ExecutionProps::subquery_indexes, documented in the upgrade notes.