feat: support '>', '<', '>=', '<=', '<>' in all operator#21416
feat: support '>', '<', '>=', '<=', '<>' in all operator#21416berkaysynnada merged 7 commits intoapache:mainfrom
Conversation
Jefffrey
left a comment
There was a problem hiding this comment.
This is a question actually I've checked behaviors of Postgresql and Duckdb about null semantics and continued with the Postgresql behavior. However, I'm not sure if we want this so also put Duckdb outputs. It would be great to have feedback on this
I agree with following postgres semantics; to me it seems more intuitive and inline with other null operation semantics, and off the top of my head I think DataFusion tries to follow postgres more closely though I may be remembering incorrect (or out of date).
I haven't fully reviewed this PR, since it seems to change both the null semantics for any whilst also introducing all support. It might be a good idea to split these into separate PRs to more them easier to review.
| left_expr: &Expr, | ||
| right_expr: &Expr, | ||
| compare_op: &BinaryOperator, | ||
| is_all: bool, |
There was a problem hiding this comment.
Probably more readable to have is_all be an enum instead
// Or some better name
enum QuantifyType {
All,
Any,
}There was a problem hiding this comment.
Thanks, I'll apply this after merging both operators in a follow-up PR. Currently this PR only adds ALL operator support
Thanks @Jefffrey for checking this PR. I've combined ANY operator logic to not cause duplication but I understand it makes sense to separate PRs. Thus, I've narrowed down PR scope to only include |
| # NULL LHS with empty array returns TRUE (vacuous truth) | ||
| query B | ||
| select NULL = ALL(arrow_cast(make_array(), 'List(Int64)')); | ||
| ---- | ||
| true |
There was a problem hiding this comment.
I find this a bit surprising since I would assume if the needle is null then we'd always return null regardless of the haystack 🤔
There was a problem hiding this comment.
I was surprised about the semantics and results as well btw. That's why I've checked every edge case with postgres and continued with their approach but you're right
| when(null_arr_check, null_bool.clone()) | ||
| .when(empty_check, lit(true)) | ||
| .when(null_lhs_check, null_bool.clone()) | ||
| .when(decisive_condition, lit(false)) | ||
| .when(has_nulls, null_bool) | ||
| .otherwise(lit(true)) |
There was a problem hiding this comment.
I do wonder if we're better off having a UDF implementation instead of this case approach, though maybe thats something that can be explored as a followup
There was a problem hiding this comment.
I was thinking of this as well especially regarding performance wise. As you've said I thought maybe I can have a followup UDF implementation and compare results
Co-authored-by: Jeffrey Vo <jeffrey.vo.australia@gmail.com>
Co-authored-by: Jeffrey Vo <jeffrey.vo.australia@gmail.com>
berkaysynnada
left a comment
There was a problem hiding this comment.
LGTM, thanks @buraksenn. Looking forward to the follow-ups :)
Which issue does this PR close?
Rationale for this change
Related with #20830 all operator does not support operators above.
What changes are included in this PR?
Adds support for other expressions and add tests
This is a question actually I've checked behaviors of Postgresql and Duckdb about null semantics and continued with the Postgresql behavior. However, I'm not sure if we want this so also put Duckdb outputs. It would be great to have feedback on this
5 = ALL(NULL::INT[])NULLNULLtrue5 <> ALL(NULL::INT[])NULLNULLtrue5 > ALL(NULL::INT[])NULLNULLtrue5 < ALL(NULL::INT[])NULLNULLtrue5 >= ALL(NULL::INT[])NULLNULLtrue5 <= ALL(NULL::INT[])NULLNULLtrueAre these changes tested?
Added slt tests for this and they all pass
Are there any user-facing changes?
Yes user's can now use all operator with this new expressions