Skip to content

test: improve array_distinct test coverage and incompatibility description#3887

Open
andygrove wants to merge 1 commit intoapache:mainfrom
andygrove:improve-array-distinct-tests
Open

test: improve array_distinct test coverage and incompatibility description#3887
andygrove wants to merge 1 commit intoapache:mainfrom
andygrove:improve-array-distinct-tests

Conversation

@andygrove
Copy link
Copy Markdown
Member

Which issue does this PR close?

Closes #3174.

Rationale for this change

The existing array_distinct SQL file tests only covered array<int> with 2 queries. The Incompatible support level had no description, making it unclear to users why the expression is marked incompatible.

What changes are included in this PR?

  1. Expanded SQL file tests for array_distinct from 2 queries to comprehensive coverage across 7 element types:

    • INT: duplicates, empty array, NULL input, NULL elements, all-NULLs, boundary values (INT_MIN/MAX), negative values
    • BIGINT: duplicates, NULL elements, boundary values (Long.MIN/MAX)
    • STRING: duplicates, empty strings, NULL elements, empty-string-vs-NULL distinction
    • BOOLEAN: all combinations of true/false with NULLs
    • DOUBLE: duplicates, NaN deduplication, NaN+NULL, Infinity/-Infinity, negative zero
    • FLOAT: duplicates, NaN deduplication
    • DECIMAL(10,2): duplicates, NULL elements
    • Nested arrays: array<array<int>> with duplicates and NULLs
  2. Added descriptive reason to Incompatible support level: "Output elements are sorted rather than preserving insertion order".

How are these changes tested?

All new tests are SQL file tests that run via CometSqlFileTestSuite with the ConfigMatrix for both dictionary-encoded and plain parquet. Tests use spark_answer_only mode since the expression is Incompatible (DataFusion sorts output elements while Spark preserves insertion order). Verified passing locally with both dictionary configurations.

…ption

Expand SQL file tests for array_distinct from 2 queries on array<int>
to comprehensive coverage across int, bigint, string, boolean, double,
float, decimal, and nested array types. Add edge case coverage for NULL
handling, NaN/Infinity deduplication, boundary values, and negative zero.

Add descriptive reason to the Incompatible support level so users
understand that output elements are sorted rather than preserving
insertion order.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Incompatibility] Document array_distinct behavior differences: element ordering

2 participants