Skip to content

feat: Mark array_compact as Compatible and improve test coverage#3889

Open
andygrove wants to merge 2 commits intoapache:mainfrom
andygrove:array-compact-audit
Open

feat: Mark array_compact as Compatible and improve test coverage#3889
andygrove wants to merge 2 commits intoapache:mainfrom
andygrove:array-compact-audit

Conversation

@andygrove
Copy link
Copy Markdown
Member

Which issue does this PR close?

Improves test coverage and correctness of array_compact support level.

Rationale for this change

CometArrayCompact.getSupportLevel returned Incompatible(None), but this was incorrect:

  • On Spark 3.x, the existing tests pass without allowIncompatible, confirming the implementation matches Spark behavior.
  • On Spark 4.0, ArrayCompact is a RuntimeReplaceable that Spark rewrites to ArrayFilter(child, IsNotNull(lambda)) before it reaches Comet's serde, so CometArrayCompact is never invoked. The CometArrayFilter serde already handles this case as Compatible().

The test coverage was also limited to integer arrays only.

What changes are included in this PR?

  • Change CometArrayCompact.getSupportLevel from Incompatible(None) to Compatible()
  • Expand array_compact.sql test coverage:
    • Add string element type tests (via table with column references)
    • Add double element type test
    • Add nested array type test (removes null arrays from outer, preserves null elements in inner)
    • Add empty string preservation test (not confused with nulls)
    • Upgrade existing queries from spark_answer_only to query mode to assert native execution
  • Remove unnecessary ConfigMatrix for dictionary encoding

How are these changes tested?

SQL file tests in array_compact.sql run via CometSqlFileTestSuite. All tests pass.

The CometArrayCompact serde was incorrectly marked as Incompatible(None).
On Spark 3.x, tests pass without allowIncompatible, confirming it matches
Spark behavior. On Spark 4.0, ArrayCompact is a RuntimeReplaceable that
rewrites to ArrayFilter before reaching this serde.

Also expands SQL file test coverage to include string, double, and nested
array element types, and upgrades existing tests from spark_answer_only
to query mode to assert native execution.
@andygrove andygrove changed the title Mark array_compact as Compatible and improve test coverage feat: Mark array_compact as Compatible and improve test coverage Apr 2, 2026
@andygrove andygrove marked this pull request as draft April 2, 2026 22:59
Spark 4.0 wraps ArrayCompact's replacement in KnownNotContainsNull,
a TaggingExpression that marks containsNull=false in the schema but
has no runtime effect. Pass through to the child expression so that
array_compact runs natively on Spark 4.0 instead of falling back.
@andygrove andygrove marked this pull request as ready for review April 3, 2026 01:06
object CometArrayCompact extends CometExpressionSerde[Expression] {

override def getSupportLevel(expr: Expression): SupportLevel = Incompatible(None)
override def getSupportLevel(expr: Expression): SupportLevel = Compatible()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove getSupportLevel entirely

@kazuyukitanimura
Copy link
Copy Markdown
Contributor

I think we will need #3796 as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants