feat: support NullType in row-to-Arrow conversion and shuffle by mbutrovich · Pull Request #4460 · apache/datafusion-comet

mbutrovich · 2026-05-27T21:34:23Z

Which issue does this PR close?

Closes #4457.

Rationale for this change

NullType columns currently break Comet at the row-to-Arrow boundary. Utils.toArrowType throws UnsupportedOperationException for NullType, which surfaces in two places:

CometLocalTableScanExec when a LocalTableScan contains a NullType column (the case in Queries with NullType aggregate fails when native LocalTableScanExec is enabled #4457, e.g. SELECT max(col) FROM VALUES (NULL), (NULL) AS t(col)).
CometShuffleExchangeExec when a Spark LocalTableScanExec with a NullType column feeds a Comet shuffle.

NullType is well-defined in Arrow (ArrowType.Null) and the Spark ArrowWriter already has a NullWriter case, so the right fix is to support it end-to-end rather than fall back. This PR is an alternative to #4458, which adds a LocalTableScanExec-only fallback and leaves the shuffle path broken.

What changes are included in this PR?

Utils.toArrowType maps NullType to ArrowType.Null.
CometShuffleExchangeExec.supportedSerializableDataType accepts NullType for both native and columnar shuffle.
Native shuffle row reader (native/shuffle/src/spark_unsafe/row.rs) handles NullType.

How are these changes tested?

New regression tests:

CometExecSuite "CometLocalTableScanExec handles NullType nested in struct/array/map" covers NullType nested under StructType, ArrayType, and MapType through CometLocalTableScanExec.
CometColumnarShuffleSuite "columnar shuffle with NullType passthrough column" covers JVM-input columnar shuffle with a NullType column. Replaces the older "Fallback to Spark for unsupported input besides ordering" test, which asserted the previous fallback behavior.
CometNativeShuffleSuite "native shuffle with NullType passthrough column" covers native shuffle with a Comet LocalTableScan source containing a NullType column. Gated on spark.comet.exec.localTableScan.enabled=true because native shuffle requires Comet input.

mbutrovich added 3 commits May 27, 2026 17:28

add NullType to toArrowType

3272105

add NullType to shuffles

cf8cd33

Remove changes unrelated to NullType and LocalTableScanexec.

0ed96ba

mbutrovich mentioned this pull request May 27, 2026

fix: fallback NullType LocalTableScanExec to Spark #4458

Closed

mbutrovich added 2 commits May 27, 2026 17:40

Tighten tests.

68381a0

I missed a commit.

cd84c7a

mbutrovich requested a review from andygrove May 28, 2026 00:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support NullType in row-to-Arrow conversion and shuffle#4460

feat: support NullType in row-to-Arrow conversion and shuffle#4460
mbutrovich wants to merge 5 commits into
apache:mainfrom
mbutrovich:fix_4457

mbutrovich commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mbutrovich commented May 27, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant