Skip to content

fix: audit array_insert expression for correctness and test coverage#3890

Open
andygrove wants to merge 3 commits intoapache:mainfrom
andygrove:audit-array-insert
Open

fix: audit array_insert expression for correctness and test coverage#3890
andygrove wants to merge 3 commits intoapache:mainfrom
andygrove:audit-array-insert

Conversation

@andygrove
Copy link
Copy Markdown
Member

Which issue does this PR close?

N/A - proactive audit of existing expression.

Rationale for this change

Audited the array_insert expression against Spark 3.4.3, 3.5.8, and 4.0.1 source code to verify correctness and identify test coverage gaps. Found a metadata bug and significant missing test coverage.

What changes are included in this PR?

Bug fix:

  • Fixed nullable() in ArrayInsert (Rust) to account for pos_expr nullability, matching Spark's first.nullable | second.nullable. Previously only checked src_array_expr, so if the position column was nullable but the array was not, the metadata incorrectly reported "not nullable".

Test coverage (expanded from 1 query to 27+ queries):

  • Column and literal arguments for int, string, boolean, double, float, long, short, and byte array types
  • NULL handling: null array, null position, null value, arrays with existing nulls
  • Positive out-of-bounds insertion with null padding (pos=5, pos=10)
  • Negative indices: in-range (-1 through -4) and out-of-bounds (-5, -10) with null padding
  • Special double values: NaN, Infinity, -Infinity, negative zero
  • Multibyte UTF-8 characters and empty strings
  • Legacy negative index mode (spark.sql.legacy.negativeIndexInArrayInsert=true) in a separate test file
  • Parquet dictionary encoding via ConfigMatrix

Documentation:

  • Added docs/source/contributor-guide/expression-audit-log.md to track audited expressions
  • Updated audit skill with Step 8 to maintain the audit log

How are these changes tested?

All new SQL file tests pass against both Spark and Comet:

./mvnw test -Dsuites="org.apache.comet.CometSqlFileTestSuite array_insert" -Dtest=none

Rust unit tests and clippy pass. Full build succeeds.

Fix nullable() metadata bug in ArrayInsert where only src_array_expr
was checked but not pos_expr, causing incorrect nullability reporting
when the position column is nullable. Expand SQL test coverage from
1 query to 27+ queries across multiple types and edge cases.
@andygrove
Copy link
Copy Markdown
Member Author

@kazuyukitanimura @martin-g This PR was created using the audit skill that you reviewed

under the License.
-->

# Expression Audit Log
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is important to keep track of which expressions have been audited and against which Spark versions. Maintaining a markdown doc is one approach. Another approach could be to add comments and/or annotations directly in the code in the serde handlers.

@andygrove andygrove marked this pull request as ready for review April 3, 2026 01:06
@martin-g
Copy link
Copy Markdown
Member

martin-g commented Apr 3, 2026

@kazuyukitanimura @martin-g This PR was created using the audit skill that you reviewed

Really cool!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants