Skip to content

added support for MapFromEntries#21720

Open
athlcode wants to merge 4 commits intoapache:mainfrom
athlcode:support/MapFromEntries
Open

added support for MapFromEntries#21720
athlcode wants to merge 4 commits intoapache:mainfrom
athlcode:support/MapFromEntries

Conversation

@athlcode
Copy link
Copy Markdown

@athlcode athlcode commented Apr 18, 2026

Which issue does this PR close?

Closes # (none: prerequisite for apache/datafusion-comet#2706; follow-up to #17779 and #19274).

Rationale for this change

The existing Spark map_from_entries / map_from_arrays UDFs silently kept the last occurrence of a duplicate key, hardcoded to the LAST_WIN branch of Spark's spark.sql.mapKeyDedupPolicy. Spark's default is EXCEPTION, and Spark 4 raises SparkRuntimeException with error class DUPLICATED_MAP_KEY. Without matching the default, downstream engines (e.g. datafusion-comet) have to fall back to Spark even for the common case.

What changes are included in this PR?

  1. In map_deduplicate_keys (utils.rs), raise [DUPLICATED_MAP_KEY] Duplicate map key {key} was found ... when a duplicate is encountered, matching Spark's default behavior and error class. The LAST_WIN branch is removed along with its TODO comment.
  2. Update the two affected sqllogictest files to assert the new DUPLICATED_MAP_KEY error instead of the previous last-wins output.

No config-option or enum is introduced at this stage. map_from_entries.rs, map_from_arrays.rs, and config.rs are untouched.

Are these changes tested?

Yes. The duplicate-key assertions in sqllogictest/test_files/spark/map/map_from_entries.slt and sqllogictest/test_files/spark/map/map_from_arrays.slt are flipped from positive-result to query error.

Are there any user-facing changes?

Yes, a behavior change: duplicate keys now raise [DUPLICATED_MAP_KEY] under the default policy instead of silently collapsing to the last occurrence. This aligns with Spark's documented default. No new config keys, no API changes.

@github-actions github-actions bot added common Related to common crate spark labels Apr 18, 2026
@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) and removed common Related to common crate labels Apr 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

spark sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant