Skip to content

feat: add GetTimestamp support via codegen dispatcher#4454

Open
andygrove wants to merge 1 commit into
apache:mainfrom
andygrove:feat/get-timestamp
Open

feat: add GetTimestamp support via codegen dispatcher#4454
andygrove wants to merge 1 commit into
apache:mainfrom
andygrove:feat/get-timestamp

Conversation

@andygrove
Copy link
Copy Markdown
Member

Which issue does this PR close?

Closes #.

Rationale for this change

Spark's GetTimestamp expression powers the format-based variants of to_timestamp(s, fmt), to_date(s, fmt), to_timestamp_ntz(s, fmt), and try_to_timestamp(s, fmt). Until now Comet had no serde for it, so any plan containing those calls fell back to Spark for the enclosing operator.

A native re-implementation would have to mirror Spark's full SimpleDateFormat / TimestampFormatter parsing semantics, including time parser policy (LEGACY/CORRECTED/EXCEPTION), ANSI failure modes, timezone handling, and locale awareness. The upstream datafusion-spark crate has no equivalent function, so a from-scratch native implementation would be a large project unlikely to be 100% Spark-compatible without significant follow-up work.

The closely-related ToUnixTimestamp already uses the codegen dispatcher (CometCodegenDispatch) to run Spark's own doGenCode inside the Comet pipeline. Wiring GetTimestamp the same way produces bit-exact results today and works uniformly across Spark 3.4, 3.5, and 4.0.

What changes are included in this PR?

  • Adds object CometGetTimestamp extends CometCodegenDispatch[GetTimestamp] in spark/src/main/scala/org/apache/comet/serde/datetime.scala.
  • Registers classOf[GetTimestamp] -> CometGetTimestamp in the temporalExpressions map in QueryPlanSerde.scala.
  • Adds two SQL file tests:
    • spark/src/test/resources/sql-tests/expressions/datetime/get_timestamp.sql covers to_timestamp(s, fmt), to_date(s, fmt), to_timestamp_ntz(s, fmt), try_to_timestamp(s, fmt), NULLs, parse errors, column references, literals, and column-as-format.
    • spark/src/test/resources/sql-tests/expressions/datetime/get_timestamp_ansi.sql covers ANSI mode error throwing with expect_error(CANNOT_PARSE_TIMESTAMP) and the non-throwing try_to_timestamp fallback.

The implement-comet-expression skill was used to scaffold the implementation.

How are these changes tested?

./mvnw test -Pspark-3.5 -Dtest=none -Dsuites="org.apache.comet.CometSqlFileTestSuite get_timestamp" -DwildcardSuites=none

Both new test files pass. The existing to_timestamp_*, try_to_timestamp_*, and related SQL file tests continue to pass under the same profile (12 tests total).

Routes Spark's GetTimestamp expression through the existing codegen
dispatcher so that to_timestamp(s, fmt), to_date(s, fmt),
to_timestamp_ntz(s, fmt), and try_to_timestamp(s, fmt) execute natively
in Comet instead of falling back to Spark.

GetTimestamp depends on Spark's full SimpleDateFormat / TimestampFormatter
parsing semantics, including time parser policies, ANSI mode, timezone
handling, and locale awareness. datafusion-spark has no equivalent
function. The codegen dispatcher runs Spark's own doGenCode inside the
Comet pipeline, which guarantees identical behavior across Spark 3.4,
3.5, and 4.0.
@andygrove andygrove marked this pull request as ready for review May 27, 2026 17:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant