feat: Add timestamp nanosecond primitive types#653
Conversation
|
I chose TypeId::kTimestampNs over TypeId::kTimestampNano (Java uses Nano) to align with the spec. @evindj Please help review the timestamp parsing part when you have time. I changed the fractional seconds handling a bit. |
| template <> | ||
| int32_t HashLiteral<TypeId::kTimestampTzNs>(const Literal& literal) { | ||
| return BucketUtils::HashLong(std::get<int64_t>(literal.value())); | ||
| } |
There was a problem hiding this comment.
According to the Iceberg V3 spec and the Java implementation (BucketTimestampNano.java), nanosecond timestamps must be converted to microseconds (divided by 1000) before hashing. This ensures that bucket partitioning is consistent between microsecond and nanosecond precision types for the same logical time.
return BucketUtils::HashLong(std::get<int64_t>(literal.value()) / 1000);| std::string TransformUtil::HumanTimestampNs(int64_t timestamp_nanos) { | ||
| auto tp = std::chrono::time_point<std::chrono::system_clock, std::chrono::seconds>{ | ||
| std::chrono::seconds(timestamp_nanos / kNanosPerSecond)}; | ||
| auto nanos = timestamp_nanos % kNanosPerSecond; |
There was a problem hiding this comment.
For negative timestamps (pre-1970), C++'s division (/) and modulo (%) operators truncate towards zero. This causes ParseTimestampNs and HumanTimestampNs to compute an incorrect base time point and a negative fractional part, breaking the string formatting and parsing.
For example, 1969-12-31T23:59:59.123456789 parses to -876543211 nanos. Passing this back here yields 0 for seconds and -876543211 for nanos, resulting in 1970-01-01T00:00:00.-876543211.
Consider using std::chrono::floor to handle the negative values correctly (note: the original microsecond HumanTimestamp and ParseTimestamp also suffer from this issue).
There was a problem hiding this comment.
Good catch, fixed with some additional test cases.
No description provided.