NIFI-15568: Fix Iceberg timestamp handling and add S3 storage class support#10877
NIFI-15568: Fix Iceberg timestamp handling and add S3 storage class support#10877NirYanay2005 wants to merge 2 commits intoapache:mainfrom
Conversation
exceptionfactory
left a comment
There was a problem hiding this comment.
Thanks for submitted the revised pull request @NirYanay2005.
Please review the pull request instructions and run a build with the contrib-check profile enabled to correct formatting issues such as missing license headers.
|
Please note that each commit must also be signed with a key associated with your GitHub profile for verification. |
44af491 to
d5fcca9
Compare
|
I added a key to my commits and fixed all contrib-check problems. |
Thanks @NirYanay2005, the initial commit is now signed, but the subsequent commits are not. Can you squash all commits and this branch to ensure they are all signed? |
Removed unnecessary code that already existed in main Added licenses and formatting Fixed all contrib-check issues
09ed9a8 to
7bfa0af
Compare
Ok, I squashed all commits |
exceptionfactory
left a comment
There was a problem hiding this comment.
Thanks for making the initial adjustments @NirYanay2005.
The new Storage Class property looks good. If you are interested in getting that merged more quickly, it would be worth breaking that out to a separate Jira issue and pull request.
Regarding the Record field conversion, the problem makes sense, but the proposed solution does not appear to be the best way forward. Introducing a new IcebergRecordConverter in the Parquet module does not seem like the right location for shared Record formatting, although that is worth further consideration. More importantly, running each Record through a conversion process places additional overhead on the entire process.
Instead of this approach, constructing the Iceberg Record object as need in the PutIcebergRecord Processor seems like the optimal place for changes. The initial implementation of the delegating Record depends heavily on NiFi Record field conversion. That would be the first potential place to make a change for handling type conversion in an optimal way.
If you are willing to work through an alternative approach, I can review options. Alternatively, I may take a closer look at the problem and could propose an alternative solution.
Thanks again for working on these issues and feel free to follow up on how you would like to proceed.
|
About the |
Thanks for the reply. I recommend tracing through |
|
I changed the implementation entirely. I moved it to |
exceptionfactory
left a comment
There was a problem hiding this comment.
Thanks for adjusting the direction @NirYanay2005. This looks on a good track. Regarding the InternalRecordWrapper, is that still needed? Are there any options to avoid the RecordWrapper in ParquetPartitionedWriter?
|
I tired without and it failed as the conversion to a long is needed after the conversion to local date time but at a different phase. |
Summary
NIFI-15568
This change improves Apache Iceberg integration in NiFi by addressing two related issues:
S3IcebergFileIOProvider, which is required for certain on-prem or S3-compatible object stores.java.sql.Timestampvalues tojava.time.LocalDateTime, including correct handling for nested records, collections, and partitioned tables.The timestamp fix ensures compatibility with Iceberg’s internal expectations and allows writes to succeed when timestamp columns are used as partition keys.
Tracking
Please complete the following tracking steps prior to pull request creation.
Issue Tracking
Pull Request Tracking
NIFI-15568NIFI-15568VerifiedstatusPull Request Formatting
mainbranchVerification
Please indicate the verification steps performed prior to pull request creation.
Build
./mvnw clean install -P contrib-checkLicensing
LICENSEandNOTICEfilesDocumentation