[lake/hudi] Introduce Hudi LakeCatalog to create table#3395
Open
fhan688 wants to merge 11 commits into
Open
Conversation
…Schema() and add isCreatingFlussTable in HudiLakeCatalog.createTable()
…talogDatabaseImpl in HudiLakeCatalog & use copied Configuration in HudiCatalogUtils
Contributor
Author
|
all tests are passed, please help review, thanks! @XuQianJin-Stars |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
Linked issue: #3275
This PR introduces the Hudi
LakeCatalogimplementation, enabling Fluss to create tables in Hudi data lake storage. This aligns with the existing Paimon and Iceberg lake catalog support, completing the trio of supported lake formats for table creation.Brief change log
New modules & classes (
fluss-lake/fluss-lake-hudi):HudiLakeCatalog: ImplementsLakeCataloginterface, supporting both HMS (Hive Metastore) and DFS (filesystem) catalog modes. Handles table creation with schema compatibility check for crash-recovery idempotency, automatic database creation, and system column (__bucket, __offset, __timestamp) appending.FlussDataTypeToHudiDataType: ImplementsDataTypeVisitorto convert Fluss data types to Flink types (Hudi's type system). HandlesLocalZonedTimestampTypespecially: maps toBIGINTunder HMS mode,TIMESTAMP_WITH_LOCAL_TIME_ZONEunder DFS mode.HudiConversions: Core conversion utility. Converts FlussTablePath→ HudiObjectPath,TableDescriptor→ResolvedSchema/ Hudi table properties. ValidatesHUDI_UNSETTABLE_OPTIONS(6 protected options that Fluss auto-manages), checks system column name conflicts, and handles property prefix rewriting (hudi.xxx→xxx, others →fluss.xxx).HudiCatalogUtils: Factory for creating HudiCataloginstances (HoodieHiveCatalogfor HMS,HoodieCatalog forDFS). Uses copied Configuration to avoid mutating the original.Modifications to existing modules (
fluss-flink/fluss-flink-common):LakeFlinkCatalog: Adds HUDI branch ingetLakeCatalog()with a newHudiCatalogFactoryinner class that uses reflection to instantiate Hudi catalog (mirroring the Iceberg pattern to avoid compile-time dependency on hudi-flink-bundle).LakeTableFactory: Adds HUDI branch ingetLakeTableFactory()withgetHudiFactory()that reflectively loadsHoodieTableFactory.HudiLakeStorage: Replaces theUnsupportedOperationExceptionincreateLakeCatalog()withnew HudiLakeCatalog(hudiConfig)to wire the SPI path.Key design decisions:
Tests
HudiLakeCatalogTest(14 test cases):testPropertyPrefixRewriting— verifies hudi.xxx → xxx and non-hudi → fluss.xxx prefix rewritingtestCreatePrimaryKeyTable— PK table (MOR) creation with system columns and primary keytestCreateLogTable— Log table (COW) creation with record key from customPropertiestestIsHudiSchemaCompatibleWithSameSchema— compatible schemas return truetestIsHudiSchemaCompatibleWithDifferentColumnCount— different column count returns falsetestIsHudiSchemaCompatibleWithDifferentColumnName— different column name returns falsetestIsHudiSchemaCompatibleWithDifferentColumnType— different column type returns falsetestCreateDuplicateTableWithCompatibleSchema— duplicate creation with compatible schema is idempotentt
estCreateDuplicateTableWithIncompatibleSchema— duplicate creation with incompatible schema throwsTableAlreadyExistExceptiontestUnsettableOptionInPropertiesThrowsException— protected option inpropertiesthrowsInvalidConfigExceptiontestUnsettableOptionInCustomPropertiesThrowsException— protected option in customProperties throwsInvalidConfigExceptiontestNonProtectedHudiOptionPassesValidation— non-protected option passes validationtestSystemColumnBucketConflictThrowsException—__bucketconflict throwsInvalidTableExceptiontestSystemColumnOffsetConflictThrowsException—__offsetconflict throwsInvalidTableExceptiontestSystemColumnTimestampConflictThrowsException—__timestampconflict throwsInvalidTableExceptionAPI and Format
No API or storage format changes. This PR only adds new implementations for the existing
LakeCatalogandLakeStorage SPIinterfaces.Documentation
A new feature — Hudi lake catalog support for table creation. Will need documentation updates for the Hudi integration guide.