Docs: Add Hive Metastore schema validation warnings for schema evolution with Hive catalog#15814
Conversation
…nd REORDER When using a Hive catalog, ALTER TABLE DROP COLUMN (non-last column) and ALTER COLUMN REORDER fail because the Hive Metastore validates schema changes by comparing column types positionally. Dropping a middle column shifts subsequent columns, causing HMS to reject the change as an incompatible type change via MetaStoreUtils#throwExceptionIfIncompatibleColTypeChange. Add warning admonitions to spark-ddl.md (DROP COLUMN and REORDER sections) and flink-ddl.md (Hive catalog section) documenting the limitation, workaround (hive.metastore.disallow.incompatible.col.type.changes=false), and trade-off (Hive engine can no longer read the table).
docs/docs/spark-ddl.md
Outdated
|
|
||
| To work around this, disable the HMS schema compatibility check by setting | ||
| `hive.metastore.disallow.incompatible.col.type.changes=false` in `hive-site.xml`, or by passing | ||
| `--conf spark.hadoop.hive.metastore.disallow.incompatible.col.type.changes=false` when starting Spark. |
There was a problem hiding this comment.
can not recall the detail, but I remember that overriding this config on the client-side does not work on some HMS versions.
There was a problem hiding this comment.
Hm, I have tried it for hive 2.3.9 and 3.1.3. Do you remember the HMS version you tried that failed?
There was a problem hiding this comment.
did you test with embedded HMS, or remote HMS? IIRC, Hive 2.3.9 with remote HMS does not work. I will contact our Hive team and sync the result today.
There was a problem hiding this comment.
I was testing with embedded HMS. I just asks the colleague in charge of Hive, these should be configured in the metadata store, unless user use embedded HMS just like I did.
I would update the doc, thanks.
There was a problem hiding this comment.
back here, I get a response that config hive.metastore.disallow.incompatible.col.type.changes client-side overriding requires HIVE-17832, HIVE-17942
There was a problem hiding this comment.
I have check remote HMS with version 3.1.3, which contains the HIVE-17832 and HIVE-17942, they still don't work. Thus, I updated the documentation as follows.
- Remote HMS: Set this property in the HMS server's hive-site.xml.
- Embedded HMS: Add the equivalent property to the Hive catalog configuration.
There was a problem hiding this comment.
you should apply these patches to the HMS client ...
There was a problem hiding this comment.
you should apply these patches to the HMS client ...
Oh. got it, would make a double check for hive client 3.x
There was a problem hiding this comment.
I have test with flink 1.20 and hive client 3.1.3, it doesn't work.
The hms client also needs to actively use setMetaConf to make it effective. However, neither Spark, Flink, nor Iceberg currently offer this type of operation, so configuring it directly in the job will not work.
When using a Hive catalog, schema evolution operations that change column positions — such as ALTER TABLE ... DROP COLUMN (non-last column) and ALTER TABLE ... ALTER COLUMN ... FIRST/AFTER (reorder) — fail with InvalidOperationException from MetaStoreUtils#throwExceptionIfIncompatibleColTypeChange. This is because the Hive Metastore validates schema changes by comparing column types positionally (by index, not by name), controlled by hive.metastore.disallow.incompatible.col.type.changes (default true).
This limitation is not documented anywhere in the Iceberg docs, though the Iceberg test suite itself works around it by setting METASTORE_DISALLOW_INCOMPATIBLE_COL_TYPE_CHANGES=false (TestHiveMetastore.java:269), and a test comment(HiveTableTest.java:283-287) explicitly describes the issue.
This PR adds !!! warning admonition blocks to: