Skip to content

Docs: Add Hive Metastore schema validation warnings for schema evolution with Hive catalog#15814

Open
jackylee-ch wants to merge 2 commits intoapache:mainfrom
jackylee-ch:docs-hive-catalog-schema-evolution-warning
Open

Docs: Add Hive Metastore schema validation warnings for schema evolution with Hive catalog#15814
jackylee-ch wants to merge 2 commits intoapache:mainfrom
jackylee-ch:docs-hive-catalog-schema-evolution-warning

Conversation

@jackylee-ch
Copy link
Copy Markdown
Contributor

When using a Hive catalog, schema evolution operations that change column positions — such as ALTER TABLE ... DROP COLUMN (non-last column) and ALTER TABLE ... ALTER COLUMN ... FIRST/AFTER (reorder) — fail with InvalidOperationException from MetaStoreUtils#throwExceptionIfIncompatibleColTypeChange. This is because the Hive Metastore validates schema changes by comparing column types positionally (by index, not by name), controlled by hive.metastore.disallow.incompatible.col.type.changes (default true).

This limitation is not documented anywhere in the Iceberg docs, though the Iceberg test suite itself works around it by setting METASTORE_DISALLOW_INCOMPATIBLE_COL_TYPE_CHANGES=false (TestHiveMetastore.java:269), and a test comment(HiveTableTest.java:283-287) explicitly describes the issue.

This PR adds !!! warning admonition blocks to:

  • spark-ddl.md — under the DROP COLUMN section (full explanation, workaround, and trade-off) and the ALTER COLUMN reorder section (with cross-reference)
  • flink-ddl.md — under the Hive catalog configuration section (engine-agnostic warning)

…nd REORDER

When using a Hive catalog, ALTER TABLE DROP COLUMN (non-last column) and
ALTER COLUMN REORDER fail because the Hive Metastore validates schema
changes by comparing column types positionally. Dropping a middle column
shifts subsequent columns, causing HMS to reject the change as an
incompatible type change via MetaStoreUtils#throwExceptionIfIncompatibleColTypeChange.

Add warning admonitions to spark-ddl.md (DROP COLUMN and REORDER sections)
and flink-ddl.md (Hive catalog section) documenting the limitation,
workaround (hive.metastore.disallow.incompatible.col.type.changes=false),
and trade-off (Hive engine can no longer read the table).
@github-actions github-actions bot added the docs label Mar 28, 2026

To work around this, disable the HMS schema compatibility check by setting
`hive.metastore.disallow.incompatible.col.type.changes=false` in `hive-site.xml`, or by passing
`--conf spark.hadoop.hive.metastore.disallow.incompatible.col.type.changes=false` when starting Spark.
Copy link
Copy Markdown
Member

@pan3793 pan3793 Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can not recall the detail, but I remember that overriding this config on the client-side does not work on some HMS versions.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I have tried it for hive 2.3.9 and 3.1.3. Do you remember the HMS version you tried that failed?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you test with embedded HMS, or remote HMS? IIRC, Hive 2.3.9 with remote HMS does not work. I will contact our Hive team and sync the result today.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was testing with embedded HMS. I just asks the colleague in charge of Hive, these should be configured in the metadata store, unless user use embedded HMS just like I did.

I would update the doc, thanks.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

back here, I get a response that config hive.metastore.disallow.incompatible.col.type.changes client-side overriding requires HIVE-17832, HIVE-17942

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have check remote HMS with version 3.1.3, which contains the HIVE-17832 and HIVE-17942, they still don't work. Thus, I updated the documentation as follows.
- Remote HMS: Set this property in the HMS server's hive-site.xml.
- Embedded HMS: Add the equivalent property to the Hive catalog configuration.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should apply these patches to the HMS client ...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should apply these patches to the HMS client ...

Oh. got it, would make a double check for hive client 3.x

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have test with flink 1.20 and hive client 3.1.3, it doesn't work.

The hms client also needs to actively use setMetaConf to make it effective. However, neither Spark, Flink, nor Iceberg currently offer this type of operation, so configuring it directly in the job will not work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants