Skip to content

Comments

[SPARK-55631][SQL] ALTER TABLE must invalidate cache for DSv2 tables#54427

Open
aokolnychyi wants to merge 1 commit intoapache:masterfrom
aokolnychyi:spark-55631
Open

[SPARK-55631][SQL] ALTER TABLE must invalidate cache for DSv2 tables#54427
aokolnychyi wants to merge 1 commit intoapache:masterfrom
aokolnychyi:spark-55631

Conversation

@aokolnychyi
Copy link
Contributor

What changes were proposed in this pull request?

This PR makes ALTER TABLE command invalidate cache.

Why are the changes needed?

These changes are needed to reflect changes made in the session correctly.

Does this PR introduce any user-facing change?

ALTER TABLE commands will now invalidate cache.

How was this patch tested?

This PR comes with test that would previously fail.

Was this patch authored or co-authored using generative AI tooling?

No.

table.catalog,
table.identifier,
a.changes,
recacheTable(table, includeTimeTravel = false)) :: Nil
Copy link
Contributor Author

@aokolnychyi aokolnychyi Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may debate that only some table changes should trigger a refresh but it is very fragile and dangerous. For instance, Iceberg still allows revert of table state by setting a predefined table property (not something I encourage). This must invalidate the cache in Spark too. To sum up, it is the safest call to recache but I can be convinced otherwise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we add a SQL conf and migration guide for the new behavior? There can be users who wants the old behavior.

Copy link
Contributor Author

@aokolnychyi aokolnychyi Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that the old behavior breaks correctness. What if someone modified a comment or a property and does SHOW TABLE and doesn't see the changes because the cache wasn't invalidated? Or worse. The schema is not updated. So you evolve the schema but it is not reflected?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For behavior changes, we always add a flag spark.sql.legacy.ABC. See https://spark.apache.org/docs/latest/sql-migration-guide.html for details. Most of the legacy behaviors doesn't make sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: this is a general suggestion. Since the previous DSV2 cache behavior are not gated with a config, I am also fine to merge without that.

@aokolnychyi
Copy link
Contributor Author

Copy link
Member

@szehon-ho szehon-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it makes sense / safer to invalidate to me. I guess it is for V2 (pr title could reflect it)

@aokolnychyi aokolnychyi changed the title [SPARK-55631][SQL] ALTER TABLE must invalidate cache [SPARK-55631][SQL] ALTER TABLE must invalidate cache for DSv2 tables Feb 23, 2026

val result = sql(s"SELECT * FROM $t ORDER BY id")
assertCached(result)
checkAnswer(result, Seq(Row(1, "a", null), Row(2, "b", null)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what was the error message before this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants