Skip to content

[SPARK-52729][SQL] Add MetadataOnlyTable and CREATE/ALTER VIEW support for DS v2 catalogs#51419

Open
cloud-fan wants to merge 56 commits intoapache:masterfrom
cloud-fan:v1-v2
Open

[SPARK-52729][SQL] Add MetadataOnlyTable and CREATE/ALTER VIEW support for DS v2 catalogs#51419
cloud-fan wants to merge 56 commits intoapache:masterfrom
cloud-fan:v1-v2

Conversation

@cloud-fan
Copy link
Copy Markdown
Contributor

@cloud-fan cloud-fan commented Jul 9, 2025

What changes were proposed in this pull request?

This PR exposes a DS v2 API for metadata-only tables (read side), CREATE VIEW, and ALTER VIEW ... AS (write side) so that third-party v2 catalogs can participate in Spark's resolution and creation flows without reimplementing read/write themselves.

The headline API choice is a new RelationCatalog interface that owns the cross-cutting rules for catalogs that expose both tables and views. TableCatalog and ViewCatalog remain strict single-kind APIs; mixing the two requires RelationCatalog. Details in section 4.

1. Read path — MetadataOnlyTable:

  • New Table implementation that carries a TableInfo and delegates everything to it. Catalogs return it from TableCatalog.loadTable for data-source tables (Spark interprets the provider via V1 paths), and from RelationCatalog.loadRelation wrapping a ViewInfo for views (Spark expands the view text). Downstream consumers distinguish via getTableInfo() instanceof ViewInfo.
  • Analyzer.lookupTableOrView and RelationResolution.tryResolvePersistent detect MetadataOnlyTable and route through a new V1Table.toCatalogTable adapter to the existing V1 data-source / view machinery.

2. Shared DTO — TableInfo:

  • TableInfo.Builder gains convenience setters that write reserved keys into properties: withProvider, withLocation, withComment, withCollation, withOwner, withTableType, plus withSchema(StructType). The read side (MetadataOnlyTable) and the write side (createTable(ident, TableInfo)) use the same struct. View-specific fields live on a typed subclass (see section 3) so they are not encoded as string properties.
  • withProperties takes a defensive copy so convenience setters don't mutate the caller's map.

3. Typed view DTO — ViewInfo:

  • ViewInfo extends TableInfo carries the view-specific fields that cannot be represented as string table properties: queryText, currentCatalog, currentNamespace (multi-part, never null; empty when no namespace was captured), sqlConfigs (unprefixed SQL config keys), schemaMode (BINDING / COMPENSATION / TYPE EVOLUTION / EVOLUTION), and queryColumnNames (mapping query output to the view's declared columns; empty in EVOLUTION mode).
  • ViewInfo.Builder extends TableInfo.BaseBuilder<Builder> adds typed setters: withQueryText, withCurrentCatalog, withCurrentNamespace, withSqlConfigs, withSchemaMode, withQueryColumnNames. The inherited TableInfo.BaseBuilder setters (schema, properties, owner, comment, collation, etc.) are available on the same builder so view and table writes share one fluent API.
  • The ViewInfo constructor stamps PROP_TABLE_TYPE = TableSummary.VIEW_TABLE_TYPE into properties() so catalogs and generic viewers reading PROP_TABLE_TYPE from the properties bag (e.g. TableCatalog.listTableSummaries default impl, DESCRIBE) classify the entry as VIEW without requiring authors to remember withTableType(VIEW).
  • ViewInfo is the typed payload returned by ViewCatalog.loadView and accepted by createView / replaceView. It still extends TableInfo so a RelationCatalog can opt into the single-RPC perf path by returning a MetadataOnlyTable(ViewInfo) from loadRelation (see section 4); pure view-only catalogs never see TableInfo directly because the typed builder covers everything they construct.

4. API design — RelationCatalog, ViewCatalog, the orthogonal split:

This PR's API surface is shaped by two principles:

  • Orthogonal interfaces. Every TableCatalog method behaves as if views did not exist; every ViewCatalog method behaves as if tables did not exist. A pure TableCatalog impl and a pure ViewCatalog impl never need to know about the other kind.
  • Single identifier namespace. When a catalog exposes both, tables and views share one keyspace within a namespace; the same identifier cannot resolve to both at the same time.

These two together require a third interface for the combined case — the orthogonality forbids cross-type clauses on TableCatalog / ViewCatalog methods, and the shared namespace forces cross-type rules to exist somewhere. That somewhere is RelationCatalog.

  • ViewCatalog: plugin-facing API for views. listViews(namespace), loadView(ident) returning ViewInfo, createView(ident, ViewInfo), replaceView(ident, ViewInfo), dropView(ident), default viewExists(ident), default invalidateView(ident), default createOrReplaceView(ident, ViewInfo). The default createOrReplaceView impl tries replaceView and falls back to createView on NoSuchViewException (two RPCs, non-atomic across the pair); catalogs that can do an atomic upsert in a single transactional call should override to collapse to one RPC and make the swap atomic. No staging variant — replaceView is a single atomic-swap call. Class Javadoc reads as a strict view-only API and points mixed implementers at RelationCatalog.
  • TableCatalog: unchanged surface, but the loadTable Javadoc reverts to strictly tables-only (the perf opt-in moved off this method, see below). Cross-type clauses scattered through method docs are removed; class doc points mixed implementers at RelationCatalog.
  • RelationCatalog extends TableCatalog, ViewCatalog: the home for the combined case. Class Javadoc owns the two principles and two per-method contract tables — active rejection (5 write-side methods that throw on cross-type collision: createTable, renameTable, createView, createOrReplaceView, replaceView) and passive filtering (10 read / non-collision-mutation methods that behave as if the wrong kind doesn't exist). Catalogs that implement TableCatalog and ViewCatalog directly without RelationCatalog are rejected at Catalogs.load with a clear pointer at the right interface.
  • Single-RPC perf entry points on RelationCatalog:
    • loadRelation(ident) — the resolver's per-identifier read path. Returns a regular Table for a table or a MetadataOnlyTable wrapping a ViewInfo for a view; saves the cold-cache loadTableloadView two-step. Crucially, this lives only on RelationCatalogTableCatalog.loadTable is strictly tables-only, which means tableExists's default impl (loadTable(ident) != null) is correct under any catalog with no override required. The earlier shape (perf opt-in via loadTable returning MetadataOnlyTable(ViewInfo)) silently broke tableExists's default impl and was a foot-gun on three independent framework call sites.
    • listRelationSummaries(namespace) — unified listing of tables and views as TableSummary[] with kind preserved. Default impl unions listTableSummaries + listViews (two round trips); override to fetch in one.
  • Implementer ergonomics: RelationCatalog provides default impls of loadTable, loadView, tableExists, and viewExists that derive from loadRelation (a MetadataOnlyTable + ViewInfo discriminator routes the result to the right kind). A RelationCatalog author writes the read-side lookup once in loadRelation; the four kind-specific accessors come for free. They can still be overridden when a kind-specific path is materially cheaper than the unified one, but they're no longer required boilerplate.
  • Resolver wiring: Analyzer.lookupTableOrView and RelationResolution.tryResolvePersistent check RelationCatalog first and call loadRelation. Otherwise they fall through to the existing two-step (loadTable if TableCatalog, then loadView if ViewCatalog). The MISSING_CATALOG_ABILITY.VIEWS gate in the resolver and CheckViewReferences for CREATE / ALTER VIEW continues to fire for catalogs that aren't ViewCatalog (and therefore aren't RelationCatalog).

5. Write path — DS v2 CREATE VIEW:

  • DataSourceV2Strategy routes CreateView(ResolvedIdentifier(catalog, ident), …) to CreateV2ViewExec(catalog: ViewCatalog, …), which uses an act-then-decode pattern to keep the happy path at one RPC: plain CREATE calls createView; CREATE OR REPLACE calls createOrReplaceView (one RPC for catalogs that override the default; two for those that don't). CREATE VIEW IF NOT EXISTS short-circuits via an upfront viewExists probe so the view body isn't analyzed when the view already exists (matches v1 CreateViewCommand.run). On ViewAlreadyExistsException from the catalog (cross-type collision in a RelationCatalog, or a race for plain CREATE / OR REPLACE), the exec decodes with one tableExists probe: if a table is at the identifier, plain CREATE surfaces TABLE_OR_VIEW_ALREADY_EXISTS and CREATE OR REPLACE surfaces EXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE, while IF NOT EXISTS is a no-op (v1-parity); if a view is at the identifier, plain CREATE re-raises as viewAlreadyExists and IF NOT EXISTS is a no-op (lost the race for the same view). Because tableExists is now contract-clean (no perf-opt-in leak), the decoder gives the right error in every catalog shape.
  • The exec builds the ViewInfo via a V2ViewPreparation trait reusing v1 ViewHelper helpers (aliasPlan, sqlConfigsToProps) to populate a ViewInfo.Builder with the current session's captured catalog/namespace and SQL configs. Cyclic-reference detection and auto-generated-alias rejection run once at analysis time in CheckViewReferences (see section 7).
  • CreateView logical plan extends AnalysisOnlyCommand (same shape as V2CreateTableAsSelectPlan) so HandleSpecialCommand.markAsAnalyzed captures referredTempFunctions from AnalysisContext. The v1 rewriting path (ResolveSessionCatalogCreateViewCommand) is unchanged.

6. Write path — DS v2 ALTER VIEW ... AS:

  • AlterViewAs logical plan also extends AnalysisOnlyCommand so referredTempFunctions is captured for the non-session path.
  • DataSourceV2Strategy routes AlterViewAs(ResolvedPersistentView(catalog, ident, _), …) to AlterV2ViewExec(catalog: ViewCatalog, …), which calls replaceView (the single atomic-swap entry point — no separate staging variant, since view REPLACE writes only metadata).
  • A V2AlterViewPreparation trait (extends V2ViewPreparation) calls catalog.loadView(ident) once and uses the result to preserve user TBLPROPERTIES, comment, collation, owner, and schema-binding mode when constructing the replacement ViewInfo. Session-scoped fields (SQL configs, query column names) are re-emitted by buildViewInfo() from the active SparkSession, matching v1 AlterViewAsCommand.alterPermanentView. A racing DDL between analysis and exec (the view dropped, or replaced with a non-view table in a RelationCatalog) surfaces NoSuchViewException / EXPECT_VIEW_NOT_TABLE rather than a stale-resolution error.
  • ResolvedViewIdentifier.unapply (in ResolveSessionCatalog) replaces its assert(isSessionCatalog) with an if isSessionCatalog guard so non-session ResolvedPersistentView plans fall through to the v2 strategy instead of tripping the assertion.

7. Post-analysis check — CheckViewReferences:

  • New rule wired into BaseSessionStateBuilder.extendedCheckRules. Rejects permanent views that reference temporary objects and rejects view bodies with auto-generated aliases for both CreateView and AlterViewAs (v2 paths). v1 CreateViewCommand / AlterViewAsCommand keep their existing exec-time safety net — Dataset-built commands can be constructed with isAnalyzed=true directly and bypass the analyzer's re-capture path.

8. Listing — SHOW TABLES / SHOW VIEWS:

  • TableCatalog.listTables returns table identifiers only — views (in a RelationCatalog) are listed separately via ViewCatalog.listViews. listTableSummaries's default impl enumerates via listTables + loadTable and returns one summary per table. This is an intentional v2 divergence from v1 SHOW TABLES, which includes both tables and views; RelationCatalog.listRelationSummaries is the unified entry point if a RelationCatalog wants to expose both in one call. Routing the SHOW TABLES SQL command through listRelationSummaries for v1-parity output is left as a follow-up so this PR's API surface stays narrowly scoped.
  • SHOW VIEWS on a non-session ViewCatalog is routed through a new ShowViewsExec that enumerates via ViewCatalog.listViews(namespace). ResolveSessionCatalog.ShowViews skips (via guard) for ViewCatalog catalogs so they fall through to this strategy; non-session, non-ViewCatalog catalogs still hit the existing MISSING_CATALOG_ABILITY.VIEWS rejection. v2 catalogs have no temp views, so the isTemporary column is always false (mirroring v1, which only sets it true for local/global temp views).

Why are the changes needed?

A v2 Table is not always backed by a connector that implements read/write. Catalogs like HMS and Unity Catalog store only metadata and rely on Spark to interpret the table provider as a data source or to execute the view SQL. Previously the only way to achieve that was a hack around V1Table, which leaks private v1 types into v2 connectors (example: https://github.com/unitycatalog/unitycatalog/blob/main/connectors/spark/src/main/scala/io/unitycatalog/spark/UCSingleCatalog.scala).

Separately, v2 catalogs had no public way to handle CREATE VIEW or ALTER VIEW. ResolveSessionCatalog rejected CREATE VIEW on any non-session catalog with MISSING_CATALOG_ABILITY.VIEWS, so third-party catalogs could not own view lifecycle at all.

The new ViewCatalog interface gives catalogs a clean view-shaped API, and the new RelationCatalog interface (extending both TableCatalog and ViewCatalog) is the single home for the cross-cutting rules of the combined case — orthogonality, the shared identifier namespace, the per-method active-rejection / passive-filtering tables, and the single-RPC perf entry points (loadRelation, listRelationSummaries). This separation keeps TableCatalog and ViewCatalog Javadocs as strict single-kind APIs (a connector author writing only one of them never sees cross-type clauses), and gives RelationCatalog implementers a single document with the full contract.

Does this PR introduce any user-facing change?

Yes to connector developers:

  • Third-party v2 TableCatalog implementations can now return a MetadataOnlyTable from loadTable to delegate reads to Spark.
  • Third-party v2 connectors that expose only views implement just ViewCatalog (listViews / loadView / createView / replaceView / dropView, plus the default helpers). Connectors that expose both tables and views implement RelationCatalog (which extends both TableCatalog and ViewCatalog); implementing the two interfaces directly is rejected at catalog initialization. View text, schema, captured current catalog+namespace, SQL configs, and temp-object-reference rejection are handled the same way as for session-catalog views.

No SQL-level or user-visible behavior change for existing deployments.

Remaining work (follow-up PRs)

This PR covers the core read path, CREATE VIEW (all shapes), ALTER VIEW ... AS, DROP VIEW, and SHOW VIEWS. The following view-scoped plans for DS v2 catalogs are not yet supported and are tracked for follow-ups. Until the follow-ups land, each currently surfaces a clean UNSUPPORTED_FEATURE.TABLE_OPERATION error (wired up in DataSourceV2Strategy and pinned by tests in DataSourceV2MetadataOnlyViewSuite), so users get a meaningful message rather than a generic planner failure:

  • ALTER VIEW ... SET/UNSET TBLPROPERTIES — separate logical plans (SetViewProperties, UnsetViewProperties); need their own DataSourceV2Strategy cases backed by new TableChange routing.
  • ALTER VIEW ... RENAME TORenameTable at the logical level; needs v2 view awareness and the catalog-side rename semantics.
  • ALTER VIEW ... WITH SCHEMA BINDINGAlterViewSchemaBinding logical plan; needs the same treatment as ALTER VIEW AS (AnalysisOnlyCommand shape + v2 exec).
  • DESCRIBE / SHOW CREATE TABLE / SHOW TBLPROPERTIES / SHOW COLUMNS on v2 views — currently route through ResolvedViewIdentifier which only matches session-catalog views; the v2 equivalents need dedicated handling.
  • SHOW TABLES v1-parity output (include views) on a v2 catalog — TableCatalog.listTables now intentionally returns tables only; routing the SQL SHOW TABLES through RelationCatalog.listRelationSummaries for v1-parity is a separate piece of work.

How was this patch tested?

Two new test suites: DataSourceV2MetadataOnlyTableSuite (4 tests, table-side) and DataSourceV2MetadataOnlyViewSuite (56 tests, view-side + mixed-catalog).

DataSourceV2MetadataOnlyTableSuite — table side

  • File-source reads: non-partitioned and partitioned MetadataOnlyTable round-trip through SELECT / INSERT / INSERT OVERWRITE.
  • DESCRIBE TABLE EXTENDED: Name row shows the real catalog-qualified identifier, not the wrapper class.
  • Fully-qualified column references: 3-part catalog.schema.table.col references resolve correctly through a MetadataOnlyTable. The mechanism: V1Table.toCatalogTable sets CatalogTable.multipartIdentifier = [catalog, namespace…, table], and the SessionCatalog change in this PR makes getRelation prefer that over the hardcoded spark_catalog qualifier when wrapping in a SubqueryAlias. (1-part and 2-part references already worked via last-part suffix matching; the test pins both cases plus the 3-part case to make the alias-source change visible in the diff.)

DataSourceV2MetadataOnlyViewSuite — view side + mixed catalogs

Catalog fixtures: TestingRelationCatalog (RelationCatalog), TestingViewOnlyCatalog (pure ViewCatalog), TestingTableOnlyCatalog (pure TableCatalog).

  • Read path — view-text expansion with captured SQL configs; unqualified references resolved via single-part and multi-part captured currentCatalog + currentNamespace; pure ViewCatalog read path (no TableCatalog mixin) confirms the resolver's loadView fallback fires when loadTable is skipped.
  • V1Table.toCatalogTable(ViewInfo) round-trip — multi-part captured namespace flows through structurally (including parts containing dots, which would break a string-encoded path); absent-currentCatalog branch yields an empty viewCatalogAndNamespace.
  • CREATE VIEW — end-to-end CREATE; CREATE VIEW IF NOT EXISTS no-op vs. failure; CREATE OR REPLACE VIEW replacement (including cyclic-reference detection); too-few / too-many user-specified columns; rejection of references to temp function / temp view / temp variable; DEFAULT COLLATION propagation into ViewInfo; cross-type collision over a non-view table entry surfaces TABLE_OR_VIEW_ALREADY_EXISTS (plain) / EXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE (OR REPLACE) / no-op (IF NOT EXISTS) — v1-parity; PROP_OWNER stamped on stored TableInfo; rejection on a TableCatalog-only catalog with MISSING_CATALOG_ABILITY.VIEWS.
  • ALTER VIEW … AS — body replacement; missing-view fails at analysis; rejection of references to temp function / temp view / temp variable; preserves user-set TBLPROPERTIES, PROP_OWNER, and SCHEMA EVOLUTION binding mode (v1-parity); re-captures the current session's SQL configs; cyclic-reference detection; rejection on a TableCatalog-only catalog with MISSING_CATALOG_ABILITY.VIEWS; pure ViewCatalog flow; CREATE OR REPLACE VIEW whose new body references a nonexistent table fails at the right phase.
  • Multi-level namespace handling — cyclic-reference detection distinguishes views across multi-level namespaces (CREATE and ALTER); error messages render the full multi-level namespace; temp-object reference errors render the full multi-level namespace.
  • DROP VIEW — drops a view on ViewCatalog; DROP VIEW IF EXISTS is a no-op on missing; rejection on a non-view table entry (v1-parity, EXPECT_VIEW_NOT_TABLE); rejection on a non-ViewCatalog catalog.
  • SHOW TABLES / SHOW VIEWSSHOW TABLES on a v2 catalog returns tables only; SHOW VIEWS returns views only (isTemporary=false throughout for v2); SHOW VIEWS … LIKE filters by name; rejection on a non-ViewCatalog catalog.
  • Unsupported v2 view DDL / inspection pinningALTER VIEW SET/UNSET TBLPROPERTIES, ALTER VIEW WITH SCHEMA, ALTER VIEW RENAME TO, SHOW CREATE TABLE, SHOW TBLPROPERTIES, SHOW COLUMNS, DESCRIBE TABLE, REFRESH TABLE, ANALYZE TABLE, ANALYZE TABLE … FOR COLUMNS against a v2 view all surface UNSUPPORTED_FEATURE.TABLE_OPERATION; DESCRIBE TABLE … COLUMN surfaces a clean AnalysisException. Pins the current failure mode so a future regression to a generic planner error is caught in the diff.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude (Anthropic)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current view implementation which stores the original SQL text and a bunch of context is too convoluted to put into the public DS v2 API. It's better if the view text is context-independent. We are going to improve it in #51410

Before the improvement is done, we only allow to read DS v2 views that has context-independent SQL text.

Copy link
Copy Markdown
Contributor

@aokolnychyi aokolnychyi Jul 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we store the current catalog and namespace in the view metadata? Do we expect the connectors to modify the view SQL text?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My proposal is to let Spark modify the view text before saving it into the catalog, so that the catalog does not need to store the current catalog/namespace.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So all identifiers in the view text will always include the catalog name as the first name part and the table name as the last name part? How hard will it be to modify the original SQL text? Will it cause any surprises to the users if the original and the persisted SQL text differ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So all identifiers in the view text will always include the catalog name as the first name part and the table name as the last name part? How hard will it be to modify the original SQL text? Will it cause any surprises to the users if the original and the persisted SQL text differ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hive already did so and its view has both "view_text" and "original_view_text" fields. All identifiers should be fully qualified (with catalog name and namespace) in the view text.

@cloud-fan
Copy link
Copy Markdown
Contributor Author

cc @aokolnychyi @gengliangwang

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about SPARK_TABLE_OR_VIEW

Comment thread sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/V1Table.scala Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we explicitly mention what operations will be affected?(read/write/DDL/...)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The capability has since been reshaped: TableCapability is gone in favor of the concrete MetadataOnlyTable for the read side, and a TableCatalogCapability.SUPPORTS_VIEW gate for the write side. The SUPPORTS_VIEW javadoc now spells out the affected operations explicitly (CREATE VIEW / CREATE OR REPLACE VIEW / CREATE VIEW IF NOT EXISTS via createTable, ALTER VIEW ... AS via dropTable+createTable or stageReplace, and the read-path round-trip through MetadataOnlyTable). PTAL and let me know if anything is still unclear.

@aokolnychyi
Copy link
Copy Markdown
Contributor

I'd love to take a look on Monday.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand we may want to use the new API to expose views, but what about the case with Spark table? When would this be helpful?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, UC does not want to rely on the Spark file source, but just set the table provider to the file source name and leave read/write to Spark. Today this is done by a hack with V1Table: https://github.com/unitycatalog/unitycatalog/blob/main/connectors/spark/src/main/scala/io/unitycatalog/spark/UCSingleCatalog.scala#L303

Copy link
Copy Markdown
Contributor

@aokolnychyi aokolnychyi Jul 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, got it. The use case is catalogs that govern tables but not necessarily implement read/write logic.

Have we considered offering a generic V2 table implementation for built-in formats that would be accessible to external connectors? Essentially, a public version of V1Table that doesn't need to expose CatalogTable. If we go with the table capability approach, then each connector will have to implement a custom V2 table for it to be simply replaced as a Parquet table or view. Each connector would have to be aware of how the translation will be done in Spark. For instance, it seems like we assume that serdeProps will be with prefixed with option..

Just curious to know your thinking, I don't have a strong opinion here.

Copy link
Copy Markdown
Contributor Author

@cloud-fan cloud-fan Jul 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, this makes sense. A concrete class is easier to use than a table capability.

@cloud-fan cloud-fan force-pushed the v1-v2 branch 2 times, most recently from aaa56bc to b5c909e Compare July 29, 2025 16:10
@cloud-fan cloud-fan changed the title [SPARK-52729][SQL] Add GENERAL_TABLE v2 table capacity [SPARK-52729][SQL] Add DataSourceTableOrView in DS v2 API Jul 29, 2025
@cloud-fan cloud-fan force-pushed the v1-v2 branch 2 times, most recently from 5d5a508 to 5517dee Compare July 29, 2025 16:25
@cloud-fan cloud-fan changed the title [SPARK-52729][SQL] Add DataSourceTableOrView in DS v2 API [SPARK-52729][SQL] Add MetadataOnlyTable in DS v2 API Jul 31, 2025
}

// TODO: move the v2 data source table handling from V2SessionCatalog to the analyzer
ignore("v2 data source table") {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we add this test case later?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be fixed shortly after the PR is merged.

* implementing read/write directly. It represents a general Spark data source table or
* a Spark view, and relies on Spark to interpret the table metadata, resolve the table
* provider into a data source, or read it as a view.
* This affects the table read/write operations but not DDL operations.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This affects the table read/write operations but not DDL operations.

It seems a bit unclear. Before the change, DDL operations of DSV2 tables relies on Spark too.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should just remove this line? DDL operations do not call read/write APIs of Table anyway.

Copy link
Copy Markdown
Member

@szehon-ho szehon-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, just wondering if some of it can be less bound to HMS concepts

return this;
}

public Builder withSerdeProps(Map<String, String> serdeProps) {
Copy link
Copy Markdown
Member

@szehon-ho szehon-ho Jul 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an opinion, serde is quite bound to HMS (?) and the newer DSV2 Catalog dont have that, will it make sense to abstract this (maybe something like 'storageProperties')? This is just for HMS table support?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. I was trying to save the work of splitting serde properties from table properties, but API simplicity is more important.

database = Some(ident.namespace().lastOption.getOrElse("root")),
catalog = Some(catalog.name())),
tableType = tableType,
storage = CatalogStorageFormat.empty.copy(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious, do we not need to set inputFormat/outputFormat?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

V1Table itself does not access the input/output format when translating v1 table to v2 table. I think Hive table is never supported by DS v2 and we will not handle it here.

Copy link
Copy Markdown
Member

@szehon-ho szehon-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not entirely familiar with the end use case, but the API looks good now, thanks

Copy link
Copy Markdown
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @cloud-fan . Do you have a plan to proceed this forward?

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Feb 1, 2026

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions Bot added the Stale label Feb 1, 2026
@github-actions github-actions Bot closed this Feb 2, 2026
@cloud-fan cloud-fan reopened this Apr 22, 2026
Copy link
Copy Markdown
Member

@gengliangwang gengliangwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

# Conflicts:
#	sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala
SPARK-39660 split v2 DESCRIBE TABLE PARTITION off into its own
DescribeTablePartition plan and dropped `partitionSpec` from
DescribeRelation. Our v2-view pin case had 4 wildcards; reduce to 3 to
match the new 3-field case class.

Co-authored-by: Isaac
Javadoc died mid-stream while generating
CatalogV2Implicits.IdentifierHelper.html (the failing PR's log stops
exactly there; the succeeding PRs continue past to MultipartIdentifierHelper
and CatalogV2Util). The only diff in IdentifierHelper on this branch was
the new asLegacyTableIdentifier method, whose scaladoc used `[[TableIdentifier]]`
/ `[[toQualifiedNameParts]]` / backtick-inlined code refs. Something in
that doc tripped javadoc into a hard exit (not a warning) instead of a
broken-link warning.

Fix: downgrade both new scaladoc blocks in the exposed-to-javadoc
connector/catalog package to plain `//` comments so genjavadoc doesn't
emit them into the Java stub at all:
- CatalogV2Implicits.IdentifierHelper.asLegacyTableIdentifier
- CatalogV2Util.supportsView (same risky pattern, hasn't been reached
  yet because javadoc died earlier, but would break next)

The method names are self-documenting; internal callers don't need the
scaladoc.

Co-authored-by: Isaac
Restore ViewCatalog as the plugin-facing API for view-only catalogs and
view DDL operations, instead of routing views through TableCatalog under
a SUPPORTS_VIEW capability flag. Catalog-implementer ergonomics:

  * Pure view-only: implement ViewCatalog. 5 methods (listViews,
    loadView/createView/replaceView/dropView), default viewExists. No
    instanceof, no capability declaration, no TableCatalog stubs.
  * Pure tables: implement TableCatalog. Same as today.
  * Mixed (Iceberg/UC): implement both interfaces independently. Single
    cross-cutting invariant -- one identifier namespace; createTable
    rejects view-collisions, createView rejects table-collisions.

ViewCatalog API:
  Identifier[] listViews(String[] namespace);
  ViewInfo loadView(Identifier);
  ViewInfo createView(Identifier, ViewInfo);
  ViewInfo replaceView(Identifier, ViewInfo);   // atomic per-call
  boolean dropView(Identifier);
  default boolean viewExists(Identifier);

No StagingViewCatalog -- view REPLACE writes only metadata, so a single
transactional metastore call (or equivalent) is sufficient. CREATE OR
REPLACE VIEW probes viewExists then dispatches createView/replaceView.

Spark-side dispatch:
  * Analyzer.lookupTableOrView: try TableCatalog.loadTable first; on
    NoSuchTableException, if catalog is ViewCatalog, fall back to
    loadView and synthesize ResolvedPersistentView.
  * Mixed-catalog perf opt-in: loadTable may return
    MetadataOnlyTable(ViewInfo) for view idents, short-circuiting the
    second RPC. Documented on TableCatalog#loadTable.
  * DataSourceV2Strategy: routes CREATE/ALTER/DROP/SHOW VIEWS to
    ViewCatalog only; staging branches removed.
  * ResolveSessionCatalog: SUPPORTS_VIEW guards replaced with
    instanceof ViewCatalog.

Internal: V1Table.toCatalogTable for ViewInfo is now public so the
analyzer can synthesize CatalogTable from a loadView result for the
session-catalog v1 view-resolution path.

Out of scope for this commit:
  * Test suite rewrite (DataSourceV2MetadataOnlyViewSuite still uses
    SUPPORTS_VIEW and TestingStagingCatalog) -- broken until the
    follow-up commit.
  * Lifting the session-catalog gate on DESCRIBE/SHOW CREATE TABLE/SHOW
    COLUMNS/SHOW TBLPROPERTIES for v2 views -- still pinned with
    UNSUPPORTED_FEATURE.TABLE_OPERATION; tracked as follow-up.

Co-authored-by: Isaac
…ewCatalog

The structural rework removed TableCatalogCapability.SUPPORTS_VIEW and
introduced ViewCatalog as the plugin-facing API for views. The existing
test catalogs (TestingViewCatalog, TestingStagingCatalog) now implement
both TableCatalog and ViewCatalog, sharing one identifier-keyed map per
the mixed-catalog contract. Storage value's runtime type (ViewInfo vs
TableInfo) distinguishes views from tables on each lookup; tableExists /
listTables exclude view entries, viewExists / listViews include only
views, and createTable / createView each reject cross-type collisions.

Test-name renames replace "without SUPPORTS_VIEW" with "without
ViewCatalog" to track the new API. The rest of the test bodies are
unchanged.

Co-authored-by: Isaac
…ec test names

- Analyzer.lookupTableOrView and RelationResolution.tryResolvePersistent skip
  CatalogV2Util.loadTable for pure ViewCatalogs (no TableCatalog mixin), so
  asTableCatalog no longer throws MISSING_CATALOG_ABILITY.TABLES and masks the
  legitimate loadView fallback. SELECT and ALTER VIEW now work end-to-end on a
  pure ViewCatalog.
- Add a TestingViewOnlyCatalog fixture (no TableCatalog mixin) plus read and
  ALTER VIEW tests that exercise the loadView fallback.
- DataSourceV2MetadataOnlyViewSuite: rename "uses the atomic exec" tests to
  reflect that view DDL routes through ViewCatalog.createView / replaceView
  (no separate staging variant); drop now-dead RecordingStagedTable; replace
  TestingStagingCatalog's stage* method bodies with explicit "must not be
  invoked by view DDL" throws so any future regression that misroutes through
  the staging API surfaces immediately.

Co-authored-by: Isaac
… over a non-view table; align SHOW TABLES test with new listTables contract; defensive null check on MetadataOnlyTable

- CreateV2ViewExec.run: probe both viewExists and tableExists up front.
  CREATE VIEW IF NOT EXISTS over a non-view table is now a no-op (v1 parity:
  see SQLViewSuite "existing a table with the duplicate name when CREATE VIEW
  IF NOT EXISTS"); the previous code called rejectIfTable() unconditionally
  before the allowExisting check and threw TABLE_OR_VIEW_ALREADY_EXISTS for
  what should be a no-op. Non-IF-NOT-EXISTS CREATE / OR REPLACE still surfaces
  the dedicated EXPECT_VIEW_NOT_TABLE / TABLE_OR_VIEW_ALREADY_EXISTS error.
  Drop the now-unused rejectIfTable / replaceArg trait helpers (and the
  AlterV2ViewExec override).
- DataSourceV2MetadataOnlyViewSuite: rename "SHOW TABLES on a v2 catalog
  includes views (v1 parity)" to "SHOW TABLES on a v2 catalog returns only
  tables" and flip the assertion. The new TableCatalog.listTables contract
  excludes views (per the file Javadoc); the previous test name + body asserted
  v1-parity which the implementation does not provide and ShowTablesExec is
  not changed by this PR. Documents the intentional v2 divergence.
- MetadataOnlyTable: Objects.requireNonNull on `info` and `name` so a
  connector that constructs the wrapper with nulls fails fast at
  construction time rather than producing cryptic NPEs in downstream
  consumers (DescribeTableExec's Name row, DataSourceV2Relation logging).

Co-authored-by: Isaac
- ViewInfo class doc: complete the dangling "construct." sentence with its
  direct object ("construct a ViewInfo") so the line reads as a complete
  thought.
- TableInfo Builder: replace the awkward use of "write" as a noun
  ("discards the convenience setter's write") with verb form ("discards
  the value the convenience setter wrote").

Co-authored-by: Isaac
…talog gating

Commit 66fa409 added `catalog.isInstanceOf[TableCatalog]` to
RelationResolution.tryResolvePersistent's gating but didn't add
TableCatalog to the explicit-list import block; CI failed at
catalyst compile with `not found: type TableCatalog`. Add the import.

Co-authored-by: Isaac
After the SUPPORTS_VIEW removal, views.scala and DataSourceV2Strategy.scala
no longer reference CatalogV2Util. Scala's fatal unused-imports warning
(Wconf cat=unused-imports) blocks the build. Drop the now-dead import in
both files.

Co-authored-by: Isaac
- ShowViewsExec: import order — CatalogV2Implicits.NamespaceHelper
  must precede ViewCatalog (alphabetic comparison after the shared
  `connector.catalog.` prefix). Caught by `sql/scalastyle`.
- DataSourceV2MetadataOnlyViewSuite (3 test failures from CI):
  * "ALTER VIEW preserves PROP_OWNER (v1-parity)": pre-seeded the view
    via `catalog.createTable(viewIdent, viewInfo)`. After the rework,
    TestingViewCatalog.createTable rejects ViewInfo (the new mixed-
    catalog contract: views go through ViewCatalog.createView). Use
    `createView` instead.
  * "DROP VIEW on a ViewCatalog drops the view" /
    "DROP VIEW on a StagingTableCatalog drops the view":
    asserted catalog.tableExists(viewIdent) before/after DROP, but
    tableExists now returns false for view-typed entries by design
    (the table side ignores views). Use `viewExists` instead.

Co-authored-by: Isaac
…_NOT_VIEW

Two changes, symmetric to the existing DropViewExec:

1. Issue dropTable / purgeTable directly and inspect its boolean return,
   instead of probing tableExists upfront and ignoring dropTable's value.
   Cuts the happy-path RPC count from 2 (tableExists + dropTable) to 1.
   IF EXISTS semantics are unchanged: false return + ifExists=true -> no-op,
   false return + ifExists=false -> noSuchTableError.

2. On false return, if the catalog also implements ViewCatalog and a view
   sits at the ident, throw EXPECT_TABLE_NOT_VIEW.NO_ALTERNATIVE instead of
   the generic noSuchTable error. Restores v1 DropTableCommand(isView=false)
   parity for mixed catalogs (which in the new design return tableExists=false
   for view idents). The viewExists probe runs only on the slow path
   (would-be "not found"), so happy-path catalogs see no extra RPC.

invalidateCache now runs after the drop succeeds rather than before;
matches DropViewExec's ordering, and the cache invalidation only mattered
when the drop succeeded anyway.

Co-authored-by: Isaac
…C opt-in)

Add a default `createOrReplaceView(ident, info)` method to ViewCatalog. The
default implementation tries `replaceView`, falling back to `createView` on
NoSuchViewException -- non-atomic across the two calls, but keeps plugin
authors free of the new method (override only when single-RPC atomic upsert
is worth the work, e.g. catalogs backed by `INSERT ... ON CONFLICT DO UPDATE`
or equivalent).

CreateV2ViewExec rework:

* Drop the upfront viewExists + tableExists probes. Just call createView /
  createOrReplaceView and decode the catalog's response.
* Happy-path RPCs:
  - CREATE VIEW (no collision): 1 (was 3)
  - CREATE OR REPLACE VIEW (any state): 1 (was 3, plus the catch-and-fallback)
  - CREATE VIEW IF NOT EXISTS over an existing view: 1 (early viewExists
    short-circuit kept; matches v1 `CreateViewCommand.run` behavior of
    skipping aliasPlan / config capture in the no-op path).
* Cross-type collision (mixed catalog with a table at ident): the catalog's
  ViewAlreadyExistsException is caught, then a single tableExists check
  decodes the situation. Plain CREATE / OR REPLACE throws
  EXPECT_VIEW_NOT_TABLE; IF NOT EXISTS is a no-op (v1 parity).

Co-authored-by: Isaac
ViewInfo.java:64 -- "remember withTableType(VIEW)" parses as "remember [the
noun-form method call]"; add the missing infinitive ("remember to call
withTableType(VIEW)") so the sentence reads cleanly.

Co-authored-by: Isaac
…s list

V2AlterViewPreparation scaladoc listed "schema mode" among the transient
fields re-captured from the session, contradicting the same scaladoc two
lines earlier ("preserve ... schema binding mode") and the implementation
at lines 78-86 (viewSchemaMode reads existingView.schemaMode -- preserved,
not re-captured).

Co-authored-by: Isaac
…bles and views

Reshapes the table-and-view catalog story around an explicit
`RelationCatalog extends TableCatalog, ViewCatalog`. The new interface
owns all the cross-cutting rules; `TableCatalog` and `ViewCatalog`
revert to strict single-kind APIs.

Why: in the previous shape, `TableCatalog.loadTable` carried a perf
opt-in (return `MetadataOnlyTable(ViewInfo)` for views). That leaked
through `tableExists`'s default impl (which calls `loadTable`),
silently breaking three independent framework call sites for any
mixed catalog that used the opt-in without overriding `tableExists`.
The new shape makes the perf opt-in explicit on a dedicated method
and removes the leak by construction.

API:
- New `RelationCatalog` (`@Evolving`, since 4.2.0) with
  `loadRelation(ident)` and a default `listRelations(namespace)`. Class
  Javadoc owns the "two principles" (orthogonal interfaces + single
  identifier namespace) and the per-method cross-type contract tables
  (active rejection on writes, passive filtering on reads).
- `TableCatalog.loadTable` Javadoc tightened: tables only. The "may
  return MetadataOnlyTable wrapping ViewInfo" clause is gone.
  `tableExists`'s default impl is now correct under any catalog.
- `TableCatalog` and `ViewCatalog` class-level docs read as strict
  single-kind APIs and point mixed implementers at `RelationCatalog`.
  Per-method cross-type clauses are stripped from both interfaces.

Enforcement: `Catalogs.load` rejects any plugin that implements both
`TableCatalog` and `ViewCatalog` directly without `RelationCatalog`,
with a clear error message naming the right interface.

Resolver: `Analyzer.lookupTableOrView` and
`RelationResolution.tryResolvePersistent` check `RelationCatalog`
first and call `loadRelation` (one RPC); otherwise fall through to
the existing `loadTable` -> `loadView` two-step.

Test catalogs:
- `TestingViewCatalog` extends `RelationCatalog`; `loadTable` is
  tightened to tables-only, view fixtures move to `loadRelation`.
- `TestingStagingCatalog` extends `StagingTableCatalog with
  RelationCatalog`.
- `TestingTableOnlyCatalog`'s dead view fixture is removed.

Co-authored-by: Isaac
…m loadRelation; rename listRelations -> listRelationSummaries

Two API refinements on top of the previous commit, both targeted at
the implementer's experience:

1. Default impls on `RelationCatalog` for `loadTable`, `loadView`,
   `tableExists`, `viewExists` -- each derives from `loadRelation` by
   matching `MetadataOnlyTable + ViewInfo` and routing the result to
   the right kind. A `RelationCatalog` author writes the read-side
   lookup once (in `loadRelation`); the four kind-specific accessors
   come for free. Implementers can still override any of them for a
   cheaper kind-specific path; otherwise the defaults are correct by
   construction.

2. Rename `RelationCatalog.listRelations` to
   `RelationCatalog.listRelationSummaries`, mirroring the existing
   `TableCatalog.listTableSummaries` / `ViewCatalog.listViews`
   convention -- the method returns `TableSummary[]`, so "summaries"
   is the accurate noun. The unified entry point reads as the
   summary-of-relations counterpart to the kind-specific summary
   methods.

Test catalogs trimmed: `TestingViewCatalog` and `TestingStagingCatalog`
now override only `loadRelation` on the read side; `loadTable`,
`loadView`, `tableExists`, and (for the latter) `viewExists` use the
new `RelationCatalog` defaults.

Co-authored-by: Isaac
…rop TestingStagingCatalog

Three cleanups from a self-review pass:

1. Rename `TestingViewCatalog` to `TestingRelationCatalog`. The class
   extends `RelationCatalog` and stores both tables and views in the
   same backing map; the old name (a leftover from when the interface
   was called `ViewCatalog`) suggested view-only and was misleading.

2. Drop `TestingStagingCatalog` and its 4 associated tests. The
   fixture's only real assertion was that `stageCreate` /
   `stageReplace` / `stageCreateOrReplace` throw "must not be invoked
   by view DDL" -- a defense against a regression that the type
   system already prevents (view DDL execs call
   `createView` / `replaceView` / `dropView` directly on the
   `ViewCatalog` half of the interface; the staging methods take
   `TableInfo`, not `ViewInfo`, so they're unreachable from view DDL
   by construction). The four CREATE / ALTER / DROP VIEW tests on
   the staging catalog were duplicates of the corresponding tests on
   `TestingRelationCatalog` -- same SQL, same assertions, the
   `StagingTableCatalog` mixin contributes nothing observable.

3. Clarify three contract-claim inline comments that didn't accurately
   describe the code below them:
   - `TestingRelationCatalog.createTable` claimed the method rejected
     when the ident was already a view, but the `instanceof ViewInfo`
     check inspected the input parameter, not catalog state. The
     actual rejection comes from the shared-keyspace `putIfAbsent`
     collision below.
   - `RelationResolution.tryResolvePersistent` said "the fallback is
     gated on neither" -- ambiguous; clarified to "fires only when
     both are absent".
   - `DataSourceV2Strategy.ShowViews` said "this case only sees
     non-session ViewCatalog catalogs"; a session-catalog override
     that mixes in `ViewCatalog` would also reach this case.

View suite test count: 60 -> 56.

Co-authored-by: Isaac
…p; minor typo

Self-review findings:

- DropTableExec.run: removing the upfront tableExists() probe regressed
  `DROP TABLE IF EXISTS X PURGE` on missing tables when the catalog does not
  override TableCatalog.purgeTable (the default impl throws unconditionally).
  Pre-PR was a clean no-op; post-PR threw UNSUPPORTED_FEATURE.PURGE_TABLE.
  Restore the existence guard for the purge path; the perf gain on plain
  DROP TABLE is preserved. Pin the no-op with a v2 DropTableSuite test.

- DataSourceV2MetadataOnlyViewSuite: typo "name space" -> "namespace".

Co-authored-by: Isaac
HyukjinKwon pushed a commit that referenced this pull request Apr 26, 2026
…output

### What changes were proposed in this pull request?

Adds a diagnostic banner to the unidoc step in `docs/_plugins/build_api_docs.rb`. When `build/sbt unidoc` fails, the script now scans the captured sbt output and prints a framed summary naming the `<Class>.html` javadoc was generating when it died, the inferred source class to audit, and a one-paragraph hint about the usual scaladoc triggers.

Implementation:
- `stream_and_capture` tees sbt output to both stdout and `target/unidoc-build.log` (Ruby-only, no shell `pipefail` reliance).
- `diagnose_unidoc_failure` finds the last `Generating .../<Class>.html...` line before `javadoc exited with exit code N` and prints a culprit-pointer banner. ANSI colour codes are stripped before regex matching.
- When the failure mode doesn't match the mid-HTML-crash pattern (e.g. scaladoc failure, sbt env issue), the banner says so and points back to the full log.

### Why are the changes needed?

Today, when javadoc hard-exits during unidoc HTML generation -- typically because of a specific scaladoc construct (e.g. wiki-style `[[Class]]` links or backtick-inline code refs) in an exposed Scala source -- the failing PR's CI log shows ~100 `[error]` lines on `target/java/...` files. Those errors are benign: they're genjavadoc-emitted Java stubs (`static public abstract R apply(T1, T2, T3, T4)`) that every PR produces, and `javadoc` always complains about them but normally still finishes. They are not the cause of the failure.

The actual signal is the last `Generating .../<Class>.html...` line before `javadoc exited with exit code 1`, which a developer has to find by hand in a multi-thousand-line log. The error reporting does not differentiate the benign noise from the real crash, so the failure consistently looks like it's "in" `ErrorInfo.java` / `LexicalThreadLocal.java` / similar, when it's actually in a Scala source that none of those names point to.

A recent example: PR #51419 hit this exact misdirection -- the log was full of errors on `common/utils/target/java/...` stubs, but the real culprit was a doc comment in `CatalogV2Implicits.IdentifierHelper` that triggered a hard exit during HTML generation. The diagnostic in this PR would have named that class directly.

### Does this PR introduce _any_ user-facing change?

No. CI-only output change visible in the unidoc step of the doc-gen job.

### How was this patch tested?

- Dry-ran the parser logic against the captured failing log from PR #51419 -- it correctly extracts `org/apache/spark/sql/connector/catalog/CatalogV2Implicits.IdentifierHelper.html` as the crash class.
- The second commit on this branch (`DO NOT MERGE: break a docstring to validate the unidoc diagnostic`) intentionally reintroduces the same `[[...]]`+backtick-inline scaladoc pattern in `CatalogV2Implicits.IdentifierHelper.asTableIdentifierOpt` so that this PR's CI run actually exercises the new path. Once the banner fires and names that class on the failing CI run, that commit will be dropped from this PR.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude (Anthropic)

Closes #55548 from cloud-fan/javadoc-error-reporting.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
…le paths

CI failures revealed two regressions caused by removing the upfront
`tableExists` probe in DropTableExec:

1. `JDBCTableCatalog.dropTable` does not honor the v2 contract of returning
   false for missing tables -- e.g. Derby raises a SQLSyntaxErrorException,
   surfaced as `FAILED_JDBC.DROP_TABLE` instead of a clean no-op.
   (DerbyTableCatalogSuite cleanup after RENAME hits this.)
2. The hive `DropTableSuite.hive client calls` test counts the upfront
   `tableExists` call.

Restore the original guard structure -- `tableExists` -> drop/purge if
present, else fall through to the IF-EXISTS / view-fallback / not-found
branches. This subsumes the previous purge-only fix and matches pre-PR
behavior for `dropTable` while still adding the new `viewExists` fallback
for the `EXPECT_TABLE_NOT_VIEW` error on mixed catalogs.

Co-authored-by: Isaac
Two lines exceeded the 100-char limit:
- Analyzer.scala:1112 in the lookupTableOrView Scaladoc.
- Catalogs.scala:123 in the validateRelationCatalog error message.

These cascaded into 7 CI failures (Maven Java 17/25, linters, docs, sparkr,
Docker integration, TPC-DS) all rooted in
`(catalyst / scalaStyleOnCompile) Failing because of negative scalastyle result`.

Co-authored-by: Isaac
`@inheritDoc` is only valid as an inline tag (`{@inheritdoc}`); using it
as a block tag is undefined. RelationCatalog had it written as the block
form on the four `default` overrides (loadTable, loadView, tableExists,
viewExists), which is the only such occurrence in the entire Spark Java
tree -- every other file uses `{@inheritdoc}`.

Suspected trigger of the recent unidoc failures: javadoc 17 dies in
"Building tree" with no per-class diagnostic when this pattern combines
with default-method overrides on an interface that inherits from two
parents. Surface symptom: `javadoc exited with exit code 1` after
"Building tree for all the packages and classes...", before any
per-class HTML generation begins (so SPARK-56630's diagnostic helper
correctly reports "no class HTML generation was in progress").

Co-authored-by: Isaac
chenwang-databricks added a commit to chenwang-databricks/spark that referenced this pull request Apr 27, 2026
…2 catalogs

Adds support for CREATE VIEW ... WITH METRICS and corresponding DROP VIEW
routing on non-session DataSource V2 catalogs, on top of cloud-fan/spark:v1-v2
(PR apache#51419, "Add MetadataOnlyTable and CREATE/ALTER VIEW support for DS v2
catalogs").

What's new

* TableInfo gains a typed `viewDependencies` field (structured dependency list
  does not round-trip cleanly through flat string properties).
* CatalogTableType.METRIC_VIEW and TableSummary.METRIC_VIEW_TABLE_TYPE.
* CreateMetricViewCommand's V2 path gates on
  TableCatalogCapability.SUPPORTS_VIEW, emits PROP_VIEW_TEXT (YAML) +
  PROP_TABLE_TYPE=METRIC_VIEW + PROP_VIEW_CURRENT_CATALOG_AND_NAMESPACE +
  view.sqlConfig.* + metric_view.* via the Builder introduced in apache#51419.
* DROP VIEW on V2 metric views routes to a new DropMetricViewExec; DROP TABLE
  on metric views is rejected with WRONG_COMMAND_FOR_OBJECT_TYPE. V1
  DropTableCommand and V2SessionCatalog accept DROP VIEW on METRIC_VIEW.
* V1Table round-trips METRIC_VIEW table type via PROP_TABLE_TYPE, including
  expanding PROP_VIEW_CURRENT_CATALOG_AND_NAMESPACE into the numbered
  view.catalogAndNamespace.* keys the V1 resolution path expects.

Stacks on apache#51419; keep this PR in draft until it merges.
…th explicit prose

The {@inheritdoc} -> block-form fix unblocked the obvious invalid-tag case,
but doc gen still fails in the same "Building tree" phase. Hypothesis:
{@inheritdoc} on a default-method override of a default method (the parent
override is itself default in TableCatalog/ViewCatalog) on a multi-parent
interface still trips javadoc 17's tree builder, even with the inline form.

Replace each of the four overrides' {@inheritdoc} with explicit @param /
@return / @throws so the doclet has no inheritance link to traverse.
The descriptive text below {@inheritdoc} is preserved verbatim. Behavioral
equivalent to inheriting + augmenting; just no tag chain.

If this push turns the docs job green, the {@inheritdoc} chain on multi-
parent default-method overrides is the trigger and we should keep this
shape; if it stays red, the trigger is elsewhere and we'll bisect further.

Co-authored-by: Isaac
…build crash

The unidoc failure on this PR has the same shape across every commit:
  Building tree for all the packages and classes...
  javadoc exited with exit code 1
No stack trace, no NPE keyword, no class name. Both the block-form and inline
{@inheritdoc} hypotheses turned out wrong (the block-fix and the explicit-prose
replacement both kept the failure). Master + this PR have identical sets of
source errors before "Building tree", so the difference must be a runtime
failure inside the doclet itself when traversing the new public types.

Add `-J-Xlog:exceptions=info` (java 11+) to the unidoc javadoc options so any
uncaught Throwable from the standard doclet's tree builder is dumped to stderr.
On the next CI run, this will show whether the crash is an NPE, an
IllegalStateException, or something else, and where it came from. Will revert
once we have a class/line to fix.

Co-authored-by: Isaac
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants