[SPARK-52729][SQL] Add MetadataOnlyTable and CREATE/ALTER VIEW support for DS v2 catalogs#51419
[SPARK-52729][SQL] Add MetadataOnlyTable and CREATE/ALTER VIEW support for DS v2 catalogs#51419cloud-fan wants to merge 56 commits intoapache:masterfrom
Conversation
There was a problem hiding this comment.
I think the current view implementation which stores the original SQL text and a bunch of context is too convoluted to put into the public DS v2 API. It's better if the view text is context-independent. We are going to improve it in #51410
Before the improvement is done, we only allow to read DS v2 views that has context-independent SQL text.
There was a problem hiding this comment.
Don't we store the current catalog and namespace in the view metadata? Do we expect the connectors to modify the view SQL text?
There was a problem hiding this comment.
My proposal is to let Spark modify the view text before saving it into the catalog, so that the catalog does not need to store the current catalog/namespace.
There was a problem hiding this comment.
So all identifiers in the view text will always include the catalog name as the first name part and the table name as the last name part? How hard will it be to modify the original SQL text? Will it cause any surprises to the users if the original and the persisted SQL text differ?
There was a problem hiding this comment.
So all identifiers in the view text will always include the catalog name as the first name part and the table name as the last name part? How hard will it be to modify the original SQL text? Will it cause any surprises to the users if the original and the persisted SQL text differ?
There was a problem hiding this comment.
Hive already did so and its view has both "view_text" and "original_view_text" fields. All identifiers should be fully qualified (with catalog name and namespace) in the view text.
There was a problem hiding this comment.
how about SPARK_TABLE_OR_VIEW
There was a problem hiding this comment.
shall we explicitly mention what operations will be affected?(read/write/DDL/...)
There was a problem hiding this comment.
The capability has since been reshaped: TableCapability is gone in favor of the concrete MetadataOnlyTable for the read side, and a TableCatalogCapability.SUPPORTS_VIEW gate for the write side. The SUPPORTS_VIEW javadoc now spells out the affected operations explicitly (CREATE VIEW / CREATE OR REPLACE VIEW / CREATE VIEW IF NOT EXISTS via createTable, ALTER VIEW ... AS via dropTable+createTable or stageReplace, and the read-path round-trip through MetadataOnlyTable). PTAL and let me know if anything is still unclear.
|
I'd love to take a look on Monday. |
There was a problem hiding this comment.
I understand we may want to use the new API to expose views, but what about the case with Spark table? When would this be helpful?
There was a problem hiding this comment.
For example, UC does not want to rely on the Spark file source, but just set the table provider to the file source name and leave read/write to Spark. Today this is done by a hack with V1Table: https://github.com/unitycatalog/unitycatalog/blob/main/connectors/spark/src/main/scala/io/unitycatalog/spark/UCSingleCatalog.scala#L303
There was a problem hiding this comment.
Ah, got it. The use case is catalogs that govern tables but not necessarily implement read/write logic.
Have we considered offering a generic V2 table implementation for built-in formats that would be accessible to external connectors? Essentially, a public version of V1Table that doesn't need to expose CatalogTable. If we go with the table capability approach, then each connector will have to implement a custom V2 table for it to be simply replaced as a Parquet table or view. Each connector would have to be aware of how the translation will be done in Spark. For instance, it seems like we assume that serdeProps will be with prefixed with option..
Just curious to know your thinking, I don't have a strong opinion here.
There was a problem hiding this comment.
I see, this makes sense. A concrete class is easier to use than a table capability.
aaa56bc to
b5c909e
Compare
5d5a508 to
5517dee
Compare
| } | ||
|
|
||
| // TODO: move the v2 data source table handling from V2SessionCatalog to the analyzer | ||
| ignore("v2 data source table") { |
There was a problem hiding this comment.
Shall we add this test case later?
There was a problem hiding this comment.
This will be fixed shortly after the PR is merged.
| * implementing read/write directly. It represents a general Spark data source table or | ||
| * a Spark view, and relies on Spark to interpret the table metadata, resolve the table | ||
| * provider into a data source, or read it as a view. | ||
| * This affects the table read/write operations but not DDL operations. |
There was a problem hiding this comment.
This affects the table read/write operations but not DDL operations.
It seems a bit unclear. Before the change, DDL operations of DSV2 tables relies on Spark too.
There was a problem hiding this comment.
maybe we should just remove this line? DDL operations do not call read/write APIs of Table anyway.
szehon-ho
left a comment
There was a problem hiding this comment.
Makes sense, just wondering if some of it can be less bound to HMS concepts
| return this; | ||
| } | ||
|
|
||
| public Builder withSerdeProps(Map<String, String> serdeProps) { |
There was a problem hiding this comment.
Just an opinion, serde is quite bound to HMS (?) and the newer DSV2 Catalog dont have that, will it make sense to abstract this (maybe something like 'storageProperties')? This is just for HMS table support?
There was a problem hiding this comment.
good point. I was trying to save the work of splitting serde properties from table properties, but API simplicity is more important.
| database = Some(ident.namespace().lastOption.getOrElse("root")), | ||
| catalog = Some(catalog.name())), | ||
| tableType = tableType, | ||
| storage = CatalogStorageFormat.empty.copy( |
There was a problem hiding this comment.
curious, do we not need to set inputFormat/outputFormat?
There was a problem hiding this comment.
V1Table itself does not access the input/output format when translating v1 table to v2 table. I think Hive table is never supported by DS v2 and we will not handle it here.
szehon-ho
left a comment
There was a problem hiding this comment.
Not entirely familiar with the end use case, but the API looks good now, thanks
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Hi, @cloud-fan . Do you have a plan to proceed this forward?
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
# Conflicts: # sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala
SPARK-39660 split v2 DESCRIBE TABLE PARTITION off into its own DescribeTablePartition plan and dropped `partitionSpec` from DescribeRelation. Our v2-view pin case had 4 wildcards; reduce to 3 to match the new 3-field case class. Co-authored-by: Isaac
Javadoc died mid-stream while generating CatalogV2Implicits.IdentifierHelper.html (the failing PR's log stops exactly there; the succeeding PRs continue past to MultipartIdentifierHelper and CatalogV2Util). The only diff in IdentifierHelper on this branch was the new asLegacyTableIdentifier method, whose scaladoc used `[[TableIdentifier]]` / `[[toQualifiedNameParts]]` / backtick-inlined code refs. Something in that doc tripped javadoc into a hard exit (not a warning) instead of a broken-link warning. Fix: downgrade both new scaladoc blocks in the exposed-to-javadoc connector/catalog package to plain `//` comments so genjavadoc doesn't emit them into the Java stub at all: - CatalogV2Implicits.IdentifierHelper.asLegacyTableIdentifier - CatalogV2Util.supportsView (same risky pattern, hasn't been reached yet because javadoc died earlier, but would break next) The method names are self-documenting; internal callers don't need the scaladoc. Co-authored-by: Isaac
Restore ViewCatalog as the plugin-facing API for view-only catalogs and
view DDL operations, instead of routing views through TableCatalog under
a SUPPORTS_VIEW capability flag. Catalog-implementer ergonomics:
* Pure view-only: implement ViewCatalog. 5 methods (listViews,
loadView/createView/replaceView/dropView), default viewExists. No
instanceof, no capability declaration, no TableCatalog stubs.
* Pure tables: implement TableCatalog. Same as today.
* Mixed (Iceberg/UC): implement both interfaces independently. Single
cross-cutting invariant -- one identifier namespace; createTable
rejects view-collisions, createView rejects table-collisions.
ViewCatalog API:
Identifier[] listViews(String[] namespace);
ViewInfo loadView(Identifier);
ViewInfo createView(Identifier, ViewInfo);
ViewInfo replaceView(Identifier, ViewInfo); // atomic per-call
boolean dropView(Identifier);
default boolean viewExists(Identifier);
No StagingViewCatalog -- view REPLACE writes only metadata, so a single
transactional metastore call (or equivalent) is sufficient. CREATE OR
REPLACE VIEW probes viewExists then dispatches createView/replaceView.
Spark-side dispatch:
* Analyzer.lookupTableOrView: try TableCatalog.loadTable first; on
NoSuchTableException, if catalog is ViewCatalog, fall back to
loadView and synthesize ResolvedPersistentView.
* Mixed-catalog perf opt-in: loadTable may return
MetadataOnlyTable(ViewInfo) for view idents, short-circuiting the
second RPC. Documented on TableCatalog#loadTable.
* DataSourceV2Strategy: routes CREATE/ALTER/DROP/SHOW VIEWS to
ViewCatalog only; staging branches removed.
* ResolveSessionCatalog: SUPPORTS_VIEW guards replaced with
instanceof ViewCatalog.
Internal: V1Table.toCatalogTable for ViewInfo is now public so the
analyzer can synthesize CatalogTable from a loadView result for the
session-catalog v1 view-resolution path.
Out of scope for this commit:
* Test suite rewrite (DataSourceV2MetadataOnlyViewSuite still uses
SUPPORTS_VIEW and TestingStagingCatalog) -- broken until the
follow-up commit.
* Lifting the session-catalog gate on DESCRIBE/SHOW CREATE TABLE/SHOW
COLUMNS/SHOW TBLPROPERTIES for v2 views -- still pinned with
UNSUPPORTED_FEATURE.TABLE_OPERATION; tracked as follow-up.
Co-authored-by: Isaac
…ewCatalog The structural rework removed TableCatalogCapability.SUPPORTS_VIEW and introduced ViewCatalog as the plugin-facing API for views. The existing test catalogs (TestingViewCatalog, TestingStagingCatalog) now implement both TableCatalog and ViewCatalog, sharing one identifier-keyed map per the mixed-catalog contract. Storage value's runtime type (ViewInfo vs TableInfo) distinguishes views from tables on each lookup; tableExists / listTables exclude view entries, viewExists / listViews include only views, and createTable / createView each reject cross-type collisions. Test-name renames replace "without SUPPORTS_VIEW" with "without ViewCatalog" to track the new API. The rest of the test bodies are unchanged. Co-authored-by: Isaac
…ec test names - Analyzer.lookupTableOrView and RelationResolution.tryResolvePersistent skip CatalogV2Util.loadTable for pure ViewCatalogs (no TableCatalog mixin), so asTableCatalog no longer throws MISSING_CATALOG_ABILITY.TABLES and masks the legitimate loadView fallback. SELECT and ALTER VIEW now work end-to-end on a pure ViewCatalog. - Add a TestingViewOnlyCatalog fixture (no TableCatalog mixin) plus read and ALTER VIEW tests that exercise the loadView fallback. - DataSourceV2MetadataOnlyViewSuite: rename "uses the atomic exec" tests to reflect that view DDL routes through ViewCatalog.createView / replaceView (no separate staging variant); drop now-dead RecordingStagedTable; replace TestingStagingCatalog's stage* method bodies with explicit "must not be invoked by view DDL" throws so any future regression that misroutes through the staging API surfaces immediately. Co-authored-by: Isaac
… over a non-view table; align SHOW TABLES test with new listTables contract; defensive null check on MetadataOnlyTable - CreateV2ViewExec.run: probe both viewExists and tableExists up front. CREATE VIEW IF NOT EXISTS over a non-view table is now a no-op (v1 parity: see SQLViewSuite "existing a table with the duplicate name when CREATE VIEW IF NOT EXISTS"); the previous code called rejectIfTable() unconditionally before the allowExisting check and threw TABLE_OR_VIEW_ALREADY_EXISTS for what should be a no-op. Non-IF-NOT-EXISTS CREATE / OR REPLACE still surfaces the dedicated EXPECT_VIEW_NOT_TABLE / TABLE_OR_VIEW_ALREADY_EXISTS error. Drop the now-unused rejectIfTable / replaceArg trait helpers (and the AlterV2ViewExec override). - DataSourceV2MetadataOnlyViewSuite: rename "SHOW TABLES on a v2 catalog includes views (v1 parity)" to "SHOW TABLES on a v2 catalog returns only tables" and flip the assertion. The new TableCatalog.listTables contract excludes views (per the file Javadoc); the previous test name + body asserted v1-parity which the implementation does not provide and ShowTablesExec is not changed by this PR. Documents the intentional v2 divergence. - MetadataOnlyTable: Objects.requireNonNull on `info` and `name` so a connector that constructs the wrapper with nulls fails fast at construction time rather than producing cryptic NPEs in downstream consumers (DescribeTableExec's Name row, DataSourceV2Relation logging). Co-authored-by: Isaac
- ViewInfo class doc: complete the dangling "construct." sentence with its
direct object ("construct a ViewInfo") so the line reads as a complete
thought.
- TableInfo Builder: replace the awkward use of "write" as a noun
("discards the convenience setter's write") with verb form ("discards
the value the convenience setter wrote").
Co-authored-by: Isaac
…talog gating Commit 66fa409 added `catalog.isInstanceOf[TableCatalog]` to RelationResolution.tryResolvePersistent's gating but didn't add TableCatalog to the explicit-list import block; CI failed at catalyst compile with `not found: type TableCatalog`. Add the import. Co-authored-by: Isaac
After the SUPPORTS_VIEW removal, views.scala and DataSourceV2Strategy.scala no longer reference CatalogV2Util. Scala's fatal unused-imports warning (Wconf cat=unused-imports) blocks the build. Drop the now-dead import in both files. Co-authored-by: Isaac
- ShowViewsExec: import order — CatalogV2Implicits.NamespaceHelper
must precede ViewCatalog (alphabetic comparison after the shared
`connector.catalog.` prefix). Caught by `sql/scalastyle`.
- DataSourceV2MetadataOnlyViewSuite (3 test failures from CI):
* "ALTER VIEW preserves PROP_OWNER (v1-parity)": pre-seeded the view
via `catalog.createTable(viewIdent, viewInfo)`. After the rework,
TestingViewCatalog.createTable rejects ViewInfo (the new mixed-
catalog contract: views go through ViewCatalog.createView). Use
`createView` instead.
* "DROP VIEW on a ViewCatalog drops the view" /
"DROP VIEW on a StagingTableCatalog drops the view":
asserted catalog.tableExists(viewIdent) before/after DROP, but
tableExists now returns false for view-typed entries by design
(the table side ignores views). Use `viewExists` instead.
Co-authored-by: Isaac
…_NOT_VIEW Two changes, symmetric to the existing DropViewExec: 1. Issue dropTable / purgeTable directly and inspect its boolean return, instead of probing tableExists upfront and ignoring dropTable's value. Cuts the happy-path RPC count from 2 (tableExists + dropTable) to 1. IF EXISTS semantics are unchanged: false return + ifExists=true -> no-op, false return + ifExists=false -> noSuchTableError. 2. On false return, if the catalog also implements ViewCatalog and a view sits at the ident, throw EXPECT_TABLE_NOT_VIEW.NO_ALTERNATIVE instead of the generic noSuchTable error. Restores v1 DropTableCommand(isView=false) parity for mixed catalogs (which in the new design return tableExists=false for view idents). The viewExists probe runs only on the slow path (would-be "not found"), so happy-path catalogs see no extra RPC. invalidateCache now runs after the drop succeeds rather than before; matches DropViewExec's ordering, and the cache invalidation only mattered when the drop succeeded anyway. Co-authored-by: Isaac
…C opt-in)
Add a default `createOrReplaceView(ident, info)` method to ViewCatalog. The
default implementation tries `replaceView`, falling back to `createView` on
NoSuchViewException -- non-atomic across the two calls, but keeps plugin
authors free of the new method (override only when single-RPC atomic upsert
is worth the work, e.g. catalogs backed by `INSERT ... ON CONFLICT DO UPDATE`
or equivalent).
CreateV2ViewExec rework:
* Drop the upfront viewExists + tableExists probes. Just call createView /
createOrReplaceView and decode the catalog's response.
* Happy-path RPCs:
- CREATE VIEW (no collision): 1 (was 3)
- CREATE OR REPLACE VIEW (any state): 1 (was 3, plus the catch-and-fallback)
- CREATE VIEW IF NOT EXISTS over an existing view: 1 (early viewExists
short-circuit kept; matches v1 `CreateViewCommand.run` behavior of
skipping aliasPlan / config capture in the no-op path).
* Cross-type collision (mixed catalog with a table at ident): the catalog's
ViewAlreadyExistsException is caught, then a single tableExists check
decodes the situation. Plain CREATE / OR REPLACE throws
EXPECT_VIEW_NOT_TABLE; IF NOT EXISTS is a no-op (v1 parity).
Co-authored-by: Isaac
ViewInfo.java:64 -- "remember withTableType(VIEW)" parses as "remember [the
noun-form method call]"; add the missing infinitive ("remember to call
withTableType(VIEW)") so the sentence reads cleanly.
Co-authored-by: Isaac
…s list
V2AlterViewPreparation scaladoc listed "schema mode" among the transient
fields re-captured from the session, contradicting the same scaladoc two
lines earlier ("preserve ... schema binding mode") and the implementation
at lines 78-86 (viewSchemaMode reads existingView.schemaMode -- preserved,
not re-captured).
Co-authored-by: Isaac
…bles and views Reshapes the table-and-view catalog story around an explicit `RelationCatalog extends TableCatalog, ViewCatalog`. The new interface owns all the cross-cutting rules; `TableCatalog` and `ViewCatalog` revert to strict single-kind APIs. Why: in the previous shape, `TableCatalog.loadTable` carried a perf opt-in (return `MetadataOnlyTable(ViewInfo)` for views). That leaked through `tableExists`'s default impl (which calls `loadTable`), silently breaking three independent framework call sites for any mixed catalog that used the opt-in without overriding `tableExists`. The new shape makes the perf opt-in explicit on a dedicated method and removes the leak by construction. API: - New `RelationCatalog` (`@Evolving`, since 4.2.0) with `loadRelation(ident)` and a default `listRelations(namespace)`. Class Javadoc owns the "two principles" (orthogonal interfaces + single identifier namespace) and the per-method cross-type contract tables (active rejection on writes, passive filtering on reads). - `TableCatalog.loadTable` Javadoc tightened: tables only. The "may return MetadataOnlyTable wrapping ViewInfo" clause is gone. `tableExists`'s default impl is now correct under any catalog. - `TableCatalog` and `ViewCatalog` class-level docs read as strict single-kind APIs and point mixed implementers at `RelationCatalog`. Per-method cross-type clauses are stripped from both interfaces. Enforcement: `Catalogs.load` rejects any plugin that implements both `TableCatalog` and `ViewCatalog` directly without `RelationCatalog`, with a clear error message naming the right interface. Resolver: `Analyzer.lookupTableOrView` and `RelationResolution.tryResolvePersistent` check `RelationCatalog` first and call `loadRelation` (one RPC); otherwise fall through to the existing `loadTable` -> `loadView` two-step. Test catalogs: - `TestingViewCatalog` extends `RelationCatalog`; `loadTable` is tightened to tables-only, view fixtures move to `loadRelation`. - `TestingStagingCatalog` extends `StagingTableCatalog with RelationCatalog`. - `TestingTableOnlyCatalog`'s dead view fixture is removed. Co-authored-by: Isaac
…m loadRelation; rename listRelations -> listRelationSummaries Two API refinements on top of the previous commit, both targeted at the implementer's experience: 1. Default impls on `RelationCatalog` for `loadTable`, `loadView`, `tableExists`, `viewExists` -- each derives from `loadRelation` by matching `MetadataOnlyTable + ViewInfo` and routing the result to the right kind. A `RelationCatalog` author writes the read-side lookup once (in `loadRelation`); the four kind-specific accessors come for free. Implementers can still override any of them for a cheaper kind-specific path; otherwise the defaults are correct by construction. 2. Rename `RelationCatalog.listRelations` to `RelationCatalog.listRelationSummaries`, mirroring the existing `TableCatalog.listTableSummaries` / `ViewCatalog.listViews` convention -- the method returns `TableSummary[]`, so "summaries" is the accurate noun. The unified entry point reads as the summary-of-relations counterpart to the kind-specific summary methods. Test catalogs trimmed: `TestingViewCatalog` and `TestingStagingCatalog` now override only `loadRelation` on the read side; `loadTable`, `loadView`, `tableExists`, and (for the latter) `viewExists` use the new `RelationCatalog` defaults. Co-authored-by: Isaac
…rop TestingStagingCatalog
Three cleanups from a self-review pass:
1. Rename `TestingViewCatalog` to `TestingRelationCatalog`. The class
extends `RelationCatalog` and stores both tables and views in the
same backing map; the old name (a leftover from when the interface
was called `ViewCatalog`) suggested view-only and was misleading.
2. Drop `TestingStagingCatalog` and its 4 associated tests. The
fixture's only real assertion was that `stageCreate` /
`stageReplace` / `stageCreateOrReplace` throw "must not be invoked
by view DDL" -- a defense against a regression that the type
system already prevents (view DDL execs call
`createView` / `replaceView` / `dropView` directly on the
`ViewCatalog` half of the interface; the staging methods take
`TableInfo`, not `ViewInfo`, so they're unreachable from view DDL
by construction). The four CREATE / ALTER / DROP VIEW tests on
the staging catalog were duplicates of the corresponding tests on
`TestingRelationCatalog` -- same SQL, same assertions, the
`StagingTableCatalog` mixin contributes nothing observable.
3. Clarify three contract-claim inline comments that didn't accurately
describe the code below them:
- `TestingRelationCatalog.createTable` claimed the method rejected
when the ident was already a view, but the `instanceof ViewInfo`
check inspected the input parameter, not catalog state. The
actual rejection comes from the shared-keyspace `putIfAbsent`
collision below.
- `RelationResolution.tryResolvePersistent` said "the fallback is
gated on neither" -- ambiguous; clarified to "fires only when
both are absent".
- `DataSourceV2Strategy.ShowViews` said "this case only sees
non-session ViewCatalog catalogs"; a session-catalog override
that mixes in `ViewCatalog` would also reach this case.
View suite test count: 60 -> 56.
Co-authored-by: Isaac
…p; minor typo Self-review findings: - DropTableExec.run: removing the upfront tableExists() probe regressed `DROP TABLE IF EXISTS X PURGE` on missing tables when the catalog does not override TableCatalog.purgeTable (the default impl throws unconditionally). Pre-PR was a clean no-op; post-PR threw UNSUPPORTED_FEATURE.PURGE_TABLE. Restore the existence guard for the purge path; the perf gain on plain DROP TABLE is preserved. Pin the no-op with a v2 DropTableSuite test. - DataSourceV2MetadataOnlyViewSuite: typo "name space" -> "namespace". Co-authored-by: Isaac
…output ### What changes were proposed in this pull request? Adds a diagnostic banner to the unidoc step in `docs/_plugins/build_api_docs.rb`. When `build/sbt unidoc` fails, the script now scans the captured sbt output and prints a framed summary naming the `<Class>.html` javadoc was generating when it died, the inferred source class to audit, and a one-paragraph hint about the usual scaladoc triggers. Implementation: - `stream_and_capture` tees sbt output to both stdout and `target/unidoc-build.log` (Ruby-only, no shell `pipefail` reliance). - `diagnose_unidoc_failure` finds the last `Generating .../<Class>.html...` line before `javadoc exited with exit code N` and prints a culprit-pointer banner. ANSI colour codes are stripped before regex matching. - When the failure mode doesn't match the mid-HTML-crash pattern (e.g. scaladoc failure, sbt env issue), the banner says so and points back to the full log. ### Why are the changes needed? Today, when javadoc hard-exits during unidoc HTML generation -- typically because of a specific scaladoc construct (e.g. wiki-style `[[Class]]` links or backtick-inline code refs) in an exposed Scala source -- the failing PR's CI log shows ~100 `[error]` lines on `target/java/...` files. Those errors are benign: they're genjavadoc-emitted Java stubs (`static public abstract R apply(T1, T2, T3, T4)`) that every PR produces, and `javadoc` always complains about them but normally still finishes. They are not the cause of the failure. The actual signal is the last `Generating .../<Class>.html...` line before `javadoc exited with exit code 1`, which a developer has to find by hand in a multi-thousand-line log. The error reporting does not differentiate the benign noise from the real crash, so the failure consistently looks like it's "in" `ErrorInfo.java` / `LexicalThreadLocal.java` / similar, when it's actually in a Scala source that none of those names point to. A recent example: PR #51419 hit this exact misdirection -- the log was full of errors on `common/utils/target/java/...` stubs, but the real culprit was a doc comment in `CatalogV2Implicits.IdentifierHelper` that triggered a hard exit during HTML generation. The diagnostic in this PR would have named that class directly. ### Does this PR introduce _any_ user-facing change? No. CI-only output change visible in the unidoc step of the doc-gen job. ### How was this patch tested? - Dry-ran the parser logic against the captured failing log from PR #51419 -- it correctly extracts `org/apache/spark/sql/connector/catalog/CatalogV2Implicits.IdentifierHelper.html` as the crash class. - The second commit on this branch (`DO NOT MERGE: break a docstring to validate the unidoc diagnostic`) intentionally reintroduces the same `[[...]]`+backtick-inline scaladoc pattern in `CatalogV2Implicits.IdentifierHelper.asTableIdentifierOpt` so that this PR's CI run actually exercises the new path. Once the banner fires and names that class on the failing CI run, that commit will be dropped from this PR. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude (Anthropic) Closes #55548 from cloud-fan/javadoc-error-reporting. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
…le paths CI failures revealed two regressions caused by removing the upfront `tableExists` probe in DropTableExec: 1. `JDBCTableCatalog.dropTable` does not honor the v2 contract of returning false for missing tables -- e.g. Derby raises a SQLSyntaxErrorException, surfaced as `FAILED_JDBC.DROP_TABLE` instead of a clean no-op. (DerbyTableCatalogSuite cleanup after RENAME hits this.) 2. The hive `DropTableSuite.hive client calls` test counts the upfront `tableExists` call. Restore the original guard structure -- `tableExists` -> drop/purge if present, else fall through to the IF-EXISTS / view-fallback / not-found branches. This subsumes the previous purge-only fix and matches pre-PR behavior for `dropTable` while still adding the new `viewExists` fallback for the `EXPECT_TABLE_NOT_VIEW` error on mixed catalogs. Co-authored-by: Isaac
Two lines exceeded the 100-char limit: - Analyzer.scala:1112 in the lookupTableOrView Scaladoc. - Catalogs.scala:123 in the validateRelationCatalog error message. These cascaded into 7 CI failures (Maven Java 17/25, linters, docs, sparkr, Docker integration, TPC-DS) all rooted in `(catalyst / scalaStyleOnCompile) Failing because of negative scalastyle result`. Co-authored-by: Isaac
`@inheritDoc` is only valid as an inline tag (`{@inheritdoc}`); using it
as a block tag is undefined. RelationCatalog had it written as the block
form on the four `default` overrides (loadTable, loadView, tableExists,
viewExists), which is the only such occurrence in the entire Spark Java
tree -- every other file uses `{@inheritdoc}`.
Suspected trigger of the recent unidoc failures: javadoc 17 dies in
"Building tree" with no per-class diagnostic when this pattern combines
with default-method overrides on an interface that inherits from two
parents. Surface symptom: `javadoc exited with exit code 1` after
"Building tree for all the packages and classes...", before any
per-class HTML generation begins (so SPARK-56630's diagnostic helper
correctly reports "no class HTML generation was in progress").
Co-authored-by: Isaac
…2 catalogs Adds support for CREATE VIEW ... WITH METRICS and corresponding DROP VIEW routing on non-session DataSource V2 catalogs, on top of cloud-fan/spark:v1-v2 (PR apache#51419, "Add MetadataOnlyTable and CREATE/ALTER VIEW support for DS v2 catalogs"). What's new * TableInfo gains a typed `viewDependencies` field (structured dependency list does not round-trip cleanly through flat string properties). * CatalogTableType.METRIC_VIEW and TableSummary.METRIC_VIEW_TABLE_TYPE. * CreateMetricViewCommand's V2 path gates on TableCatalogCapability.SUPPORTS_VIEW, emits PROP_VIEW_TEXT (YAML) + PROP_TABLE_TYPE=METRIC_VIEW + PROP_VIEW_CURRENT_CATALOG_AND_NAMESPACE + view.sqlConfig.* + metric_view.* via the Builder introduced in apache#51419. * DROP VIEW on V2 metric views routes to a new DropMetricViewExec; DROP TABLE on metric views is rejected with WRONG_COMMAND_FOR_OBJECT_TYPE. V1 DropTableCommand and V2SessionCatalog accept DROP VIEW on METRIC_VIEW. * V1Table round-trips METRIC_VIEW table type via PROP_TABLE_TYPE, including expanding PROP_VIEW_CURRENT_CATALOG_AND_NAMESPACE into the numbered view.catalogAndNamespace.* keys the V1 resolution path expects. Stacks on apache#51419; keep this PR in draft until it merges.
…th explicit prose
The {@inheritdoc} -> block-form fix unblocked the obvious invalid-tag case,
but doc gen still fails in the same "Building tree" phase. Hypothesis:
{@inheritdoc} on a default-method override of a default method (the parent
override is itself default in TableCatalog/ViewCatalog) on a multi-parent
interface still trips javadoc 17's tree builder, even with the inline form.
Replace each of the four overrides' {@inheritdoc} with explicit @param /
@return / @throws so the doclet has no inheritance link to traverse.
The descriptive text below {@inheritdoc} is preserved verbatim. Behavioral
equivalent to inheriting + augmenting; just no tag chain.
If this push turns the docs job green, the {@inheritdoc} chain on multi-
parent default-method overrides is the trigger and we should keep this
shape; if it stays red, the trigger is elsewhere and we'll bisect further.
Co-authored-by: Isaac
…build crash
The unidoc failure on this PR has the same shape across every commit:
Building tree for all the packages and classes...
javadoc exited with exit code 1
No stack trace, no NPE keyword, no class name. Both the block-form and inline
{@inheritdoc} hypotheses turned out wrong (the block-fix and the explicit-prose
replacement both kept the failure). Master + this PR have identical sets of
source errors before "Building tree", so the difference must be a runtime
failure inside the doclet itself when traversing the new public types.
Add `-J-Xlog:exceptions=info` (java 11+) to the unidoc javadoc options so any
uncaught Throwable from the standard doclet's tree builder is dumped to stderr.
On the next CI run, this will show whether the crash is an NPE, an
IllegalStateException, or something else, and where it came from. Will revert
once we have a class/line to fix.
Co-authored-by: Isaac
What changes were proposed in this pull request?
This PR exposes a DS v2 API for metadata-only tables (read side), CREATE VIEW, and ALTER VIEW ... AS (write side) so that third-party v2 catalogs can participate in Spark's resolution and creation flows without reimplementing read/write themselves.
The headline API choice is a new
RelationCataloginterface that owns the cross-cutting rules for catalogs that expose both tables and views.TableCatalogandViewCatalogremain strict single-kind APIs; mixing the two requiresRelationCatalog. Details in section 4.1. Read path —
MetadataOnlyTable:Tableimplementation that carries aTableInfoand delegates everything to it. Catalogs return it fromTableCatalog.loadTablefor data-source tables (Spark interprets the provider via V1 paths), and fromRelationCatalog.loadRelationwrapping aViewInfofor views (Spark expands the view text). Downstream consumers distinguish viagetTableInfo() instanceof ViewInfo.Analyzer.lookupTableOrViewandRelationResolution.tryResolvePersistentdetectMetadataOnlyTableand route through a newV1Table.toCatalogTableadapter to the existing V1 data-source / view machinery.2. Shared DTO —
TableInfo:TableInfo.Buildergains convenience setters that write reserved keys intoproperties:withProvider,withLocation,withComment,withCollation,withOwner,withTableType, pluswithSchema(StructType). The read side (MetadataOnlyTable) and the write side (createTable(ident, TableInfo)) use the same struct. View-specific fields live on a typed subclass (see section 3) so they are not encoded as string properties.withPropertiestakes a defensive copy so convenience setters don't mutate the caller's map.3. Typed view DTO —
ViewInfo:ViewInfo extends TableInfocarries the view-specific fields that cannot be represented as string table properties:queryText,currentCatalog,currentNamespace(multi-part, nevernull; empty when no namespace was captured),sqlConfigs(unprefixed SQL config keys),schemaMode(BINDING/COMPENSATION/TYPE EVOLUTION/EVOLUTION), andqueryColumnNames(mapping query output to the view's declared columns; empty inEVOLUTIONmode).ViewInfo.Builder extends TableInfo.BaseBuilder<Builder>adds typed setters:withQueryText,withCurrentCatalog,withCurrentNamespace,withSqlConfigs,withSchemaMode,withQueryColumnNames. The inheritedTableInfo.BaseBuildersetters (schema, properties, owner, comment, collation, etc.) are available on the same builder so view and table writes share one fluent API.ViewInfoconstructor stampsPROP_TABLE_TYPE = TableSummary.VIEW_TABLE_TYPEintoproperties()so catalogs and generic viewers readingPROP_TABLE_TYPEfrom the properties bag (e.g.TableCatalog.listTableSummariesdefault impl,DESCRIBE) classify the entry asVIEWwithout requiring authors to rememberwithTableType(VIEW).ViewInfois the typed payload returned byViewCatalog.loadViewand accepted bycreateView/replaceView. It still extendsTableInfoso aRelationCatalogcan opt into the single-RPC perf path by returning aMetadataOnlyTable(ViewInfo)fromloadRelation(see section 4); pure view-only catalogs never seeTableInfodirectly because the typed builder covers everything they construct.4. API design —
RelationCatalog,ViewCatalog, the orthogonal split:This PR's API surface is shaped by two principles:
TableCatalogmethod behaves as if views did not exist; everyViewCatalogmethod behaves as if tables did not exist. A pureTableCatalogimpl and a pureViewCatalogimpl never need to know about the other kind.These two together require a third interface for the combined case — the orthogonality forbids cross-type clauses on
TableCatalog/ViewCatalogmethods, and the shared namespace forces cross-type rules to exist somewhere. That somewhere isRelationCatalog.ViewCatalog: plugin-facing API for views.listViews(namespace),loadView(ident)returningViewInfo,createView(ident, ViewInfo),replaceView(ident, ViewInfo),dropView(ident), defaultviewExists(ident), defaultinvalidateView(ident), defaultcreateOrReplaceView(ident, ViewInfo). The defaultcreateOrReplaceViewimpl triesreplaceViewand falls back tocreateViewonNoSuchViewException(two RPCs, non-atomic across the pair); catalogs that can do an atomic upsert in a single transactional call should override to collapse to one RPC and make the swap atomic. No staging variant —replaceViewis a single atomic-swap call. Class Javadoc reads as a strict view-only API and points mixed implementers atRelationCatalog.TableCatalog: unchanged surface, but theloadTableJavadoc reverts to strictly tables-only (the perf opt-in moved off this method, see below). Cross-type clauses scattered through method docs are removed; class doc points mixed implementers atRelationCatalog.RelationCatalog extends TableCatalog, ViewCatalog: the home for the combined case. Class Javadoc owns the two principles and two per-method contract tables — active rejection (5 write-side methods that throw on cross-type collision:createTable,renameTable,createView,createOrReplaceView,replaceView) and passive filtering (10 read / non-collision-mutation methods that behave as if the wrong kind doesn't exist). Catalogs that implementTableCatalogandViewCatalogdirectly withoutRelationCatalogare rejected atCatalogs.loadwith a clear pointer at the right interface.RelationCatalog:loadRelation(ident)— the resolver's per-identifier read path. Returns a regularTablefor a table or aMetadataOnlyTablewrapping aViewInfofor a view; saves the cold-cacheloadTable→loadViewtwo-step. Crucially, this lives only onRelationCatalog—TableCatalog.loadTableis strictly tables-only, which meanstableExists's default impl (loadTable(ident) != null) is correct under any catalog with no override required. The earlier shape (perf opt-in vialoadTablereturningMetadataOnlyTable(ViewInfo)) silently broketableExists's default impl and was a foot-gun on three independent framework call sites.listRelationSummaries(namespace)— unified listing of tables and views asTableSummary[]with kind preserved. Default impl unionslistTableSummaries+listViews(two round trips); override to fetch in one.RelationCatalogprovides default impls ofloadTable,loadView,tableExists, andviewExiststhat derive fromloadRelation(aMetadataOnlyTable + ViewInfodiscriminator routes the result to the right kind). ARelationCatalogauthor writes the read-side lookup once inloadRelation; the four kind-specific accessors come for free. They can still be overridden when a kind-specific path is materially cheaper than the unified one, but they're no longer required boilerplate.Analyzer.lookupTableOrViewandRelationResolution.tryResolvePersistentcheckRelationCatalogfirst and callloadRelation. Otherwise they fall through to the existing two-step (loadTableifTableCatalog, thenloadViewifViewCatalog). TheMISSING_CATALOG_ABILITY.VIEWSgate in the resolver andCheckViewReferencesfor CREATE / ALTER VIEW continues to fire for catalogs that aren'tViewCatalog(and therefore aren'tRelationCatalog).5. Write path — DS v2 CREATE VIEW:
DataSourceV2StrategyroutesCreateView(ResolvedIdentifier(catalog, ident), …)toCreateV2ViewExec(catalog: ViewCatalog, …), which uses an act-then-decode pattern to keep the happy path at one RPC: plain CREATE callscreateView; CREATE OR REPLACE callscreateOrReplaceView(one RPC for catalogs that override the default; two for those that don't). CREATE VIEW IF NOT EXISTS short-circuits via an upfrontviewExistsprobe so the view body isn't analyzed when the view already exists (matches v1CreateViewCommand.run). OnViewAlreadyExistsExceptionfrom the catalog (cross-type collision in aRelationCatalog, or a race for plain CREATE / OR REPLACE), the exec decodes with onetableExistsprobe: if a table is at the identifier, plain CREATE surfacesTABLE_OR_VIEW_ALREADY_EXISTSand CREATE OR REPLACE surfacesEXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE, while IF NOT EXISTS is a no-op (v1-parity); if a view is at the identifier, plain CREATE re-raises asviewAlreadyExistsand IF NOT EXISTS is a no-op (lost the race for the same view). BecausetableExistsis now contract-clean (no perf-opt-in leak), the decoder gives the right error in every catalog shape.ViewInfovia aV2ViewPreparationtrait reusing v1ViewHelperhelpers (aliasPlan,sqlConfigsToProps) to populate aViewInfo.Builderwith the current session's captured catalog/namespace and SQL configs. Cyclic-reference detection and auto-generated-alias rejection run once at analysis time inCheckViewReferences(see section 7).CreateViewlogical plan extendsAnalysisOnlyCommand(same shape asV2CreateTableAsSelectPlan) soHandleSpecialCommand.markAsAnalyzedcapturesreferredTempFunctionsfromAnalysisContext. The v1 rewriting path (ResolveSessionCatalog→CreateViewCommand) is unchanged.6. Write path — DS v2 ALTER VIEW ... AS:
AlterViewAslogical plan also extendsAnalysisOnlyCommandsoreferredTempFunctionsis captured for the non-session path.DataSourceV2StrategyroutesAlterViewAs(ResolvedPersistentView(catalog, ident, _), …)toAlterV2ViewExec(catalog: ViewCatalog, …), which callsreplaceView(the single atomic-swap entry point — no separate staging variant, since view REPLACE writes only metadata).V2AlterViewPreparationtrait (extendsV2ViewPreparation) callscatalog.loadView(ident)once and uses the result to preserve user TBLPROPERTIES, comment, collation, owner, and schema-binding mode when constructing the replacementViewInfo. Session-scoped fields (SQL configs, query column names) are re-emitted bybuildViewInfo()from the activeSparkSession, matching v1AlterViewAsCommand.alterPermanentView. A racing DDL between analysis and exec (the view dropped, or replaced with a non-view table in aRelationCatalog) surfacesNoSuchViewException/EXPECT_VIEW_NOT_TABLErather than a stale-resolution error.ResolvedViewIdentifier.unapply(inResolveSessionCatalog) replaces itsassert(isSessionCatalog)with anif isSessionCatalogguard so non-sessionResolvedPersistentViewplans fall through to the v2 strategy instead of tripping the assertion.7. Post-analysis check —
CheckViewReferences:BaseSessionStateBuilder.extendedCheckRules. Rejects permanent views that reference temporary objects and rejects view bodies with auto-generated aliases for bothCreateViewandAlterViewAs(v2 paths). v1CreateViewCommand/AlterViewAsCommandkeep their existing exec-time safety net — Dataset-built commands can be constructed withisAnalyzed=truedirectly and bypass the analyzer's re-capture path.8. Listing —
SHOW TABLES/SHOW VIEWS:TableCatalog.listTablesreturns table identifiers only — views (in aRelationCatalog) are listed separately viaViewCatalog.listViews.listTableSummaries's default impl enumerates vialistTables+loadTableand returns one summary per table. This is an intentional v2 divergence from v1SHOW TABLES, which includes both tables and views;RelationCatalog.listRelationSummariesis the unified entry point if aRelationCatalogwants to expose both in one call. Routing theSHOW TABLESSQL command throughlistRelationSummariesfor v1-parity output is left as a follow-up so this PR's API surface stays narrowly scoped.SHOW VIEWSon a non-sessionViewCatalogis routed through a newShowViewsExecthat enumerates viaViewCatalog.listViews(namespace).ResolveSessionCatalog.ShowViewsskips (via guard) forViewCatalogcatalogs so they fall through to this strategy; non-session, non-ViewCatalogcatalogs still hit the existingMISSING_CATALOG_ABILITY.VIEWSrejection. v2 catalogs have no temp views, so theisTemporarycolumn is always false (mirroring v1, which only sets it true for local/global temp views).Why are the changes needed?
A v2
Tableis not always backed by a connector that implements read/write. Catalogs like HMS and Unity Catalog store only metadata and rely on Spark to interpret the table provider as a data source or to execute the view SQL. Previously the only way to achieve that was a hack aroundV1Table, which leaks private v1 types into v2 connectors (example: https://github.com/unitycatalog/unitycatalog/blob/main/connectors/spark/src/main/scala/io/unitycatalog/spark/UCSingleCatalog.scala).Separately, v2 catalogs had no public way to handle CREATE VIEW or ALTER VIEW.
ResolveSessionCatalogrejected CREATE VIEW on any non-session catalog withMISSING_CATALOG_ABILITY.VIEWS, so third-party catalogs could not own view lifecycle at all.The new
ViewCataloginterface gives catalogs a clean view-shaped API, and the newRelationCataloginterface (extending bothTableCatalogandViewCatalog) is the single home for the cross-cutting rules of the combined case — orthogonality, the shared identifier namespace, the per-method active-rejection / passive-filtering tables, and the single-RPC perf entry points (loadRelation,listRelationSummaries). This separation keepsTableCatalogandViewCatalogJavadocs as strict single-kind APIs (a connector author writing only one of them never sees cross-type clauses), and givesRelationCatalogimplementers a single document with the full contract.Does this PR introduce any user-facing change?
Yes to connector developers:
TableCatalogimplementations can now return aMetadataOnlyTablefromloadTableto delegate reads to Spark.ViewCatalog(listViews/loadView/createView/replaceView/dropView, plus the default helpers). Connectors that expose both tables and views implementRelationCatalog(which extends bothTableCatalogandViewCatalog); implementing the two interfaces directly is rejected at catalog initialization. View text, schema, captured current catalog+namespace, SQL configs, and temp-object-reference rejection are handled the same way as for session-catalog views.No SQL-level or user-visible behavior change for existing deployments.
Remaining work (follow-up PRs)
This PR covers the core read path, CREATE VIEW (all shapes),
ALTER VIEW ... AS,DROP VIEW, andSHOW VIEWS. The following view-scoped plans for DS v2 catalogs are not yet supported and are tracked for follow-ups. Until the follow-ups land, each currently surfaces a cleanUNSUPPORTED_FEATURE.TABLE_OPERATIONerror (wired up inDataSourceV2Strategyand pinned by tests inDataSourceV2MetadataOnlyViewSuite), so users get a meaningful message rather than a generic planner failure:ALTER VIEW ... SET/UNSET TBLPROPERTIES— separate logical plans (SetViewProperties,UnsetViewProperties); need their ownDataSourceV2Strategycases backed by newTableChangerouting.ALTER VIEW ... RENAME TO—RenameTableat the logical level; needs v2 view awareness and the catalog-side rename semantics.ALTER VIEW ... WITH SCHEMA BINDING—AlterViewSchemaBindinglogical plan; needs the same treatment asALTER VIEW AS(AnalysisOnlyCommand shape + v2 exec).DESCRIBE/SHOW CREATE TABLE/SHOW TBLPROPERTIES/SHOW COLUMNSon v2 views — currently route throughResolvedViewIdentifierwhich only matches session-catalog views; the v2 equivalents need dedicated handling.SHOW TABLESv1-parity output (include views) on a v2 catalog —TableCatalog.listTablesnow intentionally returns tables only; routing the SQLSHOW TABLESthroughRelationCatalog.listRelationSummariesfor v1-parity is a separate piece of work.How was this patch tested?
Two new test suites:
DataSourceV2MetadataOnlyTableSuite(4 tests, table-side) andDataSourceV2MetadataOnlyViewSuite(56 tests, view-side + mixed-catalog).DataSourceV2MetadataOnlyTableSuite— table sideMetadataOnlyTableround-trip through SELECT / INSERT / INSERT OVERWRITE.DESCRIBE TABLE EXTENDED:Namerow shows the real catalog-qualified identifier, not the wrapper class.catalog.schema.table.colreferences resolve correctly through aMetadataOnlyTable. The mechanism:V1Table.toCatalogTablesetsCatalogTable.multipartIdentifier = [catalog, namespace…, table], and theSessionCatalogchange in this PR makesgetRelationprefer that over the hardcodedspark_catalogqualifier when wrapping in aSubqueryAlias. (1-part and 2-part references already worked via last-part suffix matching; the test pins both cases plus the 3-part case to make the alias-source change visible in the diff.)DataSourceV2MetadataOnlyViewSuite— view side + mixed catalogsCatalog fixtures:
TestingRelationCatalog(RelationCatalog),TestingViewOnlyCatalog(pureViewCatalog),TestingTableOnlyCatalog(pureTableCatalog).currentCatalog+currentNamespace; pureViewCatalogread path (noTableCatalogmixin) confirms the resolver'sloadViewfallback fires whenloadTableis skipped.V1Table.toCatalogTable(ViewInfo)round-trip — multi-part captured namespace flows through structurally (including parts containing dots, which would break a string-encoded path); absent-currentCatalogbranch yields an emptyviewCatalogAndNamespace.CREATE VIEW IF NOT EXISTSno-op vs. failure;CREATE OR REPLACE VIEWreplacement (including cyclic-reference detection); too-few / too-many user-specified columns; rejection of references to temp function / temp view / temp variable;DEFAULT COLLATIONpropagation intoViewInfo; cross-type collision over a non-view table entry surfacesTABLE_OR_VIEW_ALREADY_EXISTS(plain) /EXPECT_VIEW_NOT_TABLE.NO_ALTERNATIVE(OR REPLACE) / no-op (IF NOT EXISTS) — v1-parity;PROP_OWNERstamped on storedTableInfo; rejection on aTableCatalog-only catalog withMISSING_CATALOG_ABILITY.VIEWS.PROP_OWNER, andSCHEMA EVOLUTIONbinding mode (v1-parity); re-captures the current session's SQL configs; cyclic-reference detection; rejection on aTableCatalog-only catalog withMISSING_CATALOG_ABILITY.VIEWS; pureViewCatalogflow;CREATE OR REPLACE VIEWwhose new body references a nonexistent table fails at the right phase.ViewCatalog;DROP VIEW IF EXISTSis a no-op on missing; rejection on a non-view table entry (v1-parity,EXPECT_VIEW_NOT_TABLE); rejection on a non-ViewCatalogcatalog.SHOW TABLESon a v2 catalog returns tables only;SHOW VIEWSreturns views only (isTemporary=falsethroughout for v2);SHOW VIEWS … LIKEfilters by name; rejection on a non-ViewCatalogcatalog.ALTER VIEW SET/UNSET TBLPROPERTIES,ALTER VIEW WITH SCHEMA,ALTER VIEW RENAME TO,SHOW CREATE TABLE,SHOW TBLPROPERTIES,SHOW COLUMNS,DESCRIBE TABLE,REFRESH TABLE,ANALYZE TABLE,ANALYZE TABLE … FOR COLUMNSagainst a v2 view all surfaceUNSUPPORTED_FEATURE.TABLE_OPERATION;DESCRIBE TABLE … COLUMNsurfaces a cleanAnalysisException. Pins the current failure mode so a future regression to a generic planner error is caught in the diff.Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude (Anthropic)