-
Notifications
You must be signed in to change notification settings - Fork 4
SQL: struct support #586
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
kbatuigas
wants to merge
9
commits into
rp-sql
Choose a base branch
from
DOC-2019-document-feature-record-structure-type-support
base: rp-sql
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+308
−9
Open
SQL: struct support #586
Changes from all commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
141e80f
Draft struct support reference
kbatuigas 7018420
How to query structs/nested fields
kbatuigas 726a48d
Review pass
kbatuigas b97b464
Apply suggestions from SME review
kbatuigas 2099069
Apply suggestions from SME review
kbatuigas 5463590
Apply suggestions from code review
kbatuigas f945fff
Review pass
kbatuigas b0041ed
Apply suggestions from code review
kbatuigas 2730222
Apply suggestions
kbatuigas File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,136 @@ | ||
| = Query Topics with Nested Fields | ||
| :description: Map a topic's nested fields to typed SQL columns and query them by name. | ||
| :page-topic-type: how-to | ||
| :personas: app_developer, data_engineer | ||
| :learning-objective-1: Map a topic with a nested schema as a SQL table using struct_mapping_policy = 'COMPOUND' | ||
| :learning-objective-2: Query nested fields using ROW field-access syntax | ||
| :learning-objective-3: Resolve cyclic-reference errors | ||
|
|
||
| When a glossterm:topic[]'s schema includes nested Protobuf, Avro, or JSON message types, you can map those nested structures as user-defined types (UDTs) with named fields. UDT columns are queryable using SQL `ROW` field-access syntax instead of opaque JSON, so nested fields are queryable by name, includable in projections, and usable in `WHERE`, `GROUP BY`, and `ORDER BY` clauses without parsing JSON at query time. | ||
|
|
||
| Use this page to: | ||
|
|
||
| * [ ] {learning-objective-1} | ||
| * [ ] {learning-objective-2} | ||
| * [ ] {learning-objective-3} | ||
|
|
||
| == Prerequisites | ||
|
|
||
| Before you query a topic with nested fields: | ||
|
|
||
| * xref:sql:get-started/deploy-sql-cluster.adoc[Enable Redpanda SQL] on your Redpanda Bring Your Own Cloud (BYOC) cluster. | ||
| * xref:sql:connect-to-sql/index.adoc[Connect to Redpanda SQL] with `psql` or another PostgreSQL client. | ||
| * Register a schema for the topic in glossterm:schema-registry[Schema Registry], including one or more nested message types. | ||
| * The topic's data is reachable through a Redpanda catalog. The `default_redpanda_catalog` is created and linked for you when Redpanda SQL is enabled. | ||
|
|
||
| == Map the topic as a SQL table | ||
|
|
||
| Create the SQL table with `struct_mapping_policy = 'COMPOUND'` to surface each nested message as a user-defined type column: | ||
|
|
||
| [source,sql] | ||
| ---- | ||
| CREATE TABLE default_redpanda_catalog=>orders WITH ( | ||
| topic = 'orders', <1> | ||
| schema_subject = 'orders-value', <2> | ||
| struct_mapping_policy = 'COMPOUND' <3> | ||
| ); | ||
| ---- | ||
| <1> Required. The Redpanda topic to map. | ||
| <2> Optional. The Schema Registry subject. Defaults to `<topic>-value` when omitted. | ||
| <3> Optional. Defaults to `'COMPOUND'`, which surfaces nested structures as user-defined types. | ||
|
|
||
| Replace `orders` with your topic name. Your topic must have a schema registered in Schema Registry. For details on the `schema_subject` option, see xref:reference:sql/sql-statements/create-table.adoc[CREATE TABLE]. | ||
|
|
||
| For a topic schema with this Protobuf definition: | ||
|
|
||
| [source,proto] | ||
| ---- | ||
| message Order { | ||
| string order_id = 1; | ||
| Customer customer = 2; | ||
| double amount = 3; | ||
| } | ||
|
|
||
| message Customer { | ||
| string customer_id = 1; | ||
| string name = 2; | ||
| string region = 3; | ||
| } | ||
| ---- | ||
|
|
||
| Redpanda SQL maps the table with three columns: `order_id` (text), `customer` (a user-defined type with fields `customer_id`, `name`, and `region`), and `amount` (double precision). | ||
|
|
||
| TIP: `COMPOUND` is the default `struct_mapping_policy`. To map nested structures as opaque JSON instead, use `struct_mapping_policy = 'JSON'`. Cyclic types require `struct_mapping_policy = 'JSON'`. See <<handle-recursive-cyclic-schemas, Handle recursive (cyclic) schemas>>. | ||
|
|
||
| == Query nested fields | ||
|
|
||
| Access a nested field by its declared name using the `(column).field` form. Wrap the column in parentheses: | ||
|
|
||
| [source,sql] | ||
| ---- | ||
| SELECT order_id, (customer).name, (customer).region, amount | ||
| FROM default_redpanda_catalog=>orders | ||
| WHERE (customer).region = 'EMEA'; | ||
| ---- | ||
|
|
||
| To project every field of a nested structure as separate result columns, use the wildcard `.*` form: | ||
|
|
||
| [source,sql] | ||
| ---- | ||
| SELECT order_id, (customer).* | ||
| FROM default_redpanda_catalog=>orders | ||
| LIMIT 10; | ||
| ---- | ||
|
|
||
| For schemas with multiple levels of nesting, chain the parenthesized field access. For example, if `Customer` itself contained a nested `address` message with a `zip_code` field, you would query the zip code as: | ||
|
|
||
| [source,sql] | ||
| ---- | ||
| SELECT ((customer).address).zip_code FROM default_redpanda_catalog=>orders; | ||
| ---- | ||
|
|
||
| For the full `ROW` reference, including comparison operators, NULL handling, and `::text` casting, see xref:reference:sql/sql-data-types/row.adoc[ROW]. | ||
|
|
||
| [[handle-recursive-cyclic-schemas]] | ||
| == Handle recursive (cyclic) schemas | ||
|
|
||
| The `COMPOUND` policy does not support recursive (cyclic) schemas, such as a `Comment` message that references itself or two messages that reference each other. Trying to map such a schema with `COMPOUND` fails at table-creation time with the following error: | ||
|
|
||
| [source,text] | ||
| ---- | ||
| Cyclic reference at '<parent>.<field>' → '<type>'. Cyclic types are not supported in COMPOUND struct mapping policy; use struct_mapping_policy=JSON for recursive types. | ||
| ---- | ||
|
|
||
| Re-create the table with `struct_mapping_policy = 'JSON'`. In JSON mode, Redpanda SQL stores each nested structure as a JSON value: | ||
|
|
||
| [source,sql] | ||
| ---- | ||
| CREATE TABLE default_redpanda_catalog=>comments WITH ( | ||
| topic = 'comments', | ||
| struct_mapping_policy = 'JSON' | ||
| ); | ||
| ---- | ||
|
|
||
| Query JSON-mapped fields with standard JSON functions instead of ROW field access. See xref:reference:sql/sql-data-types/json.adoc[JSON]. | ||
|
|
||
| == Choose between COMPOUND and JSON | ||
|
|
||
| [cols="<20%,<40%,<40%",options="header"] | ||
| |=== | ||
| | Policy | Use when | Trade-offs | ||
|
|
||
| | `COMPOUND` (default) | ||
| | The topic schema has nested structures that are not recursive, and you want to query nested fields directly by name. | ||
| | Typed access; usable in `WHERE`, `GROUP BY`, `ORDER BY`. Required if you xref:sql:query-data/query-iceberg-topics.adoc[query an Iceberg-enabled topic via a linked Redpanda catalog], so that nested fields stay typed across both live records and the topic's Iceberg history. | ||
|
|
||
| | `JSON` | ||
| | The topic schema is recursive, or you prefer flexible access through JSON functions. | ||
| | Recursive types supported; fields are untyped until extracted with JSON functions. Queries that span the Redpanda topic and its linked Iceberg table do not align cleanly, because Iceberg always exposes nested structures as typed columns. | ||
| |=== | ||
|
|
||
| == Suggested reading | ||
|
|
||
| * xref:sql:query-data/query-streaming-topics.adoc[Query streaming topics]: Query a topic without Iceberg history. | ||
| * xref:sql:query-data/query-iceberg-topics.adoc[Query Iceberg-enabled topics]: Query a topic with both its live streaming data and Iceberg history. Use `struct_mapping_policy = 'COMPOUND'` so nested fields align between the Redpanda topic and the linked Iceberg table. | ||
| * xref:reference:sql/sql-data-types/row.adoc[ROW]: Full reference for the `ROW` data type, including comparisons, NULL semantics, and conversion to text. | ||
| * xref:reference:sql/sql-statements/create-table.adoc[CREATE TABLE]: Complete option list for mapping a Redpanda topic to a SQL table. | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@grzebiel this warning would imply something very important to alert the user of (we dont really support querying iceberg topics with recursive types). However, I don't think this is the correct message here (at least not always), because Iceberg topics has a special handling encoding recursive Protobuf Struct fields as a JSON string in the Iceberg table. SO for protobuf, we do have a story for recursive fields (at least in the protobuf case).
So, how should this be adjusted.