diff --git a/modules/ROOT/nav.adoc b/modules/ROOT/nav.adoc index 4d1521d6f..416cb430a 100644 --- a/modules/ROOT/nav.adoc +++ b/modules/ROOT/nav.adoc @@ -356,6 +356,7 @@ *** xref:sql:query-data/redpanda-catalogs.adoc[Redpanda Catalogs] *** xref:sql:query-data/query-streaming-topics.adoc[Query Streaming Topics] *** xref:sql:query-data/query-iceberg-topics.adoc[Query Iceberg-enabled Topics] +*** xref:sql:query-data/query-nested-fields.adoc[Query Topics with Nested Fields] ** xref:sql:manage/index.adoc[Manage Redpanda SQL] *** xref:sql:manage/manage-access.adoc[Manage access] ** xref:sql:troubleshoot/index.adoc[Troubleshoot] diff --git a/modules/reference/pages/sql/sql-data-types/row.adoc b/modules/reference/pages/sql/sql-data-types/row.adoc index bc55a2dac..7fa37cd76 100644 --- a/modules/reference/pages/sql/sql-data-types/row.adoc +++ b/modules/reference/pages/sql/sql-data-types/row.adoc @@ -2,7 +2,7 @@ :description: The ROW data type represents a composite value containing one or more fields of different types. :page-topic-type: reference -The `ROW` data type represents a composite value (also known as a struct or record) containing one or more fields of different types. +The `ROW` data type represents a composite value (also known as a struct or record) containing one or more fields of different types. `ROW` values support field access, lexicographic comparison, NULL checks, conversion to text, and use in `GROUP BY`, `ORDER BY`, and `JOIN` clauses. == Syntax @@ -75,3 +75,167 @@ SELECT ROW(); () (1 row) ---- + +== Access fields + +=== Access by position + +For anonymous ROW expressions, fields are accessed by the positional names `f1`, `f2`, and so on, in declaration order: + +[source,sql] +---- +SELECT (ROW(1, 'hello', 3.14)).f1, (ROW(1, 'hello', 3.14)).f2; +---- + +[source,sql] +---- + f1 | f2 +----+------- + 1 | hello +(1 row) +---- + +The parentheses around the ROW expression are required when accessing a field. + +=== Access by name + +For composite columns with declared field names, for example, columns mapped from a topic with `struct_mapping_policy = 'COMPOUND'` (see xref:reference:sql/sql-statements/create-table.adoc[CREATE TABLE]), access fields by their declared names: + +[source,sql] +---- +SELECT (record).customer_id, (record).order_total FROM orders; +---- + +=== Expand all fields with a wildcard + +To project every field of a ROW value as a separate result column, use the `.*` form: + +[source,sql] +---- +SELECT (ROW(1, 'hello', 3.14)).*; +---- + +[source,sql] +---- + f1 | f2 | f3 +----+-------+------ + 1 | hello | 3.14 +(1 row) +---- + +The wildcard form also works inside a `ROW(...)` constructor to copy fields from one composite into another. + +== Compare ROW values + +ROW values support the standard comparison operators `=`, `<>`, `<`, `<=`, `>`, and `>=`. Comparison is *lexicographic*: fields are compared in order, left to right, and the first differing field determines the result. + +[source,sql] +---- +SELECT ROW(1, 'a') < ROW(1, 'b'); +---- + +[source,sql] +---- + ?column? +---------- + t +(1 row) +---- + +Both ROW values must have the same number of fields, and corresponding fields must have comparable types. + +== Check for NULL + +ROW values support `IS NULL` and `IS NOT NULL`, but with semantics that differ from scalar columns: + +* `expression IS NULL` returns `true` when the expression itself is NULL, *or* when all of the row's fields are NULL. +* `expression IS NOT NULL` returns `true` when the expression itself is non-NULL *and* all of the row's fields are non-NULL. + +Because of this, `IS NULL` and `IS NOT NULL` are not always inverses for ROW values. Both can return `false` for the same input, such as a ROW with a mix of NULL and non-NULL fields. + +[source,sql] +---- +SELECT ROW(1, 'a') IS NULL AS is_null, + ROW(1, 'a') IS NOT NULL AS is_not_null; +---- + +[source,sql] +---- + is_null | is_not_null +---------+------------- + f | t +(1 row) +---- + +For a ROW with at least one NULL field and at least one non-NULL field, both checks return `false`: + +[source,sql] +---- +SELECT ROW(NULL, 'a') IS NULL AS is_null, + ROW(NULL, 'a') IS NOT NULL AS is_not_null; +---- + +[source,sql] +---- + is_null | is_not_null +---------+------------- + f | f +(1 row) +---- + +NOTE: These checks do not recurse into nested ROW values. A nested ROW with all-NULL fields counts as a value (not NULL) at the outer level, so the outer `IS NULL` returns `false`. To check a specific nested field directly, access the field and test that. + +== Convert to text + +Cast a ROW value to `text` to produce the standard PostgreSQL composite literal form: + +[source,sql] +---- +SELECT ROW(1, 'hello', 3.14)::text; +---- + +[source,sql] +---- + row +----------------- + (1,"hello",3.14) +(1 row) +---- + +== Use ROW in queries + +ROW values can be used in `GROUP BY`, `ORDER BY`, and `JOIN` clauses with lexicographic comparison semantics. + +=== Group by a ROW field + +[source,sql] +---- +SELECT (customer).region, COUNT(*) +FROM orders +GROUP BY (customer).region; +---- + +=== Order by a whole ROW + +[source,sql] +---- +SELECT * FROM orders ORDER BY customer; +---- + +The rows are sorted lexicographically by the fields of the `customer` composite column, in their declared order. + +=== Join on a multi-column key + +Compare implicit tuples to match multi-column keys without spelling out each field in a `WHERE` clause: + +[source,sql] +---- +SELECT * +FROM table_a a +JOIN table_b b +ON (a.col1, a.col2) = (b.col1, b.col2); +---- + +== Suggested reading + +* xref:reference:sql/sql-statements/create-table.adoc[CREATE TABLE]: Maps a Redpanda topic to a SQL table. Use `struct_mapping_policy = 'COMPOUND'` to surface nested topic fields as user-defined types accessible with ROW field-access syntax. diff --git a/modules/reference/pages/sql/sql-statements/create-table.adoc b/modules/reference/pages/sql/sql-statements/create-table.adoc index f92fc4d32..3ff8c7ecb 100644 --- a/modules/reference/pages/sql/sql-statements/create-table.adoc +++ b/modules/reference/pages/sql/sql-statements/create-table.adoc @@ -1,8 +1,8 @@ = CREATE TABLE -:description: The CREATE TABLE statement maps a Redpanda topic to a SQL table through a catalog, making topic data queryable with SQL. +:description: The CREATE TABLE statement maps a Redpanda topic to a SQL table through a catalog, making the topic queryable with SQL. :page-topic-type: reference -The `CREATE TABLE` statement maps a Redpanda topic to a SQL table through a catalog. After creating the table, you can query topic data using standard SQL. +The `CREATE TABLE` statement maps a Redpanda topic to a SQL table through a catalog. After creating the table, you can query the topic using standard SQL. NOTE: You must first xref:reference:sql/sql-statements/create-redpanda-catalog.adoc[create a Redpanda catalog connection] before creating tables. `CREATE TABLE` in Redpanda SQL maps Redpanda topics to SQL tables and does not create standalone tables with user-defined schemas. @@ -53,7 +53,7 @@ a|How to handle records that fail deserialization. |No a|How to map nested structures from the topic schema to SQL columns. -* `COMPOUND` (default): Maps each nested structure to a SQL xref:reference:sql/sql-data-types/row.adoc[ROW] value with named fields, queryable using `(column).field_name` syntax. Cyclic types are not supported in `COMPOUND` mode. Use `JSON` for recursive schemas. +* `COMPOUND` (default): Maps each nested structure to a user-defined type with named fields, queryable using `(column).field_name` syntax. Cyclic types are not supported in `COMPOUND` mode. Use `JSON` for recursive schemas. See xref:reference:sql/sql-data-types/row.adoc[ROW] for the field-access syntax. * `JSON`: Stores each nested structure as a JSON value. Required for recursive (cyclic) types. |`output_schema_message_full_name` diff --git a/modules/sql/pages/query-data/query-nested-fields.adoc b/modules/sql/pages/query-data/query-nested-fields.adoc new file mode 100644 index 000000000..b86bbdf66 --- /dev/null +++ b/modules/sql/pages/query-data/query-nested-fields.adoc @@ -0,0 +1,136 @@ += Query Topics with Nested Fields +:description: Map a topic's nested fields to typed SQL columns and query them by name. +:page-topic-type: how-to +:personas: app_developer, data_engineer +:learning-objective-1: Map a topic with a nested schema as a SQL table using struct_mapping_policy = 'COMPOUND' +:learning-objective-2: Query nested fields using ROW field-access syntax +:learning-objective-3: Resolve cyclic-reference errors + +When a glossterm:topic[]'s schema includes nested Protobuf, Avro, or JSON message types, you can map those nested structures as user-defined types (UDTs) with named fields. UDT columns are queryable using SQL `ROW` field-access syntax instead of opaque JSON, so nested fields are queryable by name, includable in projections, and usable in `WHERE`, `GROUP BY`, and `ORDER BY` clauses without parsing JSON at query time. + +Use this page to: + +* [ ] {learning-objective-1} +* [ ] {learning-objective-2} +* [ ] {learning-objective-3} + +== Prerequisites + +Before you query a topic with nested fields: + +* xref:sql:get-started/deploy-sql-cluster.adoc[Enable Redpanda SQL] on your Redpanda Bring Your Own Cloud (BYOC) cluster. +* xref:sql:connect-to-sql/index.adoc[Connect to Redpanda SQL] with `psql` or another PostgreSQL client. +* Register a schema for the topic in glossterm:schema-registry[Schema Registry], including one or more nested message types. +* The topic's data is reachable through a Redpanda catalog. The `default_redpanda_catalog` is created and linked for you when Redpanda SQL is enabled. + +== Map the topic as a SQL table + +Create the SQL table with `struct_mapping_policy = 'COMPOUND'` to surface each nested message as a user-defined type column: + +[source,sql] +---- +CREATE TABLE default_redpanda_catalog=>orders WITH ( + topic = 'orders', <1> + schema_subject = 'orders-value', <2> + struct_mapping_policy = 'COMPOUND' <3> +); +---- +<1> Required. The Redpanda topic to map. +<2> Optional. The Schema Registry subject. Defaults to `-value` when omitted. +<3> Optional. Defaults to `'COMPOUND'`, which surfaces nested structures as user-defined types. + +Replace `orders` with your topic name. Your topic must have a schema registered in Schema Registry. For details on the `schema_subject` option, see xref:reference:sql/sql-statements/create-table.adoc[CREATE TABLE]. + +For a topic schema with this Protobuf definition: + +[source,proto] +---- +message Order { + string order_id = 1; + Customer customer = 2; + double amount = 3; +} + +message Customer { + string customer_id = 1; + string name = 2; + string region = 3; +} +---- + +Redpanda SQL maps the table with three columns: `order_id` (text), `customer` (a user-defined type with fields `customer_id`, `name`, and `region`), and `amount` (double precision). + +TIP: `COMPOUND` is the default `struct_mapping_policy`. To map nested structures as opaque JSON instead, use `struct_mapping_policy = 'JSON'`. Cyclic types require `struct_mapping_policy = 'JSON'`. See <>. + +== Query nested fields + +Access a nested field by its declared name using the `(column).field` form. Wrap the column in parentheses: + +[source,sql] +---- +SELECT order_id, (customer).name, (customer).region, amount +FROM default_redpanda_catalog=>orders +WHERE (customer).region = 'EMEA'; +---- + +To project every field of a nested structure as separate result columns, use the wildcard `.*` form: + +[source,sql] +---- +SELECT order_id, (customer).* +FROM default_redpanda_catalog=>orders +LIMIT 10; +---- + +For schemas with multiple levels of nesting, chain the parenthesized field access. For example, if `Customer` itself contained a nested `address` message with a `zip_code` field, you would query the zip code as: + +[source,sql] +---- +SELECT ((customer).address).zip_code FROM default_redpanda_catalog=>orders; +---- + +For the full `ROW` reference, including comparison operators, NULL handling, and `::text` casting, see xref:reference:sql/sql-data-types/row.adoc[ROW]. + +[[handle-recursive-cyclic-schemas]] +== Handle recursive (cyclic) schemas + +The `COMPOUND` policy does not support recursive (cyclic) schemas, such as a `Comment` message that references itself or two messages that reference each other. Trying to map such a schema with `COMPOUND` fails at table-creation time with the following error: + +[source,text] +---- +Cyclic reference at '.' → ''. Cyclic types are not supported in COMPOUND struct mapping policy; use struct_mapping_policy=JSON for recursive types. +---- + +Re-create the table with `struct_mapping_policy = 'JSON'`. In JSON mode, Redpanda SQL stores each nested structure as a JSON value: + +[source,sql] +---- +CREATE TABLE default_redpanda_catalog=>comments WITH ( + topic = 'comments', + struct_mapping_policy = 'JSON' +); +---- + +Query JSON-mapped fields with standard JSON functions instead of ROW field access. See xref:reference:sql/sql-data-types/json.adoc[JSON]. + +== Choose between COMPOUND and JSON + +[cols="<20%,<40%,<40%",options="header"] +|=== +| Policy | Use when | Trade-offs + +| `COMPOUND` (default) +| The topic schema has nested structures that are not recursive, and you want to query nested fields directly by name. +| Typed access; usable in `WHERE`, `GROUP BY`, `ORDER BY`. Required if you xref:sql:query-data/query-iceberg-topics.adoc[query an Iceberg-enabled topic via a linked Redpanda catalog], so that nested fields stay typed across both live records and the topic's Iceberg history. + +| `JSON` +| The topic schema is recursive, or you prefer flexible access through JSON functions. +| Recursive types supported; fields are untyped until extracted with JSON functions. Queries that span the Redpanda topic and its linked Iceberg table do not align cleanly, because Iceberg always exposes nested structures as typed columns. +|=== + +== Suggested reading + +* xref:sql:query-data/query-streaming-topics.adoc[Query streaming topics]: Query a topic without Iceberg history. +* xref:sql:query-data/query-iceberg-topics.adoc[Query Iceberg-enabled topics]: Query a topic with both its live streaming data and Iceberg history. Use `struct_mapping_policy = 'COMPOUND'` so nested fields align between the Redpanda topic and the linked Iceberg table. +* xref:reference:sql/sql-data-types/row.adoc[ROW]: Full reference for the `ROW` data type, including comparisons, NULL semantics, and conversion to text. +* xref:reference:sql/sql-statements/create-table.adoc[CREATE TABLE]: Complete option list for mapping a Redpanda topic to a SQL table.