[improve](Streamingjob) add macaddr8/xml/hstore and array element type for PostgreSQL #61433
[improve](Streamingjob) add macaddr8/xml/hstore and array element type for PostgreSQL #61433JNSimba wants to merge 3 commits intoapache:masterfrom
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
There was a problem hiding this comment.
Pull request overview
Adds PostgreSQL macaddr8 and xml type support for CDC streaming jobs by mapping both types to Doris STRING, consistent with how other non-primitive PostgreSQL types are handled.
Changes:
- Map PostgreSQL
macaddr8andxmlto DorisSTRINGin both FE JDBC type conversion and CDC client schema/type mapping. - Extend CDC client unit tests to cover the new type mappings.
- Update the PostgreSQL “all types” streaming-job regression test (DDL, inserts, and expected output) to include the two new column types.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| regression-test/suites/job_p0/streaming_job/cdc/test_streaming_postgres_job_all_type.groovy | Extends the PG source table and inserts to include macaddr8 and xml columns for end-to-end CDC validation. |
| regression-test/data/job_p0/streaming_job/cdc/test_streaming_postgres_job_all_type.out | Updates expected DESC and SELECT outputs to include the new columns/values. |
| fs_brokers/cdc_client/src/test/java/org/apache/doris/cdcclient/utils/SchemaChangeHelperTest.java | Adds unit coverage asserting macaddr8/xml map to STRING. |
| fs_brokers/cdc_client/src/main/java/org/apache/doris/cdcclient/utils/SchemaChangeHelper.java | Adds macaddr8/xml to PostgreSQL type-name mapping as STRING. |
| fe/fe-core/src/main/java/org/apache/doris/datasource/jdbc/client/JdbcPostgreSQLClient.java | Adds macaddr8/xml to JDBC type-to-Doris type mapping as STRING. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
FE UT Coverage ReportIncrement line coverage |
TPC-H: Total hot run time: 26771 ms |
TPC-DS: Total hot run time: 168326 ms |
|
/review |
Code Review Summary — No Issues FoundThis PR adds PostgreSQL Critical Checkpoint Conclusions
|
|
run buildall |
There was a problem hiding this comment.
Pull request overview
This PR extends PostgreSQL CDC/streaming job support to better handle additional PostgreSQL types—most notably array types (via Debezium’s _type naming) and a few extra “string-mapped” types (macaddr8/xml/hstore)—and adds regression + unit tests to validate the behavior.
Changes:
- Add a new Postgres CDC regression test suite for common array types (snapshot + incremental).
- Extend PG→Doris type mapping to recognize Debezium underscore-prefixed array type names and map macaddr8/xml/hstore as STRING.
- Update Debezium JSON deserialization to emit proper JSON arrays (List) for ARRAY fields instead of stringifying them.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| regression-test/suites/job_p0/streaming_job/cdc/test_streaming_postgres_job_array_types.groovy | New regression suite covering PG array columns end-to-end (snapshot + incremental). |
| regression-test/data/job_p0/streaming_job/cdc/test_streaming_postgres_job_array_types.out | Expected outputs for the new array-types regression suite. |
| regression-test/suites/job_p0/streaming_job/cdc/test_streaming_postgres_job_all_type.groovy | Expand “all types” PG regression to include macaddr8/xml/hstore. |
| regression-test/data/job_p0/streaming_job/cdc/test_streaming_postgres_job_all_type.out | Update expected outputs to reflect the added PG types. |
| fs_brokers/cdc_client/src/main/java/org/apache/doris/cdcclient/utils/SchemaChangeHelper.java | Map Debezium _type names to ARRAY<...> and add macaddr8/xml/hstore mapping. |
| fs_brokers/cdc_client/src/test/java/org/apache/doris/cdcclient/utils/SchemaChangeHelperTest.java | Add unit tests for new PG type mappings, including arrays. |
| fs_brokers/cdc_client/src/main/java/org/apache/doris/cdcclient/source/deserialize/DebeziumJsonDeserializer.java | Convert ARRAY values to Lists so JSON payloads contain real arrays (not strings). |
| fe/fe-core/src/main/java/org/apache/doris/datasource/jdbc/client/JdbcPostgreSQLClient.java | Extend supported array inner types and map macaddr8/xml/hstore to STRING. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
regression-test/suites/job_p0/streaming_job/cdc/test_streaming_postgres_job_array_types.groovy
Show resolved
Hide resolved
...rokers/cdc_client/src/test/java/org/apache/doris/cdcclient/utils/SchemaChangeHelperTest.java
Show resolved
Hide resolved
|
/review |
There was a problem hiding this comment.
Code Review Summary
This PR extends PostgreSQL CDC type support by adding macaddr8, xml, hstore scalar types and improving array element type conversion. The changes are well-structured, focused, and have good test coverage. Below are the critical checkpoint conclusions and minor findings.
Critical Checkpoint Conclusions
Goal & Correctness: The PR achieves its stated goals — new scalar types map correctly, array type mapping via _ prefix recursion works, and convertToArray() fixes timestamp arrays being stored as raw epoch microseconds. Tests prove both snapshot and incremental CDC paths.
Modification scope: Small, clear, and focused. Changes span 3 source files + 2 test files + 2 regression tests. Each change is directly related to the stated purpose.
Concurrency: No concurrency concerns. The modified methods are stateless type-mapping utilities and the deserializer's convert() method operates on per-record data.
Lifecycle / static init: No new lifecycle or static initialization concerns.
Configuration items: No new configuration items added.
Incompatible changes: No incompatible changes. New types are additive — previously unsupported types now have explicit mappings.
Parallel code paths: The JdbcPostgreSQLClient (FE JDBC catalog) and SchemaChangeHelper (CDC client) both had their type mappings updated consistently for macaddr8, xml, hstore. One intentional divergence: json/jsonb maps to JSON in CDC but STRING in JDBC catalog — this is pre-existing and outside the scope of this PR.
Test coverage: Good. Unit tests cover all new type mappings including array types, nested arrays, and case-insensitivity. Regression tests cover both snapshot and incremental CDC flows for both all-types and array-types scenarios. One minor issue: hstore is tested redundantly (see inline comment).
Observability: The existing LOG.warn for unrecognized types in SchemaChangeHelper provides sufficient observability.
Transaction/persistence: Not applicable.
FE-BE variable passing: Not applicable — changes are in FE and CDC client only.
Performance: No performance concerns. Type mapping is O(1) switch/case lookup. The recursive _ prefix handling adds at most 1-2 extra levels of recursion for multi-dimensional arrays.
Minor Issues (Non-blocking)
- Duplicate
hstoretest assertion — see inline comment - Theoretical edge case with
_prefix — user-defined PG types starting with_would be incorrectly treated as arrays inSchemaChangeHelper.pgTypeNameToDorisType. This is extremely unlikely in practice since_is PostgreSQL's reserved array-type prefix convention, and the fallback isARRAY<STRING>which is not catastrophic. A comment noting this assumption would be helpful.
Overall: The PR looks good. The issues found are minor and non-blocking.
...rokers/cdc_client/src/test/java/org/apache/doris/cdcclient/utils/SchemaChangeHelperTest.java
Show resolved
Hide resolved
fs_brokers/cdc_client/src/main/java/org/apache/doris/cdcclient/utils/SchemaChangeHelper.java
Show resolved
Hide resolved
FE UT Coverage ReportIncrement line coverage |
TPC-H: Total hot run time: 26873 ms |
TPC-DS: Total hot run time: 168972 ms |
|
run buildall |
TPC-H: Total hot run time: 26901 ms |
TPC-DS: Total hot run time: 167945 ms |
What problem does this PR solve?
Extend PostgreSQL CDC type support:
macaddr8,xml,hstore→ mapped to DorisSTRINGin bothSchemaChangeHelper(CDC DDL) andJdbcPostgreSQLClient(FE table creation)_*prefix handling inSchemaChangeHelper→ARRAY<T>(e.g._int4→ARRAY<INT>); expandedsupportedInnerTypeinJdbcPostgreSQLClientwithnumeric/json/jsonb/uuidconvertToArray()inDebeziumJsonDeserializerto recursively convert eachelement through the same type-aware logic as scalar columns, fixing timestamp array elements being stored as raw epoch
microseconds
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)