Fix: ClickHouse MATERIALIZED VIEW TO lineage extraction#27628
Fix: ClickHouse MATERIALIZED VIEW TO lineage extraction#27628Jtss-ux wants to merge 3 commits intoopen-metadata:mainfrom
Conversation
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
1 similar comment
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
Code Review ✅ Approved 3 resolved / 3 findingsRefactors ClickHouse MATERIALIZED VIEW lineage extraction by moving imports to the module level, refining regex patterns to exclude engine clauses, and adding unit tests. All findings have been resolved. ✅ 3 resolved✅ Quality: Move
|
| Compact |
|
Was this helpful? React with 👍 / 👎 | Gitar
6e4e90b to
0f3b173
Compare
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
|
@harshach @nikhilchennam - I've addressed the edge case flagged by Gitar (tightened the regex to stop at ENGINE/POPULATE/SETTINGS) and added 6 unit tests covering the main variants (simple TO, ENGINE clause, IF NOT EXISTS, ON CLUSTER, no-TO passthrough, and end-to-end lineage validation). The branch is also rebased on latest main. Happy to make any further adjustments! |
Summary
Fixes #26265
ClickHouse supports CREATE MATERIALIZED VIEW mv_name TO target_table AS SELECT ... syntax, where the TO clause designates the actual destination table for the data. The existing LineageParser passes this raw query directly to collate-sqllineage, which does not understand the TO clause and therefore misidentifies the Materialized View itself as the target — resulting in no lineage edge being drawn to the actual destination table.
Root Cause
collate-sqllineage (the underlying parser used by LineageParser) treats CREATE MATERIALIZED VIEW as a DDL statement without any special handling for the ClickHouse-specific TO clause. As a result:
Fix
Added a pre-processing step inside LineageParser.clean_raw_query that detects the ClickHouse MATERIALIZED VIEW ... TO ... AS SELECT pattern and rewrites it as a standard CREATE TABLE ... AS SELECT ... statement before it reaches LineageRunner.
This normalization approach:
e, no new dependencies
Changes
Testing
Verified locally using \collate-sqllineage's \LineageRunner:
Before fix:
\
query = 'CREATE MATERIALIZED VIEW default.my_mv TO default.my_target AS SELECT * FROM default.my_source'
source_tables: {default.my_source}
target_tables: {default.my_mv} <-- WRONG
\\
After fix (query rewritten to):
\
CREATE TABLE default.my_target AS SELECT * FROM default.my_source
source_tables: {default.my_source}
target_tables: {default.my_target} <-- CORRECT
\\
Related Issue
Closes #26265