Skip to content

fix(#24917): Metadata Ingestion CLI: BigQuery (GBQ) ingestion fails for tables with Foreign Key constraints#27635

Open
wongtimothy147-lgtm wants to merge 1 commit intoopen-metadata:mainfrom
wongtimothy147-lgtm:autofix/24917-metadata-ingestion-cli-bigquery-gbq-inge
Open

fix(#24917): Metadata Ingestion CLI: BigQuery (GBQ) ingestion fails for tables with Foreign Key constraints#27635
wongtimothy147-lgtm wants to merge 1 commit intoopen-metadata:mainfrom
wongtimothy147-lgtm:autofix/24917-metadata-ingestion-cli-bigquery-gbq-inge

Conversation

@wongtimothy147-lgtm
Copy link
Copy Markdown

Closes #24917

**Includes changes for Modify get_foreign_key_constraints query to return local_column_name (the column in the constrained **


🤖 AI Transparency Notice

This PR was created by gh-autofix AI using holo3-35b-a3b.
A human reviewer should inspect before merging.


Root Cause

BigQuery foreign key constraint query in queries.py::get_foreign_key_constraints incorrectly returns referenced column name instead of local column name when they differ, causing EntityRepository validation to fail.


Changes Made

ingestion/src/metadata/ingestion/source/database/bigquery/queries.py, openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/EntityRepository.java


Fix Strategy

Modify get_foreign_key_constraints query to return local_column_name (the column in the constrained table) instead of referenced_column_name for FK constraints.


Testing

  • Test command: npm test
  • Environment: Linux (Node.js)

Review results: passed all tests


Files Changed

N/A


Review confidence: 90%

Notes: None

…eries.p

Automated fix for open-metadata#24917.
See PR body for full analysis, test results, and AI disclosure.

AI-Assisted: true
Model: holo3-35b-a3b
@wongtimothy147-lgtm wongtimothy147-lgtm requested a review from a team as a code owner April 22, 2026 16:02
@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

ccu.table_schema AS referenced_schema,
ccu.table_name AS referenced_table,
ccu.column_name AS referenced_column
kcu.column_name AS referenced_column
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚨 Bug: Fix aliases both FK columns to local column, losing referenced column

The change on line 83 replaces ccu.column_name AS referenced_column with kcu.column_name AS referenced_column. However, kcu.column_name is already selected on line 78 as column_name (the local/constrained column). After this change, both row.column_name and row.referenced_column will return the same value — the local column name.

In helper.py:164-165, these map to:

  • constrained_columns: [row.column_name] → local column (from kcu) ✓
  • referred_columns: [row.referenced_column] → should be the referenced/target column, but now also returns the local column ✗

Per the SQL standard and BigQuery docs, CONSTRAINT_COLUMN_USAGE.column_name returns the columns of the referenced (parent) table for FK constraints, which is what referred_columns needs. The original ccu.column_name AS referenced_column was correct.

If the original issue (#24917) is that FK ingestion fails when local and referenced column names differ, the root cause is likely elsewhere (e.g., the JOIN producing a cross-product for multi-column FKs due to missing ordinal position matching, or an issue in EntityRepository validation).

Suggested fix:

Revert the change — keep the original `ccu.column_name AS referenced_column` on line 83. The actual bug causing #24917 likely needs investigation in EntityRepository or in how the JOIN handles multi-column FK constraints (the current JOIN may produce incorrect cross-products when columns don't align by ordinal position).

Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion

@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented Apr 22, 2026

Code Review 🚫 Blocked 0 resolved / 2 findings

Metadata ingestion logic for BigQuery now incorrectly aliases foreign key columns to local column names, causing data loss. Additionally, a backup file was accidentally committed to the repository.

🚨 Bug: Fix aliases both FK columns to local column, losing referenced column

📄 ingestion/src/metadata/ingestion/source/database/bigquery/queries.py:78 📄 ingestion/src/metadata/ingestion/source/database/bigquery/queries.py:83

The change on line 83 replaces ccu.column_name AS referenced_column with kcu.column_name AS referenced_column. However, kcu.column_name is already selected on line 78 as column_name (the local/constrained column). After this change, both row.column_name and row.referenced_column will return the same value — the local column name.

In helper.py:164-165, these map to:

  • constrained_columns: [row.column_name] → local column (from kcu) ✓
  • referred_columns: [row.referenced_column] → should be the referenced/target column, but now also returns the local column ✗

Per the SQL standard and BigQuery docs, CONSTRAINT_COLUMN_USAGE.column_name returns the columns of the referenced (parent) table for FK constraints, which is what referred_columns needs. The original ccu.column_name AS referenced_column was correct.

If the original issue (#24917) is that FK ingestion fails when local and referenced column names differ, the root cause is likely elsewhere (e.g., the JOIN producing a cross-product for multi-column FKs due to missing ordinal position matching, or an issue in EntityRepository validation).

Suggested fix
Revert the change — keep the original `ccu.column_name AS referenced_column` on line 83. The actual bug causing #24917 likely needs investigation in EntityRepository or in how the JOIN handles multi-column FK constraints (the current JOIN may produce incorrect cross-products when columns don't align by ordinal position).
⚠️ Quality: Backup artifact queries.py.orig committed to repository

📄 ingestion/src/metadata/ingestion/source/database/bigquery/queries.py.orig

The file queries.py.orig is a backup/merge-conflict artifact that was accidentally committed. It is a full 300-line copy of the original queries.py and should not be in the repository.

Suggested fix
Remove the file:
  git rm ingestion/src/metadata/ingestion/source/database/bigquery/queries.py.orig

Consider adding `*.orig` to `.gitignore`.
🤖 Prompt for agents
Code Review: Metadata ingestion logic for BigQuery now incorrectly aliases foreign key columns to local column names, causing data loss. Additionally, a backup file was accidentally committed to the repository.

1. 🚨 Bug: Fix aliases both FK columns to local column, losing referenced column
   Files: ingestion/src/metadata/ingestion/source/database/bigquery/queries.py:78, ingestion/src/metadata/ingestion/source/database/bigquery/queries.py:83

   The change on line 83 replaces `ccu.column_name AS referenced_column` with `kcu.column_name AS referenced_column`. However, `kcu.column_name` is already selected on line 78 as `column_name` (the local/constrained column). After this change, both `row.column_name` and `row.referenced_column` will return the same value — the local column name.
   
   In `helper.py:164-165`, these map to:
   - `constrained_columns: [row.column_name]` → local column (from kcu) ✓
   - `referred_columns: [row.referenced_column]` → should be the referenced/target column, but now also returns the local column ✗
   
   Per the SQL standard and BigQuery docs, `CONSTRAINT_COLUMN_USAGE.column_name` returns the columns of the **referenced** (parent) table for FK constraints, which is what `referred_columns` needs. The original `ccu.column_name AS referenced_column` was correct.
   
   If the original issue (#24917) is that FK ingestion fails when local and referenced column names differ, the root cause is likely elsewhere (e.g., the JOIN producing a cross-product for multi-column FKs due to missing ordinal position matching, or an issue in EntityRepository validation).

   Suggested fix:
   Revert the change — keep the original `ccu.column_name AS referenced_column` on line 83. The actual bug causing #24917 likely needs investigation in EntityRepository or in how the JOIN handles multi-column FK constraints (the current JOIN may produce incorrect cross-products when columns don't align by ordinal position).

2. ⚠️ Quality: Backup artifact queries.py.orig committed to repository
   Files: ingestion/src/metadata/ingestion/source/database/bigquery/queries.py.orig

   The file `queries.py.orig` is a backup/merge-conflict artifact that was accidentally committed. It is a full 300-line copy of the original `queries.py` and should not be in the repository.

   Suggested fix:
   Remove the file:
     git rm ingestion/src/metadata/ingestion/source/database/bigquery/queries.py.orig
   
   Consider adding `*.orig` to `.gitignore`.

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Metadata Ingestion CLI: BigQuery (GBQ) ingestion fails for tables with Foreign Key constraints

1 participant