Add Support for Custom pg_stat_statements View in Postgres Lineage Ingestion #25104

Copilot · 2026-01-07T05:49:25Z

Describe your changes:

Adds queryStatementSource configuration property to Postgres and Timescale connectors, allowing users to specify a custom view/table for query logs instead of the default pg_stat_statements. This supports deployments that expose pg_stat_statements through a custom view for security policy compliance.

Changes:

Added queryStatementSource property to postgresConnection.json and timescaleConnection.json schemas (default: pg_stat_statements)
Parameterized POSTGRES_SQL_STATEMENT and POSTGRES_TEST_GET_QUERIES to use {query_statement_source} placeholder
Updated PostgresQueryParserSource.get_sql_statement() and connection test methods to pass the configured source
Added documentation for the new property in Postgres.md and Timescale.md
Added unit tests for default and custom source behavior

Example configuration:

{
  "type": "Postgres",
  "username": "user",
  "hostPort": "localhost:5432",
  "database": "postgres",
  "queryStatementSource": "my_schema.custom_pg_stat_statements"
}

This mirrors the existing pattern in the Snowflake connector (accountUsageSchema).

Type of change:

New feature

Checklist:

I have read the CONTRIBUTING document.
My PR title is Fixes <issue-number>: <short explanation>
I have commented on my code, particularly in hard-to-understand areas.
For JSON Schema changes: I updated the migration scripts or explained why it is not needed.

The issue properly describes why the new feature is needed, what's the goal, and how we are building it. Any discussion
or decision-making process is reflected in the issue.
I have updated the documentation.
I have added tests around the new logic.

Note on migrations: No migration script needed - this adds a new optional property with a default value that preserves existing behavior.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

www.antlr.org
- Triggering command: /usr/bin/curl curl -O REDACTED (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Configure Actions setup steps to set up my environment, which run before the firewall is enabled
Add the appropriate URLs or hosts to the custom allowlist in this repository's Copilot coding agent settings (admins only)

Original prompt

This section details on the original issue you should resolve

<issue_title>Add Support for Custom pg_stat_statements View in Postgres Lineage Ingestion</issue_title>
<issue_description>Feature
Add feature issue reference

Add support to override the pg_stat_statements source table/view through configuration in the Postgres connector.

Default continues to use pg_stat_statements

If a custom source is provided, lineage queries should reference that instead

All existing filters and formatting remain unchanged

Describe the task
A clear and concise description of what the bug is.

Currently, the Postgres lineage ingestion relies directly on the pg_stat_statements extension to extract SQL queries:

Some deployments require restricting direct access to pg_stat_statements and instead exposing its contents through a custom view (e.g., my_schema.custom_pg_stat_statements).
This pattern already exists in the Snowflake connector, where lineage queries support overriding the statement source via configuration (SNOWFLAKE_SQL_STATEMENT).</issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes Add Support for Custom pg_stat_statements View in Postgres Lineage Ingestion #24865

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

…views Co-authored-by: SumanMaharana <59608519+SumanMaharana@users.noreply.github.com>

github-actions · 2026-01-07T07:32:18Z

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

github-actions · 2026-01-07T07:35:38Z

TypeScript types have been updated based on the JSON schema changes in the PR

gitar-bot · 2026-01-07T23:32:19Z

ingestion/src/metadata/ingestion/source/database/postgres/queries.py

      SELECT
        u.usename,
        d.datname database_name,
        s.query query_text,


🚨 Security: SQL injection via unvalidated queryStatementSource parameter

Details

The queryStatementSource configuration value is directly interpolated into SQL queries using Python string formatting ({query_statement_source}) without any validation or sanitization. A malicious user could provide a value like pg_stat_statements; DROP TABLE users; -- which would be directly inserted into the SQL query, potentially allowing arbitrary SQL execution.

Impact: An attacker with access to configuration could execute arbitrary SQL commands against the database, leading to data theft, corruption, or complete database compromise.

Suggested fix: Add input validation to ensure queryStatementSource only contains valid identifier characters (alphanumeric, underscores, dots for schema qualification). Consider using a regex pattern like ^[a-zA-Z_][a-zA-Z0-9_]*(\.[a-zA-Z_][a-zA-Z0-9_]*)*$ in the JSON schema validation:

"queryStatementSource": { "title": "Query Statement Source", "description": "...", "type": "string", "pattern": "^[a-zA-Z_][a-zA-Z0-9_]*(\\.[a-zA-Z_][a-zA-Z0-9_]*)*$", "default": "pg_stat_statements" }

Additionally, consider using proper identifier quoting in Python code as a defense-in-depth measure.

github-actions · 2026-01-07T23:42:03Z

🛡️ TRIVY SCAN RESULT 🛡️

Target: `openmetadata-ingestion-base-slim:trivy (debian 12.12)`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `Java`

Vulnerabilities (33)

Package	Vulnerability ID	Severity	Installed Version	Fixed Version
`com.fasterxml.jackson.core:jackson-core`	CVE-2025-52999	🚨 HIGH	2.12.7	2.15.0
`com.fasterxml.jackson.core:jackson-core`	CVE-2025-52999	🚨 HIGH	2.13.4	2.15.0
`com.fasterxml.jackson.core:jackson-databind`	CVE-2022-42003	🚨 HIGH	2.12.7	2.12.7.1, 2.13.4.2
`com.fasterxml.jackson.core:jackson-databind`	CVE-2022-42004	🚨 HIGH	2.12.7	2.12.7.1, 2.13.4
`com.google.code.gson:gson`	CVE-2022-25647	🚨 HIGH	2.2.4	2.8.9
`com.google.protobuf:protobuf-java`	CVE-2021-22569	🚨 HIGH	3.3.0	3.16.1, 3.18.2, 3.19.2
`com.google.protobuf:protobuf-java`	CVE-2022-3509	🚨 HIGH	3.3.0	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2022-3510	🚨 HIGH	3.3.0	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2024-7254	🚨 HIGH	3.3.0	3.25.5, 4.27.5, 4.28.2
`com.google.protobuf:protobuf-java`	CVE-2021-22569	🚨 HIGH	3.7.1	3.16.1, 3.18.2, 3.19.2
`com.google.protobuf:protobuf-java`	CVE-2022-3509	🚨 HIGH	3.7.1	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2022-3510	🚨 HIGH	3.7.1	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2024-7254	🚨 HIGH	3.7.1	3.25.5, 4.27.5, 4.28.2
`com.nimbusds:nimbus-jose-jwt`	CVE-2023-52428	🚨 HIGH	9.8.1	9.37.2
`com.squareup.okhttp3:okhttp`	CVE-2021-0341	🚨 HIGH	3.12.12	4.9.2
`commons-beanutils:commons-beanutils`	CVE-2025-48734	🚨 HIGH	1.9.4	1.11.0
`commons-io:commons-io`	CVE-2024-47554	🚨 HIGH	2.8.0	2.14.0
`dnsjava:dnsjava`	CVE-2024-25638	🚨 HIGH	2.1.7	3.6.0
`io.netty:netty-codec-http2`	CVE-2025-55163	🚨 HIGH	4.1.96.Final	4.2.4.Final, 4.1.124.Final
`io.netty:netty-codec-http2`	GHSA-xpw8-rcwv-8f8p	🚨 HIGH	4.1.96.Final	4.1.100.Final
`io.netty:netty-handler`	CVE-2025-24970	🚨 HIGH	4.1.96.Final	4.1.118.Final
`net.minidev:json-smart`	CVE-2021-31684	🚨 HIGH	1.3.2	1.3.3, 2.4.4
`net.minidev:json-smart`	CVE-2023-1370	🚨 HIGH	1.3.2	2.4.9
`org.apache.avro:avro`	CVE-2024-47561	🔥 CRITICAL	1.7.7	1.11.4
`org.apache.avro:avro`	CVE-2023-39410	🚨 HIGH	1.7.7	1.11.3
`org.apache.derby:derby`	CVE-2022-46337	🔥 CRITICAL	10.14.2.0	10.14.3, 10.15.2.1, 10.16.1.2, 10.17.1.0
`org.apache.ivy:ivy`	CVE-2022-46751	🚨 HIGH	2.5.1	2.5.2
`org.apache.mesos:mesos`	CVE-2018-1330	🚨 HIGH	1.4.3	1.6.0
`org.apache.thrift:libthrift`	CVE-2019-0205	🚨 HIGH	0.12.0	0.13.0
`org.apache.thrift:libthrift`	CVE-2020-13949	🚨 HIGH	0.12.0	0.14.0
`org.apache.zookeeper:zookeeper`	CVE-2023-44981	🔥 CRITICAL	3.6.3	3.7.2, 3.8.3, 3.9.1
`org.eclipse.jetty:jetty-server`	CVE-2024-13009	🚨 HIGH	9.4.56.v20240826	9.4.57.v20241219
`org.lz4:lz4-java`	CVE-2025-12183	🚨 HIGH	1.8.0	1.8.1

🛡️ TRIVY SCAN RESULT 🛡️

Target: `Node.js`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `Python`

Vulnerabilities (4)

Package	Vulnerability ID	Severity	Installed Version	Fixed Version
`starlette`	CVE-2025-62727	🚨 HIGH	0.48.0	0.49.1
`urllib3`	CVE-2025-66418	🚨 HIGH	1.26.20	2.6.0
`urllib3`	CVE-2025-66471	🚨 HIGH	1.26.20	2.6.0
`urllib3`	CVE-2026-21441	🚨 HIGH	1.26.20	2.6.3

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/etc/ssl/private/ssl-cert-snakeoil.key`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/extended_sample_data.yaml`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/lineage.yaml`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/sample_data.json`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/sample_data.yaml`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/sample_data_aut.yaml`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/sample_usage.json`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/sample_usage.yaml`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/sample_usage_aut.yaml`

No Vulnerabilities Found

github-actions · 2026-01-07T23:42:42Z

🛡️ TRIVY SCAN RESULT 🛡️

Target: `openmetadata-ingestion:trivy (debian 12.12)`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `Java`

Vulnerabilities (33)

Package	Vulnerability ID	Severity	Installed Version	Fixed Version
`com.fasterxml.jackson.core:jackson-core`	CVE-2025-52999	🚨 HIGH	2.12.7	2.15.0
`com.fasterxml.jackson.core:jackson-core`	CVE-2025-52999	🚨 HIGH	2.13.4	2.15.0
`com.fasterxml.jackson.core:jackson-databind`	CVE-2022-42003	🚨 HIGH	2.12.7	2.12.7.1, 2.13.4.2
`com.fasterxml.jackson.core:jackson-databind`	CVE-2022-42004	🚨 HIGH	2.12.7	2.12.7.1, 2.13.4
`com.google.code.gson:gson`	CVE-2022-25647	🚨 HIGH	2.2.4	2.8.9
`com.google.protobuf:protobuf-java`	CVE-2021-22569	🚨 HIGH	3.3.0	3.16.1, 3.18.2, 3.19.2
`com.google.protobuf:protobuf-java`	CVE-2022-3509	🚨 HIGH	3.3.0	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2022-3510	🚨 HIGH	3.3.0	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2024-7254	🚨 HIGH	3.3.0	3.25.5, 4.27.5, 4.28.2
`com.google.protobuf:protobuf-java`	CVE-2021-22569	🚨 HIGH	3.7.1	3.16.1, 3.18.2, 3.19.2
`com.google.protobuf:protobuf-java`	CVE-2022-3509	🚨 HIGH	3.7.1	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2022-3510	🚨 HIGH	3.7.1	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2024-7254	🚨 HIGH	3.7.1	3.25.5, 4.27.5, 4.28.2
`com.nimbusds:nimbus-jose-jwt`	CVE-2023-52428	🚨 HIGH	9.8.1	9.37.2
`com.squareup.okhttp3:okhttp`	CVE-2021-0341	🚨 HIGH	3.12.12	4.9.2
`commons-beanutils:commons-beanutils`	CVE-2025-48734	🚨 HIGH	1.9.4	1.11.0
`commons-io:commons-io`	CVE-2024-47554	🚨 HIGH	2.8.0	2.14.0
`dnsjava:dnsjava`	CVE-2024-25638	🚨 HIGH	2.1.7	3.6.0
`io.netty:netty-codec-http2`	CVE-2025-55163	🚨 HIGH	4.1.96.Final	4.2.4.Final, 4.1.124.Final
`io.netty:netty-codec-http2`	GHSA-xpw8-rcwv-8f8p	🚨 HIGH	4.1.96.Final	4.1.100.Final
`io.netty:netty-handler`	CVE-2025-24970	🚨 HIGH	4.1.96.Final	4.1.118.Final
`net.minidev:json-smart`	CVE-2021-31684	🚨 HIGH	1.3.2	1.3.3, 2.4.4
`net.minidev:json-smart`	CVE-2023-1370	🚨 HIGH	1.3.2	2.4.9
`org.apache.avro:avro`	CVE-2024-47561	🔥 CRITICAL	1.7.7	1.11.4
`org.apache.avro:avro`	CVE-2023-39410	🚨 HIGH	1.7.7	1.11.3
`org.apache.derby:derby`	CVE-2022-46337	🔥 CRITICAL	10.14.2.0	10.14.3, 10.15.2.1, 10.16.1.2, 10.17.1.0
`org.apache.ivy:ivy`	CVE-2022-46751	🚨 HIGH	2.5.1	2.5.2
`org.apache.mesos:mesos`	CVE-2018-1330	🚨 HIGH	1.4.3	1.6.0
`org.apache.thrift:libthrift`	CVE-2019-0205	🚨 HIGH	0.12.0	0.13.0
`org.apache.thrift:libthrift`	CVE-2020-13949	🚨 HIGH	0.12.0	0.14.0
`org.apache.zookeeper:zookeeper`	CVE-2023-44981	🔥 CRITICAL	3.6.3	3.7.2, 3.8.3, 3.9.1
`org.eclipse.jetty:jetty-server`	CVE-2024-13009	🚨 HIGH	9.4.56.v20240826	9.4.57.v20241219
`org.lz4:lz4-java`	CVE-2025-12183	🚨 HIGH	1.8.0	1.8.1

🛡️ TRIVY SCAN RESULT 🛡️

Target: `Node.js`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `Python`

Vulnerabilities (9)

Package	Vulnerability ID	Severity	Installed Version	Fixed Version
`Werkzeug`	CVE-2024-34069	🚨 HIGH	2.2.3	3.0.3
`aiohttp`	CVE-2025-69223	🚨 HIGH	3.12.12	3.13.3
`aiohttp`	CVE-2025-69223	🚨 HIGH	3.13.2	3.13.3
`deepdiff`	CVE-2025-58367	🔥 CRITICAL	7.0.1	8.6.1
`ray`	CVE-2025-62593	🔥 CRITICAL	2.47.1	2.52.0
`starlette`	CVE-2025-62727	🚨 HIGH	0.48.0	0.49.1
`urllib3`	CVE-2025-66418	🚨 HIGH	1.26.20	2.6.0
`urllib3`	CVE-2025-66471	🚨 HIGH	1.26.20	2.6.0
`urllib3`	CVE-2026-21441	🚨 HIGH	1.26.20	2.6.3

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/etc/ssl/private/ssl-cert-snakeoil.key`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/home/airflow/openmetadata-airflow-apis/openmetadata_managed_apis.egg-info/PKG-INFO`

No Vulnerabilities Found

github-actions · 2026-01-07T23:59:25Z

Jest test Coverage

UI tests summary

Lines	Statements	Branches	Functions
	65.25% (52209/80016)	43.14% (25828/59875)	46.53% (8094/17394)

gitar-bot · 2026-01-08T00:18:16Z

ingestion/src/metadata/ingestion/source/database/postgres/queries.py

        u.usename,
        d.datname database_name,
        s.query query_text,
        s.{time_column_name} duration


🚨 Security: SQL injection via unvalidated queryStatementSource parameter

Details

The queryStatementSource configuration parameter is directly interpolated into SQL queries without any validation or sanitization. This creates a SQL injection vulnerability where a malicious user could execute arbitrary SQL commands.

Impact:

A malicious actor with configuration access could inject arbitrary SQL like pg_stat_statements; DROP TABLE users; -- or use UNION-based attacks to extract sensitive data

This could lead to data exfiltration, data modification, or denial of service

Locations affected:

queries.py line 22-24: {query_statement_source} in POSTGRES_SQL_STATEMENT

queries.py line 140: {query_statement_source} in POSTGRES_TEST_GET_QUERIES

connection.py line 9-10: Passing unvalidated value to query format

query_parser.py line 44-45: Passing unvalidated value to query format

Suggested fix:

Add a validation pattern to the JSON schema to restrict input to valid SQL identifiers:

"queryStatementSource": { "type": "string", "pattern": "^[a-zA-Z_][a-zA-Z0-9_]*(\\.[a-zA-Z_][a-zA-Z0-9_]*)?$", "default": "pg_stat_statements" }

Additionally, validate/sanitize the value in Python before use:

import re def validate_query_source(source: str) -> str: if not re.match(r'^[a-zA-Z_][a-zA-Z0-9_]*(\.[a-zA-Z_][a-zA-Z0-9_]*)?$', source): raise ValueError(f"Invalid query statement source: {source}") return source

This mirrors security patterns used elsewhere for similar configurable SQL identifiers.

gitar-bot · 2026-01-08T18:47:34Z

🔍 CI failure analysis for fb07505: Integration tests fail to compile due to applyTag() signature change from main branch

Issue

Both integration test jobs (integration-tests-postgres-opensearch and integration-tests-mysql-elasticsearch) are failing with compilation errors, not test failures.

Root Cause

Compilation Error in Integration Tests (unrelated to PR changes)

File: openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/OrphanedEntityReferenceIT.java

Problem: The applyTag method signature in CollectionDAO.TagUsageDAO interface has changed (likely from the recent main branch merge), but the test code hasn't been updated to match.

Error at lines 82, 131, and 194:

method applyTag in interface org.openmetadata.service.jdbi3.CollectionDAO.TagUsageDAO cannot be applied to given types;
  required: int,String,String,String,int,int,String,String
  found:    int,String,String,String,int,int,<nulltype>
  reason: actual and formal argument lists differ in length

The method now requires 8 parameters but the test code is passing only 7 parameters (with the 7th being null).

Details

This is a breaking API change introduced in the main branch, most likely from one of these recent commits merged into this PR:

d240960e07 - "Feat: 13884 - Allow creating column level custom Properties"
4152f9e13e - "Fix Do not throw 404 EntityNotFoundException when a relation is left over in entity_relationship #24266: Do not throw 404 EntityNotFoundException"
c66d9eebf6 - "Tagging explanation"
4c3f6dd1e3 - "Fix audit logs"

The "Tagging explanation" commit (c66d9eebf6) is the most likely culprit as it relates to tagging functionality.

Solution

The OrphanedEntityReferenceIT.java test file needs to be updated to pass the correct number of parameters to applyTag(). The missing 8th parameter needs to be identified from the interface definition and added to all three call sites (lines 82, 131, 194).

This is completely unrelated to the Postgres queryStatementSource feature - it's a merge conflict/compatibility issue from main branch changes.

Code Review 🚫 Blocked

Feature adds configurable query source for Postgres lineage but introduces a SQL injection vulnerability due to unvalidated user input being directly interpolated into SQL queries.

🚨 Security: SQL injection via unvalidated queryStatementSource parameter

📄 ingestion/src/metadata/ingestion/source/database/postgres/queries.py:22 📄 ingestion/src/metadata/ingestion/source/database/postgres/queries.py:140 📄 openmetadata-spec/src/main/resources/json/schema/entity/services/connections/database/postgresConnection.json:109 🔗 CWE-89: SQL Injection

The queryStatementSource parameter is directly interpolated into SQL queries via Python's .format() method without any validation or sanitization. This creates a SQL injection vulnerability.

Attack vector:
A user could provide a malicious value like:

"queryStatementSource": "pg_stat_statements; DROP TABLE users; --"

Impact:

Remote code execution via SQL injection
Data exfiltration, modification, or deletion
Complete database compromise

Affected code:

queries.py line 22: {query_statement_source} s in POSTGRES_SQL_STATEMENT
queries.py line 140: {query_statement_source} s in POSTGRES_TEST_GET_QUERIES
The JSON schema at postgresConnection.json line 109 defines the field as a plain string with no pattern validation

Recommended fix:

Add a strict regex pattern in the JSON schema to only allow valid SQL identifiers:

"queryStatementSource": {
  "type": "string",
  "pattern": "^[a-zA-Z_][a-zA-Z0-9_]*(\\.[a-zA-Z_][a-zA-Z0-9_]*)?$",
  "default": "pg_stat_statements"
}

Additionally, add server-side validation in Python to double-check the format before interpolation:

import re
if not re.match(r'^[a-zA-Z_][a-zA-Z0-9_]*(\.[a-zA-Z_][a-zA-Z0-9_]*)?$', query_statement_source):
    raise ValueError("Invalid queryStatementSource format")

This mirrors security patterns used in other database connectors that accept table/view names as configuration.

🚨 Security: SQL injection via unvalidated queryStatementSource parameter

📄 ingestion/src/metadata/ingestion/source/database/postgres/queries.py:22 📄 ingestion/src/metadata/ingestion/source/database/postgres/queries.py:140 📄 openmetadata-spec/src/main/resources/json/schema/entity/services/connections/database/postgresConnection.json:109 🔗 CWE-89: SQL Injection

The queryStatementSource configuration parameter is directly interpolated into SQL queries without any validation or sanitization. This creates a SQL injection vulnerability where a malicious user could execute arbitrary SQL commands.

Impact:

A malicious actor with configuration access could inject arbitrary SQL like pg_stat_statements; DROP TABLE users; -- or use UNION-based attacks to extract sensitive data
This could lead to data exfiltration, data modification, or denial of service

Locations affected:

queries.py line 22-24: {query_statement_source} in POSTGRES_SQL_STATEMENT
queries.py line 140: {query_statement_source} in POSTGRES_TEST_GET_QUERIES
connection.py line 9-10: Passing unvalidated value to query format
query_parser.py line 44-45: Passing unvalidated value to query format

Suggested fix:

Add a validation pattern to the JSON schema to restrict input to valid SQL identifiers:

"queryStatementSource": {
  "type": "string",
  "pattern": "^[a-zA-Z_][a-zA-Z0-9_]*(\\.[a-zA-Z_][a-zA-Z0-9_]*)?$",
  "default": "pg_stat_statements"
}

Additionally, validate/sanitize the value in Python before use:

import re
def validate_query_source(source: str) -> str:
    if not re.match(r'^[a-zA-Z_][a-zA-Z0-9_]*(\.[a-zA-Z_][a-zA-Z0-9_]*)?$', source):
        raise ValueError(f"Invalid query statement source: {source}")
    return source

This mirrors security patterns used elsewhere for similar configurable SQL identifiers.

🚨 Security: SQL injection via unvalidated queryStatementSource parameter

📄 ingestion/src/metadata/ingestion/source/database/postgres/queries.py:22 📄 ingestion/src/metadata/ingestion/source/database/postgres/queries.py:140 📄 openmetadata-spec/src/main/resources/json/schema/entity/services/connections/database/postgresConnection.json:109 🔗 CWE-89: SQL Injection

The queryStatementSource configuration value is directly interpolated into SQL queries using Python string formatting ({query_statement_source}) without any validation or sanitization. A malicious user could provide a value like pg_stat_statements; DROP TABLE users; -- which would be directly inserted into the SQL query, potentially allowing arbitrary SQL execution.

Impact: An attacker with access to configuration could execute arbitrary SQL commands against the database, leading to data theft, corruption, or complete database compromise.

Suggested fix: Add input validation to ensure queryStatementSource only contains valid identifier characters (alphanumeric, underscores, dots for schema qualification). Consider using a regex pattern like ^[a-zA-Z_][a-zA-Z0-9_]*(\.[a-zA-Z_][a-zA-Z0-9_]*)*$ in the JSON schema validation:

"queryStatementSource": {
  "title": "Query Statement Source",
  "description": "...",
  "type": "string",
  "pattern": "^[a-zA-Z_][a-zA-Z0-9_]*(\\.[a-zA-Z_][a-zA-Z0-9_]*)*$",
  "default": "pg_stat_statements"
}

Additionally, consider using proper identifier quoting in Python code as a defense-in-depth measure.

What Works Well

The implementation follows existing patterns in the codebase (similar to Snowflake connector's accountUsageSchema), includes comprehensive unit tests for both default and custom source scenarios, and provides thorough documentation for users.

Recommendations

Add input validation at both the schema level (JSON Schema pattern constraint) and application level (Python regex validation) to ensure the queryStatementSource only contains valid SQL identifier patterns (e.g., ^[a-zA-Z_][a-zA-Z0-9_]*(\.[a-zA-Z_][a-zA-Z0-9_]*)?$). This prevents malicious SQL injection while still allowing legitimate schema-qualified view names.

Tip

Comment Gitar fix CI or enable auto-apply: gitar auto-apply:on

Options

Auto-apply is off Gitar will not commit updates to this branch.
Display: compact Hiding non-applicable rules.

Comment with these commands to change:

`Auto-apply`	`Compact`
`gitar auto-apply:on`	`gitar display:verbose`

_{Was this helpful? React with 👍 / 👎 | This comment will update automatically (Docs)}

gitar-bot · 2026-01-08T18:48:49Z

ingestion/src/metadata/ingestion/source/database/postgres/queries.py

        u.usename,
        d.datname database_name,
        s.query query_text,
        s.{time_column_name} duration


🚨 Security: SQL injection via unvalidated queryStatementSource parameter

Details

The queryStatementSource parameter is directly interpolated into SQL queries via Python's .format() method without any validation or sanitization. This creates a SQL injection vulnerability.

Attack vector:
A user could provide a malicious value like:

"queryStatementSource": "pg_stat_statements; DROP TABLE users; --"

Impact:

Remote code execution via SQL injection

Data exfiltration, modification, or deletion

Complete database compromise

Affected code:

queries.py line 22: {query_statement_source} s in POSTGRES_SQL_STATEMENT

queries.py line 140: {query_statement_source} s in POSTGRES_TEST_GET_QUERIES

The JSON schema at postgresConnection.json line 109 defines the field as a plain string with no pattern validation

Recommended fix:

Add a strict regex pattern in the JSON schema to only allow valid SQL identifiers:

"queryStatementSource": { "type": "string", "pattern": "^[a-zA-Z_][a-zA-Z0-9_]*(\\.[a-zA-Z_][a-zA-Z0-9_]*)?$", "default": "pg_stat_statements" }

Additionally, add server-side validation in Python to double-check the format before interpolation:

import re if not re.match(r'^[a-zA-Z_][a-zA-Z0-9_]*(\.[a-zA-Z_][a-zA-Z0-9_]*)?$', query_statement_source): raise ValueError("Invalid queryStatementSource format")

This mirrors security patterns used in other database connectors that accept table/view names as configuration.

sonarqubecloud · 2026-01-08T19:12:43Z

Quality Gate passed for 'open-metadata-ui'

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

Initial plan

f4db374

Copilot AI assigned Copilot and SumanMaharana Jan 7, 2026

Copilot started work on behalf of SumanMaharana January 7, 2026 05:49 View session

Add queryStatementSource configuration for custom pg_stat_statements …

5c62b9d

…views Co-authored-by: SumanMaharana <59608519+SumanMaharana@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Add support for custom pg_stat_statements view in Postgres~~ Add Support for Custom pg_stat_statements View in Postgres Lineage Ingestion Jan 7, 2026

Copilot AI requested a review from SumanMaharana January 7, 2026 06:06

Copilot finished work on behalf of SumanMaharana January 7, 2026 06:06

SumanMaharana marked this pull request as ready for review January 7, 2026 07:32

SumanMaharana requested review from a team as code owners January 7, 2026 07:32

SumanMaharana had a problem deploying to test January 7, 2026 07:32 — with GitHub Actions Failure

Update generated TypeScript types

4147575

harshach added the safe to test Add this label to run secure Github workflows on PRs label Jan 7, 2026

harshach had a problem deploying to test January 7, 2026 23:30 — with GitHub Actions Error

Merge branch 'main' into copilot/add-custom-pg-stat-statements-support

b303a90

harshach temporarily deployed to test January 7, 2026 23:31 — with GitHub Actions Inactive

gitar-bot bot reviewed Jan 7, 2026

View reviewed changes

gitar-bot bot reviewed Jan 8, 2026

View reviewed changes

Merge branch 'main' into copilot/add-custom-pg-stat-statements-support

fb07505

ulixius9 had a problem deploying to test January 8, 2026 18:42 — with GitHub Actions Failure

gitar-bot bot reviewed Jan 8, 2026

View reviewed changes

Add Support for Custom pg_stat_statements View in Postgres Lineage Ingestion #25104

Are you sure you want to change the base?

Add Support for Custom pg_stat_statements View in Postgres Lineage Ingestion #25104

Uh oh!

Conversation

Copilot AI commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe your changes:

Type of change:

Checklist:

I tried to connect to the following addresses, but was blocked by firewall rules:

Comments on the Issue (you are @copilot in this section)

Uh oh!

github-actions bot commented Jan 7, 2026

Uh oh!

github-actions bot commented Jan 7, 2026

Uh oh!

gitar-bot bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🛡️ TRIVY SCAN RESULT 🛡️

Target: openmetadata-ingestion-base-slim:trivy (debian 12.12)

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Java

Vulnerabilities (33)

🛡️ TRIVY SCAN RESULT 🛡️

Target: Node.js

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Python

Vulnerabilities (4)

🛡️ TRIVY SCAN RESULT 🛡️

Target: /etc/ssl/private/ssl-cert-snakeoil.key

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/extended_sample_data.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/lineage.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data.json

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data_aut.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage.json

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage_aut.yaml

No Vulnerabilities Found

Uh oh!

github-actions bot commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🛡️ TRIVY SCAN RESULT 🛡️

Target: openmetadata-ingestion:trivy (debian 12.12)

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Java

Vulnerabilities (33)

🛡️ TRIVY SCAN RESULT 🛡️

Target: Node.js

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Python

Vulnerabilities (9)

🛡️ TRIVY SCAN RESULT 🛡️

Target: /etc/ssl/private/ssl-cert-snakeoil.key

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Copilot AI commented Jan 7, 2026 •

edited

Loading

github-actions bot commented Jan 7, 2026 •

edited

Loading

Target: `openmetadata-ingestion-base-slim:trivy (debian 12.12)`

Target: `Java`

Target: `Node.js`

Target: `Python`

Target: `/etc/ssl/private/ssl-cert-snakeoil.key`

Target: `/ingestion/pipelines/extended_sample_data.yaml`

Target: `/ingestion/pipelines/lineage.yaml`

Target: `/ingestion/pipelines/sample_data.json`

Target: `/ingestion/pipelines/sample_data.yaml`

Target: `/ingestion/pipelines/sample_data_aut.yaml`

Target: `/ingestion/pipelines/sample_usage.json`

Target: `/ingestion/pipelines/sample_usage.yaml`

Target: `/ingestion/pipelines/sample_usage_aut.yaml`

github-actions bot commented Jan 7, 2026 •

edited

Loading

Target: `openmetadata-ingestion:trivy (debian 12.12)`

Target: `Java`

Target: `Node.js`

Target: `Python`

Target: `/etc/ssl/private/ssl-cert-snakeoil.key`

Target: `/home/airflow/openmetadata-airflow-apis/openmetadata_managed_apis.egg-info/PKG-INFO`

github-actions bot commented Jan 7, 2026 •

edited

Loading

gitar-bot bot commented Jan 8, 2026 •

edited

Loading