Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Jan 7, 2026

Describe your changes:

Adds queryStatementSource configuration property to Postgres and Timescale connectors, allowing users to specify a custom view/table for query logs instead of the default pg_stat_statements. This supports deployments that expose pg_stat_statements through a custom view for security policy compliance.

Changes:

  • Added queryStatementSource property to postgresConnection.json and timescaleConnection.json schemas (default: pg_stat_statements)
  • Parameterized POSTGRES_SQL_STATEMENT and POSTGRES_TEST_GET_QUERIES to use {query_statement_source} placeholder
  • Updated PostgresQueryParserSource.get_sql_statement() and connection test methods to pass the configured source
  • Added documentation for the new property in Postgres.md and Timescale.md
  • Added unit tests for default and custom source behavior

Example configuration:

{
  "type": "Postgres",
  "username": "user",
  "hostPort": "localhost:5432",
  "database": "postgres",
  "queryStatementSource": "my_schema.custom_pg_stat_statements"
}

This mirrors the existing pattern in the Snowflake connector (accountUsageSchema).

Type of change:

  • New feature

Checklist:

  • I have read the CONTRIBUTING document.
  • My PR title is Fixes <issue-number>: <short explanation>
  • I have commented on my code, particularly in hard-to-understand areas.
  • For JSON Schema changes: I updated the migration scripts or explained why it is not needed.
  • The issue properly describes why the new feature is needed, what's the goal, and how we are building it. Any discussion
    or decision-making process is reflected in the issue.
  • I have updated the documentation.
  • I have added tests around the new logic.

Note on migrations: No migration script needed - this adds a new optional property with a default value that preserves existing behavior.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • www.antlr.org
    • Triggering command: /usr/bin/curl curl -O REDACTED (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Original prompt

This section details on the original issue you should resolve

<issue_title>Add Support for Custom pg_stat_statements View in Postgres Lineage Ingestion</issue_title>
<issue_description>Feature
Add feature issue reference

Add support to override the pg_stat_statements source table/view through configuration in the Postgres connector.

  • Default continues to use pg_stat_statements
  • If a custom source is provided, lineage queries should reference that instead
  • All existing filters and formatting remain unchanged

Describe the task
A clear and concise description of what the bug is.

Currently, the Postgres lineage ingestion relies directly on the pg_stat_statements extension to extract SQL queries:

  • Some deployments require restricting direct access to pg_stat_statements and instead exposing its contents through a custom view (e.g., my_schema.custom_pg_stat_statements).
    This pattern already exists in the Snowflake connector, where lineage queries support overriding the statement source via configuration (SNOWFLAKE_SQL_STATEMENT).</issue_description>

Comments on the Issue (you are @copilot in this section)


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

…views

Co-authored-by: SumanMaharana <59608519+SumanMaharana@users.noreply.github.com>
Copilot AI changed the title [WIP] Add support for custom pg_stat_statements view in Postgres Add Support for Custom pg_stat_statements View in Postgres Lineage Ingestion Jan 7, 2026
Copilot AI requested a review from SumanMaharana January 7, 2026 06:06
@SumanMaharana SumanMaharana marked this pull request as ready for review January 7, 2026 07:32
@SumanMaharana SumanMaharana requested review from a team as code owners January 7, 2026 07:32
@github-actions
Copy link
Contributor

github-actions bot commented Jan 7, 2026

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

@github-actions
Copy link
Contributor

github-actions bot commented Jan 7, 2026

TypeScript types have been updated based on the JSON schema changes in the PR

SELECT
u.usename,
d.datname database_name,
s.query query_text,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚨 Security: SQL injection via unvalidated queryStatementSource parameter

Details

The queryStatementSource configuration value is directly interpolated into SQL queries using Python string formatting ({query_statement_source}) without any validation or sanitization. A malicious user could provide a value like pg_stat_statements; DROP TABLE users; -- which would be directly inserted into the SQL query, potentially allowing arbitrary SQL execution.

Impact: An attacker with access to configuration could execute arbitrary SQL commands against the database, leading to data theft, corruption, or complete database compromise.

Suggested fix: Add input validation to ensure queryStatementSource only contains valid identifier characters (alphanumeric, underscores, dots for schema qualification). Consider using a regex pattern like ^[a-zA-Z_][a-zA-Z0-9_]*(\.[a-zA-Z_][a-zA-Z0-9_]*)*$ in the JSON schema validation:

"queryStatementSource": {
  "title": "Query Statement Source",
  "description": "...",
  "type": "string",
  "pattern": "^[a-zA-Z_][a-zA-Z0-9_]*(\\.[a-zA-Z_][a-zA-Z0-9_]*)*$",
  "default": "pg_stat_statements"
}

Additionally, consider using proper identifier quoting in Python code as a defense-in-depth measure.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 7, 2026

🛡️ TRIVY SCAN RESULT 🛡️

Target: openmetadata-ingestion-base-slim:trivy (debian 12.12)

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Java

Vulnerabilities (33)

Package Vulnerability ID Severity Installed Version Fixed Version
com.fasterxml.jackson.core:jackson-core CVE-2025-52999 🚨 HIGH 2.12.7 2.15.0
com.fasterxml.jackson.core:jackson-core CVE-2025-52999 🚨 HIGH 2.13.4 2.15.0
com.fasterxml.jackson.core:jackson-databind CVE-2022-42003 🚨 HIGH 2.12.7 2.12.7.1, 2.13.4.2
com.fasterxml.jackson.core:jackson-databind CVE-2022-42004 🚨 HIGH 2.12.7 2.12.7.1, 2.13.4
com.google.code.gson:gson CVE-2022-25647 🚨 HIGH 2.2.4 2.8.9
com.google.protobuf:protobuf-java CVE-2021-22569 🚨 HIGH 3.3.0 3.16.1, 3.18.2, 3.19.2
com.google.protobuf:protobuf-java CVE-2022-3509 🚨 HIGH 3.3.0 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2022-3510 🚨 HIGH 3.3.0 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2024-7254 🚨 HIGH 3.3.0 3.25.5, 4.27.5, 4.28.2
com.google.protobuf:protobuf-java CVE-2021-22569 🚨 HIGH 3.7.1 3.16.1, 3.18.2, 3.19.2
com.google.protobuf:protobuf-java CVE-2022-3509 🚨 HIGH 3.7.1 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2022-3510 🚨 HIGH 3.7.1 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2024-7254 🚨 HIGH 3.7.1 3.25.5, 4.27.5, 4.28.2
com.nimbusds:nimbus-jose-jwt CVE-2023-52428 🚨 HIGH 9.8.1 9.37.2
com.squareup.okhttp3:okhttp CVE-2021-0341 🚨 HIGH 3.12.12 4.9.2
commons-beanutils:commons-beanutils CVE-2025-48734 🚨 HIGH 1.9.4 1.11.0
commons-io:commons-io CVE-2024-47554 🚨 HIGH 2.8.0 2.14.0
dnsjava:dnsjava CVE-2024-25638 🚨 HIGH 2.1.7 3.6.0
io.netty:netty-codec-http2 CVE-2025-55163 🚨 HIGH 4.1.96.Final 4.2.4.Final, 4.1.124.Final
io.netty:netty-codec-http2 GHSA-xpw8-rcwv-8f8p 🚨 HIGH 4.1.96.Final 4.1.100.Final
io.netty:netty-handler CVE-2025-24970 🚨 HIGH 4.1.96.Final 4.1.118.Final
net.minidev:json-smart CVE-2021-31684 🚨 HIGH 1.3.2 1.3.3, 2.4.4
net.minidev:json-smart CVE-2023-1370 🚨 HIGH 1.3.2 2.4.9
org.apache.avro:avro CVE-2024-47561 🔥 CRITICAL 1.7.7 1.11.4
org.apache.avro:avro CVE-2023-39410 🚨 HIGH 1.7.7 1.11.3
org.apache.derby:derby CVE-2022-46337 🔥 CRITICAL 10.14.2.0 10.14.3, 10.15.2.1, 10.16.1.2, 10.17.1.0
org.apache.ivy:ivy CVE-2022-46751 🚨 HIGH 2.5.1 2.5.2
org.apache.mesos:mesos CVE-2018-1330 🚨 HIGH 1.4.3 1.6.0
org.apache.thrift:libthrift CVE-2019-0205 🚨 HIGH 0.12.0 0.13.0
org.apache.thrift:libthrift CVE-2020-13949 🚨 HIGH 0.12.0 0.14.0
org.apache.zookeeper:zookeeper CVE-2023-44981 🔥 CRITICAL 3.6.3 3.7.2, 3.8.3, 3.9.1
org.eclipse.jetty:jetty-server CVE-2024-13009 🚨 HIGH 9.4.56.v20240826 9.4.57.v20241219
org.lz4:lz4-java CVE-2025-12183 🚨 HIGH 1.8.0 1.8.1

🛡️ TRIVY SCAN RESULT 🛡️

Target: Node.js

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Python

Vulnerabilities (4)

Package Vulnerability ID Severity Installed Version Fixed Version
starlette CVE-2025-62727 🚨 HIGH 0.48.0 0.49.1
urllib3 CVE-2025-66418 🚨 HIGH 1.26.20 2.6.0
urllib3 CVE-2025-66471 🚨 HIGH 1.26.20 2.6.0
urllib3 CVE-2026-21441 🚨 HIGH 1.26.20 2.6.3

🛡️ TRIVY SCAN RESULT 🛡️

Target: /etc/ssl/private/ssl-cert-snakeoil.key

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/extended_sample_data.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/lineage.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data.json

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data_aut.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage.json

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage_aut.yaml

No Vulnerabilities Found

@github-actions
Copy link
Contributor

github-actions bot commented Jan 7, 2026

🛡️ TRIVY SCAN RESULT 🛡️

Target: openmetadata-ingestion:trivy (debian 12.12)

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Java

Vulnerabilities (33)

Package Vulnerability ID Severity Installed Version Fixed Version
com.fasterxml.jackson.core:jackson-core CVE-2025-52999 🚨 HIGH 2.12.7 2.15.0
com.fasterxml.jackson.core:jackson-core CVE-2025-52999 🚨 HIGH 2.13.4 2.15.0
com.fasterxml.jackson.core:jackson-databind CVE-2022-42003 🚨 HIGH 2.12.7 2.12.7.1, 2.13.4.2
com.fasterxml.jackson.core:jackson-databind CVE-2022-42004 🚨 HIGH 2.12.7 2.12.7.1, 2.13.4
com.google.code.gson:gson CVE-2022-25647 🚨 HIGH 2.2.4 2.8.9
com.google.protobuf:protobuf-java CVE-2021-22569 🚨 HIGH 3.3.0 3.16.1, 3.18.2, 3.19.2
com.google.protobuf:protobuf-java CVE-2022-3509 🚨 HIGH 3.3.0 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2022-3510 🚨 HIGH 3.3.0 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2024-7254 🚨 HIGH 3.3.0 3.25.5, 4.27.5, 4.28.2
com.google.protobuf:protobuf-java CVE-2021-22569 🚨 HIGH 3.7.1 3.16.1, 3.18.2, 3.19.2
com.google.protobuf:protobuf-java CVE-2022-3509 🚨 HIGH 3.7.1 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2022-3510 🚨 HIGH 3.7.1 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2024-7254 🚨 HIGH 3.7.1 3.25.5, 4.27.5, 4.28.2
com.nimbusds:nimbus-jose-jwt CVE-2023-52428 🚨 HIGH 9.8.1 9.37.2
com.squareup.okhttp3:okhttp CVE-2021-0341 🚨 HIGH 3.12.12 4.9.2
commons-beanutils:commons-beanutils CVE-2025-48734 🚨 HIGH 1.9.4 1.11.0
commons-io:commons-io CVE-2024-47554 🚨 HIGH 2.8.0 2.14.0
dnsjava:dnsjava CVE-2024-25638 🚨 HIGH 2.1.7 3.6.0
io.netty:netty-codec-http2 CVE-2025-55163 🚨 HIGH 4.1.96.Final 4.2.4.Final, 4.1.124.Final
io.netty:netty-codec-http2 GHSA-xpw8-rcwv-8f8p 🚨 HIGH 4.1.96.Final 4.1.100.Final
io.netty:netty-handler CVE-2025-24970 🚨 HIGH 4.1.96.Final 4.1.118.Final
net.minidev:json-smart CVE-2021-31684 🚨 HIGH 1.3.2 1.3.3, 2.4.4
net.minidev:json-smart CVE-2023-1370 🚨 HIGH 1.3.2 2.4.9
org.apache.avro:avro CVE-2024-47561 🔥 CRITICAL 1.7.7 1.11.4
org.apache.avro:avro CVE-2023-39410 🚨 HIGH 1.7.7 1.11.3
org.apache.derby:derby CVE-2022-46337 🔥 CRITICAL 10.14.2.0 10.14.3, 10.15.2.1, 10.16.1.2, 10.17.1.0
org.apache.ivy:ivy CVE-2022-46751 🚨 HIGH 2.5.1 2.5.2
org.apache.mesos:mesos CVE-2018-1330 🚨 HIGH 1.4.3 1.6.0
org.apache.thrift:libthrift CVE-2019-0205 🚨 HIGH 0.12.0 0.13.0
org.apache.thrift:libthrift CVE-2020-13949 🚨 HIGH 0.12.0 0.14.0
org.apache.zookeeper:zookeeper CVE-2023-44981 🔥 CRITICAL 3.6.3 3.7.2, 3.8.3, 3.9.1
org.eclipse.jetty:jetty-server CVE-2024-13009 🚨 HIGH 9.4.56.v20240826 9.4.57.v20241219
org.lz4:lz4-java CVE-2025-12183 🚨 HIGH 1.8.0 1.8.1

🛡️ TRIVY SCAN RESULT 🛡️

Target: Node.js

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Python

Vulnerabilities (9)

Package Vulnerability ID Severity Installed Version Fixed Version
Werkzeug CVE-2024-34069 🚨 HIGH 2.2.3 3.0.3
aiohttp CVE-2025-69223 🚨 HIGH 3.12.12 3.13.3
aiohttp CVE-2025-69223 🚨 HIGH 3.13.2 3.13.3
deepdiff CVE-2025-58367 🔥 CRITICAL 7.0.1 8.6.1
ray CVE-2025-62593 🔥 CRITICAL 2.47.1 2.52.0
starlette CVE-2025-62727 🚨 HIGH 0.48.0 0.49.1
urllib3 CVE-2025-66418 🚨 HIGH 1.26.20 2.6.0
urllib3 CVE-2025-66471 🚨 HIGH 1.26.20 2.6.0
urllib3 CVE-2026-21441 🚨 HIGH 1.26.20 2.6.3

🛡️ TRIVY SCAN RESULT 🛡️

Target: /etc/ssl/private/ssl-cert-snakeoil.key

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /home/airflow/openmetadata-airflow-apis/openmetadata_managed_apis.egg-info/PKG-INFO

No Vulnerabilities Found

@github-actions
Copy link
Contributor

github-actions bot commented Jan 7, 2026

Jest test Coverage

UI tests summary

Lines Statements Branches Functions
Coverage: 65%
65.25% (52209/80016) 43.14% (25828/59875) 46.53% (8094/17394)

u.usename,
d.datname database_name,
s.query query_text,
s.{time_column_name} duration
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚨 Security: SQL injection via unvalidated queryStatementSource parameter

Details

The queryStatementSource configuration parameter is directly interpolated into SQL queries without any validation or sanitization. This creates a SQL injection vulnerability where a malicious user could execute arbitrary SQL commands.

Impact:

  • A malicious actor with configuration access could inject arbitrary SQL like pg_stat_statements; DROP TABLE users; -- or use UNION-based attacks to extract sensitive data
  • This could lead to data exfiltration, data modification, or denial of service

Locations affected:

  • queries.py line 22-24: {query_statement_source} in POSTGRES_SQL_STATEMENT
  • queries.py line 140: {query_statement_source} in POSTGRES_TEST_GET_QUERIES
  • connection.py line 9-10: Passing unvalidated value to query format
  • query_parser.py line 44-45: Passing unvalidated value to query format

Suggested fix:

  1. Add a validation pattern to the JSON schema to restrict input to valid SQL identifiers:
"queryStatementSource": {
  "type": "string",
  "pattern": "^[a-zA-Z_][a-zA-Z0-9_]*(\\.[a-zA-Z_][a-zA-Z0-9_]*)?$",
  "default": "pg_stat_statements"
}
  1. Additionally, validate/sanitize the value in Python before use:
import re
def validate_query_source(source: str) -> str:
    if not re.match(r'^[a-zA-Z_][a-zA-Z0-9_]*(\.[a-zA-Z_][a-zA-Z0-9_]*)?$', source):
        raise ValueError(f"Invalid query statement source: {source}")
    return source

This mirrors security patterns used elsewhere for similar configurable SQL identifiers.

@gitar-bot
Copy link

gitar-bot bot commented Jan 8, 2026

🔍 CI failure analysis for fb07505: Integration tests fail to compile due to applyTag() signature change from main branch

Issue

Both integration test jobs (integration-tests-postgres-opensearch and integration-tests-mysql-elasticsearch) are failing with compilation errors, not test failures.

Root Cause

Compilation Error in Integration Tests (unrelated to PR changes)

File: openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/OrphanedEntityReferenceIT.java

Problem: The applyTag method signature in CollectionDAO.TagUsageDAO interface has changed (likely from the recent main branch merge), but the test code hasn't been updated to match.

Error at lines 82, 131, and 194:

method applyTag in interface org.openmetadata.service.jdbi3.CollectionDAO.TagUsageDAO cannot be applied to given types;
  required: int,String,String,String,int,int,String,String
  found:    int,String,String,String,int,int,<nulltype>
  reason: actual and formal argument lists differ in length

The method now requires 8 parameters but the test code is passing only 7 parameters (with the 7th being null).

Details

This is a breaking API change introduced in the main branch, most likely from one of these recent commits merged into this PR:

The "Tagging explanation" commit (c66d9eebf6) is the most likely culprit as it relates to tagging functionality.

Solution

The OrphanedEntityReferenceIT.java test file needs to be updated to pass the correct number of parameters to applyTag(). The missing 8th parameter needs to be identified from the interface definition and added to all three call sites (lines 82, 131, 194).

This is completely unrelated to the Postgres queryStatementSource feature - it's a merge conflict/compatibility issue from main branch changes.

Code Review 🚫 Blocked

Feature adds configurable query source for Postgres lineage but introduces a SQL injection vulnerability due to unvalidated user input being directly interpolated into SQL queries.

🚨 Security: SQL injection via unvalidated queryStatementSource parameter

📄 ingestion/src/metadata/ingestion/source/database/postgres/queries.py:22 📄 ingestion/src/metadata/ingestion/source/database/postgres/queries.py:140 📄 openmetadata-spec/src/main/resources/json/schema/entity/services/connections/database/postgresConnection.json:109 🔗 CWE-89: SQL Injection

The queryStatementSource parameter is directly interpolated into SQL queries via Python's .format() method without any validation or sanitization. This creates a SQL injection vulnerability.

Attack vector:
A user could provide a malicious value like:

"queryStatementSource": "pg_stat_statements; DROP TABLE users; --"

Impact:

  • Remote code execution via SQL injection
  • Data exfiltration, modification, or deletion
  • Complete database compromise

Affected code:

  • queries.py line 22: {query_statement_source} s in POSTGRES_SQL_STATEMENT
  • queries.py line 140: {query_statement_source} s in POSTGRES_TEST_GET_QUERIES
  • The JSON schema at postgresConnection.json line 109 defines the field as a plain string with no pattern validation

Recommended fix:

  1. Add a strict regex pattern in the JSON schema to only allow valid SQL identifiers:
"queryStatementSource": {
  "type": "string",
  "pattern": "^[a-zA-Z_][a-zA-Z0-9_]*(\\.[a-zA-Z_][a-zA-Z0-9_]*)?$",
  "default": "pg_stat_statements"
}
  1. Additionally, add server-side validation in Python to double-check the format before interpolation:
import re
if not re.match(r'^[a-zA-Z_][a-zA-Z0-9_]*(\.[a-zA-Z_][a-zA-Z0-9_]*)?$', query_statement_source):
    raise ValueError("Invalid queryStatementSource format")

This mirrors security patterns used in other database connectors that accept table/view names as configuration.

🚨 Security: SQL injection via unvalidated queryStatementSource parameter

📄 ingestion/src/metadata/ingestion/source/database/postgres/queries.py:22 📄 ingestion/src/metadata/ingestion/source/database/postgres/queries.py:140 📄 openmetadata-spec/src/main/resources/json/schema/entity/services/connections/database/postgresConnection.json:109 🔗 CWE-89: SQL Injection

The queryStatementSource configuration parameter is directly interpolated into SQL queries without any validation or sanitization. This creates a SQL injection vulnerability where a malicious user could execute arbitrary SQL commands.

Impact:

  • A malicious actor with configuration access could inject arbitrary SQL like pg_stat_statements; DROP TABLE users; -- or use UNION-based attacks to extract sensitive data
  • This could lead to data exfiltration, data modification, or denial of service

Locations affected:

  • queries.py line 22-24: {query_statement_source} in POSTGRES_SQL_STATEMENT
  • queries.py line 140: {query_statement_source} in POSTGRES_TEST_GET_QUERIES
  • connection.py line 9-10: Passing unvalidated value to query format
  • query_parser.py line 44-45: Passing unvalidated value to query format

Suggested fix:

  1. Add a validation pattern to the JSON schema to restrict input to valid SQL identifiers:
"queryStatementSource": {
  "type": "string",
  "pattern": "^[a-zA-Z_][a-zA-Z0-9_]*(\\.[a-zA-Z_][a-zA-Z0-9_]*)?$",
  "default": "pg_stat_statements"
}
  1. Additionally, validate/sanitize the value in Python before use:
import re
def validate_query_source(source: str) -> str:
    if not re.match(r'^[a-zA-Z_][a-zA-Z0-9_]*(\.[a-zA-Z_][a-zA-Z0-9_]*)?$', source):
        raise ValueError(f"Invalid query statement source: {source}")
    return source

This mirrors security patterns used elsewhere for similar configurable SQL identifiers.

🚨 Security: SQL injection via unvalidated queryStatementSource parameter

📄 ingestion/src/metadata/ingestion/source/database/postgres/queries.py:22 📄 ingestion/src/metadata/ingestion/source/database/postgres/queries.py:140 📄 openmetadata-spec/src/main/resources/json/schema/entity/services/connections/database/postgresConnection.json:109 🔗 CWE-89: SQL Injection

The queryStatementSource configuration value is directly interpolated into SQL queries using Python string formatting ({query_statement_source}) without any validation or sanitization. A malicious user could provide a value like pg_stat_statements; DROP TABLE users; -- which would be directly inserted into the SQL query, potentially allowing arbitrary SQL execution.

Impact: An attacker with access to configuration could execute arbitrary SQL commands against the database, leading to data theft, corruption, or complete database compromise.

Suggested fix: Add input validation to ensure queryStatementSource only contains valid identifier characters (alphanumeric, underscores, dots for schema qualification). Consider using a regex pattern like ^[a-zA-Z_][a-zA-Z0-9_]*(\.[a-zA-Z_][a-zA-Z0-9_]*)*$ in the JSON schema validation:

"queryStatementSource": {
  "title": "Query Statement Source",
  "description": "...",
  "type": "string",
  "pattern": "^[a-zA-Z_][a-zA-Z0-9_]*(\\.[a-zA-Z_][a-zA-Z0-9_]*)*$",
  "default": "pg_stat_statements"
}

Additionally, consider using proper identifier quoting in Python code as a defense-in-depth measure.

What Works Well

The implementation follows existing patterns in the codebase (similar to Snowflake connector's accountUsageSchema), includes comprehensive unit tests for both default and custom source scenarios, and provides thorough documentation for users.

Recommendations

Add input validation at both the schema level (JSON Schema pattern constraint) and application level (Python regex validation) to ensure the queryStatementSource only contains valid SQL identifier patterns (e.g., ^[a-zA-Z_][a-zA-Z0-9_]*(\.[a-zA-Z_][a-zA-Z0-9_]*)?$). This prevents malicious SQL injection while still allowing legitimate schema-qualified view names.

Tip

Comment Gitar fix CI or enable auto-apply: gitar auto-apply:on

Options

Auto-apply is off Gitar will not commit updates to this branch.
Display: compact Hiding non-applicable rules.

Comment with these commands to change:

Auto-apply Compact
gitar auto-apply:on         
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | This comment will update automatically (Docs)

u.usename,
d.datname database_name,
s.query query_text,
s.{time_column_name} duration
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚨 Security: SQL injection via unvalidated queryStatementSource parameter

Details

The queryStatementSource parameter is directly interpolated into SQL queries via Python's .format() method without any validation or sanitization. This creates a SQL injection vulnerability.

Attack vector:
A user could provide a malicious value like:

"queryStatementSource": "pg_stat_statements; DROP TABLE users; --"

Impact:

  • Remote code execution via SQL injection
  • Data exfiltration, modification, or deletion
  • Complete database compromise

Affected code:

  • queries.py line 22: {query_statement_source} s in POSTGRES_SQL_STATEMENT
  • queries.py line 140: {query_statement_source} s in POSTGRES_TEST_GET_QUERIES
  • The JSON schema at postgresConnection.json line 109 defines the field as a plain string with no pattern validation

Recommended fix:

  1. Add a strict regex pattern in the JSON schema to only allow valid SQL identifiers:
"queryStatementSource": {
  "type": "string",
  "pattern": "^[a-zA-Z_][a-zA-Z0-9_]*(\\.[a-zA-Z_][a-zA-Z0-9_]*)?$",
  "default": "pg_stat_statements"
}
  1. Additionally, add server-side validation in Python to double-check the format before interpolation:
import re
if not re.match(r'^[a-zA-Z_][a-zA-Z0-9_]*(\.[a-zA-Z_][a-zA-Z0-9_]*)?$', query_statement_source):
    raise ValueError("Invalid queryStatementSource format")

This mirrors security patterns used in other database connectors that accept table/view names as configuration.

@sonarqubecloud
Copy link

sonarqubecloud bot commented Jan 8, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Support for Custom pg_stat_statements View in Postgres Lineage Ingestion

4 participants