Skip to content

Fix RDS IAM Cross Account Auth and Clarify Dev Container Docs#27632

Open
aniruddhaadak80 wants to merge 5 commits intoopen-metadata:mainfrom
aniruddhaadak80:fix-rds-iam-and-dev-docs-27552-27517
Open

Fix RDS IAM Cross Account Auth and Clarify Dev Container Docs#27632
aniruddhaadak80 wants to merge 5 commits intoopen-metadata:mainfrom
aniruddhaadak80:fix-rds-iam-and-dev-docs-27552-27517

Conversation

@aniruddhaadak80
Copy link
Copy Markdown

@aniruddhaadak80 aniruddhaadak80 commented Apr 22, 2026

This PR addresses two issues: #27552 and #27517. It implements support for the optional assumeRoleArn parameter in AwsRdsDatabaseAuthenticationProvider.java to enable cross-account IAM authentication for RDS. It also enhances the DEVELOPER.md documentation with a dedicated section for Dev Containers and adds clarifying comments to .devcontainer configs to clarify that post-create.sh is the primary initialization script.


Summary by Gitar

  • Test Case cleanup:
    • Added postDelete logic in TestCaseRepository to perform hard-delete for associated results and resolution statuses.
  • Search index stability:
    • Updated SearchIndexClusterValidator and SearchClusterMetrics to include null-checks for clusterStats responses, preventing potential runtime exceptions.

This will update automatically on new commits.

Copilot AI review requested due to automatic review settings April 22, 2026 14:14
@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds cross-account support for AWS RDS IAM auth token generation by optionally assuming an STS role, and updates developer documentation/devcontainer configs to clarify Dev Container initialization.

Changes:

  • Add optional assumeRoleArn JDBC query param support in AwsRdsDatabaseAuthenticationProvider using STS assume-role credentials.
  • Document Dev Container workflows in DEVELOPER.md, clarifying that post-create.sh is the one-time initialization script shared by both devcontainer modes.
  • Add clarifying inline notes to Dev Container postCreateCommand configuration.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
openmetadata-service/src/main/java/org/openmetadata/service/util/jdbi/AwsRdsDatabaseAuthenticationProvider.java Add optional STS assume-role credentials provider for cross-account RDS IAM token generation.
DEVELOPER.md Add Dev Containers section clarifying the two configs and initialization flow.
.devcontainer/full-stack/devcontainer.json Add clarification near postCreateCommand (but currently via JSON comment).
.devcontainer/dev/devcontainer.json Add clarification near postCreateCommand (but currently via JSON comment).

Comment on lines +46 to +66
AwsCredentialsProvider credentialsProvider = DefaultCredentialsProvider.create();

if (assumeRoleArn != null) {
StsClient stsClient =
StsClient.builder()
.region(Region.of(awsRegion))
.credentialsProvider(credentialsProvider)
.build();

AssumeRoleRequest assumeRoleRequest =
AssumeRoleRequest.builder()
.roleArn(assumeRoleArn)
.roleSessionName("OpenMetadata-RDS-IAM-Auth")
.build();

credentialsProvider =
StsAssumeRoleCredentialsProvider.builder()
.stsClient(stsClient)
.refreshRequest(assumeRoleRequest)
.build();
}
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assumeRoleArn triggers creating a new StsClient + StsAssumeRoleCredentialsProvider on every authenticate() call. In the IAM-auth path this runs per DB connection, so this will repeatedly call STS (latency + throttling risk) and also leaves the StsClient/provider unclosed, potentially leaking HTTP resources/threads. Consider constructing and reusing an assume-role credentials provider (e.g., cached by awsRegion+assumeRoleArn or initialized once per pool) and ensuring any SDK clients/providers are closed on shutdown.

Copilot uses AI. Check for mistakes.
Comment on lines +48 to +59
if (assumeRoleArn != null) {
StsClient stsClient =
StsClient.builder()
.region(Region.of(awsRegion))
.credentialsProvider(credentialsProvider)
.build();

AssumeRoleRequest assumeRoleRequest =
AssumeRoleRequest.builder()
.roleArn(assumeRoleArn)
.roleSessionName("OpenMetadata-RDS-IAM-Auth")
.build();
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check if (assumeRoleArn != null) will attempt an STS assume-role even when the query param is present but empty/whitespace (e.g. assumeRoleArn=), which will fail with an AWS SDK validation error. Treat blank values as “not provided” (e.g., check isBlank() and skip) or raise a clear configuration error.

Copilot uses AI. Check for mistakes.
Comment on lines +49 to +66
StsClient stsClient =
StsClient.builder()
.region(Region.of(awsRegion))
.credentialsProvider(credentialsProvider)
.build();

AssumeRoleRequest assumeRoleRequest =
AssumeRoleRequest.builder()
.roleArn(assumeRoleArn)
.roleSessionName("OpenMetadata-RDS-IAM-Auth")
.build();

credentialsProvider =
StsAssumeRoleCredentialsProvider.builder()
.stsClient(stsClient)
.refreshRequest(assumeRoleRequest)
.build();
}
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the new STS assume-role path, failures (STS call errors, invalid role ARN, missing permissions, etc.) will throw AWS SDK runtime exceptions that currently bypass the catch (MalformedURLException e) and won’t be wrapped in DatabaseAuthenticationProviderException. Consider catching broader exceptions in authenticate() and wrapping them consistently to keep error handling aligned with other providers (e.g., AzureDatabaseAuthenticationProvider).

Copilot uses AI. Check for mistakes.
"version": "22.17.0"
}
},
// Use post-create script for one-time environment initialization (ANTLR, dependencies, venv)
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

devcontainer.json is parsed by the repo’s pre-commit check-json hook (strict JSON). Adding // comments makes this file invalid JSON and will cause the hook/CI to fail. Please remove the comment or replace it with a normal JSON field (e.g., a _comment property) if you want to keep the explanation in-file.

Suggested change
// Use post-create script for one-time environment initialization (ANTLR, dependencies, venv)
"_comment": "Use post-create script for one-time environment initialization (ANTLR, dependencies, venv)",

Copilot uses AI. Check for mistakes.
Comment thread .devcontainer/dev/devcontainer.json Outdated
"version": "22.17.0"
}
},
// Use post-create script for one-time environment initialization (ANTLR, dependencies, venv)
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

devcontainer.json is parsed by the repo’s pre-commit check-json hook (strict JSON). Adding // comments makes this file invalid JSON and will cause the hook/CI to fail. Please remove the comment or replace it with a normal JSON field (e.g., a _comment property) if you want to keep the explanation in-file.

Suggested change
// Use post-create script for one-time environment initialization (ANTLR, dependencies, venv)
"_comment": "Use post-create script for one-time environment initialization (ANTLR, dependencies, venv)",

Copilot uses AI. Check for mistakes.
@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Copilot AI review requested due to automatic review settings April 22, 2026 15:45
@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

Comment on lines +47 to +54
if (CommonUtil.nullOrEmpty(awsRegion)) {
throw new DatabaseAuthenticationProviderException(
"Parameter `awsRegion` shall be provided in the jdbc url.");
}
if (CommonUtil.nullOrEmpty(allowPublicKeyRetrieval)) {
throw new DatabaseAuthenticationProviderException(
"Parameter `allowPublicKeyRetrieval` shall be provided in the jdbc url.");
}
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CommonUtil.nullOrEmpty only checks isEmpty() and will treat whitespace-only values as present. That means values like awsRegion=%20 or assumeRoleArn=%20 will pass validation and then fail later (e.g., Region.of(" ") or STS AssumeRole with an invalid ARN), producing a harder-to-diagnose error. Consider validating these parameters with a blank-aware check (e.g., trim + empty, or StringUtils.isBlank) so whitespace-only inputs are rejected as missing.

Copilot uses AI. Check for mistakes.
@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Comment on lines +1085 to +1099
private static boolean checkIsAoss(ElasticSearchConfiguration config) {
String hostConfig = config != null ? config.getHost() : null;
if (StringUtils.isBlank(hostConfig)) {
return false;
}
for (String host : hostConfig.split(",")) {
String trimmedHost = host.trim().toLowerCase();
if (trimmedHost.isEmpty()) {
continue;
}

String hostname = trimmedHost;
try {
// Add protocol if missing to make URI parsing easier
String uriString = trimmedHost.contains("://") ? trimmedHost : "https://" + trimmedHost;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Edge Case: AOSS detection misses SEARCH_AWS_SERVICE_NAME=aoss config

The checkIsAoss method only checks if the hostname ends with .aoss.amazonaws.com. The PR description mentions also checking SEARCH_AWS_SERVICE_NAME=aoss as a detection mechanism. Users connecting through a proxy, VPN endpoint, or custom DNS CNAME would have a hostname that doesn't match the .aoss.amazonaws.com suffix, causing AOSS to go undetected and cluster-level API calls to fail.

Consider also checking the AWS service name configuration (if available in ElasticSearchConfiguration or AwsConfiguration) as a secondary signal.

Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion

Copilot AI review requested due to automatic review settings April 23, 2026 09:06
@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Comment on lines +1534 to +1544
@Override
protected void postDelete(TestCase entity, boolean hardDelete) {
super.postDelete(entity, hardDelete);
if (hardDelete) {
// Delete test case results and resolution statuses
Entity.getEntityTimeSeriesRepository(Entity.TEST_CASE_RESULT)
.delete(entity.getFullyQualifiedName());
Entity.getEntityTimeSeriesRepository(Entity.TEST_CASE_RESOLUTION_STATUS)
.delete(entity.getFullyQualifiedName());
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚨 Bug: Duplicate postDelete method will cause compilation error

The class TestCaseRepository already defines postDelete(TestCase, boolean) at line 862 (which calls updateTestSuite(testCase)). The new postDelete added at line 1534 has the identical signature. Java does not allow two methods with the same name and parameter types in the same class — this will fail to compile.

Additionally, even if one were removed, the existing postDelete at line 862 calls updateTestSuite() which is needed to keep test suite counts in sync. The new method's hard-delete cleanup logic must be merged into the existing method, not added as a separate override.

Suggested fix:

Merge the hard-delete cleanup into the existing postDelete at line 862:

@Override
protected void postDelete(TestCase testCase, boolean hardDelete) {
  super.postDelete(testCase, hardDelete);
  updateTestSuite(testCase);
  if (hardDelete) {
    Entity.getEntityTimeSeriesRepository(Entity.TEST_CASE_RESULT)
        .delete(testCase.getFullyQualifiedName());
    Entity.getEntityTimeSeriesRepository(Entity.TEST_CASE_RESOLUTION_STATUS)
        .delete(testCase.getFullyQualifiedName());
  }
}

Then remove the duplicate method at line 1534.

Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion

Comment on lines +90 to +93
totalShards =
(clusterStats.indices() != null && clusterStats.indices().shards() != null)
? clusterStats.indices().shards().total().intValue()
: 0;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Bug: Null check on shards().total() was dropped, risking NPE

The old code guarded against shards().total() being null before calling .intValue(). The refactored code adds null checks for indices() and shards() but drops the existing null check on total(), which returns a nullable Long. If the OpenSearch API returns a response where shards is present but total is null, this will throw a NullPointerException on .intValue().

This affects both SearchIndexClusterValidator.java and SearchClusterMetrics.java with the identical pattern.

Suggested fix:

Add back the null check on total():

totalShards =
    (clusterStats.indices() != null
        && clusterStats.indices().shards() != null
        && clusterStats.indices().shards().total() != null)
        ? clusterStats.indices().shards().total().intValue()
        : 0;

Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion

@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented Apr 23, 2026

Code Review 🚫 Blocked 2 resolved / 5 findings

RDS IAM cross-account authentication and Dev Container documentation are updated, addressing resource leaks in StsClient. The build is currently broken due to a duplicate postDelete method and a missing null check, and AOSS detection requires additional configuration coverage.

🚨 Bug: Duplicate postDelete method will cause compilation error

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/TestCaseRepository.java:1534-1544 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/TestCaseRepository.java:862-866

The class TestCaseRepository already defines postDelete(TestCase, boolean) at line 862 (which calls updateTestSuite(testCase)). The new postDelete added at line 1534 has the identical signature. Java does not allow two methods with the same name and parameter types in the same class — this will fail to compile.

Additionally, even if one were removed, the existing postDelete at line 862 calls updateTestSuite() which is needed to keep test suite counts in sync. The new method's hard-delete cleanup logic must be merged into the existing method, not added as a separate override.

Suggested fix
Merge the hard-delete cleanup into the existing postDelete at line 862:

@Override
protected void postDelete(TestCase testCase, boolean hardDelete) {
  super.postDelete(testCase, hardDelete);
  updateTestSuite(testCase);
  if (hardDelete) {
    Entity.getEntityTimeSeriesRepository(Entity.TEST_CASE_RESULT)
        .delete(testCase.getFullyQualifiedName());
    Entity.getEntityTimeSeriesRepository(Entity.TEST_CASE_RESOLUTION_STATUS)
        .delete(testCase.getFullyQualifiedName());
  }
}

Then remove the duplicate method at line 1534.
⚠️ Bug: Null check on shards().total() was dropped, risking NPE

📄 openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/SearchIndexClusterValidator.java:90-93 📄 openmetadata-service/src/main/java/org/openmetadata/service/search/SearchClusterMetrics.java:87-90

The old code guarded against shards().total() being null before calling .intValue(). The refactored code adds null checks for indices() and shards() but drops the existing null check on total(), which returns a nullable Long. If the OpenSearch API returns a response where shards is present but total is null, this will throw a NullPointerException on .intValue().

This affects both SearchIndexClusterValidator.java and SearchClusterMetrics.java with the identical pattern.

Suggested fix
Add back the null check on total():

totalShards =
    (clusterStats.indices() != null
        && clusterStats.indices().shards() != null
        && clusterStats.indices().shards().total() != null)
        ? clusterStats.indices().shards().total().intValue()
        : 0;
💡 Edge Case: AOSS detection misses SEARCH_AWS_SERVICE_NAME=aoss config

📄 openmetadata-service/src/main/java/org/openmetadata/service/search/opensearch/OpenSearchClient.java:1085-1099

The checkIsAoss method only checks if the hostname ends with .aoss.amazonaws.com. The PR description mentions also checking SEARCH_AWS_SERVICE_NAME=aoss as a detection mechanism. Users connecting through a proxy, VPN endpoint, or custom DNS CNAME would have a hostname that doesn't match the .aoss.amazonaws.com suffix, causing AOSS to go undetected and cluster-level API calls to fail.

Consider also checking the AWS service name configuration (if available in ElasticSearchConfiguration or AwsConfiguration) as a secondary signal.

✅ 2 resolved
Bug: StsClient is never closed, causing resource leak on every auth call

📄 openmetadata-service/src/main/java/org/openmetadata/service/util/jdbi/AwsRdsDatabaseAuthenticationProvider.java:49-63
The StsClient created at line 49 is never closed. Since authenticate() is called by HikariCP every time a new database connection is obtained from the pool, this leaks an HTTP client and its associated resources (threads, connections) on every invocation.

Additionally, the StsAssumeRoleCredentialsProvider wrapping it is also never closed.

This will gradually exhaust file descriptors and memory under sustained load.

Performance: StsClient and STS credentials provider recreated on every connection

📄 openmetadata-service/src/main/java/org/openmetadata/service/util/jdbi/AwsRdsDatabaseAuthenticationProvider.java:46-60
Every call to authenticate() creates a new StsClient and StsAssumeRoleCredentialsProvider. These are heavyweight objects involving HTTP client setup and STS API calls. Since authenticate() is invoked on every HikariCP connection checkout, this adds significant latency and unnecessary STS API calls (which are also subject to throttling).

StsAssumeRoleCredentialsProvider already handles credential caching and refresh internally — it's designed to be long-lived.

🤖 Prompt for agents
Code Review: RDS IAM cross-account authentication and Dev Container documentation are updated, addressing resource leaks in StsClient. The build is currently broken due to a duplicate postDelete method and a missing null check, and AOSS detection requires additional configuration coverage.

1. 💡 Edge Case: AOSS detection misses SEARCH_AWS_SERVICE_NAME=aoss config
   Files: openmetadata-service/src/main/java/org/openmetadata/service/search/opensearch/OpenSearchClient.java:1085-1099

   The `checkIsAoss` method only checks if the hostname ends with `.aoss.amazonaws.com`. The PR description mentions also checking `SEARCH_AWS_SERVICE_NAME=aoss` as a detection mechanism. Users connecting through a proxy, VPN endpoint, or custom DNS CNAME would have a hostname that doesn't match the `.aoss.amazonaws.com` suffix, causing AOSS to go undetected and cluster-level API calls to fail.
   
   Consider also checking the AWS service name configuration (if available in `ElasticSearchConfiguration` or `AwsConfiguration`) as a secondary signal.

2. 🚨 Bug: Duplicate postDelete method will cause compilation error
   Files: openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/TestCaseRepository.java:1534-1544, openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/TestCaseRepository.java:862-866

   The class `TestCaseRepository` already defines `postDelete(TestCase, boolean)` at line 862 (which calls `updateTestSuite(testCase)`). The new `postDelete` added at line 1534 has the identical signature. Java does not allow two methods with the same name and parameter types in the same class — this will fail to compile.
   
   Additionally, even if one were removed, the existing `postDelete` at line 862 calls `updateTestSuite()` which is needed to keep test suite counts in sync. The new method's hard-delete cleanup logic must be merged into the existing method, not added as a separate override.

   Suggested fix:
   Merge the hard-delete cleanup into the existing postDelete at line 862:
   
   @Override
   protected void postDelete(TestCase testCase, boolean hardDelete) {
     super.postDelete(testCase, hardDelete);
     updateTestSuite(testCase);
     if (hardDelete) {
       Entity.getEntityTimeSeriesRepository(Entity.TEST_CASE_RESULT)
           .delete(testCase.getFullyQualifiedName());
       Entity.getEntityTimeSeriesRepository(Entity.TEST_CASE_RESOLUTION_STATUS)
           .delete(testCase.getFullyQualifiedName());
     }
   }
   
   Then remove the duplicate method at line 1534.

3. ⚠️ Bug: Null check on shards().total() was dropped, risking NPE
   Files: openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/SearchIndexClusterValidator.java:90-93, openmetadata-service/src/main/java/org/openmetadata/service/search/SearchClusterMetrics.java:87-90

   The old code guarded against `shards().total()` being null before calling `.intValue()`. The refactored code adds null checks for `indices()` and `shards()` but drops the existing null check on `total()`, which returns a nullable `Long`. If the OpenSearch API returns a response where `shards` is present but `total` is null, this will throw a `NullPointerException` on `.intValue()`.
   
   This affects both `SearchIndexClusterValidator.java` and `SearchClusterMetrics.java` with the identical pattern.

   Suggested fix:
   Add back the null check on total():
   
   totalShards =
       (clusterStats.indices() != null
           && clusterStats.indices().shards() != null
           && clusterStats.indices().shards().total() != null)
           ? clusterStats.indices().shards().total().intValue()
           : 0;

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants