Skip to content

Fix AWS OpenSearch Serverless AOSS support and search cluster stability#27618

Open
aniruddhaadak80 wants to merge 12 commits intoopen-metadata:mainfrom
aniruddhaadak80:fix-aoss-cluster-stats-27599
Open

Fix AWS OpenSearch Serverless AOSS support and search cluster stability#27618
aniruddhaadak80 wants to merge 12 commits intoopen-metadata:mainfrom
aniruddhaadak80:fix-aoss-cluster-stats-27599

Conversation

@aniruddhaadak80
Copy link
Copy Markdown

@aniruddhaadak80 aniruddhaadak80 commented Apr 22, 2026

Description

This PR resolves #27599 by implementing AWS OpenSearch Serverless AOSS detection and search backend stability fixes.

Key Changes

  • AOSS Detection: Added logic to OpenSearchClient.java to detect AOSS environments via hostname patterns or the SEARCH_AWS_SERVICE_NAME environment variable.
  • API Gating: Gated unsupported cluster-level API calls like cluster stats and nodes stats in OpenSearchGenericManager.java to prevent spurious error logs in AOSS.
  • Health Check Optimization: Replaced cluster health calls with client.info() (the GET root endpoint) for AOSS deployments to ensure correct health status reporting.
  • Null-Safety Improvements: Implemented guards in SearchClusterMetrics.java and SearchIndexClusterValidator.java to handle missing cluster statistics gracefully.
  • Integration Test Fix: Updated TestCaseResourceIT.java to handle 404 Not Found responses when listing resolution statuses for hard-deleted test cases, improving CI reliability.

@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Comment on lines +79 to 86
int totalNodes = clusterStats != null && clusterStats.nodes() != null && clusterStats.nodes().count() != null
? clusterStats.nodes().count().total()
: 1;
int totalShards = clusterStats != null && clusterStats.indices() != null && clusterStats.indices().shards() != null && clusterStats.indices().shards().total() != null
? clusterStats.indices().shards().total().intValue()
: 0;

int maxShardsPerNode = getMaxShardsPerNode(client);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Edge Case: SearchIndexClusterValidator also calls clusterStats but not clusterSettings

In SearchIndexClusterValidator.java, clusterStats is now null-guarded (lines 79-84), but getMaxShardsPerNode(client) is called on line 86. If that method internally calls clusterSettings() or another unsupported AOSS API, it will also fail. Worth verifying that the full code path in this validator is AOSS-safe.

Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion

@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented Apr 22, 2026

Code Review 👍 Approved with suggestions 0 resolved / 1 findings

Eliminates spurious error logs in AWS OpenSearch Serverless by handling unsupported /_cluster/stats calls. Please also address the missing clusterSettings validation in SearchIndexClusterValidator.

💡 Edge Case: SearchIndexClusterValidator also calls clusterStats but not clusterSettings

📄 openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/SearchIndexClusterValidator.java:79-86

In SearchIndexClusterValidator.java, clusterStats is now null-guarded (lines 79-84), but getMaxShardsPerNode(client) is called on line 86. If that method internally calls clusterSettings() or another unsupported AOSS API, it will also fail. Worth verifying that the full code path in this validator is AOSS-safe.

🤖 Prompt for agents
Code Review: Eliminates spurious error logs in AWS OpenSearch Serverless by handling unsupported /_cluster/stats calls. Please also address the missing clusterSettings validation in SearchIndexClusterValidator.

1. 💡 Edge Case: SearchIndexClusterValidator also calls clusterStats but not clusterSettings
   Files: openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/SearchIndexClusterValidator.java:79-86

   In `SearchIndexClusterValidator.java`, `clusterStats` is now null-guarded (lines 79-84), but `getMaxShardsPerNode(client)` is called on line 86. If that method internally calls `clusterSettings()` or another unsupported AOSS API, it will also fail. Worth verifying that the full code path in this validator is AOSS-safe.

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

@aniruddhaadak80 aniruddhaadak80 changed the title Fix: Unsupported /_cluster/stats call causes spurious ERROR logs on AWS OpenSearch Serverless (AOSS) Fix AWS OpenSearch Serverless (AOSS) support and search cluster stability Apr 22, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Reduces false error logs and incorrect UNHEALTHY reporting when OpenMetadata is configured to use AWS OpenSearch Serverless (AOSS) by skipping unsupported cluster APIs and adding null-safe metric handling; also adjusts hard-delete behavior for TestCase children and adds an integration test to validate it.

Changes:

  • Detect AOSS in OpenSearchClient and propagate an isAoss flag to OpenSearchGenericManager.
  • Skip unsupported OpenSearch endpoints on AOSS (/_cluster/stats, /_nodes/stats, and /_cluster/health) and safely handle null cluster stats in metric/capacity calculators.
  • Fix TestCase hard-delete child cleanup behavior and add an IT to verify results/resolution statuses are removed.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
openmetadata-service/src/main/java/org/openmetadata/service/search/opensearch/OpenSearchGenericManager.java Adds isAoss handling to skip unsupported APIs and uses GET / for AOSS health checks.
openmetadata-service/src/main/java/org/openmetadata/service/search/opensearch/OpenSearchClient.java Implements AOSS detection and passes the flag into OpenSearchGenericManager.
openmetadata-service/src/main/java/org/openmetadata/service/search/SearchClusterMetrics.java Adds null-safe defaults when clusterStats() can be null (e.g., AOSS).
openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/TestCaseRepository.java Fixes hard-delete child deletion loop and removes redundant async deletion.
openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/SearchIndexClusterValidator.java Adds null-safe defaults for capacity computation when clusterStats() is null.
openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/TestCaseResourceIT.java Adds an integration test ensuring hard delete removes results and resolution statuses.

Comment on lines +116 to +121
boolean isAoss = false;
if (config != null && config.getHost() != null && config.getHost().endsWith(".aoss.amazonaws.com")) {
isAoss = true;
} else if (awsConfig != null && "aoss".equals(awsConfig.getServiceName())) {
isAoss = true;
}
Comment on lines +279 to +282
if (isAoss) {
LOG.debug("Skipping cluster stats fetch — AWS OpenSearch Serverless does not support /_cluster/stats");
return null;
}
Comment on lines +78 to +83
int totalNodes = clusterStats != null && clusterStats.nodes() != null && clusterStats.nodes().count() != null
? clusterStats.nodes().count().total()
: 1;
int totalShards = clusterStats != null && clusterStats.indices() != null && clusterStats.indices().shards() != null && clusterStats.indices().shards().total() != null
? clusterStats.indices().shards().total().intValue()
: 0;
Comment on lines 896 to +907
protected void deleteChildren(
List<CollectionDAO.EntityRelationshipRecord> children, boolean hardDelete, String updatedBy) {
if (hardDelete) {
TestCaseResolutionStatusRepository testCaseResolutionStatusRepository =
(TestCaseResolutionStatusRepository)
Entity.getEntityTimeSeriesRepository(Entity.TEST_CASE_RESOLUTION_STATUS);
for (CollectionDAO.EntityRelationshipRecord entityRelationshipRecord : children) {
LOG.info(
"Recursively {} deleting {} {}",
hardDelete ? "hard" : "soft",
"Recursively hard deleting {} {}",
entityRelationshipRecord.getType(),
entityRelationshipRecord.getId());
TestCaseResolutionStatusRepository testCaseResolutionStatusRepository =
(TestCaseResolutionStatusRepository)
Entity.getEntityTimeSeriesRepository(Entity.TEST_CASE_RESOLUTION_STATUS);
for (CollectionDAO.EntityRelationshipRecord child : children) {
testCaseResolutionStatusRepository.deleteById(child.getId(), hardDelete);
}
testCaseResolutionStatusRepository.deleteById(entityRelationshipRecord.getId(), hardDelete);
@aniruddhaadak80 aniruddhaadak80 changed the title Fix AWS OpenSearch Serverless (AOSS) support and search cluster stability Fix AWS OpenSearch Serverless AOSS support and search cluster stability Apr 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OpenSearchGenericManager: Unsupported /_cluster/stats call causes spurious ERROR logs on AWS OpenSearch Serverless (AOSS)

2 participants