test(search): de-flake nightly search-it & ui-it reindex tests#29188
Open
mohityadav766 wants to merge 1 commit into
Open
test(search): de-flake nightly search-it & ui-it reindex tests#29188mohityadav766 wants to merge 1 commit into
mohityadav766 wants to merge 1 commit into
Conversation
The JavaUIIT nightly workflow (search-it-nightly + ui-it-nightly) was failing on nearly every PR independent of content — timing/scope races in the #28637 reindex test suite, not product regressions. No production code changed; no assertions weakened (two made stricter). - DbToEsCountReconciliationIT: wrap the cluster-wide per-type DB<->ES count check in a converge-poll so it absorbs post-reindex engine-refresh lag; keeps the umbrella, a real indexer regression still never converges. - DistributedAutoTuneReindexUIIT: replace the fuzzy /v1/search/query?q=prefix cohort count (matched other parallel tests' entities -> 6900 vs 1500) with a strict name.keyword prefix count, cluster-alias-aware index resolution, and a converge-poll; remove Thread.sleep. - SimpleReindexTriggerUIIT, SelectiveFieldReindexUIIT, LongCompoundNameSearchUIIT, PipelineOwnerIndexUIIT: Awaitility .ignoreNoExceptions() -> .ignoreExceptions() so a transient Playwright/search error retries instead of aborting the poll. - SelectiveFieldReindexUIIT: move the two heavy global /data-quality page checks to direct testCase/testSuite search-index presence (the contract the DQ list is backed by); harden LongCompoundName's Explore probe into a re-navigating poll. SearchAvailable{,AllKinds}DuringReindexUIIT intentionally keep .ignoreNoExceptions() (fail-fast on a zero-downtime violation). Verified: each test green on local testcontainers Postgres + Elasticsearch. Fixes #29187 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
❌ PR checklist incompleteThis PR cannot be merged until the following are addressed on its linked issue:
The fields live on the linked issue in the Shipping project (open the issue → right sidebar → Projects). After you set them, re-run this check (or push a commit) — issue/project changes do not re-trigger it automatically. Maintainers can bypass this check by adding the |
Code Review ✅ ApprovedReplaces flaky polling and hardcoded sleeps with robust converge-polling and strict keyword matching in search integration tests. No issues found. OptionsDisplay: compact → Showing less information. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Describe your changes:
Fixes #29187
The
JavaUIIT Integration Tests (Nightly)workflow (search-it-nightly+ui-it-nightly) was failing on nearly every PR independent of content (it even failed on a no-op version-bump PR). These are timing/scope races in the #28637 reindex test suite — test-side only: no product behavior changed and no assertions weakened (two were made stricter).Root causes & fixes
DbToEsCountReconciliationITSuccess; engine refresh lags →es < dbDistributedAutoTuneReindexUIITq=<prefix>count also matched other parallelui-ittests' entities (6900vs1500)name.keywordprefix count, cluster-alias-aware index resolution, converge-poll, removedThread.sleepSimpleReindexTriggerUIIT,SelectiveFieldReindexUIIT,LongCompoundNameSearchUIIT,PipelineOwnerIndexUIIT.ignoreNoExceptions()aborted the poll on a transient Playwright/search error.ignoreExceptions()so transient errors retry on the next tickSelectiveFieldReindexUIIT/data-qualitypage rendering under loadSearchAvailableDuringReindexUIIT/SearchAvailableAllKindsDuringReindexUIITare intentionally left unchanged — their.ignoreNoExceptions()is a deliberate fail-fast for the zero-downtime guarantee (a mid-flight blackout/duplicate must fail immediately; transient 503 shard-lag is already absorbed insideprobeIndexToleratingShardLag).Type of change:
Tests:
Each modified test verified green on local testcontainers (Postgres + Elasticsearch, real Playwright browser):
DbToEsCountReconciliationIT— 2/2 runsDistributedAutoTuneReindexUIIT,SimpleReindexTriggerUIIT,SelectiveFieldReindexUIIT,LongCompoundNameSearchUIIT,PipelineOwnerIndexUIIT— greenmvn test-compileandmvn spotless:checkboth pass.🤖 Generated with Claude Code
Greptile Summary
This PR de-flakes the nightly
search-itandui-itreindex test suite by addressing root causes (ES refresh lag, fuzzy query scope bleed, andignoreNoExceptions()aborting on transient errors) without weakening any assertions — two are in fact made stricter.DbToEsCountReconciliationIT): converts the immediate post-reindex count comparison into an Awaitility convergence poll so ES refresh lag no longer causes spurious failures; a real regression still never converges and fails.DistributedAutoTuneReindexUIIT): replaces fuzzyq=<prefix>probe +Thread.sleepwith a strictname.keywordprefix count viaSearchAssertionsand cluster-alias-awareIndexAliasInspector, removing the cross-test count bleed (6900vs expected1500) that was the primary flake source.ignoreNoExceptions()→ignoreExceptions()(SimpleReindexTriggerUIIT,LongCompoundNameSearchUIIT,PipelineOwnerIndexUIIT,SelectiveFieldReindexUIIT): Awaitility'signoreNoExceptions()aborts the poll on any exception including transient Playwright/search errors;ignoreExceptions()lets those retry.SelectiveFieldReindexUIIT): replaces heavy/data-qualityglobal page renders fortestCase/testSuitewith directname.keywordindex counts — the exact contract the DQ list is backed by.Confidence Score: 5/5
All changes are test-only; no product behaviour is altered and no assertion is weakened.
Every change targets a clearly identified root cause — refresh lag, cross-test query bleed, or an incorrect Awaitility exception policy. The convergence poll in DbToEsCountReconciliationIT still catches genuine indexer regressions (they never converge). The switch from fuzzy q= to strict name.keyword prefix in DistributedAutoTuneReindexUIIT is strictly more precise. No logic shared with production code is touched.
No files require special attention.
Important Files Changed
Reviews (1): Last reviewed commit: "test(search): de-flake nightly search-it..." | Re-trigger Greptile