Remove fuzzy match on ngram; merge SearchUtils into single class; add more test coverage#27636
Remove fuzzy match on ngram; merge SearchUtils into single class; add more test coverage#27636
Conversation
… more test coverage
There was a problem hiding this comment.
Pull request overview
This PR refines search-query fuzziness/max-expansion heuristics to prevent Lucene clause explosions, consolidates search helper utilities into SearchUtils, and adds regression coverage (unit + integration) around the affected search behavior and ranking.
Changes:
- Merged the former
SearchUtilhelpers intoSearchUtilsand updated call sites accordingly. - Updated fuzzy-query heuristics to key off “alphanumeric sub-token” count (mirroring the ngram tokenizer split behavior) and added unit tests to pin the boundary behavior.
- Improved search configuration and integration coverage (e.g., adding
name.keywordexact boost for tables; matrix tests to guard shard-failure regressions).
Reviewed changes
Copilot reviewed 9 out of 10 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| openmetadata-service/src/test/java/org/openmetadata/service/search/SearchUtilsTest.java | Adds parameterized unit tests for fuzziness/max_expansions heuristics and index classification routing. |
| openmetadata-service/src/main/resources/json/data/settings/searchSettings.json | Adds name.keyword exact-match boost for the table asset configuration. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/opensearch/OpenSearchSourceBuilderFactory.java | Switches static imports from SearchUtil to SearchUtils. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/elasticsearch/ElasticSearchSourceBuilderFactory.java | Switches static imports from SearchUtil to SearchUtils. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/SearchUtils.java | Incorporates index classification + fuzziness/max_expansions logic (formerly in SearchUtil) using sub-token counting. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/SearchUtil.java | Removed (functionality merged into SearchUtils). |
| openmetadata-service/src/main/java/org/openmetadata/service/search/SearchSourceBuilderFactory.java | Switches static imports from SearchUtil to SearchUtils. |
| openmetadata-mcp/src/main/java/org/openmetadata/mcp/tools/SearchMetadataTool.java | Updates static import to SearchUtils.mapEntityTypesToIndexNames. |
| openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/SearchResourceIT.java | Adds integration regression tests for dataAsset alias clause-explosion behavior, typo-tolerance guard, and exact-name ranking guard. |
| .gitignore | Broadens ignores for .claude/ content. |
…it/tests/SearchResourceIT.java Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Code Review ✅ ApprovedConsolidates SearchUtils into a single class and removes ngram fuzzy matching while adding comprehensive test coverage. No issues found. OptionsDisplay: compact → Showing less information. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
|
🔴 Playwright Results — 2 failure(s), 13 flaky✅ 3696 passed · ❌ 2 failed · 🟡 13 flaky · ⏭️ 89 skipped
Genuine Failures (failed on all attempts)❌
|



fixes https://github.com/open-metadata/openmetadata-collate/issues/3793
Describe your changes:
Fixes
I worked on ... because ...
Type of change:
Checklist:
Fixes <issue-number>: <short explanation>Summary by Gitar
getFuzzinessandgetMaxExpansionsheuristics to preventindices.query.bool.max_clause_countoverflow by disabling fuzzy matching for multi-token queries.name.keywordtosearchSettings.jsonto ensure exact match ranking for table assets.SearchUtilintoSearchUtilsand implemented index classification helpers (isDataAssetIndex,isTimeSeriesIndex, etc.).testDataAssetAliasSearchMatrixto validate search behavior across various query lengths and formats.SearchUtilsTestto verify heuristic boundary conditions and index classification logic.This will update automatically on new commits.