Skip to content

Remove fuzzy match on ngram; merge SearchUtils into single class; add more test coverage#27636

Open
harshach wants to merge 2 commits intomainfrom
search_tests
Open

Remove fuzzy match on ngram; merge SearchUtils into single class; add more test coverage#27636
harshach wants to merge 2 commits intomainfrom
search_tests

Conversation

@harshach
Copy link
Copy Markdown
Collaborator

@harshach harshach commented Apr 22, 2026

fixes https://github.com/open-metadata/openmetadata-collate/issues/3793

Describe your changes:

Fixes

I worked on ... because ...

Type of change:

  • Bug fix
  • Improvement
  • New feature
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation

Checklist:

  • I have read the CONTRIBUTING document.
  • My PR title is Fixes <issue-number>: <short explanation>
  • I have commented on my code, particularly in hard-to-understand areas.
  • For JSON Schema changes: I updated the migration scripts or explained why it is not needed.

Summary by Gitar

  • Search Optimization:
    • Added getFuzziness and getMaxExpansions heuristics to prevent indices.query.bool.max_clause_count overflow by disabling fuzzy matching for multi-token queries.
    • Added name.keyword to searchSettings.json to ensure exact match ranking for table assets.
  • Refactoring:
    • Consolidated SearchUtil into SearchUtils and implemented index classification helpers (isDataAssetIndex, isTimeSeriesIndex, etc.).
  • Testing:
    • Added testDataAssetAliasSearchMatrix to validate search behavior across various query lengths and formats.
    • Added exhaustive unit tests in SearchUtilsTest to verify heuristic boundary conditions and index classification logic.

This will update automatically on new commits.

Copilot AI review requested due to automatic review settings April 22, 2026 16:51
@github-actions github-actions Bot added backend safe to test Add this label to run secure Github workflows on PRs labels Apr 22, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refines search-query fuzziness/max-expansion heuristics to prevent Lucene clause explosions, consolidates search helper utilities into SearchUtils, and adds regression coverage (unit + integration) around the affected search behavior and ranking.

Changes:

  • Merged the former SearchUtil helpers into SearchUtils and updated call sites accordingly.
  • Updated fuzzy-query heuristics to key off “alphanumeric sub-token” count (mirroring the ngram tokenizer split behavior) and added unit tests to pin the boundary behavior.
  • Improved search configuration and integration coverage (e.g., adding name.keyword exact boost for tables; matrix tests to guard shard-failure regressions).

Reviewed changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
openmetadata-service/src/test/java/org/openmetadata/service/search/SearchUtilsTest.java Adds parameterized unit tests for fuzziness/max_expansions heuristics and index classification routing.
openmetadata-service/src/main/resources/json/data/settings/searchSettings.json Adds name.keyword exact-match boost for the table asset configuration.
openmetadata-service/src/main/java/org/openmetadata/service/search/opensearch/OpenSearchSourceBuilderFactory.java Switches static imports from SearchUtil to SearchUtils.
openmetadata-service/src/main/java/org/openmetadata/service/search/elasticsearch/ElasticSearchSourceBuilderFactory.java Switches static imports from SearchUtil to SearchUtils.
openmetadata-service/src/main/java/org/openmetadata/service/search/SearchUtils.java Incorporates index classification + fuzziness/max_expansions logic (formerly in SearchUtil) using sub-token counting.
openmetadata-service/src/main/java/org/openmetadata/service/search/SearchUtil.java Removed (functionality merged into SearchUtils).
openmetadata-service/src/main/java/org/openmetadata/service/search/SearchSourceBuilderFactory.java Switches static imports from SearchUtil to SearchUtils.
openmetadata-mcp/src/main/java/org/openmetadata/mcp/tools/SearchMetadataTool.java Updates static import to SearchUtils.mapEntityTypesToIndexNames.
openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/SearchResourceIT.java Adds integration regression tests for dataAsset alias clause-explosion behavior, typo-tolerance guard, and exact-name ranking guard.
.gitignore Broadens ignores for .claude/ content.

pmbrull
pmbrull previously approved these changes Apr 22, 2026
…it/tests/SearchResourceIT.java

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 22, 2026 17:16
@harshach harshach added the To release Will cherry-pick this PR into the release branch label Apr 22, 2026
@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented Apr 22, 2026

Code Review ✅ Approved

Consolidates SearchUtils into a single class and removes ngram fuzzy matching while adding comprehensive test coverage. No issues found.

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 10 changed files in this pull request and generated no new comments.

@sonarqubecloud
Copy link
Copy Markdown

@github-actions
Copy link
Copy Markdown
Contributor

🔴 Playwright Results — 2 failure(s), 13 flaky

✅ 3696 passed · ❌ 2 failed · 🟡 13 flaky · ⏭️ 89 skipped

Shard Passed Failed Flaky Skipped
🔴 Shard 1 480 1 0 4
🟡 Shard 2 655 0 1 7
🟡 Shard 3 662 0 4 1
🟡 Shard 4 644 0 4 27
🔴 Shard 5 609 1 1 42
🟡 Shard 6 646 0 3 8

Genuine Failures (failed on all attempts)

Pages/SearchSettings.spec.ts › Restore default search settings (shard 1)
Error: �[2mexpect(�[22m�[31mreceived�[39m�[2m).�[22mtoEqual�[2m(�[22m�[32mexpected�[39m�[2m) // deep equality�[22m

�[32m- Expected  - 0�[39m
�[31m+ Received  + 5�[39m

�[33m@@ -45,10 +45,15 @@�[39m
�[2m        "boost": 20,�[22m
�[2m        "field": "displayName.keyword",�[22m
�[2m        "matchType": "exact",�[22m
�[2m      },�[22m
�[2m      Object {�[22m
�[31m+       "boost": 20,�[39m
�[31m+       "field": "name.keyword",�[39m
�[31m+       "matchType": "exact",�[39m
�[31m+     },�[39m
�[31m+     Object {�[39m
�[2m        "boost": 10,�[22m
�[2m        "field": "name",�[22m
�[2m        "matchType": "phrase",�[22m
�[2m      },�[22m
�[2m      Object {�[22m
Pages/Glossary.spec.ts › Add and Remove Assets (shard 5)
�[31mTest timeout of 180000ms exceeded.�[39m
🟡 13 flaky test(s) (passed on retry)
  • Features/BulkEditEntity.spec.ts › Glossary (shard 2, 1 retry)
  • Features/LineagePipelineAnnotator.spec.ts › database service has pipeline service as downstream in service lineage (shard 3, 1 retry)
  • Features/RTL.spec.ts › Verify Following widget functionality (shard 3, 1 retry)
  • Flow/AddRoleAndAssignToUser.spec.ts › Verify assigned role to new user (shard 3, 1 retry)
  • Flow/PersonaFlow.spec.ts › Set default persona for team should work properly (shard 3, 1 retry)
  • Pages/Customproperties-part2.spec.ts › entityReferenceList shows item count, scrollable list, no expand toggle (shard 4, 1 retry)
  • Pages/DataContractInheritance.spec.ts › Remove Asset - Inherited contract no longer shown when asset is removed from Data Product (shard 4, 1 retry)
  • Pages/DataContracts.spec.ts › Create Data Contract and validate for SearchIndex (shard 4, 1 retry)
  • Pages/DataContracts.spec.ts › Create Data Contract and validate for File (shard 4, 1 retry)
  • Pages/ExplorePageRightPanel.spec.ts › Should verify deleted user not visible in owner selection for table (shard 5, 1 retry)
  • Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)
  • Pages/Lineage/LineageRightPanel.spec.ts › Verify custom properties tab IS visible for supported type: searchIndex (shard 6, 1 retry)
  • Pages/Users.spec.ts › Permissions for table details page for Data Consumer (shard 6, 1 retry)

📦 Download artifacts

How to debug locally
# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend safe to test Add this label to run secure Github workflows on PRs To release Will cherry-pick this PR into the release branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants