Improve component search scoring relevance by Mbeaulne · Pull Request #2426 · TangleML/tangle-ui

Mbeaulne · 2026-06-18T17:09:24Z

Description

Improves the lexical search scoring model with three enhancements:

Prefix match boost: Partial query terms (e.g. classif) now rank components where the term is a prefix of a token higher than components where it appears only as a mid-string substring.
IDF-style rare token weighting: Query tokens that match fewer components are weighted more heavily than common tokens, preventing high-frequency terms from dominating scores. For example, searching train xgboost will surface components mentioning xgboost above generic train matches.
All-query-tokens bonus: When a component matches every token in the query (across any fields), it receives an additional score bonus, ensuring more complete matches rank above partial ones.

The phrase match bonus previously applied only to the name field has been extended to all search fields using per-field bonus weights (FIELD_PHRASE_BONUS).

The tokenize function has been refactored to extract a reusable uniqueTokens helper, and a new requiredQueryTokens function produces stemmed, deduplicated tokens from the raw query without synonym expansion, used for phrase and completeness checks.

Related Issue and Pull requests

Type of Change

Checklist

I have tested this does not break current pipelines / runs functionality
I have tested the changes on staging

Test Instructions

Three new unit tests cover the added behaviors:

Search classif — verify classify_rows ranks above a component with classif as a non-prefix substring.
Search train xgboost — verify the component with the rare token xgboost ranks first.
Search train model — verify the component matching both tokens across fields ranks above one matching only train.

Run the test suite with:

npx jest componentSearchIndex

Additional Comments

Token weights are computed per-query using a smoothed inverse document frequency: 1 + log((N+1) / (df+1)), where N is the index size and df is the number of entries containing the token.

github-actions · 2026-06-18T17:09:40Z

🎩 Preview

A preview build has been created at: 06-18-improve_component_search_scoring_relevance/afd8b04

Mbeaulne · 2026-06-18T17:09:45Z

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

camielvs · 2026-06-19T22:07:01Z

🤖 Code review — Improve component search scoring relevance

This is the strongest PR in the stack so far. Four well-chosen relevance signals — word-boundary prefix bonus, IDF rare-token weighting, all-query-tokens bonus, and per-field phrase bonuses — and the tests are genuinely discriminating: each one is constructed so it fails if the specific signal is removed (e.g. the alphabetical tie-break deliberately favors the wrong candidate so only the bonus can flip it). The inline comments explaining that are exactly what a reviewer wants. The per-entry fieldTokensFor cache to avoid re-splitting field text inside the hot per-token loop is a nice touch.

Findings:

IDF down-weights common tokens but doesn't resolve the synonym/stem stacking from Normalize component search tokens for better matching #2424–Add synonym expansion to component lexical search #2425. Scoring is still Σ over tokens of fieldWeight × tokenWeight, and tokens is the stem+synonym-expanded set. So a component containing several members of one synonym group (or both the inflected and stemmed form of a word) still accumulates a contribution per surface variant — IDF only scales each, it doesn't collapse them to one concept. If you want one logical match to count once, dedupe by group/stem before the per-token loop. IDF genuinely helps (common expansions get ~1.0 weight), so this is lower-stakes than before, but the additive stacking is still there.
Math.max(0, inverseFrequency) is dead code. documentFrequency ≤ index.length always, so (N+1)/(df+1) ≥ 1 and log(...) ≥ 0. The clamp can never fire. Harmless, but either drop it or add a comment if it's guarding against a future change.
Per-keystroke cost is now ~2× a full index pass. buildRareTokenWeights scans the whole index once per query token, then scoreEntry scans again, and the all-tokens bonus re-runs entryMatchesToken per entry. lexicalSearch runs in the DashboardComponentsV2View render path on every query change (debounce only lands in Debounce component search input #2433), and synonym expansion multiplies the token count. Fine for hundreds of components; worth keeping in mind as libraries grow or if entryMatchesToken's substring scans get hotter. Minor: index.filter(...).length allocates an array just to count — a plain counter avoids it.
Inflected multi-word phrase bonus still won't fire (carryover from Add synonym expansion to component lexical search #2425). The index normalization interleaves [inflected, stem] (training_testing → "... training train testing test"), so the stemmed requiredTokens join "train test" is never contiguous in the index. The common non-inflected case (train test split → train_test_split) works. Low priority.

Routing the phrase/all-token bonuses through the synonym-free, stemmed requiredTokens (rather than the expanded tokens) is the correct separation — good call.

This was referenced Jun 18, 2026

Expand component search indexing fields #2423

Open

Normalize component search tokens for better matching #2424

Open

Mbeaulne mentioned this pull request Jun 18, 2026

Add synonym expansion to component lexical search #2425

Open

8 tasks

Mbeaulne marked this pull request as ready for review June 18, 2026 17:10

Mbeaulne requested a review from a team as a code owner June 18, 2026 17:10

Mbeaulne commented Jun 18, 2026

View reviewed changes

Comment thread src/services/componentSearchIndex.ts Outdated

Mbeaulne commented Jun 18, 2026

View reviewed changes

Comment thread src/services/componentSearchIndex.ts Outdated

Mbeaulne commented Jun 18, 2026

View reviewed changes

Comment thread src/services/componentSearchIndex.test.ts Outdated

Mbeaulne commented Jun 18, 2026

View reviewed changes

Comment thread src/services/componentSearchIndex.test.ts Outdated

Mbeaulne force-pushed the 06-18-add_synonym_groups branch from 2655160 to dce82a1 Compare June 18, 2026 19:12

Mbeaulne force-pushed the 06-18-improve_component_search_scoring_relevance branch from bbd53a7 to 36032c1 Compare June 18, 2026 19:12

Mbeaulne force-pushed the 06-18-add_synonym_groups branch from dce82a1 to f5a29c0 Compare June 18, 2026 20:28

Mbeaulne force-pushed the 06-18-improve_component_search_scoring_relevance branch 2 times, most recently from d8e31f8 to d4d0a60 Compare June 18, 2026 20:49

This was referenced Jun 23, 2026

Decouple editor component search input #2452

Open

Add component search empty state suggestions #2453

Open

Limit dashboard component browse rendering #2454

Open

Add component search analytics baseline #2456

Open

camielvs reviewed Jun 23, 2026

View reviewed changes

Comment thread src/services/componentSearchIndex.ts

camielvs reviewed Jun 23, 2026

View reviewed changes

Comment thread src/services/componentSearchIndex.ts Outdated

camielvs reviewed Jun 23, 2026

View reviewed changes

Comment thread src/services/componentSearchIndex.ts Outdated

camielvs reviewed Jun 23, 2026

View reviewed changes

Comment thread src/services/componentSearchIndex.ts Outdated

This was referenced Jun 23, 2026

Add component discovery docs #2457

Open

Add context-aware component search suggestions #2458

Open

Simplify component search result cards #2459

Open

Mbeaulne force-pushed the 06-18-improve_component_search_scoring_relevance branch from d4d0a60 to d403991 Compare June 24, 2026 18:11

Mbeaulne force-pushed the 06-18-add_synonym_groups branch from f5a29c0 to 5adba4c Compare June 24, 2026 18:11

Mbeaulne requested a review from camielvs June 24, 2026 18:19

Mbeaulne force-pushed the 06-18-improve_component_search_scoring_relevance branch from d403991 to 5ff800e Compare June 24, 2026 19:52

Mbeaulne force-pushed the 06-18-add_synonym_groups branch from 5adba4c to 1b6c7b3 Compare June 24, 2026 19:52

Mbeaulne force-pushed the 06-18-improve_component_search_scoring_relevance branch 2 times, most recently from e9e9957 to 931f4ea Compare June 25, 2026 19:38

Mbeaulne force-pushed the 06-18-add_synonym_groups branch from 5e9dab4 to 1c2666a Compare June 25, 2026 19:38

maxy-shpfy reviewed Jun 26, 2026

View reviewed changes

Comment thread src/services/componentSearchIndex.ts

maxy-shpfy approved these changes Jun 26, 2026

View reviewed changes

Improve component search scoring relevance

afd8b04

Mbeaulne force-pushed the 06-18-improve_component_search_scoring_relevance branch from 931f4ea to afd8b04 Compare June 26, 2026 13:51

Mbeaulne force-pushed the 06-18-add_synonym_groups branch from 1c2666a to 8d10b47 Compare June 26, 2026 13:51

Mbeaulne mentioned this pull request Jun 26, 2026

Add AI model quick-select to app menu and injectable model config #2461

Open

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve component search scoring relevance#2426

Improve component search scoring relevance#2426
Mbeaulne wants to merge 1 commit into
06-18-add_synonym_groupsfrom
06-18-improve_component_search_scoring_relevance

Mbeaulne commented Jun 18, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 18, 2026 •

edited

Loading

Uh oh!

Mbeaulne commented Jun 18, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

camielvs commented Jun 19, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

Mbeaulne commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issue and Pull requests

Type of Change

Checklist

Test Instructions

Additional Comments

Uh oh!

github-actions Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🎩 Preview

Uh oh!

Mbeaulne commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

camielvs commented Jun 19, 2026

🤖 Code review — Improve component search scoring relevance

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Mbeaulne commented Jun 18, 2026 •

edited

Loading

github-actions Bot commented Jun 18, 2026 •

edited

Loading

Mbeaulne commented Jun 18, 2026 •

edited

Loading