Add source-diverse AI rerank candidate pool by Mbeaulne · Pull Request #2429 · TangleML/tangle-ui

Mbeaulne · 2026-06-18T17:40:42Z

Description

The AI rerank candidate pool now uses a source-diverse selection strategy instead of relying purely on lexical hits. Previously, the candidate pool was capped at 50 results drawn entirely from lexical matches, falling back to a plain alphabetical slice when no matches were found. This meant components from underrepresented sources (e.g. user uploads) could be crowded out entirely when a query produced many strong lexical hits from a single source.

The new approach builds the candidate pool in three layers:

Up to 60 of the strongest lexical hits for the query
An evenly-sampled set of up to 8 candidates per source (source-diverse browse)
An evenly-sampled alphabetical slice of the full index to fill remaining slots up to the new cap of 80

The rerank base is now always set to aiCandidateMatches rather than switching between lexical and AI candidate lists depending on whether lexical results existed.

Related Issue and Pull requests

Type of Change

Checklist

I have tested this does not break current pipelines / runs functionality
I have tested the changes on staging

Screenshots (if applicable)

Test Instructions

Open the component search panel in the editor.
Enter a query that matches many components from a single source (e.g. a library with 100+ entries).
Click the AI rerank button and verify that components from other sources (e.g. user-uploaded files) still appear in the ranked results.
Verify that the total candidate count sent to the reranker does not exceed 80.
Run the unit tests in componentSearchV2Logic.test.ts to confirm the new source-diversity test passes.

Additional Comments

The new sampleEvenly helper picks items at uniform intervals so that the browse sample is representative of the full sorted list rather than just the top entries.

github-actions · 2026-06-18T17:40:53Z

🎩 Preview

A preview build has been created at: 06-18-build_broader_ai_candidate_pools_for_component_search/f23f231

Mbeaulne · 2026-06-18T17:41:02Z

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

camielvs · 2026-06-19T22:10:58Z

🤖 Code review — Add source-diverse AI rerank candidate pool

Reasonable evolution of the candidate pool: top lexical hits (60) → source-diverse browse sample (8/source) → evenly-sampled whole-index fallback, deduped and capped at 80. sampleEvenly is a tidy spread, the tiered appendUniqueMatches with shared seenDigests + early cap is clean, and switching rerankBaseMatches to always be aiCandidateMatches is the right move now that rerank scores the whole pool.

Findings:

The source-diverse browse sample is query-independent, which limits its stated purpose. The comment says it lets AI "rescue plausible matches that literal scoring missed," but buildSourceDiverseBrowseMatches ignores the query entirely — it's an even alphabetical sample per source. For any specific query, the chance that the one lexically-missed-but-relevant component lands in an 8-wide even sample is low (and lower as a library grows). Its concrete, reliable effect is "guarantee each source has some representation + fill the pool to 80." If recall-on-missed-matches is the actual goal, query-aware per-source sampling (e.g. each source's top-k lexical hits, even if below the global cutoff) would spend the token budget far more effectively. Worth clarifying which goal this serves.
Cost + post-rerank list length both grow. Confirmed via the hook: displayedMatches = rerankedMatches(rerankData, rerankBaseMatches) and rerankBaseMatches is now the 80-item pool, with all 80 mapped to candidates and sent with scoreAllCandidates: true (max_output_tokens ≈ count × 100 ≈ 8k). So every rerank now scores 80 candidates regardless of how focused the query is, and rerankedMatches keeps even the ≤-threshold browse fillers (pushed to the bottom, unbadged, but still rendered) — the displayed list can jump from ~50 to ~80 after a rerank click. Both are probably acceptable (user-initiated), but consider slicing the displayed post-rerank list back to a sane cap so the tail of near-zero browse samples doesn't pad the results.
Perf nit: bySource.set(key, [...(bySource.get(key) ?? []), entry]) rebuilds the array on every entry → O(n²) over the index. Push into a mutable array ((bySource.get(key) ?? init).push(entry)), which matters more here than in the existing buildResultFolders idiom since the full index can be large.
Test coverage is thin for the new tiering. One test covers the 80-cap + user-source inclusion. sampleEvenly and the 3-tier dedup/fill interplay (e.g. lexical returns < 60, browse + whole-index fill the rest without duplicates) aren't directly exercised. A targeted test there would lock the behavior.

Mbeaulne mentioned this pull request Jun 18, 2026

Add negative constraint parsing to lexical search #2428

Open

8 tasks

Mbeaulne changed the title ~~Build broader AI candidate pools for component search~~ Add source-diverse AI rerank candidate pool Jun 18, 2026

Mbeaulne mentioned this pull request Jun 18, 2026

Add deep AI search to rerank all components in selected sources #2430

Open

8 tasks

Mbeaulne marked this pull request as ready for review June 18, 2026 17:56

Mbeaulne requested a review from a team as a code owner June 18, 2026 17:56

This was referenced Jun 18, 2026

Add search quality expectation tests for lexical search #2431

Open

add client-side embeddings cached in IndexedDB #2432

Open

Debounce component search input #2433

Open

Mbeaulne commented Jun 18, 2026

View reviewed changes

Comment thread src/routes/v2/pages/Editor/components/componentSearchV2Logic.ts Outdated

Mbeaulne force-pushed the 06-18-build_broader_ai_candidate_pools_for_component_search branch from d363ca7 to 60b076d Compare June 18, 2026 19:12

Mbeaulne force-pushed the 06-18-parse_negative_constraints_without_not_no_exclude_ branch from 97e37c0 to 4f20ff2 Compare June 18, 2026 19:12

Mbeaulne force-pushed the 06-18-build_broader_ai_candidate_pools_for_component_search branch from 60b076d to 455266e Compare June 18, 2026 20:28

Mbeaulne force-pushed the 06-18-parse_negative_constraints_without_not_no_exclude_ branch from 4f20ff2 to 638c7b7 Compare June 18, 2026 20:28

Mbeaulne force-pushed the 06-18-build_broader_ai_candidate_pools_for_component_search branch from 455266e to 4a246ee Compare June 18, 2026 20:49

Mbeaulne force-pushed the 06-18-parse_negative_constraints_without_not_no_exclude_ branch from 638c7b7 to 3f91762 Compare June 18, 2026 20:49

Mbeaulne force-pushed the 06-18-build_broader_ai_candidate_pools_for_component_search branch from 4a246ee to 8cc6222 Compare June 18, 2026 21:02

Mbeaulne force-pushed the 06-18-parse_negative_constraints_without_not_no_exclude_ branch from 3f91762 to 790c426 Compare June 18, 2026 21:02

Mbeaulne force-pushed the 06-18-build_broader_ai_candidate_pools_for_component_search branch from 8cc6222 to 88f3546 Compare June 18, 2026 21:16

Mbeaulne force-pushed the 06-18-parse_negative_constraints_without_not_no_exclude_ branch from 790c426 to 554c927 Compare June 18, 2026 21:16

This was referenced Jun 22, 2026

Disable AI search when literal search finds no matches #2444

Open

Add component search URL state #2447

Open

Add component lifecycle badges #2448

Open

camielvs reviewed Jun 23, 2026

View reviewed changes

Comment thread src/routes/v2/pages/Editor/components/componentSearchV2Logic.ts Outdated

camielvs reviewed Jun 23, 2026

View reviewed changes

Comment thread src/routes/v2/pages/Editor/components/componentSearchV2Logic.ts Outdated

camielvs reviewed Jun 23, 2026

View reviewed changes

Comment thread src/routes/v2/pages/Editor/hooks/useComponentSearchV2State.ts Outdated

camielvs reviewed Jun 23, 2026

View reviewed changes

Comment thread src/routes/v2/pages/Editor/components/componentSearchV2Logic.test.ts Outdated

This was referenced Jun 23, 2026

Add component discovery docs #2457

Open

Add context-aware component search suggestions #2458

Open

Simplify component search result cards #2459

Open

Mbeaulne force-pushed the 06-18-build_broader_ai_candidate_pools_for_component_search branch from 88f3546 to 9942e6c Compare June 24, 2026 18:11

Mbeaulne requested a review from camielvs June 24, 2026 18:19

Mbeaulne force-pushed the 06-18-build_broader_ai_candidate_pools_for_component_search branch from 9942e6c to 66fafa8 Compare June 24, 2026 19:52

Mbeaulne force-pushed the 06-18-parse_negative_constraints_without_not_no_exclude_ branch 2 times, most recently from 0b4da9b to 8365c6b Compare June 25, 2026 15:55

Mbeaulne force-pushed the 06-18-build_broader_ai_candidate_pools_for_component_search branch from 66fafa8 to 13998df Compare June 25, 2026 15:55

Mbeaulne force-pushed the 06-18-parse_negative_constraints_without_not_no_exclude_ branch from 8365c6b to c28d286 Compare June 25, 2026 19:38

Mbeaulne force-pushed the 06-18-build_broader_ai_candidate_pools_for_component_search branch 2 times, most recently from bf05e0b to 611c64f Compare June 25, 2026 19:43

Mbeaulne force-pushed the 06-18-parse_negative_constraints_without_not_no_exclude_ branch from c28d286 to 3b7ef74 Compare June 25, 2026 19:43

maxy-shpfy reviewed Jun 26, 2026

View reviewed changes

Comment thread src/routes/v2/pages/Editor/hooks/useComponentSearchV2State.ts Outdated

Build broader AI candidate pools for component search

f23f231

Mbeaulne force-pushed the 06-18-build_broader_ai_candidate_pools_for_component_search branch from 611c64f to f23f231 Compare June 26, 2026 13:51

Mbeaulne mentioned this pull request Jun 26, 2026

Add AI model quick-select to app menu and injectable model config #2461

Open

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add source-diverse AI rerank candidate pool#2429

Add source-diverse AI rerank candidate pool#2429
Mbeaulne wants to merge 1 commit into
06-18-parse_negative_constraints_without_not_no_exclude_from
06-18-build_broader_ai_candidate_pools_for_component_search

Mbeaulne commented Jun 18, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 18, 2026 •

edited

Loading

Uh oh!

Mbeaulne commented Jun 18, 2026 •

edited

Loading

Uh oh!

Uh oh!

camielvs commented Jun 19, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

Mbeaulne commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issue and Pull requests

Type of Change

Checklist

Screenshots (if applicable)

Test Instructions

Additional Comments

Uh oh!

github-actions Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🎩 Preview

Uh oh!

Mbeaulne commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

camielvs commented Jun 19, 2026

🤖 Code review — Add source-diverse AI rerank candidate pool

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Mbeaulne commented Jun 18, 2026 •

edited

Loading

github-actions Bot commented Jun 18, 2026 •

edited

Loading

Mbeaulne commented Jun 18, 2026 •

edited

Loading