Skip to content

Add source-diverse AI rerank candidate pool#2429

Open
Mbeaulne wants to merge 1 commit into
06-18-parse_negative_constraints_without_not_no_exclude_from
06-18-build_broader_ai_candidate_pools_for_component_search
Open

Add source-diverse AI rerank candidate pool#2429
Mbeaulne wants to merge 1 commit into
06-18-parse_negative_constraints_without_not_no_exclude_from
06-18-build_broader_ai_candidate_pools_for_component_search

Conversation

@Mbeaulne

@Mbeaulne Mbeaulne commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator

Description

The AI rerank candidate pool now uses a source-diverse selection strategy instead of relying purely on lexical hits. Previously, the candidate pool was capped at 50 results drawn entirely from lexical matches, falling back to a plain alphabetical slice when no matches were found. This meant components from underrepresented sources (e.g. user uploads) could be crowded out entirely when a query produced many strong lexical hits from a single source.

The new approach builds the candidate pool in three layers:

  1. Up to 60 of the strongest lexical hits for the query
  2. An evenly-sampled set of up to 8 candidates per source (source-diverse browse)
  3. An evenly-sampled alphabetical slice of the full index to fill remaining slots up to the new cap of 80

The rerank base is now always set to aiCandidateMatches rather than switching between lexical and AI candidate lists depending on whether lexical results existed.

Related Issue and Pull requests

Type of Change

  • Bug fix
  • New feature
  • Improvement
  • Cleanup/Refactor
  • Breaking change
  • Documentation update

Checklist

  • I have tested this does not break current pipelines / runs functionality
  • I have tested the changes on staging

Screenshots (if applicable)

Test Instructions

  1. Open the component search panel in the editor.
  2. Enter a query that matches many components from a single source (e.g. a library with 100+ entries).
  3. Click the AI rerank button and verify that components from other sources (e.g. user-uploaded files) still appear in the ranked results.
  4. Verify that the total candidate count sent to the reranker does not exceed 80.
  5. Run the unit tests in componentSearchV2Logic.test.ts to confirm the new source-diversity test passes.

Additional Comments

The new sampleEvenly helper picks items at uniform intervals so that the browse sample is representative of the full sorted list rather than just the top entries.

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown

🎩 Preview

A preview build has been created at: 06-18-build_broader_ai_candidate_pools_for_component_search/f23f231

Mbeaulne commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator Author

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

@Mbeaulne Mbeaulne changed the title Build broader AI candidate pools for component search Add source-diverse AI rerank candidate pool Jun 18, 2026
@Mbeaulne Mbeaulne marked this pull request as ready for review June 18, 2026 17:56
@Mbeaulne Mbeaulne requested a review from a team as a code owner June 18, 2026 17:56
Comment thread src/routes/v2/pages/Editor/components/componentSearchV2Logic.ts Outdated
@Mbeaulne Mbeaulne force-pushed the 06-18-build_broader_ai_candidate_pools_for_component_search branch from d363ca7 to 60b076d Compare June 18, 2026 19:12
@Mbeaulne Mbeaulne force-pushed the 06-18-parse_negative_constraints_without_not_no_exclude_ branch from 97e37c0 to 4f20ff2 Compare June 18, 2026 19:12
@Mbeaulne Mbeaulne force-pushed the 06-18-build_broader_ai_candidate_pools_for_component_search branch from 60b076d to 455266e Compare June 18, 2026 20:28
@Mbeaulne Mbeaulne force-pushed the 06-18-parse_negative_constraints_without_not_no_exclude_ branch from 4f20ff2 to 638c7b7 Compare June 18, 2026 20:28
@Mbeaulne Mbeaulne force-pushed the 06-18-build_broader_ai_candidate_pools_for_component_search branch from 455266e to 4a246ee Compare June 18, 2026 20:49
@Mbeaulne Mbeaulne force-pushed the 06-18-parse_negative_constraints_without_not_no_exclude_ branch from 638c7b7 to 3f91762 Compare June 18, 2026 20:49
@Mbeaulne Mbeaulne force-pushed the 06-18-build_broader_ai_candidate_pools_for_component_search branch from 4a246ee to 8cc6222 Compare June 18, 2026 21:02
@Mbeaulne Mbeaulne force-pushed the 06-18-parse_negative_constraints_without_not_no_exclude_ branch from 3f91762 to 790c426 Compare June 18, 2026 21:02
@Mbeaulne Mbeaulne force-pushed the 06-18-build_broader_ai_candidate_pools_for_component_search branch from 8cc6222 to 88f3546 Compare June 18, 2026 21:16
@Mbeaulne Mbeaulne force-pushed the 06-18-parse_negative_constraints_without_not_no_exclude_ branch from 790c426 to 554c927 Compare June 18, 2026 21:16
@camielvs

Copy link
Copy Markdown
Collaborator

🤖 Code review — Add source-diverse AI rerank candidate pool

Reasonable evolution of the candidate pool: top lexical hits (60) → source-diverse browse sample (8/source) → evenly-sampled whole-index fallback, deduped and capped at 80. sampleEvenly is a tidy spread, the tiered appendUniqueMatches with shared seenDigests + early cap is clean, and switching rerankBaseMatches to always be aiCandidateMatches is the right move now that rerank scores the whole pool.

Findings:

  • The source-diverse browse sample is query-independent, which limits its stated purpose. The comment says it lets AI "rescue plausible matches that literal scoring missed," but buildSourceDiverseBrowseMatches ignores the query entirely — it's an even alphabetical sample per source. For any specific query, the chance that the one lexically-missed-but-relevant component lands in an 8-wide even sample is low (and lower as a library grows). Its concrete, reliable effect is "guarantee each source has some representation + fill the pool to 80." If recall-on-missed-matches is the actual goal, query-aware per-source sampling (e.g. each source's top-k lexical hits, even if below the global cutoff) would spend the token budget far more effectively. Worth clarifying which goal this serves.

  • Cost + post-rerank list length both grow. Confirmed via the hook: displayedMatches = rerankedMatches(rerankData, rerankBaseMatches) and rerankBaseMatches is now the 80-item pool, with all 80 mapped to candidates and sent with scoreAllCandidates: true (max_output_tokens ≈ count × 100 ≈ 8k). So every rerank now scores 80 candidates regardless of how focused the query is, and rerankedMatches keeps even the ≤-threshold browse fillers (pushed to the bottom, unbadged, but still rendered) — the displayed list can jump from ~50 to ~80 after a rerank click. Both are probably acceptable (user-initiated), but consider slicing the displayed post-rerank list back to a sane cap so the tail of near-zero browse samples doesn't pad the results.

  • Perf nit: bySource.set(key, [...(bySource.get(key) ?? []), entry]) rebuilds the array on every entry → O(n²) over the index. Push into a mutable array ((bySource.get(key) ?? init).push(entry)), which matters more here than in the existing buildResultFolders idiom since the full index can be large.

  • Test coverage is thin for the new tiering. One test covers the 80-cap + user-source inclusion. sampleEvenly and the 3-tier dedup/fill interplay (e.g. lexical returns < 60, browse + whole-index fill the rest without duplicates) aren't directly exercised. A targeted test there would lock the behavior.

Comment thread src/routes/v2/pages/Editor/components/componentSearchV2Logic.ts Outdated
Comment thread src/routes/v2/pages/Editor/components/componentSearchV2Logic.ts Outdated
Comment thread src/routes/v2/pages/Editor/hooks/useComponentSearchV2State.ts Outdated
Comment thread src/routes/v2/pages/Editor/components/componentSearchV2Logic.test.ts Outdated
@Mbeaulne Mbeaulne force-pushed the 06-18-build_broader_ai_candidate_pools_for_component_search branch from 88f3546 to 9942e6c Compare June 24, 2026 18:11
@Mbeaulne Mbeaulne requested a review from camielvs June 24, 2026 18:19
@Mbeaulne Mbeaulne force-pushed the 06-18-build_broader_ai_candidate_pools_for_component_search branch from 9942e6c to 66fafa8 Compare June 24, 2026 19:52
@Mbeaulne Mbeaulne force-pushed the 06-18-parse_negative_constraints_without_not_no_exclude_ branch 2 times, most recently from 0b4da9b to 8365c6b Compare June 25, 2026 15:55
@Mbeaulne Mbeaulne force-pushed the 06-18-build_broader_ai_candidate_pools_for_component_search branch from 66fafa8 to 13998df Compare June 25, 2026 15:55
@Mbeaulne Mbeaulne force-pushed the 06-18-parse_negative_constraints_without_not_no_exclude_ branch from 8365c6b to c28d286 Compare June 25, 2026 19:38
@Mbeaulne Mbeaulne force-pushed the 06-18-build_broader_ai_candidate_pools_for_component_search branch 2 times, most recently from bf05e0b to 611c64f Compare June 25, 2026 19:43
@Mbeaulne Mbeaulne force-pushed the 06-18-parse_negative_constraints_without_not_no_exclude_ branch from c28d286 to 3b7ef74 Compare June 25, 2026 19:43
Comment thread src/routes/v2/pages/Editor/hooks/useComponentSearchV2State.ts Outdated
@Mbeaulne Mbeaulne force-pushed the 06-18-build_broader_ai_candidate_pools_for_component_search branch from 611c64f to f23f231 Compare June 26, 2026 13:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants