Add source-diverse AI rerank candidate pool#2429
Conversation
🎩 PreviewA preview build has been created at: |
d363ca7 to
60b076d
Compare
97e37c0 to
4f20ff2
Compare
60b076d to
455266e
Compare
4f20ff2 to
638c7b7
Compare
455266e to
4a246ee
Compare
638c7b7 to
3f91762
Compare
4a246ee to
8cc6222
Compare
3f91762 to
790c426
Compare
8cc6222 to
88f3546
Compare
790c426 to
554c927
Compare
🤖 Code review — Add source-diverse AI rerank candidate poolReasonable evolution of the candidate pool: top lexical hits (60) → source-diverse browse sample (8/source) → evenly-sampled whole-index fallback, deduped and capped at 80. Findings:
|
88f3546 to
9942e6c
Compare
9942e6c to
66fafa8
Compare
0b4da9b to
8365c6b
Compare
66fafa8 to
13998df
Compare
8365c6b to
c28d286
Compare
bf05e0b to
611c64f
Compare
c28d286 to
3b7ef74
Compare
611c64f to
f23f231
Compare

Description
The AI rerank candidate pool now uses a source-diverse selection strategy instead of relying purely on lexical hits. Previously, the candidate pool was capped at 50 results drawn entirely from lexical matches, falling back to a plain alphabetical slice when no matches were found. This meant components from underrepresented sources (e.g. user uploads) could be crowded out entirely when a query produced many strong lexical hits from a single source.
The new approach builds the candidate pool in three layers:
The rerank base is now always set to
aiCandidateMatchesrather than switching between lexical and AI candidate lists depending on whether lexical results existed.Related Issue and Pull requests
Type of Change
Checklist
Screenshots (if applicable)
Test Instructions
componentSearchV2Logic.test.tsto confirm the new source-diversity test passes.Additional Comments
The new
sampleEvenlyhelper picks items at uniform intervals so that the browse sample is representative of the full sorted list rather than just the top entries.