Add fuzzy/typo-tolerant matching for component name and I/O fields#2427
Conversation
🎩 PreviewA preview build has been created at: |
bbd53a7 to
36032c1
Compare
0a7d588 to
e379e64
Compare
36032c1 to
d8e31f8
Compare
e379e64 to
89029f0
Compare
d8e31f8 to
d4d0a60
Compare
89029f0 to
fc80727
Compare
🤖 Code review — Add fuzzy/typo-tolerant matching for name and I/O fieldsWell-scoped feature. Restricting fuzzy matching to Findings:
|
d4d0a60 to
d403991
Compare
fc80727 to
55e7004
Compare
d403991 to
5ff800e
Compare
55e7004 to
3422035
Compare
5ff800e to
e9e9957
Compare
3422035 to
80f3e75
Compare
e9e9957 to
931f4ea
Compare
81a12de to
628e387
Compare
931f4ea to
afd8b04
Compare
628e387 to
9ae8af8
Compare

Description
Adds typo tolerance to the lexical search functionality for component names and input/output fields. When a query token is 4–6 characters, a single-character edit distance is allowed; for tokens 7+ characters, up to two edits are permitted. Fuzzy matches receive a slightly lower score than exact matches via a dedicated
FUZZY_MATCH_BONUS_MULTIPLIER. Typo tolerance is intentionally restricted tonameandiofields — descriptions and implementation text do not benefit from fuzzy matching to avoid noisy results.The implementation uses a standard dynamic programming Levenshtein distance algorithm with an early-exit optimisation that abandons the computation once the running row minimum exceeds the allowed distance.
Related Issue and Pull requests
Type of Change
Checklist
Screenshots (if applicable)
Test Instructions
filtr(forfilter_rows) anddatset(fordataset) return the correct component.xgbost) return no results.Additional Comments
Fuzzy matching is skipped entirely when the computed max edit distance is 0 (tokens shorter than 4 characters), keeping short-token searches fast and precise.