Skip to content

feat(mem_wal): add overfetch_factor and refine_base_table to LSM vector search#6908

Open
jackye1995 wants to merge 1 commit into
lance-format:mainfrom
jackye1995:jack/lsm-staleness-overfetch
Open

feat(mem_wal): add overfetch_factor and refine_base_table to LSM vector search#6908
jackye1995 wants to merge 1 commit into
lance-format:mainfrom
jackye1995:jack/lsm-staleness-overfetch

Conversation

@jackye1995
Copy link
Copy Markdown
Contributor

Summary

  • Add configurable overfetch_factor to plan_search() — each LSM source fetches ceil(k * factor) candidates instead of k, mitigating stale reads when a PK's fresh version falls out of its source's top-k and the global dedup can't suppress the stale copy.
  • Simplify refine_factor: Option<u32> to refine_base_table: bool — refine is always factor=1 and only applies to the base table. When overfetch is active, base-table refine is auto-enabled so approximate index distances are re-ranked to exact before the cross-source merge.
  • Exposed in Python and Java bindings with backward-compatible overloads.

…or search

Add configurable overfetch_factor to plan_search() that makes each LSM
source fetch ceil(k * factor) candidates instead of k. The larger
candidate pool increases the chance that the global PK dedup sees both
the fresh and stale copies of an updated PK, mitigating stale reads
when the fresh version falls out of its source's top-k.

Also simplify refine_factor: Option<u32> to refine_base_table: bool,
since refine is always factor=1 and only applies to the base table.
When overfetch is active, base-table refine is auto-enabled so
approximate index distances are re-ranked to exact before the
cross-source merge.

Exposed in Python and Java bindings with backward-compatible overloads.
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@github-actions github-actions Bot added enhancement New feature or request python java labels May 22, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 22, 2026

Codecov Report

❌ Patch coverage is 90.90909% with 2 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...lance/src/dataset/mem_wal/scanner/vector_search.rs 90.90% 1 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request java python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant