-
Notifications
You must be signed in to change notification settings - Fork 538
fix: wrong merge logic for distributed IVF_PQ #5772
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
PR Review: fix: wrong merge logic for distributed IVF_PQSummaryThis PR addresses incorrect merge logic in distributed IVF_PQ index building by changing how partial auxiliary shards are ordered during merging. The key change is moving from content-derived sorting (based on min_fragment_id, min_row_id) to lexicographic parent directory name sorting, with an additional per-partition sort by first_row_id within each partition. P0/P1 Issues1. Excessive Debug Logging in Production Code (P1 - Performance) The PR introduces ~30+ unconditional
These info-level logs will be emitted in production by default and may:
Recommendation: Either:
2. Duplicated Sorting Logic (P1 - Maintainability) The parent directory name extraction and sorting logic is duplicated in two places (~lines 486-498 and 1221-1233 in index_merger.rs): // First occurrence
aux_paths.sort_by(|a, b| {
let a_parts: Vec<_> = a.parts().collect();
// ... same logic
});
// Second occurrence
shard_infos.sort_by(|(path_a, _lens_a), (path_b, _lens_b)| {
let a_parts: Vec<_> = path_a.parts().collect();
// ... same logic
});Recommendation: Extract into a helper function to ensure consistency and reduce maintenance burden. 3. No Tests for the Core Bug Fix (P1 - Test Coverage) The PR changes the fundamental ordering logic from content-derived keys to directory-name-based sorting, but there are no new tests verifying:
The existing tests appear to have whitespace formatting changes but no new test cases for the actual fix. 4. Potential Re-opening of Files (P1 - Performance) In the new per-partition merge loop (lines 1281-1442), each shard's auxiliary file is re-opened for every partition: for pid in 0..nlist {
for (path, lens) in shard_infos.iter() {
// Opens file again for each partition
let fh = sched.open_file(aux, &CachedFileSize::unknown()).await?;
let reader = V2Reader::try_open(...).await?;
}
}For large indices with many partitions (e.g., 1000+ partitions), this could result in thousands of file open operations per shard. Recommendation: Consider caching readers across partitions or restructuring to minimize file I/O. Minor Notes (not blocking)
Questions for Author
|
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
No description provided.