Predict IVF-PQ FP16 overflow and auto-switch to FP32#2246
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a new ChangesFP16 Overflow Detection and CAGRA Integration
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cpp/src/neighbors/ivf_pq/ivf_pq_fp16_overflow.cuh`:
- Line 39: Remove the debug printf statement at line 39 in the
ivf_pq_fp16_overflow.cuh file. This printf executes within a loop that iterates
over every dimension of every sampled row, causing severe performance
degradation and output flooding. Simply delete the entire printf line as it
serves no purpose in production code.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 086831a1-8cdd-42f2-aaaf-8ac33b75c173
📒 Files selected for processing (2)
cpp/src/neighbors/detail/cagra/cagra_build.cuhcpp/src/neighbors/ivf_pq/ivf_pq_fp16_overflow.cuh
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
cpp/src/neighbors/ivf_pq/ivf_pq_fp16_overflow.cuh (1)
70-88:⚠️ Potential issue | 🟠 Major | ⚡ Quick winHIGH: Missing validation could cause kernel launch with zero grid blocks.
If
n_rows == 0(empty dataset) or ifsample_fractionis extremely small with a moderate dataset size (e.g.,n_rows=5000, sample_fraction=0.0001),n_samplebecomes 0, resulting ingrid_size = 0. Launching a kernel with zero blocks is invalid.Consider adding a minimum sample size guard:
Suggested fix
int64_t n_sample = static_cast<int64_t>(sample_fraction * static_cast<double>(n_rows)); if (n_rows <= 1000) { n_sample = n_rows; // for small datasets, just use them all and skip the sampling overhead } else if (n_rows > 100000) { // cap the sample size to 100k for speed and keep memory use within the limited workspace resource n_sample = 100000; } + + // Handle empty dataset or degenerate sample size + if (n_sample <= 0) { return 0.0f; }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cpp/src/neighbors/ivf_pq/ivf_pq_fp16_overflow.cuh` around lines 70 - 88, Add a minimum sample size guard to prevent grid_size from being zero when launching the kernel. After the existing size constraint checks for n_rows (the checks for n_rows <= 1000 and n_rows > 100000), add a condition to ensure n_sample is at least 1, since if n_sample is 0, the subsequent calculation of grid_size will be 0, resulting in an invalid kernel launch with zero blocks. This guards against both empty datasets (n_rows == 0) and extremely small sample fractions.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@cpp/src/neighbors/ivf_pq/ivf_pq_fp16_overflow.cuh`:
- Around line 70-88: Add a minimum sample size guard to prevent grid_size from
being zero when launching the kernel. After the existing size constraint checks
for n_rows (the checks for n_rows <= 1000 and n_rows > 100000), add a condition
to ensure n_sample is at least 1, since if n_sample is 0, the subsequent
calculation of grid_size will be 0, resulting in an invalid kernel launch with
zero blocks. This guards against both empty datasets (n_rows == 0) and extremely
small sample fractions.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: a794a6f7-8a2b-4f29-ac12-49b5d25d10ab
📒 Files selected for processing (1)
cpp/src/neighbors/ivf_pq/ivf_pq_fp16_overflow.cuh
mfoerste4
left a comment
There was a problem hiding this comment.
Thanks @huuanhhuyn for the PR. Please utilize raft for the norm computation if possible.
mfoerste4
left a comment
There was a problem hiding this comment.
LGTM, thanks @huuanhhuyn for the PR!
Sampling is not necessary anyway.
tfeher
left a comment
There was a problem hiding this comment.
Thanks Huy for the PR! It looks good, but I have one concern regarding the "defensive margin". Let's see what others think about this questien.
| constexpr float kFp16Max = 65504.0f; | ||
| constexpr float kFp16DefensiveMargin = 0.25f; | ||
| const float overflow_detect_threshold = kFp16DefensiveMargin * kFp16Max; |
There was a problem hiding this comment.
While the intention is good, I am not sure if we ned so large kFp16DefensiveMargin. E.g. someone could have normalized 4096 dim embeddings, where the current setting would trigger fallback to fp32 (in case of L2 norm). Admittedly, such large embeddings are rare, but I would prefer to only fall back to fp32 when the distance can overflow, and not when we have precision loss due to the limited range of the numeric type. @achirkin what do you think?
There was a problem hiding this comment.
Thank you for checking it.
I have some arguments for keeping it this way:
- the precision loss would break the ivf-pq search when neighbor distances become indifferentiable. This will fall again into the "low self-included ratio" error message.
- The magnitude looks large but the bitwise distances between 65k and 16k are just 2 bits.
someone could have normalized 4096 dim embeddings, where the current setting would trigger fallback to fp32
Do you have the dataset? I could run a test on it
|
/ok to test 7f41b35 |
Large-magnitude unnormalized datasets (e.g. SIFT_1M) contains vectors whose norm overflowed the FP16 internal search types of IVF-PQ.
This PR detects it by sampling a fraction from the dataset, compute its L2 squared norm and estimate the max squared distance from the samples. When the distance reaches FP16 overflow range, the internal types fall back to FP32.
Sampling: We uniformly sampled

n_samplesamples from then_rowsof the datasetMeasured runtime: 10-15ms. It doesn't grow with n_rows. It grows with dim (max at 15ms for Wiki dataset where dim=1536)