Predict IVF-PQ FP16 overflow and auto-switch to FP32 by huuanhhuyn · Pull Request #2246 · NVIDIA/cuvs

huuanhhuyn · 2026-06-16T07:22:29Z

Large-magnitude unnormalized datasets (e.g. SIFT_1M) contains vectors whose norm overflowed the FP16 internal search types of IVF-PQ.

This PR detects it by sampling a fraction from the dataset, compute its L2 squared norm and estimate the max squared distance from the samples. When the distance reaches FP16 overflow range, the internal types fall back to FP32.

Sampling: We uniformly sampled n_sample samples from the n_rows of the dataset

Measured runtime: 10-15ms. It doesn't grow with n_rows. It grows with dim (max at 15ms for Wiki dataset where dim=1536)

copy-pr-bot · 2026-06-16T07:22:32Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-06-16T07:26:23Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds a new ivf_pq_fp16_overflow.cuh header that samples dataset rows to estimate the maximum squared norm via map-reduce, derives a distance upper bound, and compares it against a defensive FP16 threshold. The CAGRA build(...) function includes this header and calls estimate_fp16_overflow to auto-upgrade distance dtypes to FP32 when overflow is detected.

Changes

FP16 Overflow Detection and CAGRA Integration

Layer / File(s)	Summary
FP16 overflow detection header: sampler and threshold logic `cpp/src/neighbors/ivf_pq/ivf_pq_fp16_overflow.cuh`	Adds includes for RAFT CUDA utilities, MDSpan, row-sampling, RNG, and distance utilities. Implements `detail::estimate_max_squared_norm` that samples a saturated fraction of dataset rows into device memory, computes mapped squared norms per row, reduces to the global maximum via map-reduce, synchronizes, and returns the scalar. Implements `helpers::estimate_fp16_overflow` that early-exits for empty datasets and `CosineExpanded` metrics, defines an FP16 max-based overflow threshold with defensive margin, derives a distance upper bound (scaled by `4.0` for `L2Expanded`), and returns whether the bound exceeds the threshold.
CAGRA build: include and FP16 guard wiring `cpp/src/neighbors/detail/cagra/cagra_build.cuh`	Adds the include for `ivf_pq_fp16_overflow.cuh` at the top of the file. Inserts a pre-build overflow-risk check in `build(...)` that, when `ivf_pq_params` is detected and either `internal_distance_dtype` or `coarse_search_dtype` is FP16, calls `estimate_fp16_overflow` and, on detected overflow risk, logs a warning and overrides both dtypes to `CUDA_R_32F` while leaving `lut_dtype` unchanged.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Predict IVF-PQ FP16 overflow and auto-switch to FP32' accurately and concisely describes the main changes: FP16 overflow prediction and automatic precision switching in IVF-PQ operations.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description check	✅ Passed	The pull request description clearly explains the problem (FP16 overflow in IVF-PQ for large datasets), the solution (sampling-based overflow detection), and implementation details (sampling strategy and fallback to FP32).

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cpp/src/neighbors/ivf_pq/ivf_pq_fp16_overflow.cuh`:
- Line 39: Remove the debug printf statement at line 39 in the
ivf_pq_fp16_overflow.cuh file. This printf executes within a loop that iterates
over every dimension of every sampled row, causing severe performance
degradation and output flooding. Simply delete the entire printf line as it
serves no purpose in production code.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 086831a1-8cdd-42f2-aaaf-8ac33b75c173

📥 Commits

Reviewing files that changed from the base of the PR and between 6672103 and faa1609.

📒 Files selected for processing (2)

cpp/src/neighbors/detail/cagra/cagra_build.cuh
cpp/src/neighbors/ivf_pq/ivf_pq_fp16_overflow.cuh

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

cpp/src/neighbors/ivf_pq/ivf_pq_fp16_overflow.cuh (1)
70-88: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

HIGH: Missing validation could cause kernel launch with zero grid blocks.

If n_rows == 0 (empty dataset) or if sample_fraction is extremely small with a moderate dataset size (e.g., n_rows=5000, sample_fraction=0.0001), n_sample becomes 0, resulting in grid_size = 0. Launching a kernel with zero blocks is invalid.

Consider adding a minimum sample size guard:
Suggested fix
   int64_t n_sample = static_cast<int64_t>(sample_fraction * static_cast<double>(n_rows));
   if (n_rows <= 1000) {
     n_sample = n_rows;  // for small datasets, just use them all and skip the sampling overhead
   } else if (n_rows > 100000) {
     // cap the sample size to 100k for speed and keep memory use within the limited workspace resource
     n_sample = 100000;
   }
+  
+  // Handle empty dataset or degenerate sample size
+  if (n_sample <= 0) { return 0.0f; }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/src/neighbors/ivf_pq/ivf_pq_fp16_overflow.cuh` around lines 70 - 88, Add
a minimum sample size guard to prevent grid_size from being zero when launching
the kernel. After the existing size constraint checks for n_rows (the checks for
n_rows <= 1000 and n_rows > 100000), add a condition to ensure n_sample is at
least 1, since if n_sample is 0, the subsequent calculation of grid_size will be
0, resulting in an invalid kernel launch with zero blocks. This guards against
both empty datasets (n_rows == 0) and extremely small sample fractions.
Source: Coding guidelines

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@cpp/src/neighbors/ivf_pq/ivf_pq_fp16_overflow.cuh`:
- Around line 70-88: Add a minimum sample size guard to prevent grid_size from
being zero when launching the kernel. After the existing size constraint checks
for n_rows (the checks for n_rows <= 1000 and n_rows > 100000), add a condition
to ensure n_sample is at least 1, since if n_sample is 0, the subsequent
calculation of grid_size will be 0, resulting in an invalid kernel launch with
zero blocks. This guards against both empty datasets (n_rows == 0) and extremely
small sample fractions.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: a794a6f7-8a2b-4f29-ac12-49b5d25d10ab

📥 Commits

Reviewing files that changed from the base of the PR and between faa1609 and 3b5130f.

📒 Files selected for processing (1)

cpp/src/neighbors/ivf_pq/ivf_pq_fp16_overflow.cuh

mfoerste4

Thanks @huuanhhuyn for the PR. Please utilize raft for the norm computation if possible.

… SIFT 1M)

mfoerste4

LGTM, thanks @huuanhhuyn for the PR!

Sampling is not necessary anyway.

tfeher

Thanks Huy for the PR! It looks good, but I have one concern regarding the "defensive margin". Let's see what others think about this questien.

tfeher · 2026-06-26T13:31:04Z

+  constexpr float kFp16Max              = 65504.0f;
+  constexpr float kFp16DefensiveMargin  = 0.25f;
+  const float overflow_detect_threshold = kFp16DefensiveMargin * kFp16Max;


While the intention is good, I am not sure if we ned so large kFp16DefensiveMargin. E.g. someone could have normalized 4096 dim embeddings, where the current setting would trigger fallback to fp32 (in case of L2 norm). Admittedly, such large embeddings are rare, but I would prefer to only fall back to fp32 when the distance can overflow, and not when we have precision loss due to the limited range of the numeric type. @achirkin what do you think?

Thank you for checking it.

I have some arguments for keeping it this way:

the precision loss would break the ivf-pq search when neighbor distances become indifferentiable. This will fall again into the "low self-included ratio" error message.

The magnitude looks large but the bitwise distances between 65k and 16k are just 2 bits.

someone could have normalized 4096 dim embeddings, where the current setting would trigger fallback to fp32

Do you have the dataset? I could run a test on it

tfeher · 2026-06-26T15:41:52Z

/ok to test 7f41b35

IVF-PQ FP16 overflow detection

faa1609

huuanhhuyn requested a review from a team as a code owner June 16, 2026 07:22

github-project-automation Bot added this to Unstructured Data Processing Jun 16, 2026

coderabbitai Bot reviewed Jun 16, 2026

View reviewed changes

Comment thread cpp/src/neighbors/ivf_pq/ivf_pq_fp16_overflow.cuh Outdated

Explain the use of mapping

3b5130f

coderabbitai Bot reviewed Jun 16, 2026

View reviewed changes

mfoerste4 suggested changes Jun 16, 2026

View reviewed changes

Comment thread cpp/src/neighbors/ivf_pq/ivf_pq_fp16_overflow.cuh Outdated

Comment thread cpp/src/neighbors/ivf_pq/ivf_pq_fp16_overflow.cuh Outdated

huuanhhuyn commented Jun 17, 2026

View reviewed changes

Comment thread cpp/src/neighbors/detail/cagra/cagra_build.cuh Outdated

huuanhhuyn added 3 commits June 22, 2026 08:50

Move params fallback to params init phase

3a4e491

Choose sampling equation

f4031a3

Use raft built-in kernels and remove manual one.

f330a7f

huuanhhuyn requested a review from mfoerste4 June 22, 2026 10:26

huuanhhuyn added 3 commits June 22, 2026 10:55

Add kDelay param to adjust the sampling fraction growth speed

22878f1

Reduce kSaturation to 20k (just sufficient to detect FP16 overflow in…

a2d9369

… SIFT 1M)

Explain uniform sampling decision

3415092

mfoerste4 approved these changes Jun 22, 2026

View reviewed changes

Comment thread cpp/src/neighbors/ivf_pq/ivf_pq_fp16_overflow.cuh Outdated

huuanhhuyn added 2 commits June 26, 2026 12:55

Avoid expensive random sampling over the full dataset.

b117006

Sampling is not necessary anyway.

Edit comments

7f41b35

tfeher requested changes Jun 26, 2026

View reviewed changes

tfeher added bug Something isn't working non-breaking Introduces a non-breaking change labels Jun 26, 2026

Uh oh!

Conversation

huuanhhuyn commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

copy-pr-bot Bot commented Jun 16, 2026

Uh oh!

coderabbitai Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

mfoerste4 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mfoerste4 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tfeher left a comment

Choose a reason for hiding this comment

Uh oh!

tfeher Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

huuanhhuyn Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tfeher commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

huuanhhuyn commented Jun 16, 2026 •

edited

Loading

coderabbitai Bot commented Jun 16, 2026 •

edited

Loading

huuanhhuyn Jun 26, 2026 •

edited

Loading