Improve the IVF-PQ Coarse Batch Size Workspace Estimation by julianmi · Pull Request #1937 · rapidsai/cuvs

julianmi · 2026-03-20T17:34:38Z

OpenAI 5M dataset runs OOM when using quantized data (float to int8) as show in this example. The kMinWorkspaceRatio is too small for the int8 data such that the IVF-PQ search max_internal_batch_size is not reduced. Doubling kMinWorkspaceRatio fixes this. There is no performance degradation for the OpenAI 5M float dataset with this change. Let me know if I should test more datasets.

CC @tfeher

aamijar

Thanks @julianmi! Is this targeted for release/26.04?

achirkin

Thanks @julianmi for the PR!
I think the current fix most likely helps to avoid the OOM for all reasonable use cases. However, it doesn't address the problem: IVF-PQ search shouldn't ever fail with OOM - it does workspace calculation internally and should scale correctly.
Looking at the IVF-PQ search code, I think we need to make the coarse batch size estimate more precise. Something like this should work (untested):

inline auto get_max_coarse_batch_size(raft::resources const& res,
                                      const search_params& params,
                                      uint32_t n_probes,
                                      uint32_t n_lists,
                                      uint32_t n_queries,
                                      uint32_t dim_ext,
                                      uint32_t rot_dim) -> uint32_t
{
  size_t gemm_elem_size;
  size_t qc_elem_size;
  switch (params.coarse_search_dtype) {
    case CUDA_R_32F: gemm_elem_size = 4; qc_elem_size = 4; break;
    case CUDA_R_16F: gemm_elem_size = 2; qc_elem_size = 2; break;
    case CUDA_R_8I: gemm_elem_size = 1; qc_elem_size = 4; break;
    default: RAFT_FAIL("Unexpected coarse_search_dtype (%d)", int(params.coarse_search_dtype));
  }
  // Persistent allocations that live for the entire search call.
  auto persistent_per_query = static_cast<size_t>(dim_ext) * gemm_elem_size
                            + static_cast<size_t>(rot_dim) * sizeof(float)
                            + static_cast<size_t>(n_probes) * sizeof(uint32_t);
  // Transient allocations during coarse search (select_clusters): qc_distances + cluster_dists.
  auto transient_per_query =
    static_cast<size_t>(n_lists + n_probes) * qc_elem_size;
  auto total_per_query = persistent_per_query + transient_per_query;
  auto max_per_ws      = raft::resource::get_workspace_free_bytes(res) / total_per_query;
  return std::max<uint32_t>(
    1,
    std::min<uint32_t>(max_per_ws,
                       std::min<uint32_t>(params.max_internal_batch_size, n_queries)));
}

…in_ws_ratio

julianmi · 2026-03-25T13:48:51Z

Thanks @julianmi for the PR! I think the current fix most likely helps to avoid the OOM for all reasonable use cases. However, it doesn't address the problem: IVF-PQ search shouldn't ever fail with OOM - it does workspace calculation internally and should scale correctly. Looking at the IVF-PQ search code, I think we need to make the [coarse batch size estimate](https://github.com/rapidsai/cuvs/blob/3f77e4680c8a67df74ba337159397bdf07f75b53/cpp/src/neighbors/ivf_pq/ivf_p

Thank you, this is the much better fix. As discussed offline, we need to keep the max_per_ws / 2 factor to keep space for the inner loop batching.

achirkin

Thanks, looks good!

julianmi requested a review from a team as a code owner March 20, 2026 17:34

github-project-automation bot added this to Unstructured Data Processing Mar 20, 2026

aamijar added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Mar 20, 2026

aamijar moved this to In Progress in Unstructured Data Processing Mar 20, 2026

aamijar assigned julianmi Mar 20, 2026

aamijar approved these changes Mar 20, 2026

View reviewed changes

tfeher requested a review from achirkin March 24, 2026 10:21

Increase kMinWorkspaceRatio

9a14aca

julianmi force-pushed the increase_min_ws_ratio branch from 794088a to 9a14aca Compare March 24, 2026 10:30

julianmi requested review from a team as code owners March 24, 2026 10:30

julianmi requested a review from bdice March 24, 2026 10:30

julianmi changed the base branch from main to release/26.04 March 24, 2026 10:31

achirkin removed request for a team March 24, 2026 10:47

achirkin requested changes Mar 24, 2026

View reviewed changes

julianmi added 2 commits March 25, 2026 14:45

Add IVF-PQ search workspace heuristic by @achirkin

1f63060

Merge remote-tracking branch 'upstream/release/26.04' into increase_m…

c02fd6d

…in_ws_ratio

julianmi changed the title ~~Increase kMinWorkspaceRatio in build_knn_graph()~~ Improve the IVF-PQ Coarse Batch Size Workspace Estimation Mar 25, 2026

achirkin approved these changes Mar 25, 2026

View reviewed changes

Merge branch 'release/26.04' into increase_min_ws_ratio

4cd3972

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve the IVF-PQ Coarse Batch Size Workspace Estimation#1937

Improve the IVF-PQ Coarse Batch Size Workspace Estimation#1937
julianmi wants to merge 4 commits intorapidsai:release/26.04from
julianmi:increase_min_ws_ratio

julianmi commented Mar 20, 2026

Uh oh!

aamijar left a comment

Uh oh!

achirkin left a comment

Uh oh!

julianmi commented Mar 25, 2026

Uh oh!

achirkin left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

julianmi commented Mar 20, 2026

Uh oh!

aamijar left a comment

Choose a reason for hiding this comment

Uh oh!

achirkin left a comment

Choose a reason for hiding this comment

Uh oh!

julianmi commented Mar 25, 2026

Uh oh!

achirkin left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants