Skip to content

Improve the IVF-PQ Coarse Batch Size Workspace Estimation#1937

Open
julianmi wants to merge 4 commits intorapidsai:release/26.04from
julianmi:increase_min_ws_ratio
Open

Improve the IVF-PQ Coarse Batch Size Workspace Estimation#1937
julianmi wants to merge 4 commits intorapidsai:release/26.04from
julianmi:increase_min_ws_ratio

Conversation

@julianmi
Copy link
Contributor

OpenAI 5M dataset runs OOM when using quantized data (float to int8) as show in this example. The kMinWorkspaceRatio is too small for the int8 data such that the IVF-PQ search max_internal_batch_size is not reduced. Doubling kMinWorkspaceRatio fixes this. There is no performance degradation for the OpenAI 5M float dataset with this change. Let me know if I should test more datasets.

CC @tfeher

@julianmi julianmi requested a review from a team as a code owner March 20, 2026 17:34
@aamijar aamijar added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Mar 20, 2026
@aamijar aamijar moved this to In Progress in Unstructured Data Processing Mar 20, 2026
Copy link
Member

@aamijar aamijar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @julianmi! Is this targeted for release/26.04?

@tfeher tfeher requested a review from achirkin March 24, 2026 10:21
@julianmi julianmi force-pushed the increase_min_ws_ratio branch from 794088a to 9a14aca Compare March 24, 2026 10:30
@julianmi julianmi requested review from a team as code owners March 24, 2026 10:30
@julianmi julianmi requested a review from bdice March 24, 2026 10:30
@julianmi julianmi changed the base branch from main to release/26.04 March 24, 2026 10:31
@achirkin achirkin removed request for a team March 24, 2026 10:47
Copy link
Contributor

@achirkin achirkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @julianmi for the PR!
I think the current fix most likely helps to avoid the OOM for all reasonable use cases. However, it doesn't address the problem: IVF-PQ search shouldn't ever fail with OOM - it does workspace calculation internally and should scale correctly.
Looking at the IVF-PQ search code, I think we need to make the coarse batch size estimate more precise. Something like this should work (untested):

inline auto get_max_coarse_batch_size(raft::resources const& res,
                                      const search_params& params,
                                      uint32_t n_probes,
                                      uint32_t n_lists,
                                      uint32_t n_queries,
                                      uint32_t dim_ext,
                                      uint32_t rot_dim) -> uint32_t
{
  size_t gemm_elem_size;
  size_t qc_elem_size;
  switch (params.coarse_search_dtype) {
    case CUDA_R_32F: gemm_elem_size = 4; qc_elem_size = 4; break;
    case CUDA_R_16F: gemm_elem_size = 2; qc_elem_size = 2; break;
    case CUDA_R_8I: gemm_elem_size = 1; qc_elem_size = 4; break;
    default: RAFT_FAIL("Unexpected coarse_search_dtype (%d)", int(params.coarse_search_dtype));
  }
  // Persistent allocations that live for the entire search call.
  auto persistent_per_query = static_cast<size_t>(dim_ext) * gemm_elem_size
                            + static_cast<size_t>(rot_dim) * sizeof(float)
                            + static_cast<size_t>(n_probes) * sizeof(uint32_t);
  // Transient allocations during coarse search (select_clusters): qc_distances + cluster_dists.
  auto transient_per_query =
    static_cast<size_t>(n_lists + n_probes) * qc_elem_size;
  auto total_per_query = persistent_per_query + transient_per_query;
  auto max_per_ws      = raft::resource::get_workspace_free_bytes(res) / total_per_query;
  return std::max<uint32_t>(
    1,
    std::min<uint32_t>(max_per_ws,
                       std::min<uint32_t>(params.max_internal_batch_size, n_queries)));
}

@julianmi
Copy link
Contributor Author

Thanks @julianmi for the PR! I think the current fix most likely helps to avoid the OOM for all reasonable use cases. However, it doesn't address the problem: IVF-PQ search shouldn't ever fail with OOM - it does workspace calculation internally and should scale correctly. Looking at the IVF-PQ search code, I think we need to make the [coarse batch size estimate](https://github.com/rapidsai/cuvs/blob/3f77e4680c8a67df74ba337159397bdf07f75b53/cpp/src/neighbors/ivf_pq/ivf_p

Thank you, this is the much better fix. As discussed offline, we need to keep the max_per_ws / 2 factor to keep space for the inner loop batching.

@julianmi julianmi changed the title Increase kMinWorkspaceRatio in build_knn_graph() Improve the IVF-PQ Coarse Batch Size Workspace Estimation Mar 25, 2026
Copy link
Contributor

@achirkin achirkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

3 participants