[FEA] Add Batching to KMeans by tarang-jain · Pull Request #1886 · rapidsai/cuvs

tarang-jain · 2026-03-06T00:16:25Z

Merge after #1880

This PR adds support for streaming out of core (dataset on host) kmeans clustering. The idea is simple:

Batched accumulation of centroid updates: Data is processed in batches and batch-wise means and cluster counts are accumulated until all the batches i.e., the full dataset pass has completed.
This PR just brings a batch-size parameter to load and compute cluster assignments and (weighted) centroid adjustments on batches of the dataset. The final centroid 'updates' i.e. a single kmeans iteration only completes when all these accumulated sums are averaged once the whole dataset pass has completed.

…into minibatch-kmeans

…h-kmeans

…into minibatch-kmeans

jinsolp

Adding some questions and a suggestion!

python/cuvs/cuvs/tests/test_kmeans.py

cpp/src/cluster/detail/kmeans_batched.cuh

achirkin · 2026-03-23T11:32:22Z

Thanks for working on this much-needed feature, @tarang-jain ! Could you please add a short paragraph to the PR description telling how the new batching is implemented? This is extremely helpful not only for reviews, but also for future revisions using tools like git blame, because the PR description is copied to the commit message in the main branch history.

viclafargue

Thanks @tarang-jain! Here are my comments.

c/src/cluster/kmeans.cpp

cpp/src/cluster/detail/kmeans_batched.cuh

viclafargue · 2026-03-23T13:33:44Z

cpp/tests/cluster/kmeans.cu

+    if (score < 1.0) {
+      //   std::stringstream ss;
+      //   ss << "Expected: " << raft::arr2Str(d_labels_ref.data(), 25, "d_labels_ref", stream);
+      //   std::cout << (ss.str().c_str()) << '\n';
+      //   ss.str(std::string());
+      //   ss << "Actual: " << raft::arr2Str(d_labels.data(), 25, "d_labels", stream);
+      //   std::cout << (ss.str().c_str()) << '\n';
+      //   std::cout << "Score = " << score << '\n';
+    }


What is this code for?

Its a debug print. It is also present in the other kmeans and kmeans_balanced tests

cpp/src/cluster/detail/kmeans_batched.cuh

tarang-jain · 2026-03-23T21:43:08Z

@achirkin I updated the PR desc.

Co-authored-by: Victor Lafargue <viclafargue@nvidia.com>

…into batched-kmeans

tarang-jain · 2026-03-24T04:23:32Z

I have renamed batch_size to streaming_batch_size to make its purpose clearer and differentiate it from the batch_samples and batch_centroids parameters.

viclafargue

Thanks for the great work @tarang-jain! LGTM, just minor comments left.

viclafargue · 2026-03-25T08:55:24Z

cpp/src/cluster/detail/kmeans_batched.cuh

+        // Inertia for the last iteration is always computed
+        if (!params.inertia_check) {
+          raft::copy(inertia.data_handle(), clustering_cost.data_handle(), 1, stream);
+          raft::resource::sync_stream(handle);
+        }


Suggested change

// Inertia for the last iteration is always computed

if (!params.inertia_check) {

raft::copy(inertia.data_handle(), clustering_cost.data_handle(), 1, stream);

raft::resource::sync_stream(handle);

}

// Inertia for the last iteration is always computed after KMeans training

Do we truly need this copy here? This is probably dead code since we are computing the inertia unconditionally at the end.

viclafargue · 2026-03-25T08:57:34Z

python/cuvs/cuvs/tests/test_kmeans.py

+        centroids_regular, centroids_batched, rtol=1e-3, atol=1e-3
+    ), f"max diff: {np.max(np.abs(centroids_regular - centroids_batched))}"
+
+    print(inertia_regular, inertia_batched)


Do we want to keep the print here?

viclafargue · 2026-03-25T09:01:07Z

cpp/src/cluster/detail/kmeans_batched.cuh

+  auto workspace = rmm::device_uvector<char>(
+    batch_data.extent(0), stream, raft::resource::get_workspace_resource(handle));


This is called once per batch. We could maybe move this outside of the loop in the caller function?

viclafargue · 2026-03-25T09:07:04Z

cpp/include/cuvs/cluster/kmeans.hpp

+void fit(raft::resources const& handle,
+         const cuvs::cluster::kmeans::params& params,
+         raft::host_matrix_view<const float, int> X,
+         std::optional<raft::host_vector_view<const float, int>> sample_weight,
+         raft::device_matrix_view<float, int> centroids,
+         raft::host_scalar_view<float> inertia,
+         raft::host_scalar_view<int> n_iter);


It looks like the host fit function are provided for both <T, int> and <T, int64_t>, but the predict and fit_predict are not (only int64_t). Is this something that is expected? Shouldn't we only expose int64_t even for device functions? Are there performance implications?

viclafargue · 2026-03-25T09:11:36Z

cpp/src/cluster/detail/kmeans_common.cuh

+
+  workspace.resize(n_samples, stream);
+
+  raft::linalg::reduce_rows_by_key(const_cast<DataT*>(X.data_handle()),


Suggested change

raft::linalg::reduce_rows_by_key(const_cast<DataT*>(X.data_handle()),

raft::linalg::reduce_rows_by_key(X.data_handle(),

The casting is unnecessary here as the first argument is const anyway.

tarang-jain and others added 30 commits January 9, 2026 13:59

first commit (unclean)

ca07c08

Merge branch 'main' into minibatch-kmeans

bc872c8

Merge branch 'main' into minibatch-kmeans

daf6d6e

style

f1a19df

Merge branch 'minibatch-kmeans' of https://github.com/tarang-jain/cuvs …

181d536

…into minibatch-kmeans

copyright

0fa00b0

Merge branch 'main' into minibatch-kmeans

371543f

Merge branch 'main' into minibatch-kmeans

c81650c

python test

fcbdda5

minibatch first commit

d6ed934

fix docs

5d4b498

replace thrust calls:

72fe789

Merge branch 'main' of https://github.com/rapidsai/cuvs into minibatc…

aefae6e

…h-kmeans

common function in helper

526ac04

Merge branch 'main' into minibatch-kmeans

e9c85b9

Merge branch 'main' into minibatch-kmeans

1efadde

Merge branch 'main' into minibatch-kmeans

ee45045

fix templates

9b6f1ef

Merge branch 'minibatch-kmeans' of https://github.com/tarang-jain/cuvs …

ad20d0a

…into minibatch-kmeans

namespace and init fixes

4b65df5

fix docs in main header

5eb2be5

several fixes

c23985a

Merge branch 'main' into minibatch-kmeans

c103f87

rm lower precision

9d87a5f

Merge branch 'minibatch-kmeans' of https://github.com/tarang-jain/cuvs …

5bcec91

…into minibatch-kmeans

rm unnecessary unary-ops

a618ed5

rm unnecessary unary-ops

3b86325

minibatch allocations are conditional

f1b4835

cleanup extraneous docs

d2d3f4b

revert changes to get_dataset

639147a

fix test

9b6ec37

jinsolp reviewed Mar 21, 2026

View reviewed changes

python/cuvs/cuvs/tests/test_kmeans.py Outdated Show resolved Hide resolved

cpp/src/cluster/detail/kmeans_batched.cuh Outdated Show resolved Hide resolved

cpp/src/cluster/detail/kmeans_batched.cuh Show resolved Hide resolved

viclafargue reviewed Mar 23, 2026

View reviewed changes

tarang-jain and others added 11 commits March 23, 2026 18:30

fix failing cpp test

9266ad0

fix batching

c9bc395

create accum batch sums buffer outside

c1026f2

rm unused array

509facb

fix compilation

88e0e74

fix C side host compatibility check

f6620b7

Co-authored-by: Victor Lafargue <viclafargue@nvidia.com>

update pytest

d79cf2e

Merge branch 'batched-kmeans' of https://github.com/tarang-jain/cuvs …

f72a2d4

…into batched-kmeans

combine batch_size warning

a577ee5

style

ebc52b0

fix python layout handling

39bac81

tarang-jain mentioned this pull request Mar 24, 2026

[FEA] KMeans - Within Iteration Inertia Computation Should Account for Sample Weights #1940

Open

rename batch_size

c195218

tarang-jain added 7 commits March 23, 2026 21:31

rename in c header

2fd1409

rm extra commentws

e95a8ab

add sample weights in python tests

6cfc230

rm unused python imports

c6eaecb

sample weights need to be scaled

28c4196

alter pytest, fix failing cpp test

5bee94e

inertia computation for ooc predict should be weighted

1b0cc65

tarang-jain requested review from lowener and viclafargue March 24, 2026 16:19

Merge branch 'release/26.04' into batched-kmeans

6b1dcd5

viclafargue approved these changes Mar 25, 2026

View reviewed changes

		auto workspace = rmm::device_uvector<char>(
		batch_data.extent(0), stream, raft::resource::get_workspace_resource(handle));


		workspace.resize(n_samples, stream);

		raft::linalg::reduce_rows_by_key(const_cast<DataT*>(X.data_handle()),

	raft::linalg::reduce_rows_by_key(const_cast<DataT*>(X.data_handle()),
	raft::linalg::reduce_rows_by_key(X.data_handle(),

Conversation

tarang-jain commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jinsolp left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

achirkin commented Mar 23, 2026

Uh oh!

viclafargue left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

viclafargue Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

tarang-jain Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tarang-jain commented Mar 23, 2026

Uh oh!

tarang-jain commented Mar 24, 2026

Uh oh!

viclafargue left a comment

Choose a reason for hiding this comment

Uh oh!

viclafargue Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

viclafargue Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

viclafargue Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

viclafargue Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

viclafargue Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

tarang-jain commented Mar 6, 2026 •

edited

Loading