GH-49677 [Python][C++][Compute] Add search sorted compute kernel by Alex-PLACET · Pull Request #49679 · apache/arrow

Alex-PLACET · 2026-04-07T09:41:02Z

Rationale for this change

Add the implemenation of the search sorted compute kernel based on the numpy function: https://numpy.org/doc/stable/reference/generated/numpy.searchsorted.html

What changes are included in this PR?

Implementation of the C++ kernel + Python API.
Tests in C++ and Python

Are these changes tested?

Yes

Are there any user-facing changes?

No breaking change

GitHub Issue: [C++][Python] Implement search_sorted kernel for all primitive types and run-end encoded arrays #49677

github-actions · 2026-04-07T09:41:36Z

⚠️ GitHub issue #49677 has been automatically assigned in GitHub to PR creator.

- Added a new benchmark file `vector_search_sorted_benchmark.cc` to evaluate the performance of the SearchSorted function for various data types including Int64, String, and Binary. - Created a comprehensive test suite in `vector_search_sorted_test.cc` to validate the correctness of SearchSorted across different scenarios, including handling of null values, scalar needles, and run-end encoded arrays. - Ensured that the benchmarks cover both left and right search options, as well as edge cases like empty arrays and arrays with leading/trailing nulls.

…rks for needles with null runs

…tation overview and flow

…lize ranges for leading/trailing null counts

…ensive tests for supported types

…xcept to length method

tadeja · 2026-04-08T13:28:24Z

python/pyarrow/_compute_docstrings.py

+    [
+        0,
+        0,
+        3,
+        5
+    ]


Suggested change

[

0,

0,

3,

5

]

[

0,

0,

3,

5

]

@Alex-PLACET just noticing that the elements here require two space chars to be removed. Similarly for the next two example outputs.

pitrou

I haven't looked at the implementation yet, but have reviewed the tests and benchmarks (which are quite comprehensive, thank you!).

One missing item is support for chunked arrays. Besides that, see comments below :)

pitrou · 2026-04-15T13:37:43Z

cpp/src/arrow/compute/kernels/vector_search_sorted_test.cc

+                                    SearchSortedOptions(SearchSortedOptions::Left)));
+  ASSERT_OK_AND_ASSIGN(auto right,
+                       SearchSorted(Datum(values), Datum(needles),
+                                    SearchSortedOptions(SearchSortedOptions::Right)));


Let's call ValidateFull() on both results?

pitrou · 2026-04-15T13:40:10Z

cpp/src/arrow/compute/kernels/vector_search_sorted_test.cc

+                                    SearchSortedOptions(SearchSortedOptions::Left)));
+  ASSERT_OK_AND_ASSIGN(auto right,
+                       SearchSorted(Datum(values), Datum(needles),
+                                    SearchSortedOptions(SearchSortedOptions::Right)));


Let's call ValidateFull on both results here too?

(also I'm curious, why not reuse CheckSimpleSearchSorted?)

pitrou · 2026-04-15T13:43:12Z

cpp/src/arrow/compute/kernels/vector_search_sorted_test.cc

+  std::string scalar_needle_json;
+  uint64_t expected_scalar_left;
+  uint64_t expected_scalar_right;


Note that you could also generate the scalar needle tests automatically by calling GetScalar on the array needles and the expected results. This would make this easier to maintain later.

pitrou · 2026-04-15T13:44:16Z

cpp/src/arrow/compute/kernels/vector_search_sorted_test.cc

+
+  AssertArraysEqual(*ArrayFromJSON(uint64(), "[0, 1, 3, 4]"), *result.make_array());
+}
+


Can we add tests for chunked arrays? (and, if not currently supported, support them :-))

pitrou · 2026-04-15T13:58:09Z

cpp/src/arrow/compute/kernels/vector_search_sorted_benchmark.cc

+  return std::static_pointer_cast<BinaryArray>(builder.Finish().ValueOrDie());
+}
+
+std::shared_ptr<BinaryArray> BuildBinaryNeedles(int64_t size_bytes) {


I don't think it's worth benchmarking both string and binary, as they are expected to perform similarly.

pitrou · 2026-04-15T13:58:49Z

cpp/src/arrow/compute/kernels/vector_search_sorted_benchmark.cc

+
+void SetSearchSortedArgs(benchmark::internal::Benchmark* bench) {
+  bench->Unit(benchmark::kMicrosecond);
+  for (const auto size : kMemorySizes) {


Unfortunately the largest size produces rather slow benchmark iterations, can we just keep {kL1Size, kL2Size}}?

Which essentially means that all benchmarks become "quick" as per the definition of quick here :)

pitrou · 2026-04-15T14:01:21Z

cpp/src/arrow/compute/kernels/vector_search_sorted_benchmark.cc

+  RunSearchSortedBenchmark(state, values, needles, side);
+}
+
+static void BM_SearchSortedInt64ScalarNeedle(benchmark::State& state,


If there is just one needle to search for, we're just benchmarking the function call overhead rather than any significant part of the sorted search kernel, right? Is it useful?

(benchmarks for other compute functions focus on array performance, not scalar performance)

pitrou · 2026-04-15T14:05:01Z

cpp/src/arrow/compute/kernels/vector_search_sorted_benchmark.cc

+    ->Apply(SetSearchSortedArgs);
+
+// String and binary scalar cases specifically exercise the direct scalar fast path that
+// avoids boxing a scalar needle into a temporary one-element array.


Even if we wanted to benchmark this (which I doubt we do), we would only need a single benchmark IMHO.

github-actions bot added Component: C++ Component: Python Component: Documentation awaiting review Awaiting review labels Apr 7, 2026

Alex-PLACET marked this pull request as ready for review April 7, 2026 12:57

Alex-PLACET requested review from AlenkaF, raulcd and rok as code owners April 7, 2026 12:57

Alex-PLACET added 6 commits April 8, 2026 10:14

Refactor vector_search_sorted kernel to use ArrayData and add benchma…

a47c6e5

…rks for needles with null runs

Enhance documentation for search_sorted kernel with detailed implemen…

f62b93d

…tation overview and flow

Refactor vector_search_sorted kernel to improve null handling and uti…

b826e49

…lize ranges for leading/trailing null counts

Refactor search_sorted kernel: improve error messages and add compreh…

4460a03

…ensive tests for supported types

Formatting

4ec630e

Alex-PLACET force-pushed the add_search_sorted_compute_kernel branch from 8e09ea3 to 4ec630e Compare April 8, 2026 08:15

Refactor vector_search_sorted kernel: enhance readability and add noe…

56146c2

…xcept to length method

tadeja reviewed Apr 8, 2026

View reviewed changes

github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Apr 8, 2026

Refactor search_sorted documentation: adjust indentation for clarity

9def02c

pitrou requested changes Apr 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH-49677 [Python][C++][Compute] Add search sorted compute kernel#49679

GH-49677 [Python][C++][Compute] Add search sorted compute kernel#49679
Alex-PLACET wants to merge 8 commits intoapache:mainfrom
Alex-PLACET:add_search_sorted_compute_kernel

Alex-PLACET commented Apr 7, 2026 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Apr 7, 2026

Uh oh!

tadeja Apr 8, 2026

Uh oh!

pitrou left a comment

Uh oh!

pitrou Apr 15, 2026

Uh oh!

pitrou Apr 15, 2026

Uh oh!

pitrou Apr 15, 2026

Uh oh!

pitrou Apr 15, 2026

Uh oh!

pitrou Apr 15, 2026

Uh oh!

pitrou Apr 15, 2026

Uh oh!

pitrou Apr 15, 2026

Uh oh!

pitrou Apr 15, 2026

Uh oh!

pitrou Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

+                  [
+,
+,
+,
+
+                  ]

-    [
-,
-,
-,
-    ]
+    [
+,
+,
+,
+    ]


		AssertArraysEqual(ArrayFromJSON(uint64(), "[0, 1, 3, 4]"), result.make_array());
		}

Conversation

Alex-PLACET commented Apr 7, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

github-actions bot commented Apr 7, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pitrou left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Alex-PLACET commented Apr 7, 2026 •

edited by github-actions bot

Loading