GH-49677 [Python][C++][Compute] Add search sorted compute kernel#49679
GH-49677 [Python][C++][Compute] Add search sorted compute kernel#49679Alex-PLACET wants to merge 8 commits intoapache:mainfrom
Conversation
|
|
- Added a new benchmark file `vector_search_sorted_benchmark.cc` to evaluate the performance of the SearchSorted function for various data types including Int64, String, and Binary. - Created a comprehensive test suite in `vector_search_sorted_test.cc` to validate the correctness of SearchSorted across different scenarios, including handling of null values, scalar needles, and run-end encoded arrays. - Ensured that the benchmarks cover both left and right search options, as well as edge cases like empty arrays and arrays with leading/trailing nulls.
…rks for needles with null runs
…tation overview and flow
…lize ranges for leading/trailing null counts
…ensive tests for supported types
8e09ea3 to
4ec630e
Compare
…xcept to length method
| [ | ||
| 0, | ||
| 0, | ||
| 3, | ||
| 5 | ||
| ] |
There was a problem hiding this comment.
| [ | |
| 0, | |
| 0, | |
| 3, | |
| 5 | |
| ] | |
| [ | |
| 0, | |
| 0, | |
| 3, | |
| 5 | |
| ] |
@Alex-PLACET just noticing that the elements here require two space chars to be removed. Similarly for the next two example outputs.
pitrou
left a comment
There was a problem hiding this comment.
I haven't looked at the implementation yet, but have reviewed the tests and benchmarks (which are quite comprehensive, thank you!).
One missing item is support for chunked arrays. Besides that, see comments below :)
| SearchSortedOptions(SearchSortedOptions::Left))); | ||
| ASSERT_OK_AND_ASSIGN(auto right, | ||
| SearchSorted(Datum(values), Datum(needles), | ||
| SearchSortedOptions(SearchSortedOptions::Right))); |
There was a problem hiding this comment.
Let's call ValidateFull() on both results?
| SearchSortedOptions(SearchSortedOptions::Left))); | ||
| ASSERT_OK_AND_ASSIGN(auto right, | ||
| SearchSorted(Datum(values), Datum(needles), | ||
| SearchSortedOptions(SearchSortedOptions::Right))); |
There was a problem hiding this comment.
Let's call ValidateFull on both results here too?
(also I'm curious, why not reuse CheckSimpleSearchSorted?)
| std::string scalar_needle_json; | ||
| uint64_t expected_scalar_left; | ||
| uint64_t expected_scalar_right; |
There was a problem hiding this comment.
Note that you could also generate the scalar needle tests automatically by calling GetScalar on the array needles and the expected results. This would make this easier to maintain later.
|
|
||
| AssertArraysEqual(*ArrayFromJSON(uint64(), "[0, 1, 3, 4]"), *result.make_array()); | ||
| } | ||
|
|
There was a problem hiding this comment.
Can we add tests for chunked arrays? (and, if not currently supported, support them :-))
| return std::static_pointer_cast<BinaryArray>(builder.Finish().ValueOrDie()); | ||
| } | ||
|
|
||
| std::shared_ptr<BinaryArray> BuildBinaryNeedles(int64_t size_bytes) { |
There was a problem hiding this comment.
I don't think it's worth benchmarking both string and binary, as they are expected to perform similarly.
|
|
||
| void SetSearchSortedArgs(benchmark::internal::Benchmark* bench) { | ||
| bench->Unit(benchmark::kMicrosecond); | ||
| for (const auto size : kMemorySizes) { |
There was a problem hiding this comment.
Unfortunately the largest size produces rather slow benchmark iterations, can we just keep {kL1Size, kL2Size}}?
There was a problem hiding this comment.
Which essentially means that all benchmarks become "quick" as per the definition of quick here :)
| RunSearchSortedBenchmark(state, values, needles, side); | ||
| } | ||
|
|
||
| static void BM_SearchSortedInt64ScalarNeedle(benchmark::State& state, |
There was a problem hiding this comment.
If there is just one needle to search for, we're just benchmarking the function call overhead rather than any significant part of the sorted search kernel, right? Is it useful?
(benchmarks for other compute functions focus on array performance, not scalar performance)
| ->Apply(SetSearchSortedArgs); | ||
|
|
||
| // String and binary scalar cases specifically exercise the direct scalar fast path that | ||
| // avoids boxing a scalar needle into a temporary one-element array. |
There was a problem hiding this comment.
Even if we wanted to benchmark this (which I doubt we do), we would only need a single benchmark IMHO.
Rationale for this change
Add the implemenation of the search sorted compute kernel based on the numpy function: https://numpy.org/doc/stable/reference/generated/numpy.searchsorted.html
What changes are included in this PR?
Implementation of the C++ kernel + Python API.
Tests in C++ and Python
Are these changes tested?
Yes
Are there any user-facing changes?
No breaking change
search_sortedkernel for all primitive types and run-end encoded arrays #49677