Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
158 changes: 158 additions & 0 deletions include/cuco/detail/open_addressing/open_addressing_ref_impl.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -962,6 +962,164 @@ class open_addressing_ref_impl {
}
}

/**
* @brief Executes a callback on every element in the container with key equivalent to the probe
* key.
*
* @note Passes an un-incrementable input iterator to the element whose key is equivalent to
* `key` to the callback.
*
* @tparam ProbeKey Input type which is convertible to 'key_type'
* @tparam CallbackOp Unary callback functor or device lambda
*
* @param key The key to search for
* @param callback_op Function to call on every element found
*/
template <class ProbeKey, class CallbackOp>
__device__ void for_each(ProbeKey const& key, CallbackOp&& callback_op) const noexcept
{
static_assert(cg_size == 1, "Non-CG operation is incompatible with the current probing scheme");
auto probing_iter = this->probing_scheme_(key, this->storage_ref_.window_extent());

while (true) {
// TODO atomic_ref::load if insert operator is present
auto const window_slots = this->storage_ref_[*probing_iter];

for (int32_t i = 0; i < window_size; ++i) {
switch (
this->predicate_.operator()<is_insert::NO>(key, this->extract_key(window_slots[i]))) {
case detail::equal_result::EMPTY: {
return;
}
case detail::equal_result::EQUAL: {
callback_op(const_iterator{&(*(this->storage_ref_.data() + *probing_iter))[i]});
continue;
}
default: continue;
}
}
++probing_iter;
}
}

/**
* @brief Executes a callback on every element in the container with key equivalent to the probe
* key.
*
* @note Passes an un-incrementable input iterator to the element whose key is equivalent to
* `key` to the callback.
*
* @note This function uses cooperative group semantics, meaning that any thread may call the
* callback if it finds a matching element. If multiple elements are found within the same group,
* each thread with a match will call the callback with its associated element.
*
* @note Synchronizing `group` within `callback_op` is undefined behavior.
*
* @tparam ProbeKey Input type which is convertible to 'key_type'
* @tparam CallbackOp Unary callback functor or device lambda
*
* @param group The Cooperative Group used to perform this operation
* @param key The key to search for
* @param callback_op Function to call on every element found
*/
template <class ProbeKey, class CallbackOp>
__device__ void for_each(cooperative_groups::thread_block_tile<cg_size> const& group,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why the unit test is failing. Seems like the logic in this function is flawed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I found the problem. #509 should fix the issue.

ProbeKey const& key,
CallbackOp&& callback_op) const noexcept
{
auto probing_iter = this->probing_scheme_(group, key, this->storage_ref_.window_extent());
bool empty = false;

while (true) {
// TODO atomic_ref::load if insert operator is present
auto const window_slots = this->storage_ref_[*probing_iter];

for (int32_t i = 0; i < window_size and !empty; ++i) {
switch (
this->predicate_.operator()<is_insert::NO>(key, this->extract_key(window_slots[i]))) {
case detail::equal_result::EMPTY: {
empty = true;
continue;
}
case detail::equal_result::EQUAL: {
callback_op(const_iterator{&(*(this->storage_ref_.data() + *probing_iter))[i]});
continue;
}
default: {
continue;
}
}
}
if (group.any(empty)) { return; }

++probing_iter;
}
}

/**
* @brief Executes a callback on every element in the container with key equivalent to the probe
* key and can additionally perform work that requires synchronizing the Cooperative Group
* performing this operation.
*
* @note Passes an un-incrementable input iterator to the element whose key is equivalent to
* `key` to the callback.
*
* @note This function uses cooperative group semantics, meaning that any thread may call the
* callback if it finds a matching element. If multiple elements are found within the same group,
* each thread with a match will call the callback with its associated element.
*
* @note Synchronizing `group` within `callback_op` is undefined behavior.
*
* @note The `sync_op` function can be used to perform work that requires synchronizing threads in
* `group` inbetween probing steps, where the number of probing steps performed between
* synchronization points is capped by `window_size * cg_size`. The functor will be called right
* after the current probing window has been traversed.
*
* @tparam ProbeKey Input type which is convertible to 'key_type'
* @tparam CallbackOp Unary callback functor or device lambda
* @tparam SyncOp Functor or device lambda which accepts the current `group` object
*
* @param group The Cooperative Group used to perform this operation
* @param key The key to search for
* @param callback_op Function to call on every element found
* @param sync_op Function that is allowed to synchronize `group` inbetween probing windows
*/
template <class ProbeKey, class CallbackOp, class SyncOp>
__device__ void for_each(cooperative_groups::thread_block_tile<cg_size> const& group,
ProbeKey const& key,
CallbackOp&& callback_op,
SyncOp&& sync_op) const noexcept
{
auto probing_iter = this->probing_scheme_(group, key, this->storage_ref_.window_extent());
bool empty = false;

while (true) {
// TODO atomic_ref::load if insert operator is present
auto const window_slots = this->storage_ref_[*probing_iter];

for (int32_t i = 0; i < window_size and !empty; ++i) {
switch (
this->predicate_.operator()<is_insert::NO>(key, this->extract_key(window_slots[i]))) {
case detail::equal_result::EMPTY: {
empty = true;
continue;
}
case detail::equal_result::EQUAL: {
callback_op(const_iterator{&(*(this->storage_ref_.data() + *probing_iter))[i]});
continue;
}
default: {
continue;
}
}
}
sync_op(group);
if (group.any(empty)) { return; }

++probing_iter;
}
}

/**
* @brief Compares the content of the address `address` (old value) with the `expected` value and,
* only if they are the same, sets the content of `address` to `desired`.
Expand Down
110 changes: 110 additions & 0 deletions include/cuco/detail/static_multiset/static_multiset_ref.inl
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,8 @@

#include <cooperative_groups.h>

#include <utility>

namespace cuco {

template <typename Key,
Expand Down Expand Up @@ -446,6 +448,114 @@ class operator_impl<
}
};

template <typename Key,
cuda::thread_scope Scope,
typename KeyEqual,
typename ProbingScheme,
typename StorageRef,
typename... Operators>
class operator_impl<
op::for_each_tag,
static_multiset_ref<Key, Scope, KeyEqual, ProbingScheme, StorageRef, Operators...>> {
using base_type = static_multiset_ref<Key, Scope, KeyEqual, ProbingScheme, StorageRef>;
using ref_type =
static_multiset_ref<Key, Scope, KeyEqual, ProbingScheme, StorageRef, Operators...>;

static constexpr auto cg_size = base_type::cg_size;

public:
/**
* @brief Executes a callback on every element in the container with key equivalent to the probe
* key.
*
* @note Passes an un-incrementable input iterator to the element whose key is equivalent to
* `key` to the callback.
*
* @tparam ProbeKey Input type which is convertible to 'key_type'
* @tparam CallbackOp Unary callback functor or device lambda
*
* @param key The key to search for
* @param callback_op Function to call on every element found
*/
template <class ProbeKey, class CallbackOp>
__device__ void for_each(ProbeKey const& key, CallbackOp&& callback_op) const noexcept
{
// CRTP: cast `this` to the actual ref type
auto const& ref_ = static_cast<ref_type const&>(*this);
ref_.impl_.for_each(key, std::forward<CallbackOp>(callback_op));
}

/**
* @brief Executes a callback on every element in the container with key equivalent to the probe
* key.
*
* @note Passes an un-incrementable input iterator to the element whose key is equivalent to
* `key` to the callback.
*
* @note This function uses cooperative group semantics, meaning that any thread may call the
* callback if it finds a matching element. If multiple elements are found within the same group,
* each thread with a match will call the callback with its associated element.
*
* @note Synchronizing `group` within `callback_op` is undefined behavior.
*
* @tparam ProbeKey Input type which is convertible to 'key_type'
* @tparam CallbackOp Unary callback functor or device lambda
*
* @param group The Cooperative Group used to perform this operation
* @param key The key to search for
* @param callback_op Function to call on every element found
*/
template <class ProbeKey, class CallbackOp>
__device__ void for_each(cooperative_groups::thread_block_tile<cg_size> const& group,
ProbeKey const& key,
CallbackOp&& callback_op) const noexcept
{
// CRTP: cast `this` to the actual ref type
auto const& ref_ = static_cast<ref_type const&>(*this);
ref_.impl_.for_each(group, key, std::forward<CallbackOp>(callback_op));
}

/**
* @brief Executes a callback on every element in the container with key equivalent to the probe
* key and can additionally perform work that requires synchronizing the Cooperative Group
* performing this operation.
*
* @note Passes an un-incrementable input iterator to the element whose key is equivalent to
* `key` to the callback.
*
* @note This function uses cooperative group semantics, meaning that any thread may call the
* callback if it finds a matching element. If multiple elements are found within the same group,
* each thread with a match will call the callback with its associated element.
*
* @note Synchronizing `group` within `callback_op` is undefined behavior.
*
* @note The `sync_op` function can be used to perform work that requires synchronizing threads in
* `group` inbetween probing steps, where the number of probing steps performed between
* synchronization points is capped by `window_size * cg_size`. The functor will be called right
* after the current probing window has been traversed.
*
* @tparam ProbeKey Input type which is convertible to 'key_type'
* @tparam CallbackOp Unary callback functor or device lambda
* @tparam SyncOp Functor or device lambda which accepts the current `group` object
*
* @param group The Cooperative Group used to perform this operation
* @param key The key to search for
* @param callback_op Function to call on every element found
* @param sync_op Function that is allowed to synchronize `group` inbetween probing windows
*/
template <class ProbeKey, class CallbackOp, class SyncOp>
__device__ void for_each(cooperative_groups::thread_block_tile<cg_size> const& group,
ProbeKey const& key,
CallbackOp&& callback_op,
SyncOp&& sync_op) const noexcept
{
// CRTP: cast `this` to the actual ref type
auto const& ref_ = static_cast<ref_type const&>(*this);
ref_.impl_.for_each(
group, key, std::forward<CallbackOp>(callback_op), std::forward<SyncOp>(sync_op));
}
};

template <typename Key,
cuda::thread_scope Scope,
typename KeyEqual,
Expand Down
6 changes: 6 additions & 0 deletions include/cuco/operator.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,12 @@ struct count_tag {
struct find_tag {
} inline constexpr find; ///< `cuco::find` operator

/**
* @brief `for_each` operator tag
*/
struct for_each_tag {
} inline constexpr for_each; ///< `cuco::for_each` operator
Comment on lines +68 to +69
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see for_each as an internal utility as opposed to an actual hash table operator. Need to think more on this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my standpoint I would treat it as an extension to the STL API that is more suitable for the GPU. Having a "cooperative iterator" instead, which would be closer to the spirit of modern C++ has its drawbacks. For example, how do we ensure users only increment the iterator with the same CG? for_each solves this problem by making the probing part internal. We should even be able to redefine any lookup function (find, count, retrieve) that relies on probing with for_each, giving us a proper abstraction layer for probing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a side note I personally find this funtional approach, i.e., "for each found key do X" very appealing. Historic evidence that it is indeed useful comes from warpcore, where many downstream applications (mostly genomics stuff) implemented their custom lookup operations through for_each functors.


} // namespace op
} // namespace cuco

Expand Down
3 changes: 2 additions & 1 deletion tests/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,8 @@ ConfigureTest(STATIC_MULTISET_TEST
static_multiset/count_test.cu
static_multiset/custom_count_test.cu
static_multiset/find_test.cu
static_multiset/insert_test.cu)
static_multiset/insert_test.cu
static_multiset/for_each_test.cu)

###################################################################################################
# - static_multimap tests -------------------------------------------------------------------------
Expand Down
Loading