diff --git a/doc/changelog.qbk b/doc/changelog.qbk index 56869615..e8169d3f 100644 --- a/doc/changelog.qbk +++ b/doc/changelog.qbk @@ -7,6 +7,11 @@ [section:changelog Changelog] +[heading Next release] + +* Added `pick` command for `algorithm::reduce`, which selects an arbitrary subset of bins from a category axis; unlike `slice`, the bins do not have to be adjacent + * New trait `axis::traits::is_pickable` detects whether an axis supports picking; user-defined axes can opt-in by adding a special constructor, see the Axis concept + [heading Boost 1.89] * Update CMake minimum version and Python detection in CMake @@ -48,7 +53,7 @@ * Replace `detail::span` and `detail::make_span` with implementations in `boost::core` * Documentation improvements * Protect usage of `std::min` and `std::max` in some cases, contributed by Han Jiang (min,max macros are illegially set by popular Windows headers so we need to work around) -* Added test to catch usage of unprotected min,max tokens in the library in the future +* Added test to catch usage of unprotected min,max tokens in the library in the future * Fixes to support latest clang-14 and deduction guides in gcc-11+ [heading Boost 1.81] diff --git a/doc/concepts/Axis.qbk b/doc/concepts/Axis.qbk index dc13388c..a6668f12 100644 --- a/doc/concepts/Axis.qbk +++ b/doc/concepts/Axis.qbk @@ -53,6 +53,7 @@ An [*Axis] maps input values to indices. It holds state specific to that axis, l * `a` and `b` are values of type `A` * `i` and `j` are indices of type [headerref boost/histogram/fwd.hpp `boost::histogram::axis::index_type`] * `n` is a value of type `unsigned` +* `v` is a value of type `std::vector` * `M` is a metadata type that is [@https://en.cppreference.com/w/cpp/named_req/DefaultConstructible DefaultConstructible], [@https://en.cppreference.com/w/cpp/named_req/CopyConstructible CopyConstructible] and [@https://en.cppreference.com/w/cpp/named_req/CopyAssignable CopyAssignable]. It it supports moves, it must be *nothrow* [@https://en.cppreference.com/w/cpp/named_req/MoveAssignable MoveAssignable]. * `ar` is a value of an archive with Boost.Serialization semantics @@ -72,6 +73,13 @@ An [*Axis] maps input values to indices. It holds state specific to that axis, l Special constructor used by the reduce algorithm. `a` is the original axis instance, `i` and `j` are the index range to keep in the reduced axis. If `n` is larger than 1, `n` adjacent bins are merged into one larger cell. If this constructor is not implemented, [funcref boost::histogram::algorithm::reduce] throws an exception on an attempt to reduce this axis. ] ] +[ + [`A(a, v)`] + [] + [ + Special constructor used by the reduce algorithm to handle the pick command. `a` is the original axis instance, `v` is a `std::vector` of [headerref boost/histogram/fwd.hpp `boost::histogram::axis::index_type`] with the indices of the bins to keep, in the order in which they should appear in the new axis. Should only be implemented for axes which are not ordered, like the category axis. If this constructor is not implemented, [funcref boost::histogram::algorithm::reduce] throws an exception on an attempt to pick bins from this axis. + ] +] [ [`a.options()`] [`unsigned`] diff --git a/doc/guide.qbk b/doc/guide.qbk index fdc23084..31f3d353 100644 --- a/doc/guide.qbk +++ b/doc/guide.qbk @@ -293,9 +293,9 @@ The library provides the [funcref boost::histogram::algorithm::project] function [section Reduction] -A projection removes an axis completely. A less drastic way to obtain a smaller histogram is the [funcref boost::histogram::algorithm::reduce reduce] function, which allows one to /slice/, /shrink/ or /rebin/ individual axes. +A projection removes an axis completely. A less drastic way to obtain a smaller histogram is the [funcref boost::histogram::algorithm::reduce reduce] function, which allows one to /slice/, /shrink/, /pick/ or /rebin/ individual axes. -Shrinking means that the value range of an axis is reduced and the number of bins along that axis. Slicing does the same, but is based on axis indices while shrinking is based on the axis values. To /rebin/ means that adjacent bins are merged into larger bins, the histogram is made coarser. For N adjacent bins, a new bin is formed which covers the common interval of the merged bins and has their added content. These two operations can be combined and applied to several axes at once. Doing it in one step is much more efficient than doing it in several steps. +Shrinking means that the value range of an axis is reduced and the number of bins along that axis. Slicing does the same, but is based on axis indices while shrinking is based on the axis values. To /rebin/ means that adjacent bins are merged into larger bins, the histogram is made coarser. For N adjacent bins, a new bin is formed which covers the common interval of the merged bins and has their added content. These two operations can be combined and applied to several axes at once. Doing it in one step is much more efficient than doing it in several steps. Picking selects an arbitrary subset of bins by index from an axis which is not ordered, like the category axis. Unlike a slice, the picked bins do not have to be adjacent. The [funcref boost::histogram::algorithm::reduce reduce] function does not change the total count if all modified axes in the histogram have underflow and overflow bins. Counts in removed bins are added to the corresponding under- and overflow bins. As in case of the [funcref boost::histogram::algorithm::project project] function, such a histogram is guaranteed to be identical to one obtained from filling the original data. diff --git a/examples/guide_histogram_reduction.cpp b/examples/guide_histogram_reduction.cpp index 67762530..5d55ff3a 100644 --- a/examples/guide_histogram_reduction.cpp +++ b/examples/guide_histogram_reduction.cpp @@ -8,10 +8,12 @@ #include #include +#include int main() { using namespace boost::histogram; // import reduce commands into local namespace to save typing + using algorithm::pick; using algorithm::rebin; using algorithm::shrink; using algorithm::slice; @@ -43,6 +45,22 @@ int main() { assert(h3.axis(0) == h.axis(0)); // unchanged assert(h3.axis(1) == axis::regular<>(2, 0.0, 2.0)); + + // pick selects an arbitrary subset of bins from an axis which is not ordered, like + // the category axis; unlike a slice, the picked bins do not have to be adjacent + auto h4 = make_histogram(axis::category({"red", "green", "blue"})); + + h4("red"); + h4("green"); + h4("blue"); + + // pick the bins for "blue" and "red", in that order + auto h5 = algorithm::reduce(h4, pick({2, 0})); + + assert(h5.axis(0) == axis::category({"blue", "red"})); + assert(h5.at(0) == 1 && h5.at(1) == 1); + // the count for "green" was moved to the overflow bin of the category axis + assert(h5.at(2) == 1); } //] diff --git a/include/boost/histogram/algorithm/reduce.hpp b/include/boost/histogram/algorithm/reduce.hpp index c342d6aa..40c568f8 100644 --- a/include/boost/histogram/algorithm/reduce.hpp +++ b/include/boost/histogram/algorithm/reduce.hpp @@ -7,6 +7,8 @@ #ifndef BOOST_HISTOGRAM_ALGORITHM_REDUCE_HPP #define BOOST_HISTOGRAM_ALGORITHM_REDUCE_HPP +#include +#include #include #include #include @@ -21,6 +23,8 @@ #include #include #include +#include +#include namespace boost { namespace histogram { @@ -316,14 +320,67 @@ inline reduce_command slice_and_rebin(axis::index_type begin, axis::index_type e return slice_and_rebin(reduce_command::unset, begin, end, merge, mode); } -/** Shrink, crop, slice, and/or rebin axes of a histogram. +/** Pick command to be used in `reduce`. + + Command is applied to axis with given index. + + Picking selects an arbitrary subset of bins by index. The new axis consists of the + picked bins in the order in which the indices are given, which may differ from their + order in the original axis. In contrast to `slice`, the picked bins do not have to be + adjacent. Each index must be valid and may only appear once. + + Picking only works on axes that are not ordered, like the category axis, since + removing an arbitrary subset of bins from an ordered axis would create gaps in the + axis range. The counts in bins that were not picked are added to the overflow bin, + if it is present. If it is not present, the counts are discarded. + + @param iaxis which axis to operate on. + @param indices indices of the bins to keep, must be unique. +*/ +inline reduce_command pick(unsigned iaxis, std::vector indices) { + if (indices.empty()) + BOOST_THROW_EXCEPTION(std::invalid_argument("at least one index required")); + for (auto it = indices.begin(); it != indices.end(); ++it) + if (std::find(indices.begin(), it, *it) != it) + BOOST_THROW_EXCEPTION(std::invalid_argument("indices must be unique")); + reduce_command r; + r.iaxis = iaxis; + r.range = reduce_command::range_t::indices_list; + r.indices = std::move(indices); + r.merge = 1; + r.crop = false; + return r; +} + +/** Pick command to be used in `reduce`. + + Command is applied to corresponding axis in order of reduce arguments. + + Picking selects an arbitrary subset of bins by index. The new axis consists of the + picked bins in the order in which the indices are given, which may differ from their + order in the original axis. In contrast to `slice`, the picked bins do not have to be + adjacent. Each index must be valid and may only appear once. + + Picking only works on axes that are not ordered, like the category axis, since + removing an arbitrary subset of bins from an ordered axis would create gaps in the + axis range. The counts in bins that were not picked are added to the overflow bin, + if it is present. If it is not present, the counts are discarded. + + @param indices indices of the bins to keep, must be unique. +*/ +inline reduce_command pick(std::vector indices) { + return pick(reduce_command::unset, std::move(indices)); +} + +/** Shrink, crop, slice, pick, and/or rebin axes of a histogram. Returns a new reduced histogram and leaves the original histogram untouched. The commands `rebin` and `shrink` or `slice` for the same axis are automatically combined, this is not an error. Passing a `shrink` and a `slice` command for the same axis or two `rebin` commands triggers an `invalid_argument` - exception. Trying to reducing a non-reducible axis triggers an `invalid_argument` + exception. The `pick` command cannot be combined with any other command for the + same axis. Trying to reducing a non-reducible axis triggers an `invalid_argument` exception. Histograms with non-reducible axes can still be reduced along the other axes that are reducible. @@ -331,9 +388,14 @@ inline reduce_command slice_and_rebin(axis::index_type begin, axis::index_type e @param hist original histogram. @param options iterable sequence of reduce commands: `shrink`, `slice`, `rebin`, - `shrink_and_rebin`, or `slice_and_rebin`. The element type of the iterable should be - `reduce_command`. + `pick`, `shrink_and_rebin`, or `slice_and_rebin`. The element type of the iterable + should be `reduce_command`. */ +#if BOOST_WORKAROUND(BOOST_MSVC, >= 0) +#pragma warning(push) +#pragma warning(disable : 4702) // unreachable code in the non-pickable static_if branch +#endif + template > Histogram reduce(const Histogram& hist, const Iterable& options) { using axis::index_type; @@ -351,6 +413,21 @@ Histogram reduce(const Histogram& hist, const Iterable& options) { if (o.merge > 0) { // option is set? o.use_underflow_bin = AO::test(axis::option::underflow); o.use_overflow_bin = AO::test(axis::option::overflow); + if (o.range == reduce_command::range_t::indices_list) + return detail::static_if_c::value>( + [&o](const auto& a_in) { + using A = std::decay_t; + for (const auto idx : o.indices) + if (idx < 0 || idx >= a_in.size()) + BOOST_THROW_EXCEPTION(std::invalid_argument("index out of range")); + return A(a_in, o.indices); + }, + [iaxis](const auto& a_in) { + return BOOST_THROW_EXCEPTION(std::invalid_argument( + "axis " + std::to_string(iaxis) + " is not pickable")), + a_in; + }, + a_in); return detail::static_if_c::value>( [&o](const auto& a_in) { if (o.range == reduce_command::range_t::none) { @@ -412,21 +489,33 @@ Histogram reduce(const Histogram& hist, const Iterable& options) { bool skip = false; for (auto j : x.indices()) { - *i = (j - o->begin.index); - if (o->is_ordered && *i <= -1) { - *i = -1; - if (!o->use_underflow_bin) skip = true; - } else { - if (*i >= 0) - *i /= static_cast(o->merge); - else - *i = o->end.index; - const auto reduced_axis_end = - (o->end.index - o->begin.index) / static_cast(o->merge); - if (*i >= reduced_axis_end) { - *i = reduced_axis_end; + if (o->range == reduce_command::range_t::indices_list) { + // pick: map index to position in the list of picked indices; + // indices that are not picked are mapped to the overflow bin + const auto it = std::find(o->indices.begin(), o->indices.end(), j); + if (it != o->indices.end()) + *i = static_cast(std::distance(o->indices.begin(), it)); + else { + *i = static_cast(o->indices.size()); if (!o->use_overflow_bin) skip = true; } + } else { + *i = (j - o->begin.index); + if (o->is_ordered && *i <= -1) { + *i = -1; + if (!o->use_underflow_bin) skip = true; + } else { + if (*i >= 0) + *i /= static_cast(o->merge); + else + *i = o->end.index; + const auto reduced_axis_end = + (o->end.index - o->begin.index) / static_cast(o->merge); + if (*i >= reduced_axis_end) { + *i = reduced_axis_end; + if (!o->use_overflow_bin) skip = true; + } + } } ++i; @@ -439,21 +528,26 @@ Histogram reduce(const Histogram& hist, const Iterable& options) { return result; } -/** Shrink, slice, and/or rebin axes of a histogram. +#if BOOST_WORKAROUND(BOOST_MSVC, >= 0) +#pragma warning(pop) +#endif + +/** Shrink, crop, slice, pick, and/or rebin axes of a histogram. Returns a new reduced histogram and leaves the original histogram untouched. The commands `rebin` and `shrink` or `slice` for the same axis are automatically combined, this is not an error. Passing a `shrink` and a `slice` command for the same axis or two `rebin` commands triggers an invalid_argument - exception. It is safe to reduce histograms with some axis that are not reducible along + exception. The `pick` command cannot be combined with any other command for the + same axis. It is safe to reduce histograms with some axis that are not reducible along the other axes. Trying to reducing a non-reducible axis triggers an invalid_argument exception. An overload allows one to pass an iterable of reduce_command. @param hist original histogram. - @param opt first reduce command; one of `shrink`, `slice`, `rebin`, + @param opt first reduce command; one of `shrink`, `slice`, `rebin`, `pick`, `shrink_and_rebin`, or `slice_or_rebin`. @param opts more reduce commands. */ diff --git a/include/boost/histogram/axis/category.hpp b/include/boost/histogram/axis/category.hpp index 9802c95d..7772ee6e 100644 --- a/include/boost/histogram/axis/category.hpp +++ b/include/boost/histogram/axis/category.hpp @@ -145,6 +145,13 @@ class category : public iterator_mixin& indices) + : metadata_base(metadata_type(src.metadata())), vec_(src.get_allocator()) { + vec_.reserve(indices.size()); + for (const index_type idx : indices) vec_.emplace_back(src.vec_[idx]); + } + /// Return index for value argument. index_type index(const value_type& x) const noexcept { const auto beg = vec_.begin(); diff --git a/include/boost/histogram/axis/traits.hpp b/include/boost/histogram/axis/traits.hpp index 26dce06b..10a2e6ef 100644 --- a/include/boost/histogram/axis/traits.hpp +++ b/include/boost/histogram/axis/traits.hpp @@ -23,6 +23,7 @@ #include #include #include +#include namespace boost { namespace histogram { @@ -178,6 +179,29 @@ using is_reducible = std::is_constructible +#ifndef BOOST_HISTOGRAM_DOXYGEN_INVOKED +using is_pickable = + std::is_constructible&>; +#else +struct is_pickable; +#endif + /** Get axis options for axis type. Doxygen does not render this well. This is a meta-function (template alias), it accepts diff --git a/include/boost/histogram/detail/reduce_command.hpp b/include/boost/histogram/detail/reduce_command.hpp index 3fb67b50..59a2169e 100644 --- a/include/boost/histogram/detail/reduce_command.hpp +++ b/include/boost/histogram/detail/reduce_command.hpp @@ -13,6 +13,7 @@ #include #include #include +#include namespace boost { namespace histogram { @@ -25,12 +26,14 @@ struct reduce_command { none, indices, values, + indices_list, } range = range_t::none; union { axis::index_type index; double value; } begin{0}, end{0}; - unsigned merge = 0; // default value indicates unset option + std::vector indices; // only used by range_t::indices_list + unsigned merge = 0; // default value indicates unset option bool crop = false; // for internal use by the reduce algorithm bool is_ordered = true; @@ -54,9 +57,12 @@ inline void normalize_reduce_commands(span out, o_out = o_in; } else { // Some command was already set for this axis, try to fuse commands. + // A pick command cannot be fused with any other command. if (!((o_in.range == reduce_command::range_t::none) ^ (o_out.range == reduce_command::range_t::none)) || - (o_out.merge > 1 && o_in.merge > 1)) + (o_out.merge > 1 && o_in.merge > 1) || + o_in.range == reduce_command::range_t::indices_list || + o_out.range == reduce_command::range_t::indices_list) BOOST_THROW_EXCEPTION(std::invalid_argument( "multiple conflicting reduce commands for axis " + std::to_string(o_in.iaxis == reduce_command::unset ? iaxis : o_in.iaxis))); diff --git a/test/CMakeLists.txt b/test/CMakeLists.txt index 92d7577c..f8533be7 100644 --- a/test/CMakeLists.txt +++ b/test/CMakeLists.txt @@ -51,7 +51,8 @@ boost_test(TYPE run SOURCES accumulators_weighted_mean_test.cpp) boost_test(TYPE run SOURCES accumulators_weighted_sum_test.cpp) boost_test(TYPE run SOURCES accumulators_collector_test.cpp) boost_test(TYPE run SOURCES algorithm_project_test.cpp) -boost_test(TYPE run SOURCES algorithm_reduce_test.cpp) +boost_test(TYPE run SOURCES algorithm_reduce_test.cpp + COMPILE_OPTIONS $<$:/bigobj>) boost_test(TYPE run SOURCES algorithm_sum_test.cpp) boost_test(TYPE run SOURCES algorithm_empty_test.cpp) boost_test(TYPE run SOURCES axis_boolean_test.cpp) diff --git a/test/algorithm_reduce_test.cpp b/test/algorithm_reduce_test.cpp index 49608318..3cbc0ae0 100644 --- a/test/algorithm_reduce_test.cpp +++ b/test/algorithm_reduce_test.cpp @@ -69,6 +69,17 @@ void run_tests() { // not allowed: reducing unreducible axis BOOST_TEST_THROWS((void)reduce(make(Tag(), unreducible{}), slice(0, 1)), std::invalid_argument); + // not allowed: pick with empty index list + BOOST_TEST_THROWS((void)pick(0, {}), std::invalid_argument); + // not allowed: pick with duplicated indices + BOOST_TEST_THROWS((void)pick(0, {1, 1}), std::invalid_argument); + // not allowed: pick on axis which is not pickable + BOOST_TEST_THROWS((void)reduce(h, pick(0, {1})), std::invalid_argument); + // not allowed: pick combined with any other command for the same axis + BOOST_TEST_THROWS((void)reduce(h, pick(0, {1}), rebin(0, 2)), std::invalid_argument); + BOOST_TEST_THROWS((void)reduce(h, slice(0, 0, 2), pick(0, {1})), + std::invalid_argument); + BOOST_TEST_THROWS((void)reduce(h, pick(0, {1}), pick(0, {2})), std::invalid_argument); } // shrink and crop behavior when value on edge and not on edge is inclusive: @@ -319,6 +330,12 @@ void run_tests() { BOOST_TEST_EQ(hr.axis(2), (CI{{2, 3}})); BOOST_TEST_EQ(hr.axis(3), u); BOOST_TEST_THROWS((void)algorithm::reduce(h, rebin(2, 2)), std::invalid_argument); + + auto hr2 = algorithm::reduce(h, shrink(0, 2, 4), pick(2, {2, 0})); + BOOST_TEST_EQ(hr2.axis(0), (R{2, 2, 4})); + BOOST_TEST_EQ(hr2.axis(1), (V{{1., 2., 3.}})); + BOOST_TEST_EQ(hr2.axis(2), (CI{{3, 1}})); + BOOST_TEST_EQ(hr2.axis(3), u); } // reduce on integer axis, rebin must fail @@ -417,6 +434,82 @@ void run_tests() { BOOST_TEST_EQ(hr[0], 1); BOOST_TEST_EQ(hr[1], 3); } + + // pick on category axis: bins which are not picked are added to overflow bin + { + auto h = make(Tag(), CI{{1, 2, 3}}); + std::fill(h.begin(), h.end(), 1); + // original: [1: 1, 2: 1, 3: 1, overflow: 1] + + // not allowed: pick index out of range + BOOST_TEST_THROWS((void)reduce(h, pick({3})), std::invalid_argument); + BOOST_TEST_THROWS((void)reduce(h, pick({-1})), std::invalid_argument); + + auto hr = reduce(h, pick({0, 2})); + // reduced: [1: 1, 3: 1, overflow: 2] + BOOST_TEST_EQ(hr.axis(), (CI{{1, 3}})); + BOOST_TEST_EQ(hr[0], 1); + BOOST_TEST_EQ(hr[1], 1); + BOOST_TEST_EQ(hr[2], 2); + BOOST_TEST_EQ(sum(hr), 4); + + // picked bins are returned in the order in which the indices are given + auto hr2 = reduce(h, pick({2, 0})); + BOOST_TEST_EQ(hr2.axis(), (CI{{3, 1}})); + BOOST_TEST_EQ(hr2[0], 1); + BOOST_TEST_EQ(hr2[1], 1); + BOOST_TEST_EQ(hr2[2], 2); + + // test overload that accepts iterable + std::vector opts{{pick(0, {0, 2})}}; + auto hr3 = reduce(h, opts); + BOOST_TEST_EQ(hr3, hr); + } + + // pick on category axis without overflow bin: bins which are not picked are discarded + { + using CIN = axis::category; + auto h = make(Tag(), CIN{{1, 2, 3}}); + std::fill(h.begin(), h.end(), 1); + // original: [1: 1, 2: 1, 3: 1] + auto hr = reduce(h, pick({1})); + // reduced: [2: 1] + BOOST_TEST_EQ(hr.axis(), (CIN{{2}})); + BOOST_TEST_EQ(hr.size(), 1); + BOOST_TEST_EQ(hr[0], 1); + BOOST_TEST_EQ(sum(hr), 1); + } + + // pick on category axis of 2d histogram: other axes are not affected + { + auto h = make_s(Tag(), std::vector(), CI{{1, 2, 3}}, ID(0, 2)); + + /* + matrix layout: + x (category) -> + y 1 2 3 of + | 0 1 2 3 6 + v 1 4 0 5 0 + */ + h.at(0, 0) = 1; + h.at(1, 0) = 2; + h.at(2, 0) = 3; + h.at(3, 0) = 6; // overflow of category axis + h.at(0, 1) = 4; + h.at(2, 1) = 5; + + auto hr = reduce(h, pick(0, {2, 0})); + BOOST_TEST_EQ(hr.rank(), 2); + BOOST_TEST_EQ(sum(hr), 21); + BOOST_TEST_EQ(hr.axis(0), (CI{{3, 1}})); + BOOST_TEST_EQ(hr.axis(1), ID(0, 2)); + BOOST_TEST_EQ(hr.at(0, 0), 3); + BOOST_TEST_EQ(hr.at(1, 0), 1); + BOOST_TEST_EQ(hr.at(2, 0), 8); // not picked + original overflow + BOOST_TEST_EQ(hr.at(0, 1), 5); + BOOST_TEST_EQ(hr.at(1, 1), 4); + BOOST_TEST_EQ(hr.at(2, 1), 0); + } } int main() { diff --git a/test/axis_traits_test.cpp b/test/axis_traits_test.cpp index 43d344d7..30f798e9 100644 --- a/test/axis_traits_test.cpp +++ b/test/axis_traits_test.cpp @@ -8,6 +8,7 @@ #include #include #include +#include #include "axis.hpp" #include "ostream.hpp" #include "throw_exception.hpp" @@ -69,6 +70,23 @@ int main() { BOOST_TEST_TRAIT_TRUE((traits::is_reducible>)); } + // is_pickable + { + struct not_pickable {}; + struct pickable { + pickable(const pickable&, const std::vector&); + }; + + BOOST_TEST_TRAIT_TRUE((traits::is_pickable)); + BOOST_TEST_TRAIT_FALSE((traits::is_pickable)); + + BOOST_TEST_TRAIT_FALSE((traits::is_pickable>)); + BOOST_TEST_TRAIT_FALSE((traits::is_pickable>)); + BOOST_TEST_TRAIT_FALSE((traits::is_pickable>)); + BOOST_TEST_TRAIT_FALSE((traits::is_pickable>)); + BOOST_TEST_TRAIT_TRUE((traits::is_pickable>)); + } + // get_options, options() { using A = integer<>;