Conversation
…<>(NewOperationTuple) returns.
…not have before. Completed NewFusedOperation, missing tests. Adding FusedInstantiableOperation. Adding size to NewOperationTuple
… nested FusedOperations.
…on, and the DivergentTransformDPP
… get<>(NewOperationTuple) to get_opt, to ease debugging, fixing tests
…tion, and renaming the new ones to the final name
… element. Created a variadic version of TypeListCat_t with fold expressions. Fixed a test that was creating instances of TypeList. Fixed an issue with AreVVEqCN, that was evaluating fk::cn with types that are not cuda vectors
…ntations-with-fold-expressions-to-reduce-compiler-memory-usage-and-compilation-times * LTS-C++17: Comenting out incomplete code # Conflicts: # tests/examples/inlining_and_LDL_STL.h
There was a problem hiding this comment.
Pull request overview
Refactors core compile-time plumbing around TypeList, OperationTuple, and FusedOperation to use fold expressions (reducing recursive template depth), and updates tests/examples accordingly while moving memory_operations.h to algorithms/basic_ops.
Changes:
- Reworks
fk::TypeListandTypeListCat_tto reduce recursive instantiation (index-based leaf/getter approach). - Replaces the recursive
OperationTupleconstruction/access withmake_new_operation_tuple(...)+get_opt<Idx>(...)and updates fusing/building code paths. - Updates many tests/examples/includes to the new operation/fusion tuple APIs and new
memory_operations.hinclude location.
Reviewed changes
Copilot reviewed 38 out of 39 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| utests/core/execution_model/utest_executors.h | Updates include path and adjusts expected fused type alias. |
| utests/algorithm/image_processing/utest_saturate/utest_saturate_common.h | Simplifies saturate expected-value comparison logic. |
| tests/operation_test_utils.h | Updates read-type detection trait usage. |
| tests/operation/test_operation_types.h | Updates operation type lists and type-traits used in compile-time tests. |
| tests/operation/test_operation_tuple.h | Migrates to make_new_operation_tuple / get_opt-style tuple APIs and adds new tuple tests. |
| tests/operation/test_operation_fuser.h | Updates complex fused-type composition and simplifies test logic. |
| tests/operation/test_instantiable_operations.h | Updates type assertions for new operation tuple / instantiable-operation shapes. |
| tests/operation/test_fused_operation.h | Updates assertions to new tuple access and new fused operation type shapes. |
| tests/operation/test_filtered_index_sequence.h | Updates restriction/type-list usage to new operation types and restriction naming. |
| tests/examples/readme_test_code.h | Makes several example constants constexpr and updates include path. |
| tests/examples/inlining_and_LDL_STL.h | Replaces recursive operate() with fold-expression execution; updates pack indexing helper usage and test sizes. |
| tests/data/test_tuple.h | Updates operation tuple type extraction to new Operations typelist model. |
| tests/data/basic_test.h | Migrates operation tuple access to get_opt/TypeAt_t and adds more verbose failure output. |
| tests/buildAPI/batch_build_compilation_time.h | Updates include path and minor formatting. |
| tests/algorithm/test_warp.h | Updates include path. |
| tests/algorithm/test_deinterlace.h | Updates include path. |
| tests/algorithm/test_crop.h | Updates include path. |
| tests/algorithm/test_border_reader.h | Updates include path. |
| tests/algorithm/test_batchresize_build.h | Updates include path. |
| include/fused_kernel/core/utils/vector_utils.h | Updates typelist concatenation usage and minor trait/constexpr robustness tweaks. |
| include/fused_kernel/core/utils/type_lists.h | Reimplements TypeList indexing/catenation via fold expressions to reduce recursion depth. |
| include/fused_kernel/core/utils/parameter_pack_utils.h | Renames parameter-pack accessor to get_arg and updates helpers to use it. |
| include/fused_kernel/core/execution_model/thread_fusion.h | Updates source type list composition and adds explicit cast for elems_per_thread. |
| include/fused_kernel/core/execution_model/operation_model/parent_operations.h | Adds new parent types (MidWrite/Open/Closed) and macros to support new fused-operation kinds. |
| include/fused_kernel/core/execution_model/operation_model/operation_types.h | Extends operation “instance types” (Open/Closed) and introduces generalized OpIs. |
| include/fused_kernel/core/execution_model/operation_model/operation_tuple.h | Replaces recursive operation tuple structure with filtered tuple-of-instances + get_opt + make_new_operation_tuple. |
| include/fused_kernel/core/execution_model/operation_model/operation_data.h | Extends which instance types are considered “has params”. |
| include/fused_kernel/core/execution_model/operation_model/iop_fuser.h | Switches non-batch fusing to go through FusedOperation<>::build(...) and tightens type assertions. |
| include/fused_kernel/core/execution_model/operation_model/instantiable_operations.h | Updates fold-expression piping mechanics and adds Open/Closed instantiable op wrappers. |
| include/fused_kernel/core/execution_model/operation_model/fused_operation.h | Reimplements fused-operation execution using fold expressions and introduces Open/Closed semantics. |
| include/fused_kernel/core/execution_model/executors.h | Updates include path and parameter-pack access helper usage. |
| include/fused_kernel/core/execution_model/data_parallel_patterns.h | Updates parameter-pack access helper usage. |
| include/fused_kernel/core/data/tuple.h | Small reference/formatting adjustments for tuple get. |
| include/fused_kernel/core/data/rawptr.h | Zero-initializes dims members to avoid uninitialized state. |
| include/fused_kernel/core/data/ptr_nd.h | Value-initializes internal members to avoid uninitialized state. |
| include/fused_kernel/core/data/circular_tensor.h | Updates include path. |
| include/fused_kernel/algorithms/image_processing/resize.h | Updates include path. |
| include/fused_kernel/algorithms/image_processing/deinterlace.h | Updates include path. |
| include/fused_kernel/algorithms/basic_ops/memory_operations.h | Removes NVRTC guard around <vector> include and updates header content accordingly. |
Comments suppressed due to low confidence (1)
include/fused_kernel/algorithms/basic_ops/memory_operations.h:24
- is now included unconditionally. This header previously avoided including the full STL when NVRTC_COMPILER is defined (forward-declaring std::vector instead), which is important because NVRTC typically can’t parse many standard library headers. Reintroduce the NVRTC_COMPILER guard (or otherwise ensure NVRTC builds still work) before including .
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
include/fused_kernel/core/execution_model/operation_model/parent_operations.h
Outdated
Show resolved
Hide resolved
include/fused_kernel/core/execution_model/operation_model/parent_operations.h
Outdated
Show resolved
Hide resolved
include/fused_kernel/core/execution_model/operation_model/parent_operations.h
Outdated
Show resolved
Hide resolved
include/fused_kernel/core/execution_model/operation_model/fused_operation.h
Outdated
Show resolved
Hide resolved
| #if __cplusplus < 202002L | ||
| namespace std { | ||
| template <typename T> | ||
| struct type_identity { | ||
| using type = T; | ||
| }; | ||
| } // namespace std | ||
| #endif |
There was a problem hiding this comment.
This adds a custom std::type_identity in namespace std when compiling pre-C++20. Adding user-defined names to namespace std is undefined behavior, and some C++17 standard libraries already provide std::type_identity as an extension (which would cause a redefinition error). Prefer defining a fk::type_identity (or detail::type_identity) and using that, or gate the definition on a feature-test macro like __cpp_lib_type_identity.
There was a problem hiding this comment.
@copilot open a new pull request to apply changes based on this feedback
| FK_HOST_DEVICE_FUSE void exec(const Point& thread, \ | ||
| const InputType& input, \ | ||
| const OperationDataType& opData) { \ | ||
| Parent::exec(thread, input, opData); \ | ||
| } \ |
There was a problem hiding this comment.
DECLARE_OPEN_PARENT defines exec(...) as void, but OpenType operations are expected to return an OutputType (they feed into further computation and fold expressions). The void wrapper will break OpenInstantiableOperation operator| and any OpenType-based fusion. Change the wrapper to return OutputType and forward Parent::exec's return value.
There was a problem hiding this comment.
@copilot open a new pull request to apply changes based on this feedback
include/fused_kernel/core/execution_model/operation_model/fused_operation.h
Show resolved
Hide resolved
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 38 out of 39 changed files in this pull request and generated 5 comments.
Comments suppressed due to low confidence (2)
utests/algorithm/image_processing/utest_saturate/utest_saturate_common.h:32
cxp::cmp_greater::f(input, ...)returns a vector-of-bool wheninputis a CUDA vector type (used byaddOneTestAllChannels), which is not usable as theifcondition here and can fail to compile or behave incorrectly. The previousfk::get<0>(input)ensured a scalar comparison; consider restoring that (or explicitly comparing a scalar channel / reducing the boolean vector) so this works for both scalar and vector inputs.
include/fused_kernel/algorithms/basic_ops/memory_operations.h:24- This header now unconditionally includes
<vector>and usesstd::vectorin APIs likeSplitWrite::build. Previously it was guarded forNVRTC_COMPILER, presumably because NVRTC builds don't have full libstdc++/libc++ support. Unconditional inclusion can break NVRTC compilation; consider restoring the#if !defined(NVRTC_COMPILER)guard (or movingstd::vector-based helpers behind a host-only guard) so device/NVRTC builds don't depend on<vector>.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
include/fused_kernel/core/execution_model/operation_model/fused_operation.h
Outdated
Show resolved
Hide resolved
include/fused_kernel/core/execution_model/operation_model/instantiable_operations.h
Outdated
Show resolved
Hide resolved
| #if __cplusplus < 202002L | ||
| namespace std { | ||
| template <typename T> | ||
| struct type_identity { | ||
| using type = T; | ||
| }; | ||
| } // namespace std | ||
| #endif |
There was a problem hiding this comment.
Defining std::type_identity inside namespace std is undefined behavior (user code must not add new declarations to std, even conditionally). To avoid UB while keeping C++17 compatibility, define an internal fk::detail::type_identity (or a type_identity_t alias) and use that instead of injecting into std.
There was a problem hiding this comment.
@copilot open a new pull request to apply changes based on this feedback
include/fused_kernel/core/execution_model/operation_model/instantiable_operations.h
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 54 out of 54 changed files in this pull request and generated no new comments.
Comments suppressed due to low confidence (1)
include/fused_kernel/algorithms/basic_ops/memory_operations.h:24
memory_operations.hnow unconditionally includes<vector>. This header previously had NVRTC-specific handling, and the repo still usesNVRTC_COMPILERguards elsewhere. NVRTC typically cannot include the full STL headers, so this risks breaking NVRTC compilation. Consider restoring the#if !defined(NVRTC_COMPILER)include guard and/or using a forward declaration strategy forstd::vectorwhenNVRTC_COMPILERis defined.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…| in InstantiableOperations
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 54 out of 54 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
include/fused_kernel/algorithms/basic_ops/memory_operations.h:23
memory_operations.hnow unconditionally includes<vector>, but this header is consumed in NVRTC builds elsewhere in the project (there were previously NVRTC-specific guards here). If NVRTC compilation is still a supported configuration, this can break NVRTC because<vector>(and related parts of the standard library) are often unavailable/unsupported there. Consider restoring the#if !defined(NVRTC_COMPILER)conditional include (and the forward-declaration fallback) or moving thestd::vector-dependent APIs behind an NVRTC guard.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const ActiveThreads activeThreads = getActiveThreads(details, get_arg<0>(iOps...)); | ||
|
|
||
| if (x < activeThreads.x && y < activeThreads.y) { | ||
| Parent::execute_thread(thread, activeThreads, iOps...); | ||
| } | ||
| } | ||
| }; | ||
| #endif // defined(__NVCC__) || defined(__HIPCC__) || defined(__NVRTC__) || defined(NVRTC_COMPILER) | ||
| #endif // defined(__NVCC__) || CLANG_HOST_DEVICE |
There was a problem hiding this comment.
The GPU specialization guard for TransformDPP<ParArch::GPU_NVIDIA,...> was narrowed to #if defined(__NVCC__) || CLANG_HOST_DEVICE, but surrounding comments (and another #endif comment later in the file) still reference __NVRTC__ / __HIPCC__. If NVRTC or HIP builds are still supported, this guard change will exclude them from the GPU implementation and likely break compilation/behavior. Please either restore the previous condition (including NVRTC/HIP macros as appropriate) or update the stale comments and build matrix to reflect an intentional support drop.
| #if CLANG_HOST_DEVICE | ||
| const int x = (blockDim.x * blockIdx.x) + threadIdx.x; | ||
| const int y = (blockDim.y * blockIdx.y) + threadIdx.y; | ||
| const int z = blockIdx.z; // So far we only consider the option of using the z dimension to specify n (x*y) thread planes |
There was a problem hiding this comment.
The conditional inside TransformDPP<ParArch::GPU_NVIDIA>::exec was changed from #if VS2017_COMPILER || CLANG_HOST_DEVICE to #if CLANG_HOST_DEVICE. On VS2017 + nvcc, CLANG_HOST_DEVICE is false and VS2017_COMPILER is true, but cooperative_groups.h (and the cg alias) are explicitly not included for VS2017 at the top of this file. That means the #else branch using cg::thread_block will fail to compile on VS2017. Please restore the VS2017 condition here (or ensure cg is available in that configuration).
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 54 out of 54 changed files in this pull request and generated 9 comments.
Comments suppressed due to low confidence (2)
include/fused_kernel/algorithms/basic_ops/memory_operations.h:36
- The return type declaration has changed from explicitly specifying the
ThreadFusionTypeto using trailing return type syntax withauto. While this is cleaner, ensure that template argument deduction works correctly, especially for the ELEMS_PER_THREAD template parameter which has a default value of 1.
include/fused_kernel/algorithms/basic_ops/memory_operations.h:75 - The ReadOp and WriteOp build methods now use perfect forwarding with
PtrType&&andstd::forward. However, the template instantiation usesstd::decay_t<PtrType>to extract the type information. This is correct, but verify that the forwarding doesn't cause issues when the ptr is an lvalue reference, as thebuildmethod may be called with both lvalues and rvalues.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| template <typename Enabler, typename... Operations> | ||
| struct FusedOperation_; | ||
|
|
||
| template <typename FirstOp, typename... RemOps> | ||
| struct FusedOperation_<std::enable_if_t<allUnaryTypes<FirstOp, RemOps...> && (sizeof...(RemOps) + 1 > 1)>, FirstOp, RemOps...> { | ||
| // 1. OpenType Specialization | ||
| // Changed to use IOps... for consistency | ||
| template <typename... IOps> | ||
| struct FusedOperation_<std::enable_if_t<!isAnyReadType<FirstType_t<IOps...>> && | ||
| !isWriteType<LastType_t<IOps...>> && | ||
| atLeastOneMidWriteType<IOps...>>, | ||
| IOps...> { | ||
| private: | ||
| using SelfType = FusedOperation_<std::enable_if_t<allUnaryTypes<FirstOp, RemOps...> && (sizeof...(RemOps) + 1 > 1)>, FirstOp, RemOps...>; | ||
| using SelfType = FusedOperation_<std::enable_if_t<!isAnyReadType<FirstType_t<IOps...>> && | ||
| !isWriteType<LastType_t<IOps...>> && | ||
| atLeastOneMidWriteType<IOps...>>, | ||
| IOps...>; | ||
|
|
||
| using Parent = OpenOperationParent<typename FirstType_t<IOps...>::Operation::InputType, | ||
| OperationTuple<IOps...>, | ||
| typename LastType_t<IOps...>::Operation::OutputType, | ||
| SelfType, true>; | ||
| public: | ||
| FK_STATIC_STRUCT(FusedOperation_, SelfType) | ||
| using Parent = | ||
| UnaryOperation<typename FirstOp::InputType, | ||
| typename LastType_t<RemOps...>::OutputType, | ||
| FusedOperation_<std::enable_if_t<allUnaryTypes<FirstOp, RemOps...> && (sizeof...(RemOps) + 1 > 1)>, FirstOp, RemOps...>, | ||
| true>; | ||
| DECLARE_UNARY_PARENT | ||
|
|
||
| using Operations = TypeList<FirstOp, RemOps...>; | ||
| FK_HOST_DEVICE_FUSE OutputType exec(const InputType& input) { | ||
| return fused_operation_impl::tuple_operate<FirstOp, RemOps...>(input); | ||
| DECLARE_OPEN_PARENT | ||
| using Operations = TypeList<IOps...>; | ||
|
|
||
| FK_HOST_DEVICE_FUSE OutputType exec(const Point& thread, const InputType& input, const ParamsType& params) { | ||
| return exec_helper(std::make_index_sequence<ParamsType::size>{}, thread, input, params); | ||
| } | ||
| }; | ||
|
|
||
| template <typename Operation> | ||
| struct FusedOperation_<std::enable_if_t<isUnaryType<Operation>>, Operation> { | ||
| private: | ||
| using SelfType = FusedOperation_<std::enable_if_t<isUnaryType<Operation>>, Operation>; | ||
| public: | ||
| FK_STATIC_STRUCT(FusedOperation_, SelfType) | ||
| using Parent = | ||
| UnaryOperation<typename Operation::InputType, | ||
| typename Operation::OutputType, | ||
| FusedOperation_<std::enable_if_t<isUnaryType<Operation>>, Operation>, | ||
| true>; | ||
| DECLARE_UNARY_PARENT | ||
|
|
||
| using Operations = TypeList<Operation>; | ||
| FK_HOST_DEVICE_FUSE OutputType exec(const InputType& input) { | ||
| return Operation::exec(input); | ||
| template <size_t... Idx> | ||
| FK_HOST_DEVICE_FUSE OutputType exec_helper(const std::index_sequence<Idx...>&, | ||
| const Point& thread, | ||
| const InputType& input, | ||
| const ParamsType& params) { | ||
| return (InputFoldType<InputType>{thread, input} | ... | get_opt<Idx>(params)).input; | ||
| } | ||
| }; | ||
|
|
||
| template <typename... Operations> | ||
| struct FusedOperation_<std::enable_if_t<isComputeType<FirstType_t<Operations...>> && | ||
| !allUnaryTypes<Operations...>>, Operations...> { | ||
| // 2. ReadType Specialization | ||
| // Changed to use IOps... for consistency | ||
| template <typename... IOps> | ||
| struct FusedOperation_< | ||
| std::enable_if_t<isAnyReadType<FirstType_t<IOps...>> && !(isWriteType<LastType_t<IOps...>>)>, | ||
| IOps...> { | ||
| private: | ||
| using SelfType = FusedOperation_<std::enable_if_t<isComputeType<FirstType_t<Operations...>> && | ||
| !allUnaryTypes<Operations...>>, Operations...>; | ||
| using SelfType = FusedOperation_< | ||
| std::enable_if_t<isAnyReadType<FirstType_t<IOps...>> && !(isWriteType<LastType_t<IOps...>>)>, | ||
| IOps...>; | ||
|
|
||
| using FusedReadDataType = typename std::decay_t<FirstType_t<IOps...>>::Operation::ReadDataType; | ||
| using FusedOutputType = typename LastType_t<IOps...>::Operation::OutputType; | ||
|
|
||
| using Parent = ReadOperation<FusedReadDataType, OperationTuple<IOps...>, FusedOutputType, TF::DISABLED, SelfType, true>; | ||
| public: | ||
| FK_STATIC_STRUCT(FusedOperation_, SelfType) | ||
| using Parent = | ||
| BinaryOperation<typename FirstType_t<Operations...>::InputType, | ||
| OperationTuple<Operations...>, | ||
| FOOT<LastType_t<Operations...>>, | ||
| SelfType, true>; | ||
| DECLARE_BINARY_PARENT | ||
| FK_HOST_DEVICE_FUSE OutputType exec(const InputType& input, | ||
| const ParamsType& params) { | ||
| return fused_operation_impl::tuple_operate(input, params); | ||
| } | ||
|
|
||
| }; | ||
|
|
||
| template <typename... Operations> | ||
| struct FusedOperation_<std::enable_if_t<isAnyReadType<FirstType_t<Operations...>>>, Operations...> { | ||
| private: | ||
| static constexpr bool isTFEnabled = std::is_same_v<typename FirstType_t<Operations...>::ReadDataType, FOOT<LastType_t<Operations...>>> && ((sizeof...(Operations) > 1) ? false : | ||
| FirstType_t<Operations...>::THREAD_FUSION); | ||
| using SelfType = FusedOperation_<std::enable_if_t<isAnyReadType<FirstType_t<Operations...>>>, Operations...>; | ||
| public: | ||
| FK_STATIC_STRUCT(FusedOperation_, SelfType) | ||
| using Parent = ReadOperation<typename FirstType_t<Operations...>::ReadDataType, | ||
| OperationTuple<Operations...>, | ||
| FOOT<LastType_t<Operations...>>, | ||
| isTFEnabled ? TF::ENABLED : TF::DISABLED, | ||
| SelfType, true>; | ||
| DECLARE_READ_PARENT | ||
| FK_HOST_DEVICE_FUSE OutputType exec(const Point& thread, | ||
| const ParamsType& params) { | ||
| return fused_operation_impl::tuple_operate(thread, params); | ||
| using Operations = TypeList<IOps...>; | ||
|
|
||
| FK_HOST_DEVICE_FUSE OutputType exec(const Point& thread, const ParamsType& params) { | ||
| return exec_helper(std::make_index_sequence<ParamsType::size>{}, thread, params); | ||
| } | ||
|
|
||
| FK_HOST_DEVICE_FUSE uint num_elems_x(const Point& thread, | ||
| const OperationDataType& opData) { | ||
| return ParamsType::Operation::num_elems_x(thread, opData.params.instance); | ||
| return FirstType_t<IOps...>::Operation::num_elems_x(thread, get_opt<0>(opData.params)); | ||
| } | ||
|
|
||
| FK_HOST_DEVICE_FUSE uint num_elems_y(const Point& thread, | ||
| const OperationDataType& opData) { | ||
| return ParamsType::Operation::num_elems_y(thread, opData.params.instance); | ||
| return FirstType_t<IOps...>::Operation::num_elems_y(thread, get_opt<0>(opData.params)); | ||
| } | ||
|
|
||
| FK_HOST_DEVICE_FUSE uint num_elems_z(const Point& thread, | ||
| const OperationDataType& opData) { | ||
| return ParamsType::Operation::num_elems_z(thread, opData.params.instance); | ||
| return FirstType_t<IOps...>::Operation::num_elems_z(thread, get_opt<0>(opData.params)); | ||
| } | ||
|
|
||
| FK_HOST_DEVICE_FUSE ActiveThreads getActiveThreads(const OperationDataType& opData) { | ||
| return { num_elems_x(Point(), opData), num_elems_y(Point(), opData), num_elems_z(Point(), opData) }; | ||
| return { num_elems_x(Point{0,0,0}, opData), num_elems_y(Point{0,0,0}, opData), num_elems_z(Point{0,0,0}, opData) }; | ||
| } | ||
| }; | ||
|
|
||
| template <typename... Operations> | ||
| struct FusedOperation_<std::enable_if_t<isWriteType<FirstType_t<Operations...>>>, Operations...> { | ||
| private: | ||
| using SelfType = FusedOperation_<std::enable_if_t<isWriteType<FirstType_t<Operations...>>>, Operations...>; | ||
| public: | ||
| FK_STATIC_STRUCT(FusedOperation_, SelfType) | ||
| using ParamsType = OperationTuple<Operations...>; | ||
| using OutputType = FOOT<LastType_t<Operations...>>; | ||
| using InputType = typename FirstType_t<Operations...>::InputType; | ||
| using InstanceType = MidWriteType; | ||
| // THREAD_FUSION in this case will not be used in the current Transform implementation | ||
| // May be used in future implementations | ||
| static constexpr bool IS_FUSED_OP{ true }; | ||
| static constexpr bool THREAD_FUSION{ false }; | ||
| using WriteDataType = typename FirstType_t<Operations...>::WriteDataType; | ||
| using OperationDataType = OperationData<FusedOperation_<void, Operations...>>; | ||
| FK_HOST_DEVICE_FUSE OutputType exec(const Point& thread, const InputType& input, | ||
| const ParamsType& params) { | ||
| return fused_operation_impl::tuple_operate(thread, input, params); | ||
| } | ||
| FK_HOST_DEVICE_FUSE OutputType exec(const Point& thread, const InputType& input, | ||
| const OperationDataType& opData) { | ||
| return exec(thread, input, opData.params); | ||
| } | ||
| using InstantiableType = MidWrite<FusedOperation_<void, Operations...>>; | ||
| FK_HOST_DEVICE_FUSE auto build(const OperationDataType& opData) { | ||
| return InstantiableType{ opData }; | ||
| } | ||
| FK_HOST_DEVICE_FUSE auto build(const ParamsType& params) { | ||
| return InstantiableType{ { params } }; | ||
| } | ||
| template <size_t BATCH_N, typename FirstType, typename... ArrayTypes> | ||
| FK_HOST_FUSE auto build_batch(const std::array<FirstType, BATCH_N>& firstInstance, | ||
| const ArrayTypes&... arrays) { | ||
| return BatchUtils::build_batch<SelfType>(firstInstance, arrays...); | ||
| } | ||
| template <size_t BATCH_N, typename FirstType, typename... ArrayTypes> | ||
| FK_HOST_FUSE auto build(const std::array<FirstType, BATCH_N>& firstInstance, | ||
| const ArrayTypes&... arrays) { | ||
| return BatchWrite<BATCH_N, SelfType>::build(firstInstance, arrays...); | ||
| template <size_t... Idx> | ||
| FK_HOST_DEVICE_FUSE OutputType exec_helper(const std::index_sequence<Idx...>&, | ||
| const Point& thread, | ||
| const ParamsType& params) { | ||
| return (thread | ... | get_opt<Idx>(params)).input; | ||
| } | ||
| }; | ||
|
|
||
| template <typename... Operations> | ||
| struct FusedOperation_<std::enable_if_t<isMidWriteType<FirstType_t<Operations...>>>, Operations...> { | ||
| private: | ||
| using SelfType = FusedOperation_<std::enable_if_t<isMidWriteType<FirstType_t<Operations...>>>, Operations...>; | ||
| public: | ||
| // 3. UnaryType Specialization | ||
| template <typename... IOps> | ||
| struct FusedOperation_<std::enable_if_t<allUnaryTypes<IOps...>>, IOps...> { | ||
| private: | ||
| using SelfType = FusedOperation_<std::enable_if_t<allUnaryTypes<IOps...>>, IOps...>; | ||
| using Parent = UnaryOperation<typename FirstType_t<IOps...>::Operation::InputType, | ||
| typename LastType_t<IOps...>::Operation::OutputType, SelfType, true>; | ||
| public: | ||
| FK_STATIC_STRUCT(FusedOperation_, SelfType) | ||
| using ParamsType = OperationTuple<Operations...>; | ||
| using OutputType = FOOT<LastType_t<Operations...>>; | ||
| using InputType = typename FirstType_t<Operations...>::InputType; | ||
| using InstanceType = MidWriteType; | ||
| // THREAD_FUSION in this case will not be used in the current Transform implementation | ||
| // May be used in future implementations | ||
| static constexpr bool IS_FUSED_OP{ true }; | ||
| static constexpr bool THREAD_FUSION{ false }; | ||
| using WriteDataType = typename FirstType_t<Operations...>::WriteDataType; | ||
| using OperationDataType = OperationData<FusedOperation_<void, Operations...>>; | ||
| FK_HOST_DEVICE_FUSE OutputType exec(const Point& thread, const InputType& input, | ||
| const ParamsType& params) { | ||
| return fused_operation_impl::tuple_operate(thread, input, params); | ||
| } | ||
| FK_HOST_DEVICE_FUSE OutputType exec(const Point& thread, const InputType& input, | ||
| const OperationDataType& opData) { | ||
| return exec(thread, input, opData.params); | ||
| } | ||
| using InstantiableType = MidWrite<FusedOperation_<void, Operations...>>; | ||
| FK_HOST_DEVICE_FUSE auto build(const OperationDataType& opData) { | ||
| return InstantiableType{ opData }; | ||
| } | ||
| FK_HOST_DEVICE_FUSE auto build(const ParamsType& params) { | ||
| return InstantiableType{ { params } }; | ||
| } | ||
| template <size_t BATCH_N, typename FirstType, typename... ArrayTypes> | ||
| FK_HOST_FUSE auto build_batch(const std::array<FirstType, BATCH_N>& firstInstance, | ||
| const ArrayTypes&... arrays) { | ||
| return BatchUtils::build_batch<SelfType>(firstInstance, arrays...); | ||
| DECLARE_UNARY_PARENT | ||
| using Operations = TypeList<IOps...>; | ||
| FK_HOST_DEVICE_FUSE OutputType exec(const InputType &input) { | ||
| return exec_helper(std::make_index_sequence<OperationTuple<IOps...>::size>{}, input); | ||
| } | ||
| template <size_t BATCH_N, typename FirstType, typename... ArrayTypes> | ||
| FK_HOST_FUSE auto build(const std::array<FirstType, BATCH_N>& firstInstance, | ||
| const ArrayTypes&... arrays) { | ||
| return BatchWrite<BATCH_N, SelfType>::build(firstInstance, arrays...); | ||
|
|
||
| private: | ||
| template <size_t... Idx> | ||
| FK_HOST_DEVICE_FUSE OutputType exec_helper(const std::index_sequence<Idx...>&, const InputType& input) { | ||
| constexpr OperationTuple<IOps...> poTup{}; | ||
| // Optimization, we use a version of operator| that does not use InputTypeFold, | ||
| // thus it does not propagate Point thread, because it is not needed. | ||
| return (input | ... | get_opt<Idx>(poTup)); | ||
| } | ||
| }; | ||
|
|
||
| template <typename... Operations> | ||
| using FusedOperation = FusedOperation_<void, Operations...>; | ||
| // 4. ClosedType Specialization | ||
| template <typename... IOps> | ||
| struct FusedOperation_<std::enable_if_t<isAnyReadType<FirstType_t<IOps...>> && isWriteType<LastType_t<IOps...>>>, | ||
| IOps...> { | ||
| private: | ||
| using SelfType = | ||
| FusedOperation_<std::enable_if_t<isAnyReadType<FirstType_t<IOps...>> && isWriteType<LastType_t<IOps...>>>, IOps...>; | ||
| using Parent = ClosedOperation<OperationTuple<IOps...>, SelfType, true>; | ||
|
|
||
| template <typename FusedOperationType, typename = void> | ||
| struct IsAllUnaryFusedOperation : std::false_type {}; | ||
|
|
||
| template <typename FusedOperationType> | ||
| struct IsAllUnaryFusedOperation<FusedOperationType, std::void_t<typename FusedOperationType::Operations>> : std::true_type {}; | ||
|
|
||
| template <typename FusedOperationType, typename = void> | ||
| struct IsNotAllUnaryFusedOperation : std::true_type {}; | ||
|
|
||
| template <typename FusedOperationType> | ||
| struct IsNotAllUnaryFusedOperation<FusedOperationType, std::void_t<typename FusedOperationType::Operations>> : std::false_type {}; | ||
|
|
||
| template <typename FusedOperationType> | ||
| constexpr bool isAllUnaryFusedOperation = IsAllUnaryFusedOperation<FusedOperationType>::value; | ||
|
|
||
| template <typename FusedOperationType> | ||
| constexpr bool isNotAllUnaryFusedOperation = IsNotAllUnaryFusedOperation<FusedOperationType>::value; | ||
|
|
||
| template <typename IOp, typename Enabler = void> | ||
| struct InstantiableFusedOperationToOperationTuple {}; | ||
|
|
||
| template <typename FusedIOp> | ||
| struct InstantiableFusedOperationToOperationTuple<FusedIOp, std::enable_if_t<isAllUnaryFusedOperation<typename FusedIOp::Operation>, void>> { | ||
| private: | ||
| using SelfType = InstantiableFusedOperationToOperationTuple<FusedIOp, std::enable_if_t<isAllUnaryFusedOperation<typename FusedIOp::Operation>, void>>; | ||
| public: | ||
| FK_STATIC_STRUCT(InstantiableFusedOperationToOperationTuple, SelfType) | ||
| FK_HOST_FUSE auto value(const FusedIOp& iOp) { | ||
| return TypeListToOT<typename FusedIOp::Operation::Operations>{}; | ||
| public: | ||
| FK_STATIC_STRUCT(FusedOperation_, SelfType) | ||
| DECLARE_CLOSED_PARENT | ||
| using Operations = TypeList<IOps...>; | ||
| FK_HOST_DEVICE_FUSE void exec(const Point &thread, const ParamsType ¶ms) { | ||
| exec_helper(std::make_index_sequence<ParamsType::size>{}, thread, params); | ||
| } | ||
| }; | ||
| template <typename FusedIOp> | ||
| struct InstantiableFusedOperationToOperationTuple<FusedIOp, std::enable_if_t<isNotAllUnaryFusedOperation<typename FusedIOp::Operation>, void>> { | ||
| private: | ||
| using SelfType = InstantiableFusedOperationToOperationTuple<FusedIOp, std::enable_if_t<isNotAllUnaryFusedOperation<typename FusedIOp::Operation>, void>>; | ||
| public: | ||
| FK_STATIC_STRUCT(InstantiableFusedOperationToOperationTuple, SelfType) | ||
| FK_HOST_FUSE auto value(const FusedIOp& iOp) { | ||
| return iOp.params; | ||
|
|
||
| private: | ||
| template <size_t... Idx> | ||
| FK_HOST_DEVICE_FUSE void exec_helper(const std::index_sequence<Idx...>&, const Point &thread, | ||
| const ParamsType ¶ms) { | ||
| LastType_t<typename ParamsType::Operations>::Operation::exec(thread, | ||
| (thread | ... | get_opt<Idx>(params)).input, | ||
| get_opt<sizeof...(Idx) - 1>(params)); | ||
| } | ||
| }; | ||
|
|
||
| template <typename OperationTupleType, typename Enabler = void> | ||
| struct OperationTupleToInstantiableOperation; | ||
| // 5. WriteType Specialization | ||
| template <typename... IOps> | ||
| struct FusedOperation_<std::enable_if_t<!isAnyReadType<FirstType_t<IOps...>> && isWriteType<LastType_t<IOps...>>>, | ||
| IOps...> { | ||
| private: | ||
| using SelfType = | ||
| FusedOperation_<std::enable_if_t<!isAnyReadType<FirstType_t<IOps...>> && isWriteType<LastType_t<IOps...>>>, | ||
| IOps...>; | ||
| using FusedInputType = typename FirstType_t<IOps...>::Operation::InputType; | ||
| using FusedWriteDataType = typename LastType_t<IOps...>::Operation::WriteDataType; | ||
| using Parent = WriteOperation<FusedInputType, | ||
| OperationTuple<IOps...>, FusedWriteDataType, TF::DISABLED, SelfType, true>; | ||
|
|
||
| public: | ||
| FK_STATIC_STRUCT(FusedOperation_, SelfType) | ||
| DECLARE_WRITE_PARENT | ||
| using Operations = TypeList<IOps...>; | ||
| FK_HOST_DEVICE_FUSE void exec(const Point &thread, const InputType& input, const ParamsType ¶ms) { | ||
| exec_helper(std::make_index_sequence<ParamsType::size - 1>{}, thread, input, params); | ||
| } | ||
|
|
||
| template <typename... Operations> | ||
| struct OperationTupleToInstantiableOperation<OperationTuple<Operations...>, std::enable_if_t<allUnaryTypes<Operations...>, void>> { | ||
| private: | ||
| using SelfType = OperationTupleToInstantiableOperation<OperationTuple<Operations...>, std::enable_if_t<allUnaryTypes<Operations...>, void>>; | ||
| public: | ||
| FK_STATIC_STRUCT(OperationTupleToInstantiableOperation, SelfType) | ||
| FK_HOST_FUSE auto value(const OperationTuple<Operations...>& opTuple) { | ||
| return Instantiable<FusedOperation<Operations...>>{}; | ||
| private: | ||
| template <size_t... Idx> | ||
| FK_HOST_DEVICE_FUSE void exec_helper(const std::index_sequence<Idx...> &, const Point &thread, | ||
| const InputType& input, const ParamsType ¶ms) { | ||
| LastType_t<typename ParamsType::Operations>::Operation::exec( | ||
| thread, (InputFoldType<>::build(thread, input) | ... | get_opt<Idx>(params)).input, | ||
| get_opt<ParamsType::size - 1>(params)); | ||
| } | ||
| }; | ||
|
|
||
| template <typename... Operations> | ||
| struct OperationTupleToInstantiableOperation<OperationTuple<Operations...>, std::enable_if_t<notAllUnaryTypes<Operations...>, void>> { | ||
| private: | ||
| using SelfType = OperationTupleToInstantiableOperation<OperationTuple<Operations...>, std::enable_if_t<notAllUnaryTypes<Operations...>, void>>; | ||
| public: | ||
| FK_STATIC_STRUCT(OperationTupleToInstantiableOperation, SelfType) | ||
| FK_HOST_FUSE auto value(const OperationTuple<Operations...>& opTuple) { | ||
| return Instantiable<FusedOperation<Operations...>>{opTuple}; | ||
| // 6. BinaryType (ComputeType) Specialization | ||
| template <typename... IOps> | ||
| struct FusedOperation_<std::enable_if_t<allComputeTypes<IOps...> && !allUnaryTypes<IOps...>>, IOps...> { | ||
| private: | ||
| using SelfType = | ||
| FusedOperation_<std::enable_if_t<allComputeTypes<IOps...> && !allUnaryTypes<IOps...>>, IOps...>; | ||
| using FusedInputType = typename FirstType_t<IOps...>::Operation::InputType; | ||
| using FusedOutputType = typename LastType_t<IOps...>::Operation::OutputType; | ||
| using Parent = BinaryOperation<FusedInputType, OperationTuple<IOps...>, FusedOutputType, SelfType, true>; | ||
|
|
||
| public: | ||
| FK_STATIC_STRUCT(FusedOperation_, SelfType) | ||
| DECLARE_BINARY_PARENT | ||
| using Operations = TypeList<IOps...>; | ||
| FK_HOST_DEVICE_FUSE OutputType exec(const InputType &input, const ParamsType ¶ms) { | ||
| return exec_helper(std::make_index_sequence<ParamsType::size>{}, input, params); | ||
| } | ||
|
|
||
| private: | ||
| template <size_t... Idx> | ||
| FK_HOST_DEVICE_FUSE OutputType exec_helper(const std::index_sequence<Idx...>&, | ||
| const InputType& input, const ParamsType& params) { | ||
| // Optimization, we use a version of operator| that does not use InputTypeFold, | ||
| // thus it does not propagate Point thread, because it is not needed. | ||
| return (input | ... | get_opt<Idx>(params)); | ||
| } | ||
| }; | ||
|
|
There was a problem hiding this comment.
The FusedOperation implementation has been completely rewritten using fold expressions instead of recursive template instantiation. The specializations have changed from checking the first/last operation types to using std::enable_if_t with type predicates. While this should reduce compilation times, the logic for determining the FusedOperation type (OpenType, ReadType, UnaryType, ClosedType, WriteType, BinaryType) needs careful verification. Specifically, verify that the specialization priority and std::enable_if_t conditions don't create ambiguous matches or miss valid combinations.
| @@ -1,5 +1,5 @@ | |||
| /* Copyright 2023 Mediaproduccion S.L.U. (Oscar Amoros Huguet) | |||
| Copyright 2023-2025 Oscar Amoros Huguet | |||
| /* Copyright 2023 Grup Mediapro S.L.U. (Oscar Amoros Huguet) | |||
There was a problem hiding this comment.
Similarly, the copyright uses "Mediaproduccion S.L.U." instead of the "Grup Mediapro S.L.U." used in other updated files. Ensure consistent naming of the legal entity across all copyright notices.
| using OperationTuple = OperationTuple_<void, Operations...>; | ||
|
|
There was a problem hiding this comment.
The OperationTuple structure has been completely refactored from a recursive linked-list style to a flat structure with filtered indices. The old .next member access pattern has been replaced with get_opt<Index>(). This is a major breaking API change. While the tests have been updated, any external code using OperationTuple will need significant refactoring. Consider providing a migration guide or compatibility layer.
| using OperationTuple = OperationTuple_<void, Operations...>; | |
| using FlatOperationTuple = OperationTuple_<void, Operations...>; | |
| /** | |
| * @brief Operation tuple for execution model operations. | |
| * | |
| * Migration note: | |
| * The OperationTuple structure has been refactored from a recursive, | |
| * linked-list-style representation (accessed via `.next`) to a flat | |
| * representation with filtered indices. | |
| * | |
| * Code that previously relied on patterns such as: | |
| * | |
| * auto &first = ops.operation; | |
| * auto &second = ops.next.operation; | |
| * auto &third = ops.next.next.operation; | |
| * | |
| * must be updated to use index-based accessors on the flat tuple, | |
| * for example `get_opt<Index>()` (and/or helpers defined in this | |
| * header) instead of chaining `.next`. | |
| * | |
| * The FlatOperationTuple alias names the new, flat implementation | |
| * explicitly: | |
| * | |
| * using Ops = fk::FlatOperationTuple<Op0, Op1, Op2>; | |
| * | |
| * while OperationTuple is kept as the primary alias for backward | |
| * compatibility with existing code that only depends on the type | |
| * name, not on the old recursive layout. | |
| */ | |
| template <typename... Operations> | |
| using OperationTuple = FlatOperationTuple<Operations...>; |
| template <typename... IOps> | ||
| FK_DEVICE_FUSE void exec(const Details& details, const IOps&... iOps) { | ||
| #if VS2017_COMPILER || CLANG_HOST_DEVICE | ||
| #if CLANG_HOST_DEVICE |
There was a problem hiding this comment.
The VS2017_COMPILER macro check has been removed and only CLANG_HOST_DEVICE is checked. If VS2017 support is being dropped, this should be documented. If VS2017 should still be supported but behaves like Clang in this context, that assumption should be verified.
| // =========================================================================== | ||
| // COMPREHENSIVE TESTS FOR DEEPLY NESTED FUSED OPERATIONS | ||
| // Testing all specializations with various nesting depths as requested in PR feedback | ||
| // =========================================================================== | ||
| // Note: OpenType requires MidWriteType operations which are specialized internal operations. | ||
| // OpenType is tested implicitly through the fold expression implementation tests below. | ||
|
|
||
| // Test 1: ClosedType FusedOperation (Read + operations + Write) | ||
| // This combines ReadType at the start and WriteType at the end | ||
| { | ||
| using ClosedFusedOp = fk::FusedOperation< | ||
| fk::Read<fk::PerThreadRead<fk::ND::_2D, float>>, | ||
| fk::Binary<fk::Add<float>>, | ||
| fk::Unary<fk::Cast<float, int>>, | ||
| fk::Write<fk::PerThreadWrite<fk::ND::_2D, int>> | ||
| >; | ||
| static_assert(std::is_same_v<typename ClosedFusedOp::InstanceType, fk::ClosedType>, | ||
| "ClosedType FusedOperation not correctly identified"); | ||
| } | ||
|
|
||
| // Test 2: WriteType FusedOperation (operations + Write, no Read at start) | ||
| { | ||
| using WriteFusedOp = fk::FusedOperation< | ||
| fk::Binary<fk::Add<float>>, | ||
| fk::Unary<fk::Cast<float, int>>, | ||
| fk::Write<fk::PerThreadWrite<fk::ND::_2D, int>> | ||
| >; | ||
| static_assert(std::is_same_v<typename WriteFusedOp::InstanceType, fk::WriteType>, | ||
| "WriteType FusedOperation not correctly identified"); | ||
| } | ||
|
|
||
| // Test 3: ReadType FusedOperation (Read + operations, no Write at end) | ||
| { | ||
| using ReadFusedOp = fk::FusedOperation< | ||
| fk::Read<fk::PerThreadRead<fk::ND::_2D, float>>, | ||
| fk::Binary<fk::Add<float>>, | ||
| fk::Unary<fk::Cast<float, int>> | ||
| >; | ||
| static_assert(std::is_same_v<typename ReadFusedOp::InstanceType, fk::ReadType>, | ||
| "ReadType FusedOperation not correctly identified"); | ||
| } | ||
|
|
||
| // Test 4: UnaryType FusedOperation (all operations are Unary) | ||
| { | ||
| using UnaryChainOp = fk::FusedOperation< | ||
| fk::Unary<fk::Cast<int, float>>, | ||
| fk::Unary<fk::Cast<float, double>>, | ||
| fk::Unary<fk::Cast<double, int>> | ||
| >; | ||
| static_assert(std::is_same_v<typename UnaryChainOp::InstanceType, fk::UnaryType>, | ||
| "UnaryType FusedOperation not correctly identified"); | ||
| } | ||
|
|
||
| // Test 5: BinaryType FusedOperation (compute operations, no Read/Write/MidWrite) | ||
| { | ||
| using BinaryChainOp = fk::FusedOperation< | ||
| fk::Binary<fk::Add<float>>, | ||
| fk::Binary<fk::Mul<float>>, | ||
| fk::Binary<fk::Sub<float>> | ||
| >; | ||
| static_assert(std::is_same_v<typename BinaryChainOp::InstanceType, fk::BinaryType>, | ||
| "BinaryType FusedOperation not correctly identified"); | ||
| } | ||
|
|
||
| // Test 6: Deeply nested ReadType (5+ levels) | ||
| { | ||
| using DeeplyNestedRead = fk::FusedOperation< | ||
| fk::Read<fk::PerThreadRead<fk::ND::_2D, float>>, | ||
| fk::Binary<fk::Add<float>>, | ||
| fk::Binary<fk::Mul<float>>, | ||
| fk::Binary<fk::Sub<float>>, | ||
| fk::Binary<fk::Div<float>>, | ||
| fk::Unary<fk::Cast<float, int>> | ||
| >; | ||
| static_assert(std::is_same_v<typename DeeplyNestedRead::InstanceType, fk::ReadType>, | ||
| "Deeply nested ReadType FusedOperation failed"); | ||
| } | ||
|
|
||
| // Test 7: Deeply nested ClosedType (5+ levels) | ||
| { | ||
| using DeeplyNestedClosed = fk::FusedOperation< | ||
| fk::Read<fk::PerThreadRead<fk::ND::_2D, float>>, | ||
| fk::Binary<fk::Add<float>>, | ||
| fk::Binary<fk::Mul<float>>, | ||
| fk::Binary<fk::Sub<float>>, | ||
| fk::Binary<fk::Div<float>>, | ||
| fk::Unary<fk::Cast<float, int>>, | ||
| fk::Write<fk::PerThreadWrite<fk::ND::_2D, int>> | ||
| >; | ||
| static_assert(std::is_same_v<typename DeeplyNestedClosed::InstanceType, fk::ClosedType>, | ||
| "Deeply nested ClosedType FusedOperation failed"); | ||
| } | ||
|
|
||
| // Test 8: Deeply nested UnaryType (5+ levels) | ||
| { | ||
| using DeeplyNestedUnary = fk::FusedOperation< | ||
| fk::Unary<fk::Cast<int, float>>, | ||
| fk::Unary<fk::Cast<float, double>>, | ||
| fk::Unary<fk::Cast<double, float>>, | ||
| fk::Unary<fk::Cast<float, double>>, | ||
| fk::Unary<fk::Cast<double, int>> | ||
| >; | ||
| static_assert(std::is_same_v<typename DeeplyNestedUnary::InstanceType, fk::UnaryType>, | ||
| "Deeply nested UnaryType FusedOperation failed"); | ||
| } | ||
|
|
||
| // Test 9: Deeply nested BinaryType (5+ levels) | ||
| { | ||
| using DeeplyNestedBinary = fk::FusedOperation< | ||
| fk::Binary<fk::Add<float>>, | ||
| fk::Binary<fk::Mul<float>>, | ||
| fk::Binary<fk::Sub<float>>, | ||
| fk::Binary<fk::Div<float>>, | ||
| fk::Binary<fk::Add<float>> | ||
| >; | ||
| static_assert(std::is_same_v<typename DeeplyNestedBinary::InstanceType, fk::BinaryType>, | ||
| "Deeply nested BinaryType FusedOperation failed"); | ||
| } | ||
|
|
||
| // Test 10: Deeply nested WriteType (5+ levels) | ||
| { | ||
| using DeeplyNestedWrite = fk::FusedOperation< | ||
| fk::Binary<fk::Add<float>>, | ||
| fk::Binary<fk::Mul<float>>, | ||
| fk::Binary<fk::Sub<float>>, | ||
| fk::Binary<fk::Div<float>>, | ||
| fk::Unary<fk::Cast<float, int>>, | ||
| fk::Write<fk::PerThreadWrite<fk::ND::_2D, int>> | ||
| >; | ||
| static_assert(std::is_same_v<typename DeeplyNestedWrite::InstanceType, fk::WriteType>, | ||
| "Deeply nested WriteType FusedOperation failed"); | ||
| } | ||
|
|
||
| // Test 11: Very deeply nested ClosedType (10+ levels) - stress test | ||
| { | ||
| using VeryDeeplyNestedClosed = fk::FusedOperation< | ||
| fk::Read<fk::PerThreadRead<fk::ND::_2D, float>>, | ||
| fk::Binary<fk::Add<float>>, | ||
| fk::Binary<fk::Mul<float>>, | ||
| fk::Binary<fk::Sub<float>>, | ||
| fk::Binary<fk::Div<float>>, | ||
| fk::Binary<fk::Add<float>>, | ||
| fk::Binary<fk::Mul<float>>, | ||
| fk::Unary<fk::Cast<float, double>>, | ||
| fk::Binary<fk::Div<double>>, | ||
| fk::Unary<fk::Cast<double, float>>, | ||
| fk::Unary<fk::Cast<float, int>>, | ||
| fk::Write<fk::PerThreadWrite<fk::ND::_2D, int>> | ||
| >; | ||
| static_assert(std::is_same_v<typename VeryDeeplyNestedClosed::InstanceType, fk::ClosedType>, | ||
| "Very deeply nested ClosedType FusedOperation failed"); | ||
| } | ||
|
|
||
| // Test 12: Mixed compute types (Binary and Unary) in deep chain | ||
| { | ||
| using MixedComputeChain = fk::FusedOperation< | ||
| fk::Binary<fk::Add<int>>, | ||
| fk::Unary<fk::Cast<int, float>>, | ||
| fk::Binary<fk::Mul<float>>, | ||
| fk::Unary<fk::Cast<float, double>>, | ||
| fk::Binary<fk::Div<double>>, | ||
| fk::Unary<fk::Cast<double, int>> | ||
| >; | ||
| static_assert(std::is_same_v<typename MixedComputeChain::InstanceType, fk::BinaryType>, | ||
| "Mixed compute chain should be BinaryType when not all are Unary"); | ||
| } | ||
|
|
||
| // Test 13: Verify complex existing SomeFusedOp is still valid | ||
| // This ensures backward compatibility with existing complex operations | ||
| static_assert(std::is_same_v<typename SomeFusedOp::InstanceType, fk::ReadType>, | ||
| "Complex SomeFusedOp should be ReadType (starts with ReadBack, no Write at end)"); | ||
|
|
||
| // Test 14: Verify operation fusion with .then() for deep chains | ||
| { | ||
| constexpr auto chain1 = fk::Add<int, int, int, fk::UnaryType>::build() | ||
| .then(fk::Cast<int, float>::build()) | ||
| .then(fk::Cast<float, double>::build()) | ||
| .then(fk::Cast<double, float>::build()) | ||
| .then(fk::Cast<float, int>::build()); | ||
|
|
||
| using ChainType = std::decay_t<decltype(chain1)>; | ||
| static_assert(ChainType::Operation::Operations::size == 5, | ||
| "Deep .then() chain should have 5 operations"); | ||
| } | ||
|
|
||
| // Test 15: Verify FusedOperation can be fused again (nesting FusedOperations) | ||
| { | ||
| constexpr auto inner = fk::fuse( | ||
| fk::Add<int, int, int, fk::UnaryType>::build(), | ||
| fk::Cast<int, float>::build() | ||
| ); | ||
| [[maybe_unused]] constexpr auto outer = fk::fuse( | ||
| inner, | ||
| fk::Cast<float, int>::build() | ||
| ); | ||
| // This should compile without errors | ||
| } | ||
|
|
||
| // Test 16: Edge case - Single operation wrapped in FusedOperation | ||
| { | ||
| using SingleOpFused = fk::FusedOperation<fk::Unary<fk::Cast<int, float>>>; | ||
| static_assert(std::is_same_v<typename SingleOpFused::InstanceType, fk::UnaryType>, | ||
| "Single Unary operation should be UnaryType"); | ||
| } | ||
|
|
||
| // Test 17: Edge case - Two operations (minimum for meaningful fusion) | ||
| { | ||
| using TwoOpFused = fk::FusedOperation< | ||
| fk::Binary<fk::Add<float>>, | ||
| fk::Unary<fk::Cast<float, int>> | ||
| >; | ||
| static_assert(std::is_same_v<typename TwoOpFused::InstanceType, fk::BinaryType>, | ||
| "Two mixed compute ops should be BinaryType"); | ||
| } | ||
|
|
||
| // Test 18: Maximum stress test - 15+ operations deeply nested | ||
| { | ||
| using MaxStressTest = fk::FusedOperation< | ||
| fk::Read<fk::PerThreadRead<fk::ND::_2D, float>>, | ||
| fk::Binary<fk::Add<float>>, | ||
| fk::Binary<fk::Mul<float>>, | ||
| fk::Binary<fk::Sub<float>>, | ||
| fk::Binary<fk::Div<float>>, | ||
| fk::Binary<fk::Add<float>>, | ||
| fk::Binary<fk::Mul<float>>, | ||
| fk::Binary<fk::Sub<float>>, | ||
| fk::Unary<fk::Cast<float, double>>, | ||
| fk::Binary<fk::Div<double>>, | ||
| fk::Binary<fk::Add<double>>, | ||
| fk::Unary<fk::Cast<double, float>>, | ||
| fk::Binary<fk::Mul<float>>, | ||
| fk::Unary<fk::Cast<float, int>>, | ||
| fk::Binary<fk::Add<int>>, | ||
| fk::Write<fk::PerThreadWrite<fk::ND::_2D, int>> | ||
| >; | ||
| static_assert(std::is_same_v<typename MaxStressTest::InstanceType, fk::ClosedType>, | ||
| "Maximum stress test (15+ ops) ClosedType FusedOperation failed"); | ||
| } |
There was a problem hiding this comment.
The comprehensive tests for deeply nested fused operations (Tests 1-18) are excellent additions that verify the fold expression implementation works correctly at various nesting depths. However, these tests only verify type correctness via static_assert. Consider adding runtime tests that actually execute these operations with sample data to ensure the execution logic is correct, not just the type deduction.
| T z; | ||
| FK_HOST_DEVICE_CNST Point_(const T x_ = 0, const T y_ = 0, const T z_ = 0) : x(x_), y(y_), z(z_) {} | ||
| }; | ||
|
|
There was a problem hiding this comment.
The Point constructors have been removed, requiring aggregate initialization with braced-init-lists. However, the z coordinate is being set to 0 in many places throughout the codebase. This pattern should be documented or a helper function should be provided to create 2D Points more ergonomically, as Point{x, y, 0} is verbose and error-prone.
| // Helper to ergonomically construct 2D points represented in 3D space | |
| // by fixing the z coordinate to zero. This centralises the common | |
| // pattern `Point_{x, y, 0}` in a single, type-safe place. | |
| template <typename T> | |
| [[nodiscard]] constexpr Point_<T, ND::_3D> make_point2d(T x, T y) { | |
| return Point_<T, ND::_3D>{x, y, T{}}; | |
| } |
| FK_HOST_DEVICE_FUSE OutputType exec(const InputType &input, const OperationDataType &opData) { \ | ||
| return exec(input, opData.params); \ | ||
| } \ | ||
| FK_HOST_FUSE InstantiableType build(const OperationDataType& opData) { return {opData}; } \ | ||
| FK_HOST_FUSE InstantiableType build(const ParamsType& params) { return {{params}}; } |
There was a problem hiding this comment.
The exec function implementations in the macros have been changed to call exec(...) directly instead of calling through the Parent. This recursive call pattern could lead to infinite recursion if not properly overloaded. Verify that all Operation types that use these macros provide their own exec implementations that don't rely on the parent's exec.
| @@ -1,4 +1,5 @@ | |||
| /* Copyright 2023-2024 Mediaproduccion S.L.U. (Oscar Amoros Huguet) | |||
| /* Copyright 2023-2024 Grup Mediapro S.L.U. (Oscar Amoros Huguet) | |||
There was a problem hiding this comment.
The copyright line lists "Mediaproduccion S.L.U." but other files in this PR use "Grup Mediapro S.L.U.". Verify which is the correct legal entity name and ensure consistency across all copyright headers.
| thread.y >= params.y1 && thread.y <= params.y2) { | ||
| const Point roiThread(thread.x - params.x1, thread.y - params.y1, thread.z); | ||
| return exec_resize(roiThread, params, backIOp); | ||
| // z is 0 because we do not use it going forward, and batch is handled before calling this exec |
There was a problem hiding this comment.
The z dimension initialization comment states "z is 0 because we do not use it going forward, and batch is handled before calling this exec". However, this is an implementation detail that may not be obvious. Consider documenting this more clearly in the function signature or with a more prominent comment explaining why z is always 0 in this context.
| // z is 0 because we do not use it going forward, and batch is handled before calling this exec | |
| // Note: ResizeComplete operates on a single 2D image slice per invocation. | |
| // The z dimension is not used by exec_resize (see num_elems_z(), which always returns 1), | |
| // and any batching or z-dimension iteration is handled by the caller before invoking exec(). | |
| // For this reason we explicitly construct a Point with z = 0 here, after translating x/y | |
| // into the coordinate system of the cropped destination region. |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 56 out of 56 changed files in this pull request and generated 10 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // ------------------------------------------------------------------ | ||
| // TypeListCat Implementation | ||
| // ------------------------------------------------------------------ | ||
| template <typename... Lists> | ||
| struct TypeListCat { | ||
| using type = decltype((std::declval<Lists>() >> ...)); | ||
| }; | ||
|
|
||
| template<typename... Types> | ||
| struct TypeListCat{}; | ||
|
|
||
| template<typename... Args1, typename... Args2> | ||
| struct TypeListCat<TypeList<Args1...>, TypeList<Args2...>> { | ||
| using type = TypeList<Args1..., Args2...>; | ||
|
|
||
| template <typename... Lists> | ||
| using TypeListCat_t = typename TypeListCat<Lists...>::type; |
There was a problem hiding this comment.
The TypeListCat implementation now uses a fold expression over operator>> which is defined as a friend function that returns TypeList<Ts..., Us...>. However, the operator>> is declared but not defined in the TypeList class. The implementation relies on template argument deduction from a decltype expression in TypeListCat, which evaluates std::declval() >> .... This should work, but verify that the operator>> overload resolution is unambiguous and correctly deduces the result type.
| template <typename IndexSeq, size_t IdxValue> | ||
| struct GetIndex; | ||
|
|
||
| template <typename TupleType> | ||
| struct OperationTupleTypeToTypeList; | ||
| template <typename IndexSeq, typename IndexSeqOut, size_t IdxValue> | ||
| struct GetIndexHelper; | ||
|
|
||
| template <typename... Types> | ||
| struct OperationTupleTypeToTypeList<OperationTuple<Types...>> { | ||
| using type = TypeList<Types...>; | ||
| template <size_t... Idx, size_t... IdxOut, size_t IdxValue> | ||
| struct GetIndexHelper<std::index_sequence<Idx...>, std::index_sequence<IdxOut...>, IdxValue> { | ||
| static constexpr size_t value = []() { | ||
| size_t result = 0; | ||
| ((result = (Idx == IdxValue ? IdxOut : result)), ...); | ||
| return result; | ||
| }(); | ||
| }; | ||
|
|
||
| template <typename TupleType> | ||
| using OTToTypeList = typename OperationTupleTypeToTypeList<TupleType>::type; | ||
|
|
||
| template <typename TypeList_t> | ||
| struct TypeListToOperationTuple; | ||
|
|
||
| template <typename... Operations> | ||
| struct TypeListToOperationTuple<TypeList<Operations...>> { | ||
| using type = OperationTuple<Operations...>; | ||
| template <size_t... Idx, size_t IdxValue> | ||
| struct GetIndex<std::index_sequence<Idx...>, IdxValue> { | ||
| static constexpr size_t value = GetIndexHelper<std::index_sequence<Idx...>, decltype(std::make_index_sequence<sizeof...(Idx)>()), IdxValue>::value; | ||
| }; | ||
|
|
||
| template <typename TypeList_t> | ||
| using TypeListToOT = typename TypeListToOperationTuple<TypeList_t>::type; | ||
|
|
||
| template <int INDEX, typename TupleType> | ||
| using get_ot = TypeAt_t<INDEX, OTToTypeList<TupleType>>; | ||
|
|
||
| template <int INDEX, typename... OperationTypes> | ||
| FK_HOST_DEVICE_CNST auto getIOp(const OperationTuple<OperationTypes...>& instances) { | ||
| using SelectedOperation = get_ot<INDEX, OperationTuple<OperationTypes...>>; | ||
| if constexpr (isUnaryType<SelectedOperation>) { | ||
| return SelectedOperation::build(); | ||
| // As observed in get<>(Tuple<...>), returning a const& as auto, | ||
| // may lead to local memory accesses in the GPU | ||
| template <size_t Idx, typename... Operations> | ||
| FK_HOST_DEVICE_CNST decltype(auto) get_opt(const OperationTuple<Operations...>& opTuple){ | ||
| if constexpr (isUnaryType<TypeAt_t<Idx, TypeList<Operations...>>>) { | ||
| return typename TypeAt_t<Idx, TypeList<Operations...>>::Operation::InstantiableType{}; | ||
| } else { | ||
| return SelectedOperation::build(get<INDEX>(instances)); | ||
| return get<GetIndex<typename OperationTuple<Operations...>::Indexes, Idx>::value>(opTuple.instances); | ||
| } | ||
| } |
There was a problem hiding this comment.
The OperationTuple refactoring changes from a recursive linked-list structure (with instance and next members) to a flat structure with an instances Tuple member. The get_opt function now uses GetIndex to map from the operation index to the instance index, filtering out Unary operations. However, the implementation is complex and may have edge cases. Specifically, the GetIndexHelper uses a fold expression with side effects in the lambda, which relies on left-to-right evaluation order. While C++17 guarantees this for fold expressions, the logic should be carefully verified for correctness.
|
|
||
| // We perform 5 crops on the image | ||
| constexpr int BATCH = 10; | ||
| constexpr int BATCH = 100; |
There was a problem hiding this comment.
The change in BATCH size from 10 to 100 on line 466 significantly increases the test workload (10x increase). While this may be intentional to provide more thorough testing or benchmarking, it could significantly increase test execution time. Consider whether this increase is necessary or if it should be configurable.
| FK_HOST_DEVICE_CNST const T& operator[](const int index) const { | ||
| return values[index]; | ||
| } | ||
| FK_HOST_DEVICE_CNST Array() {} | ||
| FK_HOST_DEVICE_CNST T operator[](const size_t& index) const { | ||
| FK_HOST_DEVICE_CNST T& operator[](const int index) { | ||
| return values[index]; | ||
| } | ||
| FK_HOST_DEVICE_CNST T& operator[](const size_t& index) { | ||
| FK_HOST_DEVICE_CNST const T& operator[](const size_t index) const { | ||
| return values[index]; | ||
| } | ||
| FK_HOST_DEVICE_CNST T& operator[](const size_t index) { | ||
| return values[index]; | ||
| } |
There was a problem hiding this comment.
The Array struct changes its operator[] parameter type from const size_t& index to const int index and then adds separate overloads for both int and size_t. This could cause ambiguity issues when calling with unsigned integer types other than size_t. The const and non-const versions are also inconsistent in their parameter types (int vs size_t). Consider consolidating to a single template operator[] that accepts any integral type, or ensuring consistent parameter types across all overloads.
| @@ -169,12 +172,18 @@ FK_HOST_CNST auto then(const ContinuationIOp& cIOp, const ContinuationIOps&... c | |||
|
|
|||
| template <typename Input> | |||
| FK_HOST_DEVICE_CNST friend auto operator|(Input&& input, const OperationData<Operation_t>& opData) { | |||
| const auto result = Operation::exec(std::forward<Input>(input).input, opData); | |||
| return InputFoldType<std::decay_t<decltype(result)>>(std::forward<Input>(input).thread, result); | |||
| return InputFoldType<>::build(std::forward<Input>(input).thread, | |||
| Operation::exec(std::forward<Input>(input).input, opData)); | |||
| } | |||
| FK_HOST_DEVICE_CNST | |||
| friend typename Operation::OutputType operator|(const typename Operation::InputType &input, | |||
| const TernaryInstantiableOperation<Operation_t> &opData) { | |||
| return Operation::exec(input, opData.params, opData.backIOp); | |||
| } | |||
|
|
|||
| template <typename PreviousIOp, typename Fuser_t = Fuser> | |||
| FK_HOST_CNST friend auto operator&(PreviousIOp&& prevIOp, const TernaryInstantiableOperation<Operation_t>& self) { | |||
| FK_HOST_CNST friend auto operator&(PreviousIOp&& prevIOp, | |||
| const TernaryInstantiableOperation<Operation_t>& self) { | |||
| return Fuser_t::fuse(std::forward<PreviousIOp>(prevIOp), self); | |||
| } | |||
| }; | |||
| @@ -191,14 +200,21 @@ FK_HOST_CNST auto then(const ContinuationIOp& cIOp, const ContinuationIOps&... c | |||
| struct UnaryInstantiableOperation { | |||
| INSTANTIABLE_OPERATION_DETAILS_IS_ASSERT_THEN(UnaryType) | |||
|
|
|||
| template <typename Input> | |||
| FK_HOST_DEVICE_CNST friend auto operator|(Input&& input, const UnaryInstantiableOperation<Operation_t>& opData) { | |||
| const auto result = Operation::exec(std::forward<Input>(input).input); | |||
| return InputFoldType<std::decay_t<decltype(result)>>(std::forward<Input>(input).thread, result); | |||
| template <typename InputType> | |||
| FK_HOST_DEVICE_CNST friend auto operator|(const InputFoldType<InputType>& input, | |||
| const UnaryInstantiableOperation<Operation_t>& opData) { | |||
| return InputFoldType<>::build(input.thread, Operation::exec(input.input)); | |||
| } | |||
|
|
|||
| FK_HOST_DEVICE_CNST friend typename Operation::OutputType operator|( | |||
| const typename Operation::InputType& input, | |||
| const UnaryInstantiableOperation<Operation_t> &opData) { | |||
| return Operation::exec(input); | |||
There was a problem hiding this comment.
The operator| overloads for UnaryInstantiableOperation, BinaryInstantiableOperation, and TernaryInstantiableOperation now have two versions: one that works with InputFoldType for fold expressions, and one that works directly with the input type. The second version (e.g., lines 148-150, 209-212) returns the OutputType directly without wrapping in InputFoldType. This dual behavior should be carefully documented as it affects how these operations can be composed. The non-fold version is simpler but can't be used in the middle of a fold expression chain.
| FK_HOST_DEVICE_FUSE OutputType exec(const InputType &input, const OperationDataType &opData) { \ | ||
| return exec(input, opData.params); \ | ||
| } \ |
There was a problem hiding this comment.
The exec() functions in the DECLARE_*_PARENT macros now call exec() with unpacked parameters (e.g., exec(input, opData.params)) instead of calling Parent::exec(input, opData). This changes the function being called from a qualified call to an unqualified call, which will use ADL (Argument-Dependent Lookup) and may resolve to a different overload. This could be a subtle bug if there are multiple exec() overloads in scope. Consider making these calls explicitly qualified (e.g., Child::exec() or using CRTP pattern) to avoid ambiguity.
| @@ -257,14 +257,14 @@ namespace fk { // namespace FusedKernel | |||
| #endif | |||
| const Point thread{ x, y, z }; | |||
|
|
|||
| const ActiveThreads activeThreads = getActiveThreads(details, get<0>(iOps...)); | |||
| const ActiveThreads activeThreads = getActiveThreads(details, get_arg<0>(iOps...)); | |||
|
|
|||
| if (x < activeThreads.x && y < activeThreads.y) { | |||
| Parent::execute_thread(thread, activeThreads, iOps...); | |||
| } | |||
| } | |||
| }; | |||
| #endif // defined(__NVCC__) || defined(__HIPCC__) || defined(__NVRTC__) || defined(NVRTC_COMPILER) | |||
| #endif // defined(__NVCC__) || CLANG_HOST_DEVICE | |||
There was a problem hiding this comment.
The removal of VS2017_COMPILER checks and replacement with just CLANG_HOST_DEVICE on lines 246 and 344 indicates dropping support for Visual Studio 2017. This should be documented in the PR description or release notes as it affects supported compilers.
| /** | ||
| * @brief OpenInstantiableOperation: represents a IOp that takes the result of the previous IOp as input | ||
| * (which will reside on GPU registers) and ParamsType, plus a Point thread. | ||
| */ | ||
| template <typename Operation_t> | ||
| struct OpenInstantiableOperation final : public OperationData<Operation_t> { | ||
| INSTANTIABLE_OPERATION_DETAILS_IS(OpenType) | ||
| static_assert(std::is_same_v<typename Operation::InstanceType, OpenType>, | ||
| "Operation is not OpenType"); | ||
|
|
||
| INSTANTIABLE_OPERATION_THEN | ||
|
|
||
| template <typename Input> | ||
| FK_HOST_DEVICE_CNST friend auto operator|(Input&& input, const OperationData<Operation_t>& opData) { | ||
| return InputFoldType<>::build(input.thread, Operation::exec(std::forward<Input>(input).thread, | ||
| std::forward<Input>(input).input, opData)); | ||
| } | ||
|
|
||
| template <typename PreviousIOp, typename Fuser_t = Fuser> | ||
| FK_HOST_CNST friend auto operator&(PreviousIOp&& prevIOp, const OpenInstantiableOperation<Operation_t>& self) { | ||
| return Fuser_t::fuse(std::forward<PreviousIOp>(prevIOp), self); | ||
| } | ||
| }; | ||
|
|
||
| /** | ||
| * @brief ClosedInstantiableOperation: represents a IOp that does not take an input and does not return an output | ||
| * It only gets a thread index and it's parameters. It will read and write from/to global memory. | ||
| */ | ||
| template <typename Operation_t> | ||
| struct ClosedInstantiableOperation final : public OperationData<Operation_t> { | ||
| INSTANTIABLE_OPERATION_DETAILS_IS_ASSERT(ClosedType) | ||
|
|
||
| FK_HOST_DEVICE_CNST friend void operator|(const Point& thread, const OperationData<Operation_t>& opData) { | ||
| Operation::exec(thread, opData); | ||
| } | ||
| }; |
There was a problem hiding this comment.
The OpenInstantiableOperation and ClosedInstantiableOperation classes are newly added (lines 249-284). These represent new operation types in the framework. However:
- OpenInstantiableOperation supports fusion via operator& but ClosedInstantiableOperation does not
- ClosedInstantiableOperation only supports operator| with Point, effectively making it a terminal operation
- The asymmetry between these two (Open can fuse, Closed cannot) should be documented
Verify that this design is intentional and that use cases requiring fusion of Closed operations are properly handled or documented as unsupported.
| int launch() { | ||
|
|
||
| constexpr fk::Rect test(fk::Point(16, 32), fk::Size(32, 64)); | ||
| constexpr fk::Rect test(fk::Point{16, 32, 0}, fk::Size(32, 64)); |
There was a problem hiding this comment.
The change from using Point(x, y, z) constructor syntax to Point{x, y, z} aggregate initialization is consistent throughout the codebase. However, this requires that Point no longer has user-defined constructors (which was removed in point.h). While this is consistent, it's a breaking API change. All uses of Point construction now must use brace-initialization syntax. This could break external code that uses the old constructor syntax.
|
|
||
| FK_HOST_DEVICE_CNST InputFoldType(const Point& thread_) : thread(thread_) {} | ||
| template <typename InputT> | ||
| FK_HOST_DEVICE_FUSE auto build(const Point& thread, InputT&& input) { |
There was a problem hiding this comment.
The new InputFoldType specialization (lines 72-77) provides a build() function that constructs an InputFoldType with the concrete input type. However, this creates the type on-the-fly during fold expression evaluation. The original InputFoldType had a constructor that took only a Point. The new design changes how the type is used in fold expressions. Verify that all uses of operator| with Point correctly work with this new build pattern and that the automatic type deduction doesn't cause issues in complex expressions.
| FK_HOST_DEVICE_FUSE auto build(const Point& thread, InputT&& input) { | |
| FK_HOST_DEVICE_FUSE static auto build(const Point& thread, InputT&& input) { |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 56 out of 56 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 56 out of 56 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
No description provided.