Conversation
* LTS-C++17: (157 commits) Made all values that reside in registers, to be passed as const value, instead of const reference, to be more consistent with the reality of the variables and values. If nvcc where very strict, or there where no inlining in one function, it would trigger local memory usage. Simplified get wrapper and get_opt implementations Making get_opt generic like get Added missing include and fixing comment typos Removed unnecessary cast Using correct macro in OpenType Instantiable Operation further simplifying operation_types file Removing unnecesary code and adapting some benchmarks to new Point aggregate Making make and cat for OperationTuple host only functions. They should be used only on CPU. Fixing comments and removing unnecesary decltype(auto) Making OperationTuple an aggregate Removing unnecessary decltype(auto) Making some changes to avoid confusion with the two types of operator| in InstantiableOperations Added operator[] implementations with index passed as size_t Removing unnecesary decltype(auto), and adding a comment. Reducing function call depth in all operations exec and build functions, by moving their definition to the macros. Fixing bugs in memory operations Removing unnecesary decltyp(auto) Cleaning unnecessary decltype(auto) usage Revert .gitignore changes - remove CodeQL entries ... # Conflicts: # .github/workflows/cmake-linux-amd64.yml # .github/workflows/cmake-linux-arm64.yml # .github/workflows/cmake-windows-amd64.yml # CMakeLists.txt # README.md # cmake/libs/cuda/target_generation.cmake # include/fused_kernel/core/execution_model/data_parallel_patterns.h # include/fused_kernel/core/execution_model/operation_model/iop_fuser.h # include/fused_kernel/core/utils/utils.h # lib/export/FKLConfigVersion.cmake # tests/examples/inlining_and_LDL_STL.h # tests/examples/readme_test_code.h
Contributor
There was a problem hiding this comment.
Pull request overview
This PR merges the LTS-C++17 branch, bringing a broad refactor of the fused-kernel execution/operation model APIs (notably operation typing, tuple storage/access, and fusion/execution plumbing), plus corresponding updates across unit tests, examples, and benchmarks.
Changes:
- Refactors operation typing and fusion APIs (e.g.,
opIs<>, newOperationTuplestorage/access viaget_opt, updated fusers/executors). - Shifts many operation
exec()signatures fromconst T&toT(by-value) for register-resident values and consistency across CPU/CUDA compilation. - Updates tests/examples/benchmarks to the new APIs and data structures; adds a few new test/example files.
Reviewed changes
Copilot reviewed 70 out of 70 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| utests/core/execution_model/utest_executors.h | Updates back-fusion test usage/API |
| utests/algorithm/image_processing/utest_saturate/utest_saturate_common.h | Adjusts expected-value helper signature |
| tests/operation_test_utils.h | Updates test utilities for new Point/tuple APIs |
| tests/operation/test_operation_types.h | Updates operation-type tests to opIs<>/new types |
| tests/operation/test_operation_tuple.h | Migrates to make_new_operation_tuple/get_opt |
| tests/operation/test_operation_fuser.h | Updates fuser tests for new fused op model |
| tests/operation/test_instantiable_operations.h | Updates instantiable op tests for new typing/tuple model |
| tests/operation/test_fused_operation.h | Updates fused-op type assertions; adds deep-type static asserts |
| tests/operation/test_filtered_index_sequence.h | Updates restriction/filter tests for new restriction/type model |
| tests/examples/readme_test_code.h | Updates README example to new includes/constexpr usage |
| tests/examples/compiler_explorer_example.h | Adds compiler-explorer oriented example |
| tests/data/test_tuple.h | Updates tuple tests + NVCC-guard logic |
| tests/data/test_rect.h | Updates Rect/Point construction usage |
| tests/data/test_ptr_nd.h | Updates Point usage in ptr tests |
| tests/data/basic_test.h | Updates includes/types/tuple access patterns |
| tests/cudabug/test_nvcc131_BugReproducer.h | Adds NVCC 13.1 bug reproducer |
| tests/buildAPI/batch_build_compilation_time.h | Adds build-API compilation-time benchmark/test |
| tests/algorithm/test_warp.h | Updates include path for memory operations |
| tests/algorithm/test_deinterlace.h | Updates include path for memory operations |
| tests/algorithm/test_crop.h | Updates type-trait checks to opIs<> |
| tests/algorithm/test_border_reader.h | Updates include path for memory operations |
| tests/algorithm/test_batchresize_build.h | Updates include path for memory operations |
| include/fused_kernel/core/utils/vector_utils.h | Updates static_get + type-list utilities |
| include/fused_kernel/core/utils/utils.h | Moves compiler macros into new header include |
| include/fused_kernel/core/utils/type_lists.h | Reimplements TypeList indexing/cat utilities |
| include/fused_kernel/core/utils/parameter_pack_utils.h | Replaces recursive pack get with tuple-based get_arg |
| include/fused_kernel/core/utils/compiler_macros.h.rej | (Artifact) rejected-hunks file added |
| include/fused_kernel/core/utils/compiler_macros.h | Introduces CLANG_HOST_DEVICE macro header |
| include/fused_kernel/core/execution_model/thread_fusion.h | Updates typelists + Point construction |
| include/fused_kernel/core/execution_model/parallel_architectures.h | Adjusts default arch selection + includes |
| include/fused_kernel/core/execution_model/operation_model/vector_operations.h | Passes inputs by value to exec() |
| include/fused_kernel/core/execution_model/operation_model/operation_types.h | Consolidates type predicates into OpIs/opIs |
| include/fused_kernel/core/execution_model/operation_model/operation_tuple.h | New OperationTuple storage model + get_opt/cat |
| include/fused_kernel/core/execution_model/operation_model/operation_data.h | Simplifies OperationData specializations; adds Open/Closed |
| include/fused_kernel/core/execution_model/operation_model/iop_fuser.h | Refactors fuse API + adds BackFuser |
| include/fused_kernel/core/execution_model/operation_model/instantiable_operations.h | Adds fold-based `operator |
| include/fused_kernel/core/execution_model/executors.h | Refactors executor back-fusion plumbing using tuples/apply |
| include/fused_kernel/core/execution_model/data_parallel_patterns.h | Switches execution to fold-based `operator |
| include/fused_kernel/core/data/vector_types.h | Simplifies NVRTC include handling |
| include/fused_kernel/core/data/rect.h | Passes Point/Size by value in Rect ctor |
| include/fused_kernel/core/data/rawptr.h | Initializes dims; passes Point by value in accessors |
| include/fused_kernel/core/data/ptr_nd.h | Makes Point usage consistent; updates at()/crop() signatures |
| include/fused_kernel/core/data/point.h | Removes explicit Point_ constructors (aggregate init) |
| include/fused_kernel/core/data/circular_tensor.h | Updates include path for memory operations |
| include/fused_kernel/core/data/array.h | Splits Array vs ArrayVector + adds size_t operator[] |
| include/fused_kernel/core/constexpr_libs/constexpr_vector.h | Switches to new static_get function + type aliases |
| include/fused_kernel/algorithms/image_processing/warping.h | Pass-by-value exec + type alias updates |
| include/fused_kernel/algorithms/image_processing/saturate.h | Pass-by-value exec |
| include/fused_kernel/algorithms/image_processing/resize.h | Updates traits (opIs) + Point init + type aliases |
| include/fused_kernel/algorithms/image_processing/interpolation.h | Pass-by-value exec + Point init + type aliases |
| include/fused_kernel/algorithms/image_processing/image.h | Updates crop/readAt Point handling |
| include/fused_kernel/algorithms/image_processing/deinterlace.h | Pass-by-value exec + Point init + type aliases |
| include/fused_kernel/algorithms/image_processing/crop.h | Pass-by-value exec + Point math/init |
| include/fused_kernel/algorithms/image_processing/color_conversion.h | Pass-by-value exec + NV12 plane offset fixes/cleanup |
| include/fused_kernel/algorithms/image_processing/border_reader.h | Pass-by-value exec + minor build() adjustment |
| include/fused_kernel/algorithms/basic_ops/vector_ops.h | Pass-by-value exec + static_get callsite updates |
| include/fused_kernel/algorithms/basic_ops/static_loop.h | Pass-by-value exec |
| include/fused_kernel/algorithms/basic_ops/set.h | Pass-by-value exec + Point init |
| include/fused_kernel/algorithms/basic_ops/logical.h | Pass-by-value exec |
| include/fused_kernel/algorithms/basic_ops/cast.h | Pass-by-value exec |
| include/fused_kernel/algorithms/basic_ops/arithmetic.h | Pass-by-value exec |
| include/fused_kernel/algorithms/basic_ops/algebraic.h | Pass-by-value exec |
| benchmarks/fusion/benchmark_horizontal_fusion.h | Updates Point init in crop benchmarking |
| benchmarks/fkBenchmarksCommon.h | Updates iteration types + Point init for comparisons |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
include/fused_kernel/core/execution_model/operation_model/operation_types.h
Show resolved
Hide resolved
Contributor
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 70 out of 70 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Conflicts:
.github/workflows/cmake-linux-amd64.yml
.github/workflows/cmake-linux-arm64.yml
.github/workflows/cmake-windows-amd64.yml
CMakeLists.txt
README.md
cmake/libs/cuda/target_generation.cmake
include/fused_kernel/core/execution_model/data_parallel_patterns.h # include/fused_kernel/core/execution_model/operation_model/iop_fuser.h # include/fused_kernel/core/utils/utils.h
lib/export/FKLConfigVersion.cmake
tests/examples/inlining_and_LDL_STL.h
tests/examples/readme_test_code.h