Skip to content

Comments

Merge branch 'LTS-C++17'#205

Merged
morousg merged 1 commit intomainfrom
Update-from-LTS-C++17
Feb 19, 2026
Merged

Merge branch 'LTS-C++17'#205
morousg merged 1 commit intomainfrom
Update-from-LTS-C++17

Conversation

@morousg
Copy link
Member

@morousg morousg commented Feb 18, 2026

  • LTS-C++17: (157 commits) Made all values that reside in registers, to be passed as const value, instead of const reference, to be more consistent with the reality of the variables and values. If nvcc where very strict, or there where no inlining in one function, it would trigger local memory usage. Simplified get wrapper and get_opt implementations Making get_opt generic like get Added missing include and fixing comment typos Removed unnecessary cast Using correct macro in OpenType Instantiable Operation further simplifying operation_types file Removing unnecesary code and adapting some benchmarks to new Point aggregate Making make and cat for OperationTuple host only functions. They should be used only on CPU. Fixing comments and removing unnecesary decltype(auto) Making OperationTuple an aggregate Removing unnecessary decltype(auto) Making some changes to avoid confusion with the two types of operator| in InstantiableOperations Added operator[] implementations with index passed as size_t Removing unnecesary decltype(auto), and adding a comment. Reducing function call depth in all operations exec and build functions, by moving their definition to the macros. Fixing bugs in memory operations Removing unnecesary decltyp(auto) Cleaning unnecessary decltype(auto) usage Revert .gitignore changes - remove CodeQL entries ...

Conflicts:

.github/workflows/cmake-linux-amd64.yml

.github/workflows/cmake-linux-arm64.yml

.github/workflows/cmake-windows-amd64.yml

CMakeLists.txt

README.md

cmake/libs/cuda/target_generation.cmake

include/fused_kernel/core/execution_model/data_parallel_patterns.h # include/fused_kernel/core/execution_model/operation_model/iop_fuser.h # include/fused_kernel/core/utils/utils.h

lib/export/FKLConfigVersion.cmake

tests/examples/inlining_and_LDL_STL.h

tests/examples/readme_test_code.h

* LTS-C++17: (157 commits)
  Made all values that reside in registers, to be passed as const value, instead of const reference, to be more consistent with the reality of the variables and values. If nvcc where very strict, or there where no inlining in one function, it would trigger local memory usage.
  Simplified get wrapper and get_opt implementations
  Making get_opt generic like get
  Added missing include and fixing comment typos
  Removed unnecessary cast
  Using correct macro in OpenType Instantiable Operation
  further simplifying operation_types file
  Removing unnecesary code and adapting some benchmarks to new Point aggregate
  Making make and cat for OperationTuple host only functions. They should be used only on CPU.
  Fixing comments and removing unnecesary decltype(auto)
  Making OperationTuple an aggregate
  Removing unnecessary decltype(auto)
  Making some changes to avoid confusion with the two types of operator| in InstantiableOperations
  Added operator[] implementations with index passed as size_t
  Removing unnecesary decltype(auto), and adding a comment.
  Reducing function call depth in all operations exec and build functions, by moving their definition to the macros.
  Fixing bugs in memory operations
  Removing unnecesary decltyp(auto)
  Cleaning unnecessary decltype(auto) usage
  Revert .gitignore changes - remove CodeQL entries
  ...

# Conflicts:
#	.github/workflows/cmake-linux-amd64.yml
#	.github/workflows/cmake-linux-arm64.yml
#	.github/workflows/cmake-windows-amd64.yml
#	CMakeLists.txt
#	README.md
#	cmake/libs/cuda/target_generation.cmake
#	include/fused_kernel/core/execution_model/data_parallel_patterns.h
#	include/fused_kernel/core/execution_model/operation_model/iop_fuser.h
#	include/fused_kernel/core/utils/utils.h
#	lib/export/FKLConfigVersion.cmake
#	tests/examples/inlining_and_LDL_STL.h
#	tests/examples/readme_test_code.h
@morousg morousg self-assigned this Feb 18, 2026
Copilot AI review requested due to automatic review settings February 18, 2026 16:50
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR merges the LTS-C++17 branch, bringing a broad refactor of the fused-kernel execution/operation model APIs (notably operation typing, tuple storage/access, and fusion/execution plumbing), plus corresponding updates across unit tests, examples, and benchmarks.

Changes:

  • Refactors operation typing and fusion APIs (e.g., opIs<>, new OperationTuple storage/access via get_opt, updated fusers/executors).
  • Shifts many operation exec() signatures from const T& to T (by-value) for register-resident values and consistency across CPU/CUDA compilation.
  • Updates tests/examples/benchmarks to the new APIs and data structures; adds a few new test/example files.

Reviewed changes

Copilot reviewed 70 out of 70 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
utests/core/execution_model/utest_executors.h Updates back-fusion test usage/API
utests/algorithm/image_processing/utest_saturate/utest_saturate_common.h Adjusts expected-value helper signature
tests/operation_test_utils.h Updates test utilities for new Point/tuple APIs
tests/operation/test_operation_types.h Updates operation-type tests to opIs<>/new types
tests/operation/test_operation_tuple.h Migrates to make_new_operation_tuple/get_opt
tests/operation/test_operation_fuser.h Updates fuser tests for new fused op model
tests/operation/test_instantiable_operations.h Updates instantiable op tests for new typing/tuple model
tests/operation/test_fused_operation.h Updates fused-op type assertions; adds deep-type static asserts
tests/operation/test_filtered_index_sequence.h Updates restriction/filter tests for new restriction/type model
tests/examples/readme_test_code.h Updates README example to new includes/constexpr usage
tests/examples/compiler_explorer_example.h Adds compiler-explorer oriented example
tests/data/test_tuple.h Updates tuple tests + NVCC-guard logic
tests/data/test_rect.h Updates Rect/Point construction usage
tests/data/test_ptr_nd.h Updates Point usage in ptr tests
tests/data/basic_test.h Updates includes/types/tuple access patterns
tests/cudabug/test_nvcc131_BugReproducer.h Adds NVCC 13.1 bug reproducer
tests/buildAPI/batch_build_compilation_time.h Adds build-API compilation-time benchmark/test
tests/algorithm/test_warp.h Updates include path for memory operations
tests/algorithm/test_deinterlace.h Updates include path for memory operations
tests/algorithm/test_crop.h Updates type-trait checks to opIs<>
tests/algorithm/test_border_reader.h Updates include path for memory operations
tests/algorithm/test_batchresize_build.h Updates include path for memory operations
include/fused_kernel/core/utils/vector_utils.h Updates static_get + type-list utilities
include/fused_kernel/core/utils/utils.h Moves compiler macros into new header include
include/fused_kernel/core/utils/type_lists.h Reimplements TypeList indexing/cat utilities
include/fused_kernel/core/utils/parameter_pack_utils.h Replaces recursive pack get with tuple-based get_arg
include/fused_kernel/core/utils/compiler_macros.h.rej (Artifact) rejected-hunks file added
include/fused_kernel/core/utils/compiler_macros.h Introduces CLANG_HOST_DEVICE macro header
include/fused_kernel/core/execution_model/thread_fusion.h Updates typelists + Point construction
include/fused_kernel/core/execution_model/parallel_architectures.h Adjusts default arch selection + includes
include/fused_kernel/core/execution_model/operation_model/vector_operations.h Passes inputs by value to exec()
include/fused_kernel/core/execution_model/operation_model/operation_types.h Consolidates type predicates into OpIs/opIs
include/fused_kernel/core/execution_model/operation_model/operation_tuple.h New OperationTuple storage model + get_opt/cat
include/fused_kernel/core/execution_model/operation_model/operation_data.h Simplifies OperationData specializations; adds Open/Closed
include/fused_kernel/core/execution_model/operation_model/iop_fuser.h Refactors fuse API + adds BackFuser
include/fused_kernel/core/execution_model/operation_model/instantiable_operations.h Adds fold-based `operator
include/fused_kernel/core/execution_model/executors.h Refactors executor back-fusion plumbing using tuples/apply
include/fused_kernel/core/execution_model/data_parallel_patterns.h Switches execution to fold-based `operator
include/fused_kernel/core/data/vector_types.h Simplifies NVRTC include handling
include/fused_kernel/core/data/rect.h Passes Point/Size by value in Rect ctor
include/fused_kernel/core/data/rawptr.h Initializes dims; passes Point by value in accessors
include/fused_kernel/core/data/ptr_nd.h Makes Point usage consistent; updates at()/crop() signatures
include/fused_kernel/core/data/point.h Removes explicit Point_ constructors (aggregate init)
include/fused_kernel/core/data/circular_tensor.h Updates include path for memory operations
include/fused_kernel/core/data/array.h Splits Array vs ArrayVector + adds size_t operator[]
include/fused_kernel/core/constexpr_libs/constexpr_vector.h Switches to new static_get function + type aliases
include/fused_kernel/algorithms/image_processing/warping.h Pass-by-value exec + type alias updates
include/fused_kernel/algorithms/image_processing/saturate.h Pass-by-value exec
include/fused_kernel/algorithms/image_processing/resize.h Updates traits (opIs) + Point init + type aliases
include/fused_kernel/algorithms/image_processing/interpolation.h Pass-by-value exec + Point init + type aliases
include/fused_kernel/algorithms/image_processing/image.h Updates crop/readAt Point handling
include/fused_kernel/algorithms/image_processing/deinterlace.h Pass-by-value exec + Point init + type aliases
include/fused_kernel/algorithms/image_processing/crop.h Pass-by-value exec + Point math/init
include/fused_kernel/algorithms/image_processing/color_conversion.h Pass-by-value exec + NV12 plane offset fixes/cleanup
include/fused_kernel/algorithms/image_processing/border_reader.h Pass-by-value exec + minor build() adjustment
include/fused_kernel/algorithms/basic_ops/vector_ops.h Pass-by-value exec + static_get callsite updates
include/fused_kernel/algorithms/basic_ops/static_loop.h Pass-by-value exec
include/fused_kernel/algorithms/basic_ops/set.h Pass-by-value exec + Point init
include/fused_kernel/algorithms/basic_ops/logical.h Pass-by-value exec
include/fused_kernel/algorithms/basic_ops/cast.h Pass-by-value exec
include/fused_kernel/algorithms/basic_ops/arithmetic.h Pass-by-value exec
include/fused_kernel/algorithms/basic_ops/algebraic.h Pass-by-value exec
benchmarks/fusion/benchmark_horizontal_fusion.h Updates Point init in crop benchmarking
benchmarks/fkBenchmarksCommon.h Updates iteration types + Point init for comparisons

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 70 out of 70 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@morousg morousg merged commit 3d6ec1e into main Feb 19, 2026
20 checks passed
@morousg morousg deleted the Update-from-LTS-C++17 branch February 19, 2026 08:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant