GPU Framework: fixed mismatch between CUDA and HIP launch bounds definition by cima22 · Pull Request #14632 · AliceO2Group/AliceO2

cima22 · 2025-08-27T14:06:55Z

Handled mismatch between CUDA and HIP __launch_bounds__ definitions. Performances remain the same. Added GPUCA_CEIL_INT_DIV macro for integer division with ceiling; this ensures that when computing the number of warps (e.g., 1.4), the result is rounded up to the next integer (e.g., 2) so that enough active warps are allocated per execution unit.

…nitions

github-actions · 2025-08-27T14:07:06Z

REQUEST FOR PRODUCTION RELEASES:
To request your PR to be included in production software, please add the corresponding labels called "async-" to your PR. Add the labels directly (if you have the permissions) or add a comment of the form (note that labels are separated by a ",")

+async-label <label1>, <label2>, !<label3> ...

This will add <label1> and <label2> and removes <label3>.

The following labels are available
async-2023-pbpb-apass4
async-2023-pp-apass4
async-2024-pp-apass1
async-2022-pp-apass7
async-2024-pp-cpass0
async-2024-PbPb-apass1
async-2024-ppRef-apass1
async-2024-PbPb-apass2
async-2023-PbPb-apass5

davidrohr

Looks good, I have just one minor comment.
However, as discussed, I would not merge this now, since it can affect the performance. We should merge it once have update performance defaults for MI50 and MI100.
Meanwhile, I'd switch it to a draft PR, so that we do not waste CI resources.

davidrohr · 2025-08-29T12:01:38Z

GPU/GPUTracking/Definitions/GPUDefParametersLoad.template.inc

+  if (par.par_LB_maxThreads[i] > 0) {                                                                                                                    \
+    o << "#define GPUCA_LB_" GPUCA_M_STR(name) " " << par.par_LB_maxThreads[i];                                                                          \
+    if (par.par_LB_minBlocks[i] > 0) {                                                                                                                   \
+      o << ", " << GPUCA_CEIL_INT_DIV(par.par_LB_maxThreads[i] * par.par_LB_minBlocks[i], (minBlockFactor ? minBlockFactor : par.par_LB_maxThreads[i])); \


Could you add this as a function to GPUCommonMath.h? I'd prefer to have proper functions instead of macros in C++ code.

Adding the function to GPUCommonMath.h makes ROOT complain. Basically it cannot expand GPUd() macros contained in GPUCommonMath.h when using

bash -c "echo -e '#define GPUCA_GPUTYPE_${GPU_ARCH}\\n#define PARAMETER_FILE \"GPUDefParametersDefaults.h\"\\ngInterpreter->AddIncludePath(\"${CMAKE_CURRENT_SOURCE_DIR}/../Common\");\\ngInterpreter->AddIncludePath(\"${CMAKE_CURRENT_SOURCE_DIR}/Definitions\");\\ngInterpreter->AddIncludePath(\"${ON_THE_FLY_DIR}\");\\n.x ${CMAKE_CURRENT_SOURCE_DIR}/Standalone/tools/dumpGPUDefParam.C\\n.x ${CMAKE_CURRENT_SOURCE_DIR}/Standalone/tools/dumpGPUDefParam.C(\"${PARAMFILE}\")\\n.q\\n'" | root -l -b > /dev/null

For generating the .par files. Example:

/home/gcimador/alice/O2/GPU/GPUTracking/../Common/GPUCommonMath.h:51:3: error: a type specifier is required for all declarations GPUd() static float2 MakeFloat2(float x, float y); // TODO: Find better appraoch that is constexpr ^

Do you know if there is a quick fix? I am afraid ROOT does not expand correctly the macros here

ok, then just ignore my comment...

github-actions · 2025-09-29T01:17:57Z

This PR did not have any update in the last 30 days. Is it still needed? Unless further action in will be closed in 5 days.

GPU Framework: fixed mismatch between CUDA and HIP launch bounds defi…

173ade5

…nitions

cima22 requested a review from davidrohr as a code owner August 27, 2025 14:06

davidrohr reviewed Aug 29, 2025

View reviewed changes

cima22 marked this pull request as draft August 29, 2025 12:32

github-actions bot added the stale label Sep 29, 2025

github-actions bot closed this Oct 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU Framework: fixed mismatch between CUDA and HIP launch bounds definition#14632

GPU Framework: fixed mismatch between CUDA and HIP launch bounds definition#14632
cima22 wants to merge 1 commit intoAliceO2Group:devfrom
cima22:amd_launch_bounds_fix

cima22 commented Aug 27, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Aug 27, 2025

Uh oh!

davidrohr left a comment

Uh oh!

davidrohr Aug 29, 2025

Uh oh!

cima22 Aug 29, 2025

Uh oh!

davidrohr Aug 29, 2025

Uh oh!

github-actions bot commented Sep 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

Conversation

cima22 commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Aug 27, 2025

Uh oh!

davidrohr left a comment

Choose a reason for hiding this comment

Uh oh!

davidrohr Aug 29, 2025

Choose a reason for hiding this comment

Uh oh!

cima22 Aug 29, 2025

Choose a reason for hiding this comment

Uh oh!

davidrohr Aug 29, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Sep 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

cima22 commented Aug 27, 2025 •

edited

Loading