GPU Framework: fixed mismatch between CUDA and HIP launch bounds definition#14632
GPU Framework: fixed mismatch between CUDA and HIP launch bounds definition#14632cima22 wants to merge 1 commit intoAliceO2Group:devfrom
Conversation
|
REQUEST FOR PRODUCTION RELEASES: This will add The following labels are available |
davidrohr
left a comment
There was a problem hiding this comment.
Looks good, I have just one minor comment.
However, as discussed, I would not merge this now, since it can affect the performance. We should merge it once have update performance defaults for MI50 and MI100.
Meanwhile, I'd switch it to a draft PR, so that we do not waste CI resources.
| if (par.par_LB_maxThreads[i] > 0) { \ | ||
| o << "#define GPUCA_LB_" GPUCA_M_STR(name) " " << par.par_LB_maxThreads[i]; \ | ||
| if (par.par_LB_minBlocks[i] > 0) { \ | ||
| o << ", " << GPUCA_CEIL_INT_DIV(par.par_LB_maxThreads[i] * par.par_LB_minBlocks[i], (minBlockFactor ? minBlockFactor : par.par_LB_maxThreads[i])); \ |
There was a problem hiding this comment.
Could you add this as a function to GPUCommonMath.h? I'd prefer to have proper functions instead of macros in C++ code.
There was a problem hiding this comment.
Adding the function to GPUCommonMath.h makes ROOT complain. Basically it cannot expand GPUd() macros contained in GPUCommonMath.h when using
bash -c "echo -e '#define GPUCA_GPUTYPE_${GPU_ARCH}\\n#define PARAMETER_FILE \"GPUDefParametersDefaults.h\"\\ngInterpreter->AddIncludePath(\"${CMAKE_CURRENT_SOURCE_DIR}/../Common\");\\ngInterpreter->AddIncludePath(\"${CMAKE_CURRENT_SOURCE_DIR}/Definitions\");\\ngInterpreter->AddIncludePath(\"${ON_THE_FLY_DIR}\");\\n.x ${CMAKE_CURRENT_SOURCE_DIR}/Standalone/tools/dumpGPUDefParam.C\\n.x ${CMAKE_CURRENT_SOURCE_DIR}/Standalone/tools/dumpGPUDefParam.C(\"${PARAMFILE}\")\\n.q\\n'" | root -l -b > /dev/null
For generating the .par files. Example:
/home/gcimador/alice/O2/GPU/GPUTracking/../Common/GPUCommonMath.h:51:3: error: a type specifier is required for all declarations
GPUd() static float2 MakeFloat2(float x, float y); // TODO: Find better appraoch that is constexpr
^
Do you know if there is a quick fix? I am afraid ROOT does not expand correctly the macros here
There was a problem hiding this comment.
ok, then just ignore my comment...
|
This PR did not have any update in the last 30 days. Is it still needed? Unless further action in will be closed in 5 days. |
Handled mismatch between CUDA and HIP __launch_bounds__ definitions. Performances remain the same. Added GPUCA_CEIL_INT_DIV macro for integer division with ceiling; this ensures that when computing the number of warps (e.g., 1.4), the result is rounded up to the next integer (e.g., 2) so that enough active warps are allocated per execution unit.