Preheat host memory for OpenCL read_host_from_device#21134
Conversation
|
Seems good on my GB10. This also seems faster than my PR #21069 although they aren't rebased on the same parent commit so the benchmark runs aren't directly comparable. Using the 61MP benchmark from https://darktable.info/performance/benchmarks-beispiele/benchmark/: NVIDIA driver version 580.159.03 Builds:
Run controls:
Measured pixelpipe times:
OpenCL image readback profiling:
OpenCL command queue totals:
notebook tilingBy the way, in #21069 (comment) you also mentioned using the "notebook" resource level. But it's a LOT slower. Run controls:
Measured pixelpipe times:
OpenCL image readback profiling:
OpenCL command queue totals:
Anyway, seems good. Ship it!!!!!!! |
|
Ok, so
|
As some some systems are very slow in `clEnqueueReadImage()` if the host memoty is still cold we use linux specific `madvise()` for the host mem before. 1. Logs telling about errors in madvise() or if in -d verbose -d opencl debugging mode. 2. The preheating feature is currently enabled via the hidden `opencl_preheated` conf
a589317 to
9e36df0
Compare
|
In fact - no. If you provide a "-d opencl -d pipe" log when you observe this - maybe i spot something "picky". |
That would be pinned/mapped mode and lots of mem transfers. |
|
Never mind, I was dumb, I didn't turn on the Before: After: About 10 seconds --> 0.8 second. This PR does in fact fix the problem and I'll be using it on my DGX Spark from now on. |
|
Fine. You might still want to check if buffer reading should use the same preheating. That would be Would be good to know so we can possibly backreport to nvidia folks. |
|
@TurboGit would you revert this right now please! Nothing for master in this form... I didnt notice i pushed on any merge button and wasnt yet aware of my rights to do so. |
|
@victoryforce TIA |
|
@jenshannoschwalm Just above this, in the same line where the information that you "merged commit 475a9f7 into darktable-org:master" is, there is a "Revert" button. You definitely have the rights to do this. Maybe it would be better if you did the revert yourself, so that we don't doubt whether we understood you correctly? |
|
Did so. |


As some some systems are very slow in
clEnqueueReadImage()if the host memory is still cold we use linux specificmadvise()for the host mem before.opencl_preheatedconf@dllu as you did the initial work on #21069 would you be able to test this?
@karolherbst would you mind to review? I am a bit worried about the
#define __USE_MISCi had to add here, otherwise onlyposix_advise()was available (habing less precise hints.)I tested here on strix halo/fedora44 44 so lots of unified mem. No problems/instability, could not see performance changes both for rusticl and rocm.