Skip to content

nvCOMP decompress prefix-sum offsets: replace Python for loop with np.cumsum #1950

@brendancol

Description

@brendancol

Where

xrspatial/geotiff/_gpu_decode.py:1192-1196 inside _try_nvcomp_batch_decompress.

What

The batched host->device upload path computes per-tile offsets via a Python for loop:

comp_sizes_list = [len(t) for t in raw_tiles]
comp_offsets_h = np.zeros(n_tiles, dtype=np.int64)
for i in range(1, n_tiles):
    comp_offsets_h[i] = comp_offsets_h[i - 1] + comp_sizes_list[i - 1]
total_comp = sum(comp_sizes_list)

The sibling batched D2H helper _batched_d2h_to_bytes at line 924 uses the vectorised form:

offsets = np.empty(len(d_tiles) + 1, dtype=np.int64)
offsets[0] = 0
np.cumsum(sizes, out=offsets[1:])

Both helpers compute the same prefix sum; aligning the decompress side keeps the codebase consistent and trims interpreter overhead.

Why it matters

Microbench on 1024 tiles with random sizes:

Method Time (us)
Python for loop 84
np.cumsum with out= 21

3.9x speedup, but only ~60us per nvCOMP decompress call -- low enough that it does not show up as a perf bottleneck. The motivation is consistency with the sibling helper at line 924 and the existing np.cumsum(comp_sizes, out=offsets[1:]) pattern in _nvcomp_batch_compress at line 2572.

Suggested fix

comp_sizes_arr = np.fromiter((len(t) for t in raw_tiles), dtype=np.int64, count=n_tiles)
comp_offsets_h = np.zeros(n_tiles, dtype=np.int64)
if n_tiles > 1:
    np.cumsum(comp_sizes_arr[:-1], out=comp_offsets_h[1:])
total_comp = int(comp_sizes_arr.sum())

Drops the Python loop and the second pass over comp_sizes_list for sum().

Severity

LOW. The fix is a few lines and aligns with the codebase's other batched-transfer helpers; the wall-time delta on a typical nvCOMP decode is below 0.1% of the kernel runtime.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestgpuCuPy / CUDA GPU supportperformancePR touches performance-sensitive code

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions