nvCOMP decompress prefix-sum offsets: replace Python for loop with np.cumsum

### Where

`xrspatial/geotiff/_gpu_decode.py:1192-1196` inside `_try_nvcomp_batch_decompress`.

### What

The batched host->device upload path computes per-tile offsets via a Python `for` loop:

```python
comp_sizes_list = [len(t) for t in raw_tiles]
comp_offsets_h = np.zeros(n_tiles, dtype=np.int64)
for i in range(1, n_tiles):
    comp_offsets_h[i] = comp_offsets_h[i - 1] + comp_sizes_list[i - 1]
total_comp = sum(comp_sizes_list)
```

The sibling batched D2H helper `_batched_d2h_to_bytes` at line 924 uses the vectorised form:

```python
offsets = np.empty(len(d_tiles) + 1, dtype=np.int64)
offsets[0] = 0
np.cumsum(sizes, out=offsets[1:])
```

Both helpers compute the same prefix sum; aligning the decompress side keeps the codebase consistent and trims interpreter overhead.

### Why it matters

Microbench on 1024 tiles with random sizes:

| Method | Time (us) |
|--------|-----------|
| Python for loop | 84 |
| `np.cumsum` with `out=` | 21 |

3.9x speedup, but only ~60us per nvCOMP decompress call -- low enough that it does not show up as a perf bottleneck. The motivation is consistency with the sibling helper at line 924 and the existing `np.cumsum(comp_sizes, out=offsets[1:])` pattern in `_nvcomp_batch_compress` at line 2572.

### Suggested fix

```python
comp_sizes_arr = np.fromiter((len(t) for t in raw_tiles), dtype=np.int64, count=n_tiles)
comp_offsets_h = np.zeros(n_tiles, dtype=np.int64)
if n_tiles > 1:
    np.cumsum(comp_sizes_arr[:-1], out=comp_offsets_h[1:])
total_comp = int(comp_sizes_arr.sum())
```

Drops the Python loop and the second pass over `comp_sizes_list` for `sum()`.

### Severity

LOW. The fix is a few lines and aligns with the codebase's other batched-transfer helpers; the wall-time delta on a typical nvCOMP decode is below 0.1% of the kernel runtime.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nvCOMP decompress prefix-sum offsets: replace Python for loop with np.cumsum #1950

Where

What

Why it matters

Suggested fix

Severity

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

nvCOMP decompress prefix-sum offsets: replace Python for loop with np.cumsum #1950

Description

Where

What

Why it matters

Suggested fix

Severity

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions