GPU writer overview loop: redundant cupy.copy() before in-place NaN rewrite

### Where

`xrspatial/geotiff/_writers/gpu.py:498` in `write_geotiff_gpu`'s COG overview generation loop.

### What

After each call to `make_overview_gpu`, the loop rewrites NaN cells back to the nodata sentinel via:

```python
current = make_overview_gpu(current, method=overview_resampling, nodata=nodata)
cumulative_factor *= 2
if (nodata is not None
        and np.dtype(str(current.dtype)).kind == 'f'
        and not np.isnan(float(nodata))):
    nan_mask = cupy.isnan(current)
    if bool(nan_mask.any().item()):
        current = current.copy()
        current[nan_mask] = np.dtype(
            str(current.dtype)).type(nodata)
```

`make_overview_gpu` returns a freshly allocated cupy buffer at every call site:

- 2D path: `_block_reduce_2d_gpu` returns the result of `cupy.nanmean` / `cupy.nanmin` / `cupy.around(...).astype(...)` / `cropped[::2, ::2].copy()` / `cupy.asarray(cpu_result)` -- all fresh allocations.
- 3D path: `cupy.stack(bands, axis=2)` -- fresh allocation.

Nothing else aliases the buffer between the `make_overview_gpu` return and the in-place NaN rewrite, so the explicit `current = current.copy()` allocates a second chunk-sized GPU buffer just to mutate it.

### Why it matters

For an 8192x8192 float32 raster with 4 auto-generated overview levels, the extra allocations sum to roughly:

```
(2048x2048 + 1024x1024 + 512x512 + 256x256) * 4 bytes ~= 21 MB
```

per write. Modest, but the pattern is identical to the one fixed in #1934 for `_apply_nodata_mask_gpu`: replace the `copy()` + indexed write with `cupy.putmask(current, nan_mask, sentinel)` so the existing buffer is mutated in place and one chunk-sized device allocation per overview level is skipped.

### Suggested fix

```python
current = make_overview_gpu(current, method=overview_resampling, nodata=nodata)
cumulative_factor *= 2
if (nodata is not None
        and np.dtype(str(current.dtype)).kind == 'f'
        and not np.isnan(float(nodata))):
    nan_mask = cupy.isnan(current)
    if bool(nan_mask.any().item()):
        cupy.putmask(current, nan_mask,
                     np.dtype(str(current.dtype)).type(nodata))
```

The fix mirrors the in-place sentinel rewrite already used on the freshly-allocated GPU buffer at line 426 in the NaN->sentinel branch above, and the `_apply_nodata_mask_gpu` rewrite in #1934.

### Severity

LOW. Sentinel-poisoning on a multi-level pyramid is the only path that hits it, and the allocations are bounded by overview size (smaller than the full raster), but the fix is a one-line `cupy.putmask` substitution with the same correctness contract.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU writer overview loop: redundant cupy.copy() before in-place NaN rewrite #1948

Where

What

Why it matters

Suggested fix

Severity

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

GPU writer overview loop: redundant cupy.copy() before in-place NaN rewrite #1948

Description

Where

What

Why it matters

Suggested fix

Severity

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions