Skip to content

GPU writer overview loop: redundant cupy.copy() before in-place NaN rewrite #1948

@brendancol

Description

@brendancol

Where

xrspatial/geotiff/_writers/gpu.py:498 in write_geotiff_gpu's COG overview generation loop.

What

After each call to make_overview_gpu, the loop rewrites NaN cells back to the nodata sentinel via:

current = make_overview_gpu(current, method=overview_resampling, nodata=nodata)
cumulative_factor *= 2
if (nodata is not None
        and np.dtype(str(current.dtype)).kind == 'f'
        and not np.isnan(float(nodata))):
    nan_mask = cupy.isnan(current)
    if bool(nan_mask.any().item()):
        current = current.copy()
        current[nan_mask] = np.dtype(
            str(current.dtype)).type(nodata)

make_overview_gpu returns a freshly allocated cupy buffer at every call site:

  • 2D path: _block_reduce_2d_gpu returns the result of cupy.nanmean / cupy.nanmin / cupy.around(...).astype(...) / cropped[::2, ::2].copy() / cupy.asarray(cpu_result) -- all fresh allocations.
  • 3D path: cupy.stack(bands, axis=2) -- fresh allocation.

Nothing else aliases the buffer between the make_overview_gpu return and the in-place NaN rewrite, so the explicit current = current.copy() allocates a second chunk-sized GPU buffer just to mutate it.

Why it matters

For an 8192x8192 float32 raster with 4 auto-generated overview levels, the extra allocations sum to roughly:

(2048x2048 + 1024x1024 + 512x512 + 256x256) * 4 bytes ~= 21 MB

per write. Modest, but the pattern is identical to the one fixed in #1934 for _apply_nodata_mask_gpu: replace the copy() + indexed write with cupy.putmask(current, nan_mask, sentinel) so the existing buffer is mutated in place and one chunk-sized device allocation per overview level is skipped.

Suggested fix

current = make_overview_gpu(current, method=overview_resampling, nodata=nodata)
cumulative_factor *= 2
if (nodata is not None
        and np.dtype(str(current.dtype)).kind == 'f'
        and not np.isnan(float(nodata))):
    nan_mask = cupy.isnan(current)
    if bool(nan_mask.any().item()):
        cupy.putmask(current, nan_mask,
                     np.dtype(str(current.dtype)).type(nodata))

The fix mirrors the in-place sentinel rewrite already used on the freshly-allocated GPU buffer at line 426 in the NaN->sentinel branch above, and the _apply_nodata_mask_gpu rewrite in #1934.

Severity

LOW. Sentinel-poisoning on a multi-level pyramid is the only path that hits it, and the allocations are bounded by overview size (smaller than the full raster), but the fix is a one-line cupy.putmask substitution with the same correctness contract.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestgpuCuPy / CUDA GPU supportperformancePR touches performance-sensitive code

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions