Skip to content

geotiff: to_geotiff turns no-georef int64 pixel coords into a fake transform on round-trip #1949

@brendancol

Description

@brendancol

A round-trip of a no-georef TIFF through open_geotiff -> to_geotiff -> open_geotiff silently injects a synthetic attrs['transform'] and switches the y/x coord dtype from int64 to float64.

The read side documents the no-georef contract: integer pixel coords [0, 1, 2, ...] with int64 dtype, no attrs['transform'] (#1710, #1753). The write side does not honour the same signal -- _coords_to_transform infers a unit GeoTransform(origin_x=-0.5, origin_y=-0.5, pixel_width=1.0, pixel_height=1.0) from [0, 1, 2, ...] coords and writes it as real ModelPixelScale / ModelTiepoint tags. On the next read the file looks like a georeferenced raster and the no-georef branch never fires.

Reproduction

import numpy as np
import tempfile, os
import tifffile
from xrspatial.geotiff import open_geotiff, to_geotiff

with tempfile.TemporaryDirectory() as td:
    src = os.path.join(td, "src.tif")
    arr = np.arange(20, dtype=np.float32).reshape(4, 5)
    tifffile.imwrite(src, arr, photometric='minisblack', planarconfig='contig')

    da = open_geotiff(src)
    print(da.coords['y'].dtype, da.coords['y'].values)
    # int64 [0 1 2 3]
    print('transform' in da.attrs)
    # False

    out = os.path.join(td, "out.tif")
    to_geotiff(da, out, compression='none')

    da2 = open_geotiff(out)
    print(da2.coords['y'].dtype, da2.coords['y'].values)
    # float64 [0. 1. 2. 3.]
    print(da2.attrs.get('transform'))
    # (1.0, 0.0, -0.5, 0.0, 1.0, -0.5)

Impact

  • Cat 2 (coords): coord dtype silently changes between input and output (HIGH per sweep criteria).
  • Cat 1 (attrs): writer adds an inferred transform attr the input never had (MEDIUM).
  • Downstream code that branches on 'transform' in da.attrs to detect georef status will mis-classify the round-tripped file as georeferenced.

Cause

xrspatial/geotiff/_writers/eager.py:519, _writers/eager.py:839, _writers/gpu.py:350 all fall back to _coords_to_transform(data) when attrs.get('transform') is missing. _coords_to_transform (_coords.py:235) reads da.coords[xdim].values, computes pixel_width = float(x[1] - x[0]), and returns GeoTransform(...) -- with no check for the integer-dtype signal that the read side uses to mean "no georef".

Affects every writer path: to_geotiff eager, to_geotiff dask streaming (via the same fallback), to_geotiff(vrt=...) per-tile, write_geotiff_gpu. The GPU dask+cupy writer goes through the same helper.

Proposed fix

_coords_to_transform returns None when either spatial coord array has an integer dtype (which only happens from the read-side no-georef fallback). Float coord arrays continue to produce a GeoTransform. Callers already handle None correctly: write() skips the geokey emission when geo_transform is None.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions