Skip to content

geotiff: parallelise tile decode in _fetch_decode_cog_http_tiles (#1980)#1981

Open
brendancol wants to merge 1 commit into
xarray-contrib:mainfrom
brendancol:deep-sweep-performance-geotiff-2026-05-15-1778875947
Open

geotiff: parallelise tile decode in _fetch_decode_cog_http_tiles (#1980)#1981
brendancol wants to merge 1 commit into
xarray-contrib:mainfrom
brendancol:deep-sweep-performance-geotiff-2026-05-15-1778875947

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

  • The HTTP COG reader fetched tiles concurrently (issues Fetch COG tiles concurrently in HTTP path to mask RTT #1480, Fetch COG tiles concurrently in HTTP path to mask RTT #1487) but decoded them sequentially in a Python for loop. The sibling local-file path in the same module (_read_tiles) already parallelises decode whenever tile_pixels >= 64 * 1024 and n_tiles > 1; the HTTP path was structurally similar but never adopted the same gate.
  • Mirror the local-path threshold and ThreadPoolExecutor in _fetch_decode_cog_http_tiles. Codec extensions (deflate, zstd, LZW) release the GIL inside their C implementations, so the pool overlaps decode work across cores. The placement loop that writes pixels into result stays serial to avoid contended writes.
  • 5 new tests in test_cog_http_parallel_decode_2026_05_15.py: parallel + serial round-trip correctness, structural pool-instantiation check above the threshold, single-tile path skips the decode pool, _decode_strip_or_tile call count == n_tiles.

Pass 10 of /sweep-performance on the geotiff module. Closes #1980.

Test plan

  • New tests pass: python -m pytest xrspatial/geotiff/tests/test_cog_http_parallel_decode_2026_05_15.py.
  • Existing COG/HTTP suite passes: python -m pytest xrspatial/geotiff/tests/ -k 'cog or http' -> 262 passed.
  • Full geotiff suite passes apart from two pre-existing failures (test_predictor2_big_endian_gpu_1517, test_size_param_validation_gpu_vrt_1776::test_tile_size_positive_works) flagged in prior sweep notes.

The HTTP COG read path fetches tiles concurrently via
``read_ranges_coalesced`` (xarray-contrib#1480, xarray-contrib#1487) and then decoded them
sequentially in a Python for-loop. Local ``_read_tiles`` already
parallelises decode via ``ThreadPoolExecutor`` whenever
``tile_pixels >= 64 * 1024`` and ``n_tiles > 1`` (see _reader.py:2017);
the HTTP path was structurally similar but never adopted the same
gate, so wide windowed COG reads with many tiles left deflate/zstd
decoding single-threaded after a parallel fetch.

Mirror the local-file path's threshold and pool here. Codec
extensions release the GIL inside their C implementations, so a
``ThreadPoolExecutor`` overlaps decode work across cores. The
placement loop that writes pixels into ``result`` stays serial to
avoid contended writes.

Pass 10 of the geotiff performance sweep.

Tests in test_cog_http_parallel_decode_2026_05_15.py:
- parallel-branch end-to-end correctness (deflate, tile_size=256)
- serial-branch end-to-end correctness (tile_size=128, single tile)
- ThreadPoolExecutor instantiation above the threshold
- single-tile path skips the decode pool
- _decode_strip_or_tile invoked exactly once per placement
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf(geotiff): parallelise tile decode in _fetch_decode_cog_http_tiles

1 participant