feat: unified chunk grid with rectilinear chunk/shard support#3802
feat: unified chunk grid with rectilinear chunk/shard support#3802maxrjones wants to merge 123 commits intozarr-developers:mainfrom
Conversation
This reverts commit 9c0f582.
…metadata (#7) * chore: simplify sharding codec validation against varying chunk grid metadata * test: restore test strength
…chunk grid (#8) * refactor: allow regular-style chunk grid declaration for rectilinear chunk grid The rectilinear chunk grid spec allows bare integers per dimension (meaning "regular step size"), distinct from explicit single-element edge lists. This commit widens `RectilinearChunkGrid.chunk_shapes` to `tuple[int | tuple[int, ...], ...]` so bare ints are preserved for faithful JSON round-tripping. Additionally: - unifies `_validate_chunk_shapes` to handle both regular and rectilinear validation; `_parse_chunk_shape` now delegates to it - adds `from_sizes` method to `ChunkGrid`, accepting `int | Sequence[int]` per dimension - removes `from_regular` and `from_rectilinear` methods from `ChunkGrid` - removes `parse_chunk_grid` from `chunk_grids.py` (JSON → ChunkGrid shortcut that bypassed the metadata layer) - removes `serialize_chunk_grid`, `_infer_chunk_grid_name`, and serialization helpers from `chunk_grids.py` (ChunkGrid never needs to be serialized; metadata DTOs handle it) - renames `parse_chunk_grid` in `v3.py` to `parse_chunk_grid_metadata` to disambiguate - moves the rectilinear feature flag to `RectilinearChunkGrid.__post_init__` - simplifies sharding codec validation into a single divisibility check for both regular and rectilinear grids - updates `validate_rectilinear_edges` to skip bare-int dimensions - refactors chunk grid tests to functional style with parametrization - adds docstrings to all test functions * chore: remove .claude * refactor: rename chunk_grid parsing function --------- Co-authored-by: Max Jones <14077947+maxrjones@users.noreply.github.com>
Co-authored-by: Davis Bennett <davis.v.bennett@gmail.com>
| if spec is not None: | ||
| yield spec | ||
|
|
||
| def all_chunk_coords( |
There was a problem hiding this comment.
imo the name all_chunk_coords doesn't convey that this is an iterator, but iter_chunk_coords would. if we need all_chunk_coords for backwards compat, can we deprecate that method and make it call iter_chunk_coords instead?
| if spec is not None: | ||
| yield spec.slices | ||
|
|
||
| def get_nchunks(self) -> int: |
There was a problem hiding this comment.
nit: i imagine this name is here for backwards compatibility, but nchunks (like ndim) feels like a tighter name.
|
|
||
| BytesLike = bytes | bytearray | memoryview | ||
| ShapeLike = Iterable[int | np.integer[Any]] | int | np.integer[Any] | ||
| ChunksLike = ShapeLike | Sequence[Sequence[int]] | None |
There was a problem hiding this comment.
what would we lose if we used Iterable[Iterable[int]]? we'd have to call tuple before we could get a length, but we could also accept more inputs.
|
Thanks for the thorough reviews, @d-v-b! I'll make I'd prefer to address the naming/API comments on |
Summary
This PR contains an alternative implementation of the rectilinear chunk grid extension, building on the work in #3534 (RLE helpers, validation logic, and test cases were directly adopted). While the core feature of variable-sized chunks is the same, the internal architecture differs in ways that impact extensibility, performance, and release safety.
I appreciate the patience of those who contributed to #3534, and everyone who's been waiting on this feature. I know it's frustrating to see a new PR after #3534 was so close. That PR provided fundamental components, and I hope people will see the value here. I really believe it is worth the churn for the following reasons:
Key differences from #3534
DimensionGridprotocol (FixedDimension,VaryingDimension). Adding a new dimension type (e.g.TiledDimensionfor periodic patterns like days-per-month) requires implementing that protocol — no changes to indexing, codecs, or theChunkGridclass. A prototype was built to verify this.VaryingDimensionuses precomputed prefix sums for O(log n) lookups via binary search. See https://github.com/maxrjones/zarr-chunk-grid-tests for a performance comparison.zarr.config.set({'array.rectilinear_chunks': True})(orZARR_ARRAY__RECTILINEAR_CHUNKS=True), disabled by default. This gives downstream libraries time to adapt before the API is finalized, and us an opportunity to gracefully finalize the API.Design document:
docs/design/chunk-grid.mdcovers the full design, rationale, and a suggested PR sequence for splitting this into reviewable increments, if needed.Downstream POCs (all passing):
TODO:
docs/user-guide/*.mdchanges/