Skip to content

Conversation

@d33bs
Copy link
Member

@d33bs d33bs commented Jan 23, 2026

Summary by CodeRabbit

Release Notes

  • New Features

    • Added chunked pixel data support for efficient storage and retrieval of large images
    • Automatic chunk generation and metadata serialization
    • Plane extraction and dense array reconstruction from chunked data
    • Configurable chunk dimensions and ordering
  • Chores

    • Updated pre-commit hook dependencies
  • Tests

    • Added comprehensive test suite for chunked pixel operations

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Jan 23, 2026

📝 Walkthrough

Walkthrough

This pull request adds comprehensive chunking support to the OME-Arrow library, enabling storage and retrieval of TCZYX image data in chunked format. New metadata structures, helper functions, and integration points across ingest, export, transform, and visualization modules allow users to build, store, and extract chunked pixel data alongside plane-based representation.

Changes

Cohort / File(s) Summary
Configuration & Hook Version
.pre-commit-config.yaml
Bumped ruff pre-commit hook from v0.14.13 to v0.14.14; no functional impact.
Core Schema Extension
src/ome_arrow/meta.py
Added two new public fields to OME_ARROW_STRUCT: chunk_grid (struct with axis order and per-axis chunk sizes) and chunks (list of chunk entries with coordinates, shapes, and pixel data).
Core Data Handling
src/ome_arrow/core.py
Updated dict input normalization in __init__ to filter and align keys against schema; clarified OME-Zarr chunking default behavior in export docstring (1,1,≤4,≤512,≤512).
Ingest & Chunking Pipeline
src/ome_arrow/ingest.py
Introduced _normalize_chunk_shape() and _build_chunks_from_planes() helpers; extended to_ome_arrow() and from_numpy() with new parameters (chunk_shape, chunk_order, build_chunks) to support automatic chunk generation and embedding; relaxed _struct_matches_ome_fields() to treat chunk-related fields as optional.
Export & Plane Extraction
src/ome_arrow/export.py
Added public plane_from_chunks() function for extracting single (t, c, z) planes from chunked data; extended to_numpy() to handle full TCZYX reconstruction from chunks; added default-chunking heuristic in to_ome_zarr(); normalized non-StructScalar inputs in to_ome_parquet() and to_ome_vortex().
Slicing & Transformation
src/ome_arrow/transform.py
Extended slice_ome_arrow() to detect and rebuild chunk_grid and chunks when slicing chunked data, using imported helpers _build_chunks_from_planes() and _normalize_chunk_shape().
Visualization
src/ome_arrow/view.py
Refactored view_matplotlib() to use new plane_from_chunks() helper instead of manual plane extraction logic, simplifying and unifying plane access.
Test Fixtures & Coverage
tests/conftest.py, tests/test_chunks.py
Added chunked metadata and pixel data to example_correct_data fixture; introduced new test module with tests for chunk reconstruction via to_numpy(), chunk building via to_ome_arrow(), and plane extraction via plane_from_chunks().

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Ingest as Ingest Pipeline
    participant Chunker as Chunk Builder
    participant Meta as OME-Arrow Struct
    participant Export as Export / Transform
    participant Extract as Plane Extractor

    User->>Ingest: Provide image planes + chunk params
    activate Ingest
    Ingest->>Chunker: Call _build_chunks_from_planes()
    activate Chunker
    Chunker->>Chunker: Split planes by chunk dimensions
    Chunker-->>Ingest: Return chunk list with pixel data
    deactivate Chunker
    Ingest->>Meta: Construct OME_ARROW_STRUCT<br/>with chunk_grid & chunks fields
    activate Meta
    Meta-->>Ingest: StructScalar with chunking metadata
    deactivate Meta
    deactivate Ingest
    Ingest-->>User: OME-Arrow record

    User->>Export: Export or slice chunked data
    activate Export
    Export->>Chunker: Rebuild chunks (if slicing)
    Chunker-->>Export: Updated chunk_grid & chunks
    Export-->>User: Exported format (Zarr/Parquet) or sliced record
    deactivate Export

    User->>Extract: Extract single plane
    activate Extract
    Extract->>Extract: plane_from_chunks(data, t, c, z)
    Extract->>Extract: Locate chunk(s) + validate indices
    Extract->>Extract: Assemble or retrieve pixel data
    Extract-->>User: 2D numpy array (Y, X)
    deactivate Extract
Loading

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~45 minutes

The changes introduce a substantial feature spanning multiple modules (ingest, export, transform, view, meta) with moderate-to-dense logic density. Review requires tracing chunking flow across new helper functions, validating schema changes, and assessing integration points with existing plane-based and export pipelines. Moderate heterogeneity (chunking logic distributed but following consistent patterns) and reasonable test coverage partially offset the scope.

Poem

🐰 Chunks in the Warren

Carrots once whole, now diced with care,
Grid-aligned pieces stored everywhere!
Planes and chunks in harmony dance,
From pixels to zones—a structured prance. 🥕✨

— A grateful rabbit, organized at last

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 73.08% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main feature added across multiple files: a comprehensive pixel chunking implementation to the OME-Arrow specification, including new schema fields, functions, and tests.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In @.pre-commit-config.yaml:
- Around line 41-42: The pre-commit hook was updated to rev "v0.14.14" for repo:
https://github.com/astral-sh/ruff-pre-commit which changes autofix behavior
(many fixes now considered unsafe and not applied by default); verify the new
behavior by running your ruff-check hook against the repo/CI and confirm the
reduced autofixes is acceptable, or if you want the previous autofix behavior
enable unsafe fixes by adding unsafe = true to your ruff configuration
(ruff.toml) or adjust the ruff-check hook settings accordingly so CI/local
workflows apply the unsafe autofixes.

In `@src/ome_arrow/export.py`:
- Around line 82-126: The loop over chunks currently only checks lower bounds
but must also validate that each chunk's extents do not exceed the image bounds
before reshaping/assignment: for each chunk (variables
t,c,z,y,x,shape_z,shape_y,shape_x) add checks that z + shape_z <= sz, and that y
+ shape_y <= sy and x + shape_x <= sx (you can get sy,sx from out.shape[3] and
out.shape[4]) and raise a clear ValueError like f"chunks[{i}] extent out of
range: z+shape_z={z+shape_z} > sz={sz}" (and analogous messages for y/x) before
computing expected_len/reshape and assigning into
out[t,c,z:z+shape_z,y:y+shape_y,x:x+shape_x]; this prevents silent
shape-mismatch errors during arr3d assignment.
- Around line 214-254: The code currently returns a zero-filled plane even when
no chunk matched and does not validate chunk bounds; update the chunks handling
in the export routine (the loop that iterates over chunks and uses variables
chunk_grid, chunks, strict, _cast_plane, plane) to (1) track a boolean like
any_chunk_matched set to True whenever a chunk is applied to the plane and only
return plane if any_chunk_matched is True, otherwise fall through to the
existing fallback or raise; and (2) validate chunk origins and shapes (z0, y0,
x0 >= 0 and z0+szc <= depth, y0+syc <= sy, x0+sxc <= sx) before
slicing/assigning and raise ValueError (or honor strict) for out-of-bounds or
negative origins so malformed chunks cannot silently write/clip into the plane.

In `@src/ome_arrow/ingest.py`:
- Around line 329-347: The slab allocation currently hardcodes dtype=np.uint16
which can truncate or misrepresent input planes; modify the code that builds
slabs (the slab variable in the loop using plane_map and the chunks.append
block) to infer the dtype from an available plane (e.g., inspect
plane_map[(t,c,z0 + zi)] or a representative plane before the zi loop) or accept
a dtype parameter, then allocate slab = np.zeros((sz, sy, sx),
dtype=inferred_dtype) and ensure pixels = slab.reshape(-1).tolist() preserves
that dtype/precision when serializing. Adjust any callers of this ingest routine
if you add a parameter so dtype flows through correctly.
🧹 Nitpick comments (3)
tests/test_chunks.py (1)

73-83: Consider adding edge case tests.

The happy-path test is solid. Consider adding tests for:

  • Out-of-bounds indices (should raise appropriate errors)
  • Partial/boundary chunks (chunks that don't align perfectly with image dimensions)
  • Missing chunks scenario
src/ome_arrow/transform.py (1)

11-11: Importing private helper functions across modules.

The _build_chunks_from_planes and _normalize_chunk_shape functions are prefixed with underscore indicating they're internal/private. Consider either:

  1. Renaming them without underscore prefix if they're part of the intended API
  2. Keeping them internal and re-exporting through __all__ if needed externally

This is a minor concern since both modules are within the same package.

src/ome_arrow/ingest.py (1)

756-775: Inconsistent chunking API across from_* functions.

from_numpy exposes chunk_shape, chunk_order, and build_chunks parameters, but from_tiff, from_ome_zarr, and from_stack_pattern_path don't expose them. Since build_chunks=True is the default in to_ome_arrow, chunks will be automatically built for all these functions.

Consider either:

  1. Adding chunking parameters to all from_* functions for consistency
  2. Documenting that chunking is always enabled for these functions

Comment on lines +82 to +126
chunks = data.get("chunks") or []
if chunks:
chunk_grid = data.get("chunk_grid") or {}
chunk_order = str(chunk_grid.get("chunk_order") or "ZYX").upper()
if chunk_order != "ZYX":
raise ValueError("Only chunk_order='ZYX' is supported for now.")

for i, ch in enumerate(chunks):
t = int(ch["t"])
c = int(ch["c"])
z = int(ch["z"])
y = int(ch["y"])
x = int(ch["x"])
shape_z = int(ch["shape_z"])
shape_y = int(ch["shape_y"])
shape_x = int(ch["shape_x"])

if not (0 <= t < st and 0 <= c < sc and 0 <= z < sz):
raise ValueError(
f"chunks[{i}] index out of range: (t,c,z)=({t},{c},{z})"
)
if y < 0 or x < 0 or shape_z <= 0 or shape_y <= 0 or shape_x <= 0:
raise ValueError(f"chunks[{i}] has invalid shape or origin.")

pix = ch["pixels"]
try:
n = len(pix)
except Exception as e:
raise ValueError(f"chunks[{i}].pixels is not a sequence") from e

expected_len = shape_z * shape_y * shape_x
if n != expected_len:
if strict:
raise ValueError(
f"chunks[{i}].pixels length {n} != expected {expected_len}"
)
if n > expected_len:
pix = pix[:expected_len]
else:
pix = list(pix) + [0] * (expected_len - n)

arr3d = np.asarray(pix).reshape(shape_z, shape_y, shape_x)
arr3d = _cast_plane(arr3d)
out[t, c, z : z + shape_z, y : y + shape_y, x : x + shape_x] = arr3d
return out
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate chunk extents against image bounds.

Only lower bounds are checked; chunks that extend past Z/Y/X can yield confusing shape-mismatch errors during assignment. Add upper-bound validation before reshaping/assignment.

🔧 Proposed fix
             if y < 0 or x < 0 or shape_z <= 0 or shape_y <= 0 or shape_x <= 0:
                 raise ValueError(f"chunks[{i}] has invalid shape or origin.")
+            if z + shape_z > sz or y + shape_y > sy or x + shape_x > sx:
+                raise ValueError(
+                    f"chunks[{i}] exceeds image bounds for (z,y,x)=({z},{y},{x})"
+                )
🧰 Tools
🪛 Ruff (0.14.13)

87-87: Avoid specifying long messages outside the exception class

(TRY003)


100-102: Avoid specifying long messages outside the exception class

(TRY003)


104-104: Avoid specifying long messages outside the exception class

(TRY003)


110-110: Avoid specifying long messages outside the exception class

(TRY003)


115-117: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
In `@src/ome_arrow/export.py` around lines 82 - 126, The loop over chunks
currently only checks lower bounds but must also validate that each chunk's
extents do not exceed the image bounds before reshaping/assignment: for each
chunk (variables t,c,z,y,x,shape_z,shape_y,shape_x) add checks that z + shape_z
<= sz, and that y + shape_y <= sy and x + shape_x <= sx (you can get sy,sx from
out.shape[3] and out.shape[4]) and raise a clear ValueError like f"chunks[{i}]
extent out of range: z+shape_z={z+shape_z} > sz={sz}" (and analogous messages
for y/x) before computing expected_len/reshape and assigning into
out[t,c,z:z+shape_z,y:y+shape_y,x:x+shape_x]; this prevents silent
shape-mismatch errors during arr3d assignment.

Comment on lines +214 to +254
chunks = data.get("chunks") or []
if chunks:
chunk_grid = data.get("chunk_grid") or {}
chunk_order = str(chunk_grid.get("chunk_order") or "ZYX").upper()
if chunk_order != "ZYX":
raise ValueError("Only chunk_order='ZYX' is supported for now.")

plane = np.zeros((sy, sx), dtype=dtype)
for i, ch in enumerate(chunks):
if int(ch["t"]) != t or int(ch["c"]) != c:
continue
z0 = int(ch["z"])
szc = int(ch["shape_z"])
if not (z0 <= z < z0 + szc):
continue
y0 = int(ch["y"])
x0 = int(ch["x"])
syc = int(ch["shape_y"])
sxc = int(ch["shape_x"])
pix = ch["pixels"]
try:
n = len(pix)
except Exception as e:
raise ValueError(f"chunks[{i}].pixels is not a sequence") from e
expected_len = szc * syc * sxc
if n != expected_len:
if strict:
raise ValueError(
f"chunks[{i}].pixels length {n} != expected {expected_len}"
)
if n > expected_len:
pix = pix[:expected_len]
else:
pix = list(pix) + [0] * (expected_len - n)

slab = np.asarray(pix).reshape(szc, syc, sxc)
slab = _cast_plane(slab)
zi = z - z0
plane[y0 : y0 + syc, x0 : x0 + sxc] = slab[zi]

return plane
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid returning empty planes when no chunk matches; validate chunk bounds.

If chunks exist but none intersect the requested plane, this currently returns a zero-filled plane and skips the planes fallback. Also, negative origins or over-extent chunks can slip through. Track whether any chunk matched and only return when it did; otherwise fall back (or error) and validate bounds.

🔧 Proposed fix
-        plane = np.zeros((sy, sx), dtype=dtype)
+        plane = np.zeros((sy, sx), dtype=dtype)
+        matched = False
         for i, ch in enumerate(chunks):
             if int(ch["t"]) != t or int(ch["c"]) != c:
                 continue
             z0 = int(ch["z"])
             szc = int(ch["shape_z"])
             if not (z0 <= z < z0 + szc):
                 continue
             y0 = int(ch["y"])
             x0 = int(ch["x"])
             syc = int(ch["shape_y"])
             sxc = int(ch["shape_x"])
+            if (
+                z0 < 0
+                or y0 < 0
+                or x0 < 0
+                or szc <= 0
+                or syc <= 0
+                or sxc <= 0
+                or z0 + szc > sz
+                or y0 + syc > sy
+                or x0 + sxc > sx
+            ):
+                raise ValueError(f"chunks[{i}] exceeds image bounds.")
             pix = ch["pixels"]
             try:
                 n = len(pix)
             except Exception as e:
                 raise ValueError(f"chunks[{i}].pixels is not a sequence") from e
@@
             slab = np.asarray(pix).reshape(szc, syc, sxc)
             slab = _cast_plane(slab)
             zi = z - z0
             plane[y0 : y0 + syc, x0 : x0 + sxc] = slab[zi]
-
-        return plane
+            matched = True
+
+        if matched:
+            return plane
+        # fall through to plane-based path if no chunk matched
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
chunks = data.get("chunks") or []
if chunks:
chunk_grid = data.get("chunk_grid") or {}
chunk_order = str(chunk_grid.get("chunk_order") or "ZYX").upper()
if chunk_order != "ZYX":
raise ValueError("Only chunk_order='ZYX' is supported for now.")
plane = np.zeros((sy, sx), dtype=dtype)
for i, ch in enumerate(chunks):
if int(ch["t"]) != t or int(ch["c"]) != c:
continue
z0 = int(ch["z"])
szc = int(ch["shape_z"])
if not (z0 <= z < z0 + szc):
continue
y0 = int(ch["y"])
x0 = int(ch["x"])
syc = int(ch["shape_y"])
sxc = int(ch["shape_x"])
pix = ch["pixels"]
try:
n = len(pix)
except Exception as e:
raise ValueError(f"chunks[{i}].pixels is not a sequence") from e
expected_len = szc * syc * sxc
if n != expected_len:
if strict:
raise ValueError(
f"chunks[{i}].pixels length {n} != expected {expected_len}"
)
if n > expected_len:
pix = pix[:expected_len]
else:
pix = list(pix) + [0] * (expected_len - n)
slab = np.asarray(pix).reshape(szc, syc, sxc)
slab = _cast_plane(slab)
zi = z - z0
plane[y0 : y0 + syc, x0 : x0 + sxc] = slab[zi]
return plane
chunks = data.get("chunks") or []
if chunks:
chunk_grid = data.get("chunk_grid") or {}
chunk_order = str(chunk_grid.get("chunk_order") or "ZYX").upper()
if chunk_order != "ZYX":
raise ValueError("Only chunk_order='ZYX' is supported for now.")
plane = np.zeros((sy, sx), dtype=dtype)
matched = False
for i, ch in enumerate(chunks):
if int(ch["t"]) != t or int(ch["c"]) != c:
continue
z0 = int(ch["z"])
szc = int(ch["shape_z"])
if not (z0 <= z < z0 + szc):
continue
y0 = int(ch["y"])
x0 = int(ch["x"])
syc = int(ch["shape_y"])
sxc = int(ch["shape_x"])
if (
z0 < 0
or y0 < 0
or x0 < 0
or szc <= 0
or syc <= 0
or sxc <= 0
or z0 + szc > sz
or y0 + syc > sy
or x0 + sxc > sx
):
raise ValueError(f"chunks[{i}] exceeds image bounds.")
pix = ch["pixels"]
try:
n = len(pix)
except Exception as e:
raise ValueError(f"chunks[{i}].pixels is not a sequence") from e
expected_len = szc * syc * sxc
if n != expected_len:
if strict:
raise ValueError(
f"chunks[{i}].pixels length {n} != expected {expected_len}"
)
if n > expected_len:
pix = pix[:expected_len]
else:
pix = list(pix) + [0] * (expected_len - n)
slab = np.asarray(pix).reshape(szc, syc, sxc)
slab = _cast_plane(slab)
zi = z - z0
plane[y0 : y0 + syc, x0 : x0 + sxc] = slab[zi]
matched = True
if matched:
return plane
# fall through to plane-based path if no chunk matched
🧰 Tools
🪛 Ruff (0.14.13)

219-219: Avoid specifying long messages outside the exception class

(TRY003)


237-237: Avoid specifying long messages outside the exception class

(TRY003)


241-243: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
In `@src/ome_arrow/export.py` around lines 214 - 254, The code currently returns a
zero-filled plane even when no chunk matched and does not validate chunk bounds;
update the chunks handling in the export routine (the loop that iterates over
chunks and uses variables chunk_grid, chunks, strict, _cast_plane, plane) to (1)
track a boolean like any_chunk_matched set to True whenever a chunk is applied
to the plane and only return plane if any_chunk_matched is True, otherwise fall
through to the existing fallback or raise; and (2) validate chunk origins and
shapes (z0, y0, x0 >= 0 and z0+szc <= depth, y0+syc <= sy, x0+sxc <= sx) before
slicing/assigning and raise ValueError (or honor strict) for out-of-bounds or
negative origins so malformed chunks cannot silently write/clip into the plane.

Comment on lines +329 to +347
slab = np.zeros((sz, sy, sx), dtype=np.uint16)
for zi in range(sz):
plane = plane_map.get((t, c, z0 + zi))
if plane is None:
continue
slab[zi] = plane[y0 : y0 + sy, x0 : x0 + sx]
chunks.append(
{
"t": t,
"c": c,
"z": z0,
"y": y0,
"x": x0,
"shape_z": sz,
"shape_y": sy,
"shape_x": sx,
"pixels": slab.reshape(-1).tolist(),
}
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Hardcoded np.uint16 dtype may cause data loss.

The slab array is always allocated as np.uint16 (line 329), but the input planes could have different dtypes. This could cause:

  • Data truncation for float arrays
  • Incorrect values for int32/int64 data
  • Unnecessary conversion overhead for uint8 data

Consider inferring the dtype from the plane data or accepting it as a parameter.

🔧 Suggested fix
 def _build_chunks_from_planes(
     *,
     planes: List[Dict[str, Any]],
     size_t: int,
     size_c: int,
     size_z: int,
     size_y: int,
     size_x: int,
     chunk_shape: Optional[Tuple[int, int, int]],
     chunk_order: str = "ZYX",
+    dtype: np.dtype = np.uint16,
 ) -> List[Dict[str, Any]]:

Then use dtype instead of np.uint16 at line 329:

-                        slab = np.zeros((sz, sy, sx), dtype=np.uint16)
+                        slab = np.zeros((sz, sy, sx), dtype=dtype)
🤖 Prompt for AI Agents
In `@src/ome_arrow/ingest.py` around lines 329 - 347, The slab allocation
currently hardcodes dtype=np.uint16 which can truncate or misrepresent input
planes; modify the code that builds slabs (the slab variable in the loop using
plane_map and the chunks.append block) to infer the dtype from an available
plane (e.g., inspect plane_map[(t,c,z0 + zi)] or a representative plane before
the zi loop) or accept a dtype parameter, then allocate slab = np.zeros((sz, sy,
sx), dtype=inferred_dtype) and ensure pixels = slab.reshape(-1).tolist()
preserves that dtype/precision when serializing. Adjust any callers of this
ingest routine if you add a parameter so dtype flows through correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant