Skip to content

Stain normalization: foundations (PR 1 of 7)#1178

Open
timtreis wants to merge 1 commit into
mainfrom
stain/pr1-foundations
Open

Stain normalization: foundations (PR 1 of 7)#1178
timtreis wants to merge 1 commit into
mainfrom
stain/pr1-foundations

Conversation

@timtreis
Copy link
Copy Markdown
Member

@timtreis timtreis commented May 14, 2026

Summary

Substrate for a new histopathology stain-normalization module. Targets Squidpy 2.0. No public API surface in this PR — every new symbol lives under the private squidpy.experimental.im._stain namespace and there are no re-exports.

What ships:

  • _constants.py — Ruifrok H/E/DAB canonical stain vectors (derived from skimage.color.rgb_from_hed), Ruderman RGB <-> LMS <-> Lab matrices, HistomicsTK-compatible SDA scale. Nothing else; no shared "default background", no luminosity threshold, no schema version.
  • _conversion.pyrgb_to_sda / sda_to_rgb and rgb_to_lab_ruderman / lab_ruderman_to_rgb on xr.DataArray. Each public function compiles to a single fused apply_ufunc per chunk; numpy- and dask-backed inputs both stay on the same code path; dask-backed inputs stay lazy end-to-end. background_intensity is a required np.ndarray of shape (3,) — there is no library-wide default because no scanner produces a pure-white background.
  • _reference.py — minimal frozen StainReference dataclass holding either a (3, 3) stain matrix (Macenko/Vahadane, ships in PR 3) or mu / sigma Ruderman Lab channel statistics (Reinhard, ships in PR 2). background_intensity is required for decomposition methods and forbidden for Reinhard (Reinhard's color transfer is in Ruderman Lab and doesn't model absorbance). Cross-field validation only; no persistence, no cohort fields, no provenance metadata.

Deliberately deferred to the PRs that actually consume them:

  • JSON save / load, schema versioning, fit_metadata, cohort_members, per_image_stats, max_concentrations — none have a producer in PR 1. Adding them now would version artifacts we have not shipped.
  • _validation.py (canonical reorder, third-column completion, StainFittingError, stain-matrix angle/rank checks) — purely supports the Macenko/Vahadane fit path, lands with it in PR 3.
  • A DEFAULT_BACKGROUND_INTENSITY constant or default kwarg — pure-white [255, 255, 255] is wrong for every real scanner. PR 3 ships estimate_background_intensity; until then callers must pass an estimate or, knowingly, an explicit np.array([255., 255., 255.]).

Design decisions:

  • Lives under experimental/im/ alongside the existing SpatialData-native modules (_detect_tissue, _qc_image, _make_tiles, _feature). The eventual experimental/im -> im promotion is a separate v2.0 effort.
  • Conversion primitives are xr.DataArray-native from the start, not numpy-only, so PR 2's lazy apply path reuses them unchanged.
  • SDA uses (rgb + 1) / (I_0 + 1) so that pixels at the supplied white point map exactly to zero. Documented in rgb_to_sda docstring.
  • No bespoke I/O. Image data is reached through sdata.images[key] (starting in PR 2 via the existing experimental/im/_utils.py::get_element_data helper); this module never adds a loader.

@timtreis timtreis force-pushed the stain/pr1-foundations branch from 88693af to 8ad5476 Compare May 14, 2026 12:49
@codecov
Copy link
Copy Markdown

codecov Bot commented May 14, 2026

Codecov Report

❌ Patch coverage is 96.52174% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.18%. Comparing base (093217d) to head (2d7143c).

Files with missing lines Patch % Lines
src/squidpy/experimental/im/_stain/_reference.py 91.66% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1178      +/-   ##
==========================================
+ Coverage   73.82%   74.18%   +0.36%     
==========================================
  Files          45       48       +3     
  Lines        7013     7128     +115     
  Branches     1188     1202      +14     
==========================================
+ Hits         5177     5288     +111     
- Misses       1349     1351       +2     
- Partials      487      489       +2     
Files with missing lines Coverage Δ
src/squidpy/experimental/im/_stain/_constants.py 100.00% <100.00%> (ø)
src/squidpy/experimental/im/_stain/_conversion.py 100.00% <100.00%> (ø)
src/squidpy/experimental/im/_stain/_reference.py 91.66% <91.66%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Substrate for a new histopathology stain-normalization module:
canonical Ruifrok H/E/DAB vectors, RGB <-> SDA and RGB <-> Ruderman
Lab conversions on xr.DataArray, and a minimal StainReference
dataclass holding either a 3x3 stain matrix or Reinhard channel
statistics.

Lives under squidpy.experimental.im._stain with no public re-export
and no SpatialData wiring; fit, apply, persistence, cohort, and
augmentation land in follow-up PRs alongside their first consumers.
Color conversions stay lazy on dask-backed inputs and compile to a
single fused apply_ufunc per chunk so the same primitives serve test
patches and whole-slide H&E without rewrite.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@timtreis timtreis force-pushed the stain/pr1-foundations branch from 8ad5476 to 2d7143c Compare May 14, 2026 13:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant