Add tiling QC metric for tile-boundary segmentation artifacts#1157
Add tiling QC metric for tile-boundary segmentation artifacts#1157timtreis wants to merge 27 commits into
Conversation
Cells segmented in tiles get cut at tile borders, producing fragments with artificially straight edges. This adds: - `sq.experimental.tl.calculate_tiling_qc`: per-cell scoring via collinearity-based straight-edge detection (max_straight_edge_ratio, cardinal_alignment_score, cut_score). Scores stored in .obs of a QC AnnData table linked to the labels element via spatialdata_attrs. Algorithm parameters recorded in .uns["tiling_qc"]. - `sq.experimental.pl.tiling_qc`: diagnostic plot via spatialdata-plot (renders labels coloured by score; tile grid emerges from the data). - Cell-aware tiling infrastructure (_tiling.py) for scalable labels-only tile extraction without materialising full arrays. - Test fixture with 400x400 dask-backed ellipsoid cells cut by a 3x3 tile grid, with ground-truth cut/intact classification. - 35 tests (unit, integration, visual regression). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
for more information, see https://pre-commit.ci
- Bump fixture from 40 cells on 400x400 to 120 cells on 600x600 for more visible tile-grid pattern in diagnostic plots - Pin spatialdata-plot>=0.3.3 for correct continuous color rendering - Regenerate visual reference images - Use _IMAGE_SIZE constant in centroid bounds test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
for more information, see https://pre-commit.ci
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1157 +/- ##
==========================================
- Coverage 73.56% 72.56% -1.00%
==========================================
Files 44 47 +3
Lines 6929 7355 +426
Branches 1174 1245 +71
==========================================
+ Hits 5097 5337 +240
- Misses 1347 1507 +160
- Partials 485 511 +26
🚀 New features to boost your workflow:
|
- JIT-compile the two-pointer collinearity scan with @njit for ~10-50x speedup on the per-cell hot path - Cap contour points at 500 via arc-length resampling to bound O(n²) - Handle contour closure: scan 3 rotations so straight segments crossing the start/end junction are not split - Vectorise _resample_contour with np.searchsorted (no Python loops) - Replace _zero_non_owned loop with single np.isin pass - Add tqdm progress bar that tracks cells (not tiles), updates on completion for correct parallel reporting - Extract _SCORE_COLUMNS / _NAN_SCORES constants to deduplicate - Precompute segment lengths once across rotations Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
for more information, see https://pre-commit.ci
Drop 29 tests that over-tested private internals. Keep 4 behavioural tests (output structure, metric discrimination, tiling invariant, error handling) and 3 visual regression tests (one per score column). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
for more information, see https://pre-commit.ci
- Replace joblib parallelisation with dask.delayed + dask.compute for native integration with dask-backed zarr data. Tiles are scheduled as delayed tasks; the dask scheduler handles chunk caching and worker management. - Add n_jobs parameter (default -1 = all CPUs) using a threaded scheduler, and an optional dask.distributed.Client parameter for cluster execution. Warn via logger when both are specified. - Add affinity-aware cpu_count() to squidpy/_utils.py that respects cgroup limits (SLURM, Docker, taskset) via os.sched_getaffinity, replacing multiprocessing.cpu_count throughout the codebase. - Fix NameError in pl.tiling_qc (**kwargs referenced after removal), keep spatialdata_plot import lazy, remove unused typing imports. - Replace assert with raise ValueError in verify_coverage. - Add nogil=True to numba collinearity scan for thread parallelism. - Use public API (sq.experimental.tl/pl) in tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
for more information, see https://pre-commit.ci
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Compresses low scores toward zero so tile-boundary artifacts are more visually prominent without changing the stored metric values. Users can pass norm=Normalize() for a linear scale. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
for more information, see https://pre-commit.ci
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reference images from CI runner (ubuntu, py3.12). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reference images from CI runner (ubuntu, py3.12). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ier, nhood_outlier_fraction) Two-stage spatial-context pipeline after per-tile scoring: - smoothed_cut_score: cut_score x mean(k=10 neighbor cut_scores) - is_outlier: MAD-based threshold on smoothed scores (nmads param, default 3) - nhood_outlier_fraction: fraction of k-neighbors that are outliers Also: update plot defaults to nhood_outlier_fraction/RdYlGn_r, add clean-dataset and few-cells edge case tests, generate visual references for new columns. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
for more information, see https://pre-commit.ci
…cells) Replace rejection sampling with grid-based placement: deterministic, no collision checking needed, 5x more cells at the same memory footprint. Regenerate all visual references for the denser fixture. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…y plot titles - is_outlier now requires both per-cell cut_score AND spatial smoothed score to exceed their respective MAD thresholds (AND when both enabled) - Separate parameters: outlier_use_cut, outlier_use_smoothed, nmads_cut (default 1.5), nmads_smoothed (default 3) - Validation: error on both gates disabled or non-positive nmads - Plot titles mapped to human-friendly names per score column - Regenerate visual references Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
for more information, see https://pre-commit.ci
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…am, MAD constant - _tiling.py: iterate `sorted(tile_to_cells.items())` instead of scanning the full grid; skips empty tiles directly and is sparse-friendly on large images. - _tiling_qc.py: hoist parameter validation above the dask compute so bad inputs fail before tile scoring runs. - _tiling_qc.py: promote `k = 10` to public parameter `n_neighbors`; document the "8 grid neighbours + biological wiggle room" rationale. Renames `.uns["tiling_qc"]["nhood_k"]` to `["n_neighbors"]`. - _tiling_qc.py: extract 1.4826 as module-level `_MAD_TO_SD` constant with a comment explaining the MAD-to-SD consistency factor. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ater) Locally rendered placeholder for TestStitchVisual::test_plot_seam_before_after. The repo convention is platform-correct baselines downloaded from CI visual_test_results artifacts; this branch can't get one until either #1157 merges to main or test.yaml grows a workflow_dispatch trigger. Once CI runs against this branch, overwrite this PNG with the artifact version. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
LLehner
left a comment
There was a problem hiding this comment.
Just a general question: how is it currently decided whether a function should go into experimental?
currently pretty much "anything that actually uses sdata" |
| NDArrayA = NDArray[Any] | ||
|
|
||
|
|
||
| def cpu_count() -> int: |
There was a problem hiding this comment.
isn't this change unrelated to this PR? Do we really need it? Like have you seen os.cpu_count cause errors so far?
There was a problem hiding this comment.
update: I now remember _get_n_cores. Why don't we modify that one instead?
| sdata: sd.SpatialData, | ||
| labels_key: str, | ||
| qc_key: str | None = None, | ||
| score_col: str = "nhood_outlier_fraction", |
There was a problem hiding this comment.
| score_col: str = "nhood_outlier_fraction", | |
| score_col: Literal["nhood_outlier_fraction", "smoothed_cut_score", "cut_score", "max_straight_edge_ratio", "cardinal_alignment_score"] = "nhood_outlier_fraction", |
| nmads_smoothed: float = 3, | ||
| n_neighbors: int = 10, | ||
| n_jobs: int = -1, | ||
| client: Client | None = None, |
There was a problem hiding this comment.
I don't think getting client as an arg is the right approach here. In dask we usually would expect setting the client with a context manager I think. Something like this to get the client object if it exists:
# removing client from the function signature and not importing dask distributed at top level.
client = None
try:
import dask.distributed as dd
client = dd.get_client() # value error if there is no client active
except (ImportError, ValueError):
client = None
but I am not sure of this suggestion. so @flying-sheep what do you think?
| ``n_jobs`` is ignored, and progress is reported via the dask | ||
| dashboard. Workers must have access to the underlying data | ||
| store (e.g. shared filesystem or cloud storage for zarr). | ||
| adata_key_added |
There was a problem hiding this comment.
Not a strong suggestion but an opinion. I think key_added would be nicer. Or table_key_added. Because now I am not sure if this adds to the adata or adds the adata in this name
| def _scale_idx(k: str) -> int: | ||
| num = "".join(c for c in k if c.isdigit()) | ||
| return int(num) if num else 0 | ||
|
|
There was a problem hiding this comment.
This is a bit weird to me.
We should either:
- Match only the expected format, e.g. ^scale(\d+)$, and fail clearly for unexpected keys.
- or choose the coarsest level by actual spatial shape, like the nearby _choose_label_scale_for_image() logic does, since “coarsest” is a property of the array dimensions, not the key string.
selmanozleyen
left a comment
There was a problem hiding this comment.
Besides the things I mentioned it looks good to me. Really depends on the feedback this will get, otherwise I feel like it would be early optimization to dig deeper.
Summary
sq.experimental.tl.calculate_tiling_qc— per-cell scoring that detects cells artificially cut by tile boundaries during segmentation, using collinearity-based straight-edge detection on contourssq.experimental.pl.tiling_qc— diagnostic plot via spatialdata-plot where the tile grid emerges from high-scoring cells without requiring tile-border metadata_tiling.py) for scalable labels-only tile extraction — will be shared with [EXPERIMENTAL]: Integrate cp-measure #982 when merged.obsof a QC AnnData table ({labels_key}_qc) with properspatialdata_attrslinking and algorithm params in.uns["tiling_qc"]Metrics
max_straight_edge_ratiocardinal_alignment_score