Skip to content

Add tiling QC metric for tile-boundary segmentation artifacts#1157

Open
timtreis wants to merge 27 commits into
mainfrom
feature/tiling-qc-v2
Open

Add tiling QC metric for tile-boundary segmentation artifacts#1157
timtreis wants to merge 27 commits into
mainfrom
feature/tiling-qc-v2

Conversation

@timtreis
Copy link
Copy Markdown
Member

@timtreis timtreis commented Apr 10, 2026

Summary

  • Adds sq.experimental.tl.calculate_tiling_qc — per-cell scoring that detects cells artificially cut by tile boundaries during segmentation, using collinearity-based straight-edge detection on contours
  • Adds sq.experimental.pl.tiling_qc — diagnostic plot via spatialdata-plot where the tile grid emerges from high-scoring cells without requiring tile-border metadata
  • Includes cell-aware tiling infrastructure (_tiling.py) for scalable labels-only tile extraction — will be shared with [EXPERIMENTAL]: Integrate cp-measure #982 when merged
  • Scores stored in .obs of a QC AnnData table ({labels_key}_qc) with proper spatialdata_attrs linking and algorithm params in .uns["tiling_qc"]

Metrics

Metric Description
max_straight_edge_ratio Longest collinear boundary segment / equivalent diameter
cardinal_alignment_score Axis-alignment of that segment (1 = cardinal, 0 = diagonal)

timtreis and others added 4 commits April 11, 2026 00:11
Cells segmented in tiles get cut at tile borders, producing fragments
with artificially straight edges. This adds:

- `sq.experimental.tl.calculate_tiling_qc`: per-cell scoring via
  collinearity-based straight-edge detection (max_straight_edge_ratio,
  cardinal_alignment_score, cut_score). Scores stored in .obs of a
  QC AnnData table linked to the labels element via spatialdata_attrs.
  Algorithm parameters recorded in .uns["tiling_qc"].
- `sq.experimental.pl.tiling_qc`: diagnostic plot via spatialdata-plot
  (renders labels coloured by score; tile grid emerges from the data).
- Cell-aware tiling infrastructure (_tiling.py) for scalable
  labels-only tile extraction without materialising full arrays.
- Test fixture with 400x400 dask-backed ellipsoid cells cut by a
  3x3 tile grid, with ground-truth cut/intact classification.
- 35 tests (unit, integration, visual regression).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Bump fixture from 40 cells on 400x400 to 120 cells on 600x600
  for more visible tile-grid pattern in diagnostic plots
- Pin spatialdata-plot>=0.3.3 for correct continuous color rendering
- Regenerate visual reference images
- Use _IMAGE_SIZE constant in centroid bounds test

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 10, 2026

Codecov Report

❌ Patch coverage is 56.44028% with 186 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.56%. Comparing base (9aa7c09) to head (a8bea24).

Files with missing lines Patch % Lines
src/squidpy/experimental/tl/_tiling_qc.py 63.88% 73 Missing and 18 partials ⚠️
src/squidpy/experimental/im/_tiling.py 42.76% 82 Missing and 5 partials ⚠️
src/squidpy/experimental/pl/_tiling_qc.py 62.50% 3 Missing and 3 partials ⚠️
src/squidpy/_utils.py 71.42% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1157      +/-   ##
==========================================
- Coverage   73.56%   72.56%   -1.00%     
==========================================
  Files          44       47       +3     
  Lines        6929     7355     +426     
  Branches     1174     1245      +71     
==========================================
+ Hits         5097     5337     +240     
- Misses       1347     1507     +160     
- Partials      485      511      +26     
Files with missing lines Coverage Δ
src/squidpy/_utils.py 57.51% <71.42%> (+0.29%) ⬆️
src/squidpy/experimental/pl/_tiling_qc.py 62.50% <62.50%> (ø)
src/squidpy/experimental/im/_tiling.py 42.76% <42.76%> (ø)
src/squidpy/experimental/tl/_tiling_qc.py 63.88% <63.88%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

timtreis and others added 20 commits April 11, 2026 00:43
- JIT-compile the two-pointer collinearity scan with @njit for ~10-50x
  speedup on the per-cell hot path
- Cap contour points at 500 via arc-length resampling to bound O(n²)
- Handle contour closure: scan 3 rotations so straight segments
  crossing the start/end junction are not split
- Vectorise _resample_contour with np.searchsorted (no Python loops)
- Replace _zero_non_owned loop with single np.isin pass
- Add tqdm progress bar that tracks cells (not tiles), updates on
  completion for correct parallel reporting
- Extract _SCORE_COLUMNS / _NAN_SCORES constants to deduplicate
- Precompute segment lengths once across rotations

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Drop 29 tests that over-tested private internals. Keep 4 behavioural
tests (output structure, metric discrimination, tiling invariant, error
handling) and 3 visual regression tests (one per score column).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace joblib parallelisation with dask.delayed + dask.compute for
  native integration with dask-backed zarr data. Tiles are scheduled
  as delayed tasks; the dask scheduler handles chunk caching and
  worker management.
- Add n_jobs parameter (default -1 = all CPUs) using a threaded
  scheduler, and an optional dask.distributed.Client parameter for
  cluster execution. Warn via logger when both are specified.
- Add affinity-aware cpu_count() to squidpy/_utils.py that respects
  cgroup limits (SLURM, Docker, taskset) via os.sched_getaffinity,
  replacing multiprocessing.cpu_count throughout the codebase.
- Fix NameError in pl.tiling_qc (**kwargs referenced after removal),
  keep spatialdata_plot import lazy, remove unused typing imports.
- Replace assert with raise ValueError in verify_coverage.
- Add nogil=True to numba collinearity scan for thread parallelism.
- Use public API (sq.experimental.tl/pl) in tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Compresses low scores toward zero so tile-boundary artifacts are
more visually prominent without changing the stored metric values.
Users can pass norm=Normalize() for a linear scale.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reference images from CI runner (ubuntu, py3.12).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reference images from CI runner (ubuntu, py3.12).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ier, nhood_outlier_fraction)

Two-stage spatial-context pipeline after per-tile scoring:
- smoothed_cut_score: cut_score x mean(k=10 neighbor cut_scores)
- is_outlier: MAD-based threshold on smoothed scores (nmads param, default 3)
- nhood_outlier_fraction: fraction of k-neighbors that are outliers

Also: update plot defaults to nhood_outlier_fraction/RdYlGn_r, add clean-dataset
and few-cells edge case tests, generate visual references for new columns.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…cells)

Replace rejection sampling with grid-based placement: deterministic,
no collision checking needed, 5x more cells at the same memory footprint.
Regenerate all visual references for the denser fixture.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…y plot titles

- is_outlier now requires both per-cell cut_score AND spatial smoothed
  score to exceed their respective MAD thresholds (AND when both enabled)
- Separate parameters: outlier_use_cut, outlier_use_smoothed, nmads_cut
  (default 1.5), nmads_smoothed (default 3)
- Validation: error on both gates disabled or non-positive nmads
- Plot titles mapped to human-friendly names per score column
- Regenerate visual references

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@timtreis timtreis marked this pull request as ready for review April 16, 2026 12:28
@timtreis timtreis requested a review from selmanozleyen April 16, 2026 12:29
Comment thread src/squidpy/experimental/im/_tiling.py Outdated
Comment thread src/squidpy/experimental/tl/_tiling_qc.py Outdated
Comment thread src/squidpy/experimental/tl/_tiling_qc.py Outdated
Comment thread src/squidpy/experimental/tl/_tiling_qc.py Outdated
@timtreis timtreis requested a review from LLehner May 8, 2026 14:16
…am, MAD constant

- _tiling.py: iterate `sorted(tile_to_cells.items())` instead of scanning the
  full grid; skips empty tiles directly and is sparse-friendly on large images.
- _tiling_qc.py: hoist parameter validation above the dask compute so bad
  inputs fail before tile scoring runs.
- _tiling_qc.py: promote `k = 10` to public parameter `n_neighbors`; document
  the "8 grid neighbours + biological wiggle room" rationale. Renames
  `.uns["tiling_qc"]["nhood_k"]` to `["n_neighbors"]`.
- _tiling_qc.py: extract 1.4826 as module-level `_MAD_TO_SD` constant with a
  comment explaining the MAD-to-SD consistency factor.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@timtreis timtreis requested a review from selmanozleyen May 8, 2026 14:46
timtreis added a commit that referenced this pull request May 8, 2026
…ater)

Locally rendered placeholder for TestStitchVisual::test_plot_seam_before_after.
The repo convention is platform-correct baselines downloaded from CI
visual_test_results artifacts; this branch can't get one until either
#1157 merges to main or test.yaml grows a workflow_dispatch trigger.
Once CI runs against this branch, overwrite this PNG with the artifact
version.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown
Member

@LLehner LLehner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a general question: how is it currently decided whether a function should go into experimental?

@timtreis
Copy link
Copy Markdown
Member Author

Just a general question: how is it currently decided whether a function should go into experimental?

currently pretty much "anything that actually uses sdata"

Comment thread src/squidpy/_utils.py
NDArrayA = NDArray[Any]


def cpu_count() -> int:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this change unrelated to this PR? Do we really need it? Like have you seen os.cpu_count cause errors so far?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update: I now remember _get_n_cores. Why don't we modify that one instead?

sdata: sd.SpatialData,
labels_key: str,
qc_key: str | None = None,
score_col: str = "nhood_outlier_fraction",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
score_col: str = "nhood_outlier_fraction",
score_col: Literal["nhood_outlier_fraction", "smoothed_cut_score", "cut_score", "max_straight_edge_ratio", "cardinal_alignment_score"] = "nhood_outlier_fraction",

nmads_smoothed: float = 3,
n_neighbors: int = 10,
n_jobs: int = -1,
client: Client | None = None,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think getting client as an arg is the right approach here. In dask we usually would expect setting the client with a context manager I think. Something like this to get the client object if it exists:

# removing client from the function signature and not importing dask distributed at top level.
client = None
try:
    import dask.distributed as dd
    client = dd.get_client() # value error if there is no client active
except (ImportError, ValueError):
    client = None

but I am not sure of this suggestion. so @flying-sheep what do you think?

``n_jobs`` is ignored, and progress is reported via the dask
dashboard. Workers must have access to the underlying data
store (e.g. shared filesystem or cloud storage for zarr).
adata_key_added
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a strong suggestion but an opinion. I think key_added would be nicer. Or table_key_added. Because now I am not sure if this adds to the adata or adds the adata in this name

Comment on lines +99 to +102
def _scale_idx(k: str) -> int:
num = "".join(c for c in k if c.isdigit())
return int(num) if num else 0

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit weird to me.
We should either:

  • Match only the expected format, e.g. ^scale(\d+)$, and fail clearly for unexpected keys.
  • or choose the coarsest level by actual spatial shape, like the nearby _choose_label_scale_for_image() logic does, since “coarsest” is a property of the array dimensions, not the key string.

Copy link
Copy Markdown
Member

@selmanozleyen selmanozleyen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides the things I mentioned it looks good to me. Really depends on the feedback this will get, otherwise I feel like it would be early optimization to dig deeper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants