Skip to content

Conversation

@daharoni
Copy link
Member

@daharoni daharoni commented Dec 14, 2025

Follows #35

Summary

Refactored pipeline_bin() into a modular, testable structure with separation of concerns. The refactor is strictly behavior-preserving with numerical equivalence verified by snapshot tests.

Changes

New Module Structure

src/indeca/pipeline/
├── binary_pursuit.py    # Main orchestration (new canonical API)
├── config.py            # Pydantic v2 configuration models
├── types.py             # Typed dataclasses for step outputs
├── preprocess.py        # Preprocessing functions
├── init.py              # AR initialization + deconvolver creation
├── iteration.py         # Per-iteration deconvolution step
├── metrics.py           # Metrics DataFrame construction
├── ar_update.py         # AR parameter update logic
├── convergence.py       # Convergence checking
└── pipeline.py          # Deprecated shim (backward compatibility)

Key Improvements

  1. Pydantic Configuration System

    • DeconvPipelineConfig with nested sub-configs
    • Type-validated, self-documenting parameters
    • Easy serialization for reproducibility
  2. Single-Responsibility Modules

    • Each module handles one logical step
    • Pure functions where possible
    • Explicit state passing (no hidden mutations)
  3. Testable Components

    • Functions can be unit tested independently
    • Golden snapshot test ensures behavior preservation
    • Clear contracts via typed interfaces
  4. Improved Readability

    • Top-down orchestration in binary_pursuit.py
    • Named functions replace inline blocks
    • Documented quirks and edge cases

API Changes

New (recommended):

from indeca.pipeline import pipeline_bin_new, DeconvPipelineConfig

config = DeconvPipelineConfig(
    up_factor=2,
    convergence=ConvergenceConfig(max_iters=20),
)
opt_C, opt_S, metrics = pipeline_bin_new(Y, config=config)

Legacy (deprecated, still supported):

from indeca.pipeline import pipeline_bin

opt_C, opt_S, metrics = pipeline_bin(Y, up_factor=2, max_iters=20, ...)
# Emits DeprecationWarning with migration guidance

Verification

Snapshot Test

Created a tests/test_pipeline_bin_golden.py with deterministic input:

  • Fixed random seed (42)
  • 2 cells × 200 timepoints
  • Known tau values

The test verified:

  • Identical opt_C and opt_S arrays (within rtol=1e-5)
  • Identical spike counts (exact)
  • Identical metric DataFrame values (within rtol=1e-4)

Bugs Fixed During Refactor

  1. Stale theta after AR updates - theta was never updated after iteration 0, causing incorrect metric reporting
  2. Exception-driven control flow - Replaced try/except NameError with explicit iteration checks
  3. Implicit "last cell wins" behavior - Documented and made explicit in per-cell AR mode

@daharoni daharoni mentioned this pull request Dec 14, 2025
@daharoni daharoni marked this pull request as ready for review December 14, 2025 02:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants