From d6897516fecffd1fe9fab0b63bf696c24ac3d4de Mon Sep 17 00:00:00 2001
From: Brendan Collins <brendancol@gmail.com>
Date: Thu, 16 Apr 2026 20:29:18 -0700
Subject: [PATCH] Claude cleanup: rename sweep commands, gitignore superpowers
 docs

Rename accuracy/performance sweep command and state files to the
sweep-* prefix convention. Add docs/superpowers/ to gitignore and
remove previously committed plans/specs so planning docs stay local.
---
 .../{accuracy-sweep.md => sweep-accuracy.md}  |   32 +-
 .claude/commands/sweep-performance.md         |   14 +-
 .claude/commands/sweep-security.md            |   13 +-
 ...p-state.json => sweep-accuracy-state.json} |    0
 ...tate.json => sweep-performance-state.json} |   22 +-
 .gitignore                                    |    1 +
 .../2026-03-23-lightweight-crs-parser.md      | 1466 --------------
 .../plans/2026-03-24-dask-graph-utilities.md  | 1057 ----------
 .../plans/2026-03-24-hypsometric-integral.md  |  659 -------
 .../plans/2026-03-30-geotiff-perf-controls.md |  813 --------
 .../plans/2026-03-31-sweep-performance.md     |  743 -------
 .../2026-04-01-multi-observer-viewshed.md     | 1110 -----------
 .../2026-04-01-spatial-autocorrelation.md     | 1740 -----------------
 .../2026-04-06-polygonize-simplification.md   |  916 ---------
 ...026-03-23-lightweight-crs-parser-design.md |  166 --
 .../2026-03-24-dask-graph-utilities-design.md |  314 ---
 .../2026-03-24-hypsometric-integral-design.md |  120 --
 ...2026-03-30-geotiff-perf-controls-design.md |  147 --
 .../2026-03-31-sweep-performance-design.md    |  368 ----
 ...26-04-01-multi-observer-viewshed-design.md |  192 --
 ...26-04-01-spatial-autocorrelation-design.md |  259 ---
 ...-04-06-polygonize-simplification-design.md |  138 --
 22 files changed, 43 insertions(+), 10247 deletions(-)
 rename .claude/commands/{accuracy-sweep.md => sweep-accuracy.md} (85%)
 rename .claude/{accuracy-sweep-state.json => sweep-accuracy-state.json} (100%)
 rename .claude/{performance-sweep-state.json => sweep-performance-state.json} (67%)
 delete mode 100644 docs/superpowers/plans/2026-03-23-lightweight-crs-parser.md
 delete mode 100644 docs/superpowers/plans/2026-03-24-dask-graph-utilities.md
 delete mode 100644 docs/superpowers/plans/2026-03-24-hypsometric-integral.md
 delete mode 100644 docs/superpowers/plans/2026-03-30-geotiff-perf-controls.md
 delete mode 100644 docs/superpowers/plans/2026-03-31-sweep-performance.md
 delete mode 100644 docs/superpowers/plans/2026-04-01-multi-observer-viewshed.md
 delete mode 100644 docs/superpowers/plans/2026-04-01-spatial-autocorrelation.md
 delete mode 100644 docs/superpowers/plans/2026-04-06-polygonize-simplification.md
 delete mode 100644 docs/superpowers/specs/2026-03-23-lightweight-crs-parser-design.md
 delete mode 100644 docs/superpowers/specs/2026-03-24-dask-graph-utilities-design.md
 delete mode 100644 docs/superpowers/specs/2026-03-24-hypsometric-integral-design.md
 delete mode 100644 docs/superpowers/specs/2026-03-30-geotiff-perf-controls-design.md
 delete mode 100644 docs/superpowers/specs/2026-03-31-sweep-performance-design.md
 delete mode 100644 docs/superpowers/specs/2026-04-01-multi-observer-viewshed-design.md
 delete mode 100644 docs/superpowers/specs/2026-04-01-spatial-autocorrelation-design.md
 delete mode 100644 docs/superpowers/specs/2026-04-06-polygonize-simplification-design.md

diff --git a/.claude/commands/accuracy-sweep.md b/.claude/commands/sweep-accuracy.md
similarity index 85%
rename from .claude/commands/accuracy-sweep.md
rename to .claude/commands/sweep-accuracy.md
index a28e9cc3..4e2fdc84 100644
--- a/.claude/commands/accuracy-sweep.md
+++ b/.claude/commands/sweep-accuracy.md
@@ -25,7 +25,7 @@ Store results in a temporary variable -- do NOT write intermediate files.
 
 ## Step 2 -- Load inspection state
 
-Read the state file at `.claude/accuracy-sweep-state.json`.
+Read the state file at `.claude/sweep-accuracy-state.json`.
 
 If it does not exist, treat every module as never-inspected.
 
@@ -114,7 +114,12 @@ like this (adapt the module list to actual results):
    - Missing or wrong Earth curvature corrections
    - Backend inconsistencies (numpy vs cupy vs dask results differ)
 2. Run /rockout to fix the issue end-to-end (issue, worktree, fix, tests, docs)
-3. After completing rockout for ONE module, output <promise>ITERATION DONE</promise>
+3. Update .claude/sweep-accuracy-state.json in the worktree by adding or
+   updating the entry for the module:
+   { \"module_name\": { \"last_inspected\": \"ISO-DATE\", \"issue\": ISSUE_NUMBER } }
+   Then git add and commit it to the worktree branch so the state update
+   lands in the PR.
+4. After completing rockout for ONE module, output <promise>ITERATION DONE</promise>
 
 If you find no accuracy issues in the current target module, skip it and move
 to the next one.
@@ -129,24 +134,17 @@ Set `--max-iterations` to the number of target modules + 2 (buffer for retries).
 
 ```
 To run this sweep:  copy the command above and paste it.
-To update state after a manual rockout:  edit .claude/accuracy-sweep-state.json
-To reset all tracking:  /accuracy-sweep --reset-state
+To update state after a manual rockout:  edit .claude/sweep-accuracy-state.json
+To reset all tracking:  /sweep-accuracy --reset-state
 ```
 
-## Step 6 -- Update state (ONLY when called from inside a ralph-loop)
+## Step 6 -- Update state
 
-This step is informational. The accuracy-sweep command itself does NOT update
-the state file. State is updated when `/rockout` completes -- the rockout
-workflow should append to `.claude/accuracy-sweep-state.json` after creating
-the issue.
-
-To enable this, print a note reminding the user that after each rockout
-iteration completes, they can manually record the inspection:
-
-```json
-// Add to .claude/accuracy-sweep-state.json after each rockout:
-{ "module_name": { "last_inspected": "ISO-DATE", "issue": ISSUE_NUMBER } }
-```
+The sweep-accuracy command itself does NOT update the state file. State is
+updated by the ralph-loop prompt generated in Step 5b, which instructs the
+agent to write and commit `.claude/sweep-accuracy-state.json` in the
+worktree branch as part of each rockout iteration so the state update is
+included in the PR.
 
 ---
 
diff --git a/.claude/commands/sweep-performance.md b/.claude/commands/sweep-performance.md
index 8f007e27..d0a0ea57 100644
--- a/.claude/commands/sweep-performance.md
+++ b/.claude/commands/sweep-performance.md
@@ -22,7 +22,7 @@ Parse $ARGUMENTS for these flags (multiple may combine):
 | `--only-focal` | Restrict to: focal, convolution, morphology, bilateral, edge_detection, glcm |
 | `--only-hydro` | Restrict to: flood, cost_distance, geodesic, surface_distance, viewshed, erosion, diffusion |
 | `--only-io` | Restrict to: geotiff, reproject, rasterize, polygonize |
-| `--reset-state` | Delete `.claude/performance-sweep-state.json` and treat all modules as never-inspected |
+| `--reset-state` | Delete `.claude/sweep-performance-state.json` and treat all modules as never-inspected |
 | `--skip-phase1` | Skip triage; reuse last state file; go straight to ralph-loop generation for unresolved HIGH items |
 | `--report-only` | Run Phase 1 triage but do not generate a ralph-loop command |
 | `--size small` | Phase 2 benchmarks use 128x128 arrays |
@@ -64,7 +64,7 @@ For every module in scope, collect:
 
 ### Load inspection state
 
-Read `.claude/performance-sweep-state.json`. If it does not exist, treat every
+Read `.claude/sweep-performance-state.json`. If it does not exist, treat every
 module as never-inspected. If `--reset-state` was set, delete the file first.
 
 State file schema:
@@ -344,7 +344,7 @@ rockout has full context.
 
 ## Step 5 -- Update state file
 
-Write `.claude/performance-sweep-state.json` with the triage results:
+Write `.claude/sweep-performance-state.json` with the triage results:
 
 ```json
 {
@@ -448,7 +448,9 @@ from the actual triage results):
    | dask+numpy | peak_rss_mb | 892    | 34     | 0.04x | IMPROVED   |
    Thresholds: IMPROVED < 0.8x, REGRESSION > 1.2x, else UNCHANGED.
 
-6. Update .claude/performance-sweep-state.json with the issue number.
+6. Update .claude/sweep-performance-state.json with the issue number, then
+   `git add` and commit it to the worktree branch so the state update is
+   included in the PR.
 
 7. Output <promise>ITERATION DONE</promise>
 
@@ -489,8 +491,8 @@ Other options:
   on a known-numpy array), do not flag it.
 - The 30TB simulation constructs the dask task graph only; it NEVER calls
   `.compute()`.
-- State file (`.claude/performance-sweep-state.json`) is gitignored by
-  convention — do not add it to git.
+- State file (`.claude/sweep-performance-state.json`) is tracked in git.
+  Subagents must `git add` and commit it so the state update lands in the PR.
 - If $ARGUMENTS is empty, use defaults: audit all modules, benchmark at
   512x512, generate ralph-loop for HIGH items.
 - For subpackage modules (geotiff, reproject), the subagent should read ALL
diff --git a/.claude/commands/sweep-security.md b/.claude/commands/sweep-security.md
index 91dd5cb4..63254b8b 100644
--- a/.claude/commands/sweep-security.md
+++ b/.claude/commands/sweep-security.md
@@ -39,7 +39,7 @@ Store results in memory -- do NOT write intermediate files.
 
 ## Step 2 -- Load inspection state
 
-Read `.claude/security-sweep-state.json`.
+Read `.claude/sweep-security-state.json`.
 
 If it does not exist, treat every module as never-inspected.
 
@@ -194,13 +194,16 @@ Also read xrspatial/utils.py to understand _validate_raster() behavior.
    For MEDIUM/LOW issues, document them but do not fix.
 
 5. After finishing (whether you found issues or not), update the inspection
-   state file .claude/security-sweep-state.json by reading its current
+   state file .claude/sweep-security-state.json by reading its current
    contents and adding/updating the entry for "{module}" with:
    - "last_inspected": today's ISO date
    - "issue": the issue number from rockout (or null if clean / MEDIUM-only)
    - "severity_max": highest severity found (or null if clean)
    - "categories_found": list of category numbers that had findings (e.g. [1, 2])
 
+   Then `git add .claude/sweep-security-state.json` and commit it to the
+   worktree branch so the state update is included in the PR.
+
 Important:
 - Only flag real, exploitable issues. False positives waste time.
 - Read the tests for this module to understand expected behavior.
@@ -229,7 +232,7 @@ State is updated by the subagents themselves (see agent prompt step 5).
 After completion, verify state with:
 
 ```
-cat .claude/security-sweep-state.json
+cat .claude/sweep-security-state.json
 ```
 
 To reset all tracking: `/sweep-security --reset-state`
@@ -241,8 +244,8 @@ To reset all tracking: `/sweep-security --reset-state`
 - Do NOT modify any source files directly. Subagents handle fixes via /rockout.
 - Keep the output concise -- the table and agent dispatch are the deliverables.
 - If $ARGUMENTS is empty, use defaults: top 3, no category filter, no exclusions.
-- State file (`.claude/security-sweep-state.json`) is gitignored by convention --
-  do not add it to git.
+- State file (`.claude/sweep-security-state.json`) is tracked in git.
+  Subagents must `git add` and commit it so the state update lands in the PR.
 - For subpackage modules (geotiff, reproject, hydro), the subagent should read
   ALL `.py` files in the subpackage directory, not just `__init__.py`.
 - Only flag patterns that are ACTUALLY present in the code. Do not report
diff --git a/.claude/accuracy-sweep-state.json b/.claude/sweep-accuracy-state.json
similarity index 100%
rename from .claude/accuracy-sweep-state.json
rename to .claude/sweep-accuracy-state.json
diff --git a/.claude/performance-sweep-state.json b/.claude/sweep-performance-state.json
similarity index 67%
rename from .claude/performance-sweep-state.json
rename to .claude/sweep-performance-state.json
index 377a4f70..c445dac3 100644
--- a/.claude/performance-sweep-state.json
+++ b/.claude/sweep-performance-state.json
@@ -1,29 +1,29 @@
 {
-  "last_triage": "2026-04-14T12:00:00Z",
+  "last_triage": "2026-04-16T12:00:00Z",
   "modules": {
     "resample": { "last_inspected": "2026-04-15T12:00:00Z", "oom_verdict": "SAFE", "bottleneck": "compute-bound", "high_count": 0, "issue": "false-positive", "notes": "Downgraded. GPU-CPU-GPU round-trip only in aggregate path for non-integer scale factors. Interpolation (nearest/bilinear/cubic) stays on GPU. No GPU kernel exists for irregular per-pixel binning." },
-    "polygon_clip": { "last_inspected": "2026-04-15T12:00:00Z", "oom_verdict": "SAFE", "bottleneck": "memory-bound", "high_count": 0, "issue": 1207, "notes": "Fixed: pass dask chunks to rasterize() so mask stays lazy for dask inputs." },
+    "polygon_clip": { "last_inspected": "2026-04-16T12:00:00Z", "oom_verdict": "SAFE", "bottleneck": "compute-bound", "high_count": 0, "issue": 1207, "notes": "Re-audit 2026-04-16: fix verified SAFE. Mask stays lazy via rasterize chunks kwarg; per-chunk peak bounded." },
     "kde": { "last_inspected": "2026-04-14T12:00:00Z", "oom_verdict": "SAFE", "bottleneck": "compute-bound", "high_count": 0, "issue": null, "notes": "Graph construction serialized per-tile. _filter_points_to_tile scans all points per tile. No HIGH findings." },
     "sieve": { "last_inspected": "2026-04-14T12:00:00Z", "oom_verdict": "WILL OOM", "bottleneck": "memory-bound", "high_count": 0, "issue": "false-positive", "notes": "False positive. Memory guards already in place on both dask paths. CCL is inherently global — documented limitation. CuPy CPU fallback is deliberate and documented." },
-    "bump": { "last_inspected": "2026-04-15T12:00:00Z", "oom_verdict": "SAFE", "bottleneck": "memory-bound", "high_count": 0, "issue": 1206, "notes": "Fixed: int32 coords, default-count cap at 10M, memory guard, per-chunk dask partitioning via dask.delayed. Graph serialization reduced ~250x." },
+    "bump": { "last_inspected": "2026-04-16T12:00:00Z", "oom_verdict": "SAFE", "bottleneck": "compute-bound", "high_count": 0, "issue": 1206, "notes": "Re-audit 2026-04-16: fix verified SAFE. No HIGH findings. MEDIUM: CuPy backend runs CPU kernel then transfers to GPU (documented limitation)." },
     "reproject": { "last_inspected": "2026-04-15T12:00:00Z", "oom_verdict": "RISKY", "bottleneck": "compute-bound", "high_count": 0, "issue": "false-positive", "notes": "Downgraded to MEDIUM. Fast path .compute()-in-loop is intentional (~22x faster); falls back to map_blocks when output exceeds GPU VRAM. 16x16 thread blocks already mitigate register pressure in cubic kernel." },
     "rasterize": { "last_inspected": "2026-04-15T12:00:00Z", "oom_verdict": "SAFE", "bottleneck": "graph-bound", "high_count": 0, "issue": "false-positive", "notes": "Downgraded. Tile-by-tile graph construction with per-tile geometry filtering is the correct pattern. Pre-filtering ensures each delayed task gets only its relevant subset. O(n_tiles) bbox check at graph-build time is vectorized and fast." },
     "pathfinding": { "last_inspected": "2026-04-15T12:00:00Z", "oom_verdict": "SAFE", "bottleneck": "compute-bound", "high_count": 0, "issue": "false-positive", "notes": "Downgraded. CuPy .get() is required -- A* has no GPU kernel. Per-pixel .compute() is only 2 calls for start/goal validation. seg.values in multi_stop_search collects already-computed results for stitching." },
-    "geotiff": { "last_inspected": "2026-03-31T18:00:00Z", "oom_verdict": "SAFE", "bottleneck": "IO-bound", "high_count": 0, "issue": null, "notes": "False positive. open_geotiff(chunks=N) returns lazy dask array. to_geotiff auto-routes dask inputs to write_streaming. Eager paths are by design for numpy/cupy." },
-    "zonal": { "last_inspected": "2026-03-31T18:00:00Z", "oom_verdict": "WILL OOM", "bottleneck": "memory-bound", "high_count": 4, "issue": 1110, "notes": "Memory guards improved, iterrows replaced with isin. da.unique().compute() confirmed safe (small result). regions() is inherently global - documented limitation." },
+    "geotiff": { "last_inspected": "2026-04-16T18:00:00Z", "oom_verdict": "SAFE", "bottleneck": "IO-bound", "high_count": 0, "issue": "fixed-in-tree", "notes": "Fixed-in-tree 2026-04-16: (1) read_geotiff_dask caps total chunk count at 1M and auto-scales chunks with a UserWarning when exceeded (linear graph scaling confirmed: 16384 tasks/1109ms vs 16 tasks/65ms baseline). (2) _nvcomp_batch_compress deflate path now batches GPU->CPU adler32 transfers via a single contiguous .get() and memoryview slicing, eliminating N per-tile sync points. 435 geotiff tests pass (2 pre-existing matplotlib deepcopy unrelated)." },
+    "zonal": { "last_inspected": "2026-04-16T18:00:00Z", "oom_verdict": "SAFE", "bottleneck": "compute-bound", "high_count": 0, "issue": "fixed-in-tree", "notes": "Fixed-in-tree 2026-04-16: rewrote hypsometric_integral dask path. Eliminated double-compute (_unique_finite_zones removed, each block discovers own zones). Replaced np.stack (O(n_blocks * n_zones) scheduler memory) with streaming dict-merge (O(n_zones)). 29 existing tests pass." },
     "viewshed": { "last_inspected": "2026-04-05T12:00:00Z", "oom_verdict": "SAFE", "bottleneck": "memory-bound", "high_count": 0, "issue": "fixed-in-tree", "notes": "Tier B memory estimate tightened from 280 to 368 bytes/pixel (accounts for lexsort double-alloc + computed raster). astype copy=False avoids needless float64 copy." },
-    "visibility": { "last_inspected": "2026-04-05T12:00:00Z", "oom_verdict": "SAFE", "bottleneck": "memory-bound", "high_count": 0, "issue": "fixed-in-tree", "notes": "Fixed: _extract_transect uses vindex for point extraction, cumulative_viewshed accumulates lazily with da.zeros + da.Array addition. All 3 HIGH findings resolved." },
+    "visibility": { "last_inspected": "2026-04-16T12:00:00Z", "oom_verdict": "SAFE", "bottleneck": "memory-bound", "high_count": 0, "issue": "fixed-in-tree", "notes": "Re-audit after Numba-ize (PR 1177) confirms SAFE. @ngjit kernels clean, type-stable. MEDIUM: K-observer graph growth in cumulative_viewshed (recommend periodic persist)." },
     "multispectral": { "last_inspected": "2026-03-31T18:00:00Z", "oom_verdict": "SAFE", "bottleneck": "compute-bound", "high_count": 0, "issue": null },
     "fire": { "last_inspected": "2026-03-31T18:00:00Z", "oom_verdict": "SAFE", "bottleneck": "compute-bound", "high_count": 0, "issue": null },
     "proximity": { "last_inspected": "2026-03-31T18:00:00Z", "oom_verdict": "WILL OOM", "bottleneck": "memory-bound", "high_count": 3, "issue": 1111, "notes": "Memory guard added to line-sweep path. KDTree path (EUCLIDEAN/MANHATTAN + scipy) already had guards. GREAT_CIRCLE unbounded path already guarded." },
     "emerging_hotspots": { "last_inspected": "2026-03-31T18:00:00Z", "oom_verdict": "SAFE", "bottleneck": "compute-bound", "high_count": 0, "issue": null },
-    "classify": { "last_inspected": "2026-03-31T18:00:00Z", "oom_verdict": "SAFE", "bottleneck": "compute-bound", "high_count": 0, "issue": null },
+    "classify": { "last_inspected": "2026-04-16T18:00:00Z", "oom_verdict": "SAFE", "bottleneck": "compute-bound", "high_count": 0, "issue": "fixed-in-tree", "notes": "Fixed-in-tree 2026-04-16: _run_dask_head_tail_breaks now persists data_clean once and fuses mean+head_count per iter (912ms -> 339ms, 0.37x IMPROVED); added _run_dask_box_plot that samples via _generate_sample_indices instead of boolean fancy indexing on dask array; _run_dask_cupy_box_plot likewise. 85 existing classify tests pass." },
     "convolution": { "last_inspected": "2026-03-31T18:00:00Z", "oom_verdict": "SAFE", "bottleneck": "compute-bound", "high_count": 0, "issue": null },
     "morphology": { "last_inspected": "2026-03-31T18:00:00Z", "oom_verdict": "SAFE", "bottleneck": "compute-bound", "high_count": 0, "issue": null },
     "focal": { "last_inspected": "2026-03-31T18:00:00Z", "oom_verdict": "SAFE", "bottleneck": "compute-bound", "high_count": 0, "issue": null },
     "glcm": { "last_inspected": "2026-03-31T18:00:00Z", "oom_verdict": "SAFE", "bottleneck": "compute-bound", "high_count": 0, "issue": null, "notes": "Downgraded to MEDIUM. da.stack without rechunk is scheduling overhead, not OOM risk." },
     "surface_distance": { "last_inspected": "2026-03-31T18:00:00Z", "oom_verdict": "SAFE", "bottleneck": "memory-bound", "high_count": 0, "issue": 1128, "notes": "Memory guard added to dd_grid allocation." },
-    "cost_distance": { "last_inspected": "2026-03-31T18:00:00Z", "oom_verdict": "WILL OOM", "bottleneck": "memory-bound", "high_count": 2, "issue": 1118, "notes": "Memory guard added + da.block assembly. Finite max_cost path (map_overlap) was already safe." },
+    "cost_distance": { "last_inspected": "2026-04-16T12:00:00Z", "oom_verdict": "WILL OOM", "bottleneck": "memory-bound", "high_count": 4, "issue": 1118, "notes": "Re-audit 2026-04-16 after PR 1192 Bellman-Ford fix. 4 HIGH re-surface in iterative tile_cache path (L645 full-dataset materialization, L1015 da.from_delayed wrapping computed tiles). Finite max_cost path remains SAFE. Unbounded path is fundamentally O(dataset) driver memory — covered by #1118." },
     "mahalanobis": { "last_inspected": "2026-03-31T18:00:00Z", "oom_verdict": "SAFE", "bottleneck": "compute-bound", "high_count": 0, "issue": null, "notes": "False positive. Numpy path materializes by design. Dask path uses lazy reductions + map_blocks." },
     "bilateral": { "last_inspected": "2026-03-31T18:00:00Z", "oom_verdict": "SAFE", "bottleneck": "compute-bound", "high_count": 0, "issue": null },
     "sky_view_factor": { "last_inspected": "2026-03-31T18:00:00Z", "oom_verdict": "SAFE", "bottleneck": "compute-bound", "high_count": 0, "issue": null },
@@ -33,15 +33,15 @@
     "terrain": { "last_inspected": "2026-03-31T18:00:00Z", "oom_verdict": "RISKY", "bottleneck": "compute-bound", "high_count": 0, "issue": null },
     "terrain_metrics": { "last_inspected": "2026-03-31T18:00:00Z", "oom_verdict": "SAFE", "bottleneck": "memory-bound", "high_count": 0, "issue": null },
     "slope": { "last_inspected": "2026-03-31T18:00:00Z", "oom_verdict": "SAFE", "bottleneck": "compute-bound", "high_count": 0, "issue": null },
-    "hillshade": { "last_inspected": "2026-03-31T18:00:00Z", "oom_verdict": "SAFE", "bottleneck": "compute-bound", "high_count": 0, "issue": null },
+    "hillshade": { "last_inspected": "2026-04-16T12:00:00Z", "oom_verdict": "SAFE", "bottleneck": "compute-bound", "high_count": 0, "issue": null, "notes": "Re-audit after Horn's method rewrite (PR 1175): clean stencil, map_overlap depth=(1,1), no materialization. Zero findings." },
     "diffusion": { "last_inspected": "2026-03-31T18:00:00Z", "oom_verdict": "WILL OOM", "bottleneck": "memory-bound", "high_count": 2, "issue": 1116, "notes": "Scalar diffusivity now passed as float to chunks. DataArray diffusivity passed as dask array via map_overlap." },
     "perlin": { "last_inspected": "2026-03-31T18:00:00Z", "oom_verdict": "WILL OOM", "bottleneck": "memory-bound", "high_count": 0, "issue": null },
     "curvature": { "last_inspected": "2026-03-31T18:00:00Z", "oom_verdict": "SAFE", "bottleneck": "compute-bound", "high_count": 0, "issue": null },
     "normalize": { "last_inspected": "2026-03-31T18:00:00Z", "oom_verdict": "SAFE", "bottleneck": "compute-bound", "high_count": 0, "issue": 1124, "notes": "Boolean indexing replaced with lazy nanmin/nanmax/nanmean/nanstd." },
-    "polygonize": { "last_inspected": "2026-03-31T18:00:00Z", "oom_verdict": "SAFE", "bottleneck": "compute-bound", "high_count": 0, "issue": null },
+    "polygonize": { "last_inspected": "2026-04-16T12:00:00Z", "oom_verdict": "RISKY", "bottleneck": "compute-bound", "high_count": 0, "issue": null, "notes": "Re-audit 2026-04-16 after PR 1190 NaN fix + 1176 simplification. No HIGH. MEDIUM: sequential per-chunk dask.compute loop at L1528 serializes work; _polygonize_cupy full GPU->CPU transfer at L665; per-value bin_mask alloc in _calculate_regions_cupy." },
     "contour": { "last_inspected": "2026-03-31T18:00:00Z", "oom_verdict": "SAFE", "bottleneck": "compute-bound", "high_count": 0, "issue": null },
     "geodesic": { "last_inspected": "2026-03-31T18:00:00Z", "oom_verdict": "N/A", "bottleneck": "compute-bound", "high_count": 0, "issue": null },
-    "balanced_allocation": { "last_inspected": "2026-03-31T18:00:00Z", "oom_verdict": "WILL OOM", "bottleneck": "memory-bound", "high_count": 3, "issue": 1114, "notes": "Lazy source extraction + memory guard. Algorithm is inherently O(N*size) - documented limitation." },
+    "balanced_allocation": { "last_inspected": "2026-04-16T12:00:00Z", "oom_verdict": "WILL OOM", "bottleneck": "memory-bound", "high_count": 8, "issue": 1114, "notes": "Re-audit 2026-04-16 after PR 1203 float32 fix. 8 HIGH found (friction.compute L339, argmin.compute in iter loop L182, double all_nan recompute L206, stacked cost_surfaces allocation). Covered by existing documented limitation on #1114. Not refiled." },
     "corridor": { "last_inspected": "2026-03-31T18:00:00Z", "oom_verdict": "SAFE", "bottleneck": "compute-bound", "high_count": 0, "issue": null },
     "edge_detection": { "last_inspected": "2026-03-31T18:00:00Z", "oom_verdict": "SAFE", "bottleneck": "compute-bound", "high_count": 0, "issue": null },
     "erosion": { "last_inspected": "2026-03-31T18:00:00Z", "oom_verdict": "WILL OOM", "bottleneck": "memory-bound", "high_count": 2, "issue": 1120, "notes": "Memory guard added. Algorithm inherently global." },
diff --git a/.gitignore b/.gitignore
index 11a8c9f2..3bffaafb 100644
--- a/.gitignore
+++ b/.gitignore
@@ -98,3 +98,4 @@ dmypy.json
 xrspatial-examples/
 *.zarr/
 .claude/worktrees/
+docs/superpowers/
diff --git a/docs/superpowers/plans/2026-03-23-lightweight-crs-parser.md b/docs/superpowers/plans/2026-03-23-lightweight-crs-parser.md
deleted file mode 100644
index 630833d5..00000000
--- a/docs/superpowers/plans/2026-03-23-lightweight-crs-parser.md
+++ /dev/null
@@ -1,1466 +0,0 @@
-# Lightweight CRS Parser Implementation Plan
-
-> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
-
-**Goal:** Make common reprojection (UTM, Web Mercator, LCC, Albers, etc.) work without pyproj by embedding CRS metadata for EPSG codes that have Numba fast paths.
-
-**Architecture:** New `CRS` class in `_lite_crs.py` with an embedded EPSG table. `_crs_utils.py` tries our `CRS` first, falls back to pyproj. `_grid.py` gets a `_transform_points()` helper that reuses existing Numba kernels for scatter-point boundary transforms. Chunk functions and `_source_footprint_in_target` use two-tier CRS resolution.
-
-**Tech Stack:** Python 3, NumPy, Numba (existing), pytest
-
-**Spec:** `docs/superpowers/specs/2026-03-23-lightweight-crs-parser-design.md`
-
----
-
-## File Structure
-
-| File | Responsibility |
-|---|---|
-| `xrspatial/reproject/_lite_crs.py` | **New.** `CRS` class, EPSG lookup table, WKT generation |
-| `xrspatial/reproject/_crs_utils.py` | **Modify.** Two-tier `_resolve_crs()`, new `_crs_from_wkt()` |
-| `xrspatial/reproject/_projections.py` | **Modify.** New `transform_points()` scatter-point helper |
-| `xrspatial/reproject/_grid.py` | **Modify.** Use `transform_points()` with pyproj fallback |
-| `xrspatial/reproject/__init__.py` | **Modify.** Use `_crs_from_wkt()` in chunk functions; restructure CuPy chunk; update `_source_footprint_in_target` |
-| `xrspatial/tests/test_lite_crs.py` | **New.** Unit + integration tests |
-
----
-
-### Task 1: CRS class -- construction and basic interface
-
-**Files:**
-- Create: `xrspatial/tests/test_lite_crs.py`
-- Create: `xrspatial/reproject/_lite_crs.py`
-
-- [ ] **Step 1: Write failing tests for CRS construction and interface**
-
-```python
-"""Tests for the lightweight CRS class."""
-from __future__ import annotations
-
-import pytest
-
-
-class TestCRSConstruction:
-    def test_from_epsg_int(self):
-        from xrspatial.reproject._lite_crs import CRS
-        crs = CRS(4326)
-        assert crs.to_epsg() == 4326
-
-    def test_from_epsg_classmethod(self):
-        from xrspatial.reproject._lite_crs import CRS
-        crs = CRS.from_epsg(4326)
-        assert crs.to_epsg() == 4326
-
-    def test_from_authority_string(self):
-        from xrspatial.reproject._lite_crs import CRS
-        crs = CRS("EPSG:32632")
-        assert crs.to_epsg() == 32632
-
-    def test_unknown_epsg_raises(self):
-        from xrspatial.reproject._lite_crs import CRS
-        with pytest.raises(ValueError, match="not in the built-in table"):
-            CRS(9999)
-
-    def test_to_authority(self):
-        from xrspatial.reproject._lite_crs import CRS
-        crs = CRS(4326)
-        assert crs.to_authority() == ('EPSG', '4326')
-
-    def test_is_geographic_true(self):
-        from xrspatial.reproject._lite_crs import CRS
-        assert CRS(4326).is_geographic is True
-        assert CRS(4269).is_geographic is True
-        assert CRS(4267).is_geographic is True
-
-    def test_is_geographic_false(self):
-        from xrspatial.reproject._lite_crs import CRS
-        assert CRS(3857).is_geographic is False
-        assert CRS(32632).is_geographic is False
-        assert CRS(5070).is_geographic is False
-
-
-class TestCRSEquality:
-    def test_equal_same_epsg(self):
-        from xrspatial.reproject._lite_crs import CRS
-        assert CRS(4326) == CRS(4326)
-
-    def test_not_equal_different_epsg(self):
-        from xrspatial.reproject._lite_crs import CRS
-        assert CRS(4326) != CRS(3857)
-
-    def test_hash_equal(self):
-        from xrspatial.reproject._lite_crs import CRS
-        assert hash(CRS(4326)) == hash(CRS(4326))
-
-    def test_hash_in_set(self):
-        from xrspatial.reproject._lite_crs import CRS
-        s = {CRS(4326), CRS(4326), CRS(3857)}
-        assert len(s) == 2
-```
-
-- [ ] **Step 2: Run tests to verify they fail**
-
-Run: `pytest xrspatial/tests/test_lite_crs.py -v`
-Expected: FAIL with `ModuleNotFoundError` or `ImportError`
-
-- [ ] **Step 3: Implement CRS class with EPSG table**
-
-Create `xrspatial/reproject/_lite_crs.py` with:
-
-```python
-"""Lightweight CRS class for common EPSG codes.
-
-Drop-in replacement for pyproj.CRS for the subset of EPSG codes that
-have Numba JIT fast paths in _projections.py.  Allows basic reprojection
-without installing pyproj.
-"""
-from __future__ import annotations
-
-import math
-import re
-
-# ---- WGS84 ellipsoid constants (must match _projections.py) ----
-_WGS84_A = 6378137.0
-_WGS84_F = 1.0 / 298.257223563
-
-# ---- Ellipsoid definitions: (a, f) ----
-_ELLIPSOID_WGS84 = (_WGS84_A, _WGS84_F)
-_ELLIPSOID_CLARKE1866 = (6378206.4, 1.0 / 294.9786982)
-
-
-# ---- Named EPSG entries ----
-# Each value: dict with PROJ4-style keys + '_is_geographic' flag
-# Only codes with Numba fast paths in _projections.py belong here.
-_NAMED_EPSG = {
-    # Geographic
-    4326: {
-        'proj': 'longlat', 'datum': 'WGS84', 'ellps': 'WGS84',
-        'no_defs': True, '_is_geographic': True,
-        '_name': 'WGS 84',
-    },
-    4269: {
-        'proj': 'longlat', 'datum': 'NAD83', 'ellps': 'GRS80',
-        'no_defs': True, '_is_geographic': True,
-        '_name': 'NAD83',
-    },
-    4267: {
-        'proj': 'longlat', 'datum': 'NAD27', 'ellps': 'clrk66',
-        'no_defs': True, '_is_geographic': True,
-        '_name': 'NAD27',
-    },
-    # Web Mercator
-    3857: {
-        'proj': 'merc', 'datum': 'WGS84', 'ellps': 'WGS84',
-        'lon_0': 0.0, 'lat_ts': 0.0, 'x_0': 0.0, 'y_0': 0.0,
-        'k_0': 1.0, 'units': 'm', 'no_defs': True,
-        '_is_geographic': False,
-        '_name': 'WGS 84 / Pseudo-Mercator',
-    },
-    # Ellipsoidal Mercator
-    3395: {
-        'proj': 'merc', 'datum': 'WGS84', 'ellps': 'WGS84',
-        'lon_0': 0.0, 'lat_ts': 0.0, 'x_0': 0.0, 'y_0': 0.0,
-        'k_0': 1.0, 'units': 'm', 'no_defs': True,
-        '_is_geographic': False,
-        '_name': 'WGS 84 / World Mercator',
-    },
-    # Lambert Conformal Conic -- France
-    2154: {
-        'proj': 'lcc', 'datum': 'WGS84', 'ellps': 'GRS80',
-        'lon_0': 3.0, 'lat_0': 46.5, 'lat_1': 49.0, 'lat_2': 44.0,
-        'x_0': 700000.0, 'y_0': 6600000.0, 'k_0': 1.0, 'units': 'm',
-        'no_defs': True, '_is_geographic': False,
-        '_name': 'RGF93 v1 / Lambert-93',
-    },
-    # Albers Equal Area -- CONUS
-    5070: {
-        'proj': 'aea', 'datum': 'NAD83', 'ellps': 'GRS80',
-        'lon_0': -96.0, 'lat_0': 23.0, 'lat_1': 29.5, 'lat_2': 45.5,
-        'x_0': 0.0, 'y_0': 0.0, 'units': 'm',
-        'no_defs': True, '_is_geographic': False,
-        '_name': 'NAD83 / Conus Albers',
-    },
-    # Lambert Azimuthal Equal Area -- Europe
-    3035: {
-        'proj': 'laea', 'datum': 'WGS84', 'ellps': 'GRS80',
-        'lon_0': 10.0, 'lat_0': 52.0,
-        'x_0': 4321000.0, 'y_0': 3210000.0, 'units': 'm',
-        'no_defs': True, '_is_geographic': False,
-        '_name': 'ETRS89-extended / LAEA Europe',
-    },
-    # Polar Stereographic South
-    3031: {
-        'proj': 'stere', 'datum': 'WGS84', 'ellps': 'WGS84',
-        'lon_0': 0.0, 'lat_0': -90.0, 'lat_ts': -71.0,
-        'x_0': 0.0, 'y_0': 0.0, 'k_0': 1.0, 'units': 'm',
-        'no_defs': True, '_is_geographic': False,
-        '_name': 'WGS 84 / Antarctic Polar Stereographic',
-    },
-    # Polar Stereographic North (NSIDC)
-    3413: {
-        'proj': 'stere', 'datum': 'WGS84', 'ellps': 'WGS84',
-        'lon_0': -45.0, 'lat_0': 90.0, 'lat_ts': 70.0,
-        'x_0': 0.0, 'y_0': 0.0, 'k_0': 1.0, 'units': 'm',
-        'no_defs': True, '_is_geographic': False,
-        '_name': 'WGS 84 / NSIDC Sea Ice Polar Stereographic North',
-    },
-    # Arctic Polar Stereographic
-    3996: {
-        'proj': 'stere', 'datum': 'WGS84', 'ellps': 'WGS84',
-        'lon_0': 0.0, 'lat_0': 90.0, 'lat_ts': 75.0,
-        'x_0': 0.0, 'y_0': 0.0, 'k_0': 1.0, 'units': 'm',
-        'no_defs': True, '_is_geographic': False,
-        '_name': 'WGS 84 / Arctic Polar Stereographic',
-    },
-    # Oblique Stereographic -- Netherlands RD New
-    28992: {
-        'proj': 'sterea', 'datum': 'WGS84', 'ellps': 'bessel',
-        'lon_0': 5.38763888889, 'lat_0': 52.15616055556,
-        'x_0': 155000.0, 'y_0': 463000.0, 'k_0': 0.9999079,
-        'units': 'm', 'no_defs': True, '_is_geographic': False,
-        '_name': 'Amersfoort / RD New',
-    },
-    # Cylindrical Equal Area
-    6933: {
-        'proj': 'cea', 'datum': 'WGS84', 'ellps': 'WGS84',
-        'lon_0': 0.0, 'lat_ts': 30.0,
-        'x_0': 0.0, 'y_0': 0.0, 'k_0': 1.0, 'units': 'm',
-        'no_defs': True, '_is_geographic': False,
-        '_name': 'WGS 84 / NSIDC EASE-Grid 2.0 Global',
-    },
-}
-
-
-def _utm_entry(epsg):
-    """Generate PROJ4 dict for a UTM EPSG code, or return None."""
-    if 32601 <= epsg <= 32660:
-        zone = epsg - 32600
-        south = False
-    elif 32701 <= epsg <= 32760:
-        zone = epsg - 32700
-        south = True
-    elif 26901 <= epsg <= 26923:
-        zone = epsg - 26900
-        south = False
-        # NAD83 UTM
-        lon0 = (zone - 1) * 6 - 177
-        return {
-            'proj': 'utm', 'zone': zone, 'datum': 'NAD83', 'ellps': 'GRS80',
-            'lon_0': float(lon0), 'lat_0': 0.0,
-            'x_0': 500000.0, 'y_0': 0.0,
-            'k_0': 0.9996, 'units': 'm', 'no_defs': True,
-            '_is_geographic': False,
-            '_name': f'NAD83 / UTM zone {zone}N',
-        }
-    else:
-        return None
-    lon0 = (zone - 1) * 6 - 177
-    fn = 10000000.0 if south else 0.0
-    hemisphere = 'S' if south else 'N'
-    return {
-        'proj': 'utm', 'zone': zone, 'datum': 'WGS84', 'ellps': 'WGS84',
-        'lon_0': float(lon0), 'lat_0': 0.0,
-        'x_0': 500000.0, 'y_0': fn,
-        'k_0': 0.9996, 'units': 'm', 'no_defs': True,
-        '_is_geographic': False,
-        '_name': f'WGS 84 / UTM zone {zone}{hemisphere}',
-    }
-
-
-def _lookup(epsg):
-    """Look up an EPSG code in the built-in table.
-
-    Returns the PROJ4-style dict or raises ValueError.
-    """
-    entry = _NAMED_EPSG.get(epsg)
-    if entry is not None:
-        return entry
-    entry = _utm_entry(epsg)
-    if entry is not None:
-        return entry
-    raise ValueError(
-        f"EPSG:{epsg} is not in the built-in table. "
-        f"Install pyproj for full CRS support:  "
-        f"pip install pyproj  or:  pip install xarray-spatial[reproject]"
-    )
-
-
-# Regex for AUTHORITY["EPSG","XXXX"] in WKT strings
-_WKT_EPSG_RE = re.compile(r'AUTHORITY\s*\[\s*"EPSG"\s*,\s*"(\d+)"\s*\]')
-
-
-class CRS:
-    """Lightweight CRS for common EPSG codes.
-
-    Drop-in replacement for pyproj.CRS for the subset of codes that
-    have Numba JIT fast paths.  Raises ValueError for codes not in
-    the embedded table.
-    """
-
-    def __init__(self, input_value):
-        if isinstance(input_value, int):
-            self._epsg = input_value
-            self._params = _lookup(input_value)
-        elif isinstance(input_value, str):
-            m = re.match(r'^EPSG:(\d+)$', input_value, re.IGNORECASE)
-            if m:
-                self._epsg = int(m.group(1))
-                self._params = _lookup(self._epsg)
-            else:
-                # Try WKT
-                epsg = self._extract_epsg_from_wkt(input_value)
-                if epsg is not None:
-                    self._epsg = epsg
-                    self._params = _lookup(epsg)
-                else:
-                    raise ValueError(
-                        f"Cannot parse CRS from string (no EPSG code found). "
-                        f"Install pyproj for full CRS support."
-                    )
-        else:
-            raise TypeError(
-                f"CRS() expects int or str, got {type(input_value).__name__}"
-            )
-
-    @classmethod
-    def from_epsg(cls, code):
-        """Construct from an integer EPSG code."""
-        return cls(int(code))
-
-    @classmethod
-    def from_wkt(cls, wkt):
-        """Construct from a WKT string by extracting the EPSG code."""
-        epsg = cls._extract_epsg_from_wkt(wkt)
-        if epsg is None:
-            raise ValueError(
-                "No AUTHORITY[\"EPSG\",\"...\"] found in WKT string. "
-                "Install pyproj for full CRS support."
-            )
-        return cls(epsg)
-
-    @staticmethod
-    def _extract_epsg_from_wkt(wkt):
-        """Extract the last EPSG code from a WKT string, or None."""
-        matches = _WKT_EPSG_RE.findall(wkt)
-        if matches:
-            return int(matches[-1])
-        return None
-
-    def to_epsg(self):
-        """Return the integer EPSG code."""
-        return self._epsg
-
-    def to_authority(self):
-        """Return (authority_name, code) tuple."""
-        return ('EPSG', str(self._epsg))
-
-    def to_dict(self):
-        """Return PROJ4-style parameter dict.
-
-        Internal keys prefixed with '_' are stripped.
-        """
-        return {k: v for k, v in self._params.items()
-                if not k.startswith('_')}
-
-    @property
-    def is_geographic(self):
-        """True if this is a geographic (lat/lon) CRS."""
-        return self._params['_is_geographic']
-
-    def to_wkt(self):
-        """Return an OGC WKT1 string for this CRS."""
-        return _build_wkt(self._epsg, self._params)
-
-    def __eq__(self, other):
-        if isinstance(other, CRS):
-            return self._epsg == other._epsg
-        # Duck-type: try to_epsg() on the other object (e.g. pyproj.CRS)
-        try:
-            return self._epsg == other.to_epsg()
-        except (AttributeError, TypeError):
-            return NotImplemented
-
-    def __hash__(self):
-        return hash(('CRS', self._epsg))
-
-    def __repr__(self):
-        name = self._params.get('_name', '')
-        if name:
-            return f"CRS(EPSG:{self._epsg} -- {name})"
-        return f"CRS(EPSG:{self._epsg})"
-
-
-def _build_wkt(epsg, params):
-    """Build a minimal OGC WKT1 string from EPSG + params.
-
-    This does not need to be a perfect WKT -- it just needs to contain
-    AUTHORITY["EPSG","XXXX"] so that from_wkt() can round-trip, and
-    be accepted by pyproj.CRS.from_wkt() if the user mixes types.
-    """
-    name = params.get('_name', f'EPSG:{epsg}')
-    if params['_is_geographic']:
-        return (
-            f'GEOGCS["{name}",'
-            f'DATUM["unknown",SPHEROID["WGS 84",{_WGS84_A},{1/_WGS84_F}]],'
-            f'PRIMEM["Greenwich",0],'
-            f'UNIT["degree",0.0174532925199433],'
-            f'AUTHORITY["EPSG","{epsg}"]]'
-        )
-    proj = params.get('proj', '')
-    fe = params.get('x_0', 0.0)
-    fn = params.get('y_0', 0.0)
-    lon0 = params.get('lon_0', 0.0)
-    lat0 = params.get('lat_0', 0.0)
-    k0 = params.get('k_0', 1.0)
-    lat1 = params.get('lat_1', 0.0)
-    lat2 = params.get('lat_2', 0.0)
-    return (
-        f'PROJCS["{name}",'
-        f'GEOGCS["GCS_WGS_1984",'
-        f'DATUM["D_WGS_1984",SPHEROID["WGS_1984",{_WGS84_A},{1/_WGS84_F}]],'
-        f'PRIMEM["Greenwich",0],UNIT["Degree",0.0174532925199433]],'
-        f'PROJECTION["{proj}"],'
-        f'PARAMETER["false_easting",{fe}],'
-        f'PARAMETER["false_northing",{fn}],'
-        f'PARAMETER["central_meridian",{lon0}],'
-        f'PARAMETER["latitude_of_origin",{lat0}],'
-        f'PARAMETER["scale_factor",{k0}],'
-        f'PARAMETER["standard_parallel_1",{lat1}],'
-        f'PARAMETER["standard_parallel_2",{lat2}],'
-        f'UNIT["Meter",1],'
-        f'AUTHORITY["EPSG","{epsg}"]]'
-    )
-```
-
-- [ ] **Step 4: Run tests to verify they pass**
-
-Run: `pytest xrspatial/tests/test_lite_crs.py -v`
-Expected: All PASS
-
-- [ ] **Step 5: Commit**
-
-```bash
-git add xrspatial/reproject/_lite_crs.py xrspatial/tests/test_lite_crs.py
-git commit -m "Add lightweight CRS class with embedded EPSG table (#1057)"
-```
-
----
-
-### Task 2: CRS to_dict() and to_wkt() correctness
-
-**Files:**
-- Modify: `xrspatial/tests/test_lite_crs.py`
-
-- [ ] **Step 1: Write failing tests for to_dict() and to_wkt() round-trip**
-
-Add to `test_lite_crs.py`:
-
-```python
-class TestCRSToDict:
-    """Verify to_dict() returns correct PROJ4 keys for each projection family."""
-
-    def test_geographic_4326(self):
-        from xrspatial.reproject._lite_crs import CRS
-        d = CRS(4326).to_dict()
-        assert d['proj'] == 'longlat'
-        assert d['datum'] == 'WGS84'
-        # No internal keys leaked
-        assert not any(k.startswith('_') for k in d)
-
-    def test_utm_north_zone_32(self):
-        from xrspatial.reproject._lite_crs import CRS
-        d = CRS(32632).to_dict()
-        assert d['proj'] == 'utm'
-        assert d['zone'] == 32
-        assert d['k_0'] == 0.9996
-        assert d['x_0'] == 500000.0
-        assert d['y_0'] == 0.0  # north hemisphere
-        assert d['lon_0'] == 9.0  # (32-1)*6 - 177
-
-    def test_utm_south_zone_55(self):
-        from xrspatial.reproject._lite_crs import CRS
-        d = CRS(32755).to_dict()
-        assert d['zone'] == 55
-        assert d['y_0'] == 10000000.0  # south hemisphere
-
-    def test_utm_nad83_zone_10(self):
-        from xrspatial.reproject._lite_crs import CRS
-        d = CRS(26910).to_dict()
-        assert d['datum'] == 'NAD83'
-        assert d['zone'] == 10
-        assert d['lon_0'] == -123.0
-
-    def test_lcc_2154(self):
-        from xrspatial.reproject._lite_crs import CRS
-        d = CRS(2154).to_dict()
-        assert d['proj'] == 'lcc'
-        assert d['lat_1'] == 49.0
-        assert d['lat_2'] == 44.0
-
-    def test_aea_5070(self):
-        from xrspatial.reproject._lite_crs import CRS
-        d = CRS(5070).to_dict()
-        assert d['proj'] == 'aea'
-        assert d['lon_0'] == -96.0
-
-    def test_web_mercator_3857(self):
-        from xrspatial.reproject._lite_crs import CRS
-        d = CRS(3857).to_dict()
-        assert d['proj'] == 'merc'
-
-    def test_laea_3035(self):
-        from xrspatial.reproject._lite_crs import CRS
-        d = CRS(3035).to_dict()
-        assert d['proj'] == 'laea'
-        assert d['lon_0'] == 10.0
-        assert d['lat_0'] == 52.0
-
-    def test_stere_3031(self):
-        from xrspatial.reproject._lite_crs import CRS
-        d = CRS(3031).to_dict()
-        assert d['proj'] == 'stere'
-        assert d['lat_0'] == -90.0
-
-    def test_sterea_28992(self):
-        from xrspatial.reproject._lite_crs import CRS
-        d = CRS(28992).to_dict()
-        assert d['proj'] == 'sterea'
-
-    def test_cea_6933(self):
-        from xrspatial.reproject._lite_crs import CRS
-        d = CRS(6933).to_dict()
-        assert d['proj'] == 'cea'
-
-
-class TestCRSWktRoundTrip:
-    def test_roundtrip_geographic(self):
-        from xrspatial.reproject._lite_crs import CRS
-        crs1 = CRS(4326)
-        wkt = crs1.to_wkt()
-        crs2 = CRS.from_wkt(wkt)
-        assert crs2.to_epsg() == 4326
-
-    def test_roundtrip_projected(self):
-        from xrspatial.reproject._lite_crs import CRS
-        crs1 = CRS(32632)
-        wkt = crs1.to_wkt()
-        crs2 = CRS.from_wkt(wkt)
-        assert crs2.to_epsg() == 32632
-
-    def test_roundtrip_all_named(self):
-        from xrspatial.reproject._lite_crs import CRS, _NAMED_EPSG
-        for epsg in _NAMED_EPSG:
-            crs = CRS(epsg)
-            wkt = crs.to_wkt()
-            assert CRS.from_wkt(wkt).to_epsg() == epsg, f"Failed for EPSG:{epsg}"
-
-    def test_wkt_contains_authority(self):
-        from xrspatial.reproject._lite_crs import CRS
-        wkt = CRS(5070).to_wkt()
-        assert 'AUTHORITY["EPSG","5070"]' in wkt
-```
-
-- [ ] **Step 2: Run tests to verify they pass**
-
-Run: `pytest xrspatial/tests/test_lite_crs.py -v`
-Expected: All PASS (implementation was written in Task 1)
-
-- [ ] **Step 3: Verify against pyproj (if installed)**
-
-Add a parametrized test that compares our `.to_dict()` against pyproj's output for every embedded code. This test skips when pyproj is not installed:
-
-```python
-try:
-    import pyproj
-    HAS_PYPROJ = True
-except ImportError:
-    HAS_PYPROJ = False
-
-
-@pytest.mark.skipif(not HAS_PYPROJ, reason="pyproj not installed")
-class TestCRSMatchesPyproj:
-    """Compare our CRS output against pyproj for correctness."""
-
-    @pytest.mark.parametrize("epsg", [
-        4326, 4269, 4267, 3857, 3395, 2154, 5070, 3035,
-        3031, 3413, 3996, 28992, 6933,
-        32601, 32632, 32660, 32701, 32755, 32760,
-        26901, 26910, 26923,
-    ])
-    def test_to_dict_proj_key_matches(self, epsg):
-        from xrspatial.reproject._lite_crs import CRS as LiteCRS
-        lite_d = LiteCRS(epsg).to_dict()
-        pyproj_d = pyproj.CRS.from_epsg(epsg).to_dict()
-        # The 'proj' key must match for dispatch to work
-        assert lite_d['proj'] == pyproj_d.get('proj', pyproj_d.get('type', '')), \
-            f"EPSG:{epsg}: proj mismatch: {lite_d['proj']} vs {pyproj_d}"
-
-    @pytest.mark.parametrize("epsg", [
-        4326, 4269, 4267, 3857, 3395, 2154, 5070, 3035,
-        3031, 3413, 3996, 28992, 6933,
-        32632, 32755, 26910,
-    ])
-    def test_is_geographic_matches(self, epsg):
-        from xrspatial.reproject._lite_crs import CRS as LiteCRS
-        assert LiteCRS(epsg).is_geographic == pyproj.CRS.from_epsg(epsg).is_geographic
-
-    @pytest.mark.parametrize("epsg", [
-        4326, 3857, 32632, 5070, 3035, 3031, 28992, 6933,
-    ])
-    def test_equality_with_pyproj(self, epsg):
-        from xrspatial.reproject._lite_crs import CRS as LiteCRS
-        lite = LiteCRS(epsg)
-        pp = pyproj.CRS.from_epsg(epsg)
-        assert lite == pp  # our __eq__ calls pp.to_epsg()
-```
-
-- [ ] **Step 4: Run full test suite**
-
-Run: `pytest xrspatial/tests/test_lite_crs.py -v`
-Expected: All PASS (pyproj tests pass or skip)
-
-- [ ] **Step 5: Commit**
-
-```bash
-git add xrspatial/tests/test_lite_crs.py
-git commit -m "Add to_dict/to_wkt correctness tests for lite CRS (#1057)"
-```
-
----
-
-### Task 3: Two-tier `_resolve_crs` and `_crs_from_wkt`
-
-**Files:**
-- Modify: `xrspatial/reproject/_crs_utils.py`
-- Modify: `xrspatial/tests/test_lite_crs.py`
-
-- [ ] **Step 1: Write failing tests for two-tier resolution**
-
-Add to `test_lite_crs.py`:
-
-```python
-class TestTwoTierResolution:
-    def test_resolve_crs_int_uses_lite(self):
-        from xrspatial.reproject._crs_utils import _resolve_crs
-        from xrspatial.reproject._lite_crs import CRS as LiteCRS
-        crs = _resolve_crs(4326)
-        assert isinstance(crs, LiteCRS)
-        assert crs.to_epsg() == 4326
-
-    def test_resolve_crs_string_uses_lite(self):
-        from xrspatial.reproject._crs_utils import _resolve_crs
-        from xrspatial.reproject._lite_crs import CRS as LiteCRS
-        crs = _resolve_crs("EPSG:32632")
-        assert isinstance(crs, LiteCRS)
-        assert crs.to_epsg() == 32632
-
-    @pytest.mark.skipif(not HAS_PYPROJ, reason="pyproj not installed")
-    def test_resolve_crs_unknown_falls_back(self):
-        """Unknown EPSG codes should fall back to pyproj."""
-        from xrspatial.reproject._crs_utils import _resolve_crs
-        from xrspatial.reproject._lite_crs import CRS as LiteCRS
-        crs = _resolve_crs(2193)  # NZGD2000 -- not in our table
-        assert not isinstance(crs, LiteCRS)
-        assert crs.to_epsg() == 2193
-
-    def test_resolve_crs_none_returns_none(self):
-        from xrspatial.reproject._crs_utils import _resolve_crs
-        assert _resolve_crs(None) is None
-
-    def test_resolve_crs_passes_through_lite_crs(self):
-        from xrspatial.reproject._crs_utils import _resolve_crs
-        from xrspatial.reproject._lite_crs import CRS as LiteCRS
-        original = LiteCRS(4326)
-        result = _resolve_crs(original)
-        assert result is original
-
-    @pytest.mark.skipif(not HAS_PYPROJ, reason="pyproj not installed")
-    def test_resolve_crs_passes_through_pyproj(self):
-        from xrspatial.reproject._crs_utils import _resolve_crs
-        original = pyproj.CRS.from_epsg(4326)
-        result = _resolve_crs(original)
-        assert result is original
-
-    def test_crs_from_wkt_lite(self):
-        from xrspatial.reproject._crs_utils import _crs_from_wkt
-        from xrspatial.reproject._lite_crs import CRS as LiteCRS
-        wkt = LiteCRS(4326).to_wkt()
-        crs = _crs_from_wkt(wkt)
-        assert isinstance(crs, LiteCRS)
-        assert crs.to_epsg() == 4326
-```
-
-- [ ] **Step 2: Run tests to verify they fail**
-
-Run: `pytest xrspatial/tests/test_lite_crs.py::TestTwoTierResolution -v`
-Expected: FAIL -- `_resolve_crs` still requires pyproj, `_crs_from_wkt` doesn't exist
-
-- [ ] **Step 3: Implement two-tier `_resolve_crs` and `_crs_from_wkt`**
-
-Replace `xrspatial/reproject/_crs_utils.py` contents:
-
-```python
-"""CRS detection utilities with lightweight fallback."""
-from __future__ import annotations
-
-
-def _require_pyproj():
-    """Import and return the pyproj module, raising a clear error if missing."""
-    try:
-        import pyproj
-        return pyproj
-    except ImportError:
-        raise ImportError(
-            "pyproj is required for CRS reprojection. "
-            "Install it with:  pip install pyproj  "
-            "or:  pip install xarray-spatial[reproject]"
-        )
-
-
-def _resolve_crs(crs_input):
-    """Convert *crs_input* to a CRS object.
-
-    Tries the lightweight built-in CRS first for known EPSG codes,
-    then falls back to pyproj.CRS for anything else.
-
-    Returns None if *crs_input* is None.
-    """
-    if crs_input is None:
-        return None
-
-    # Pass through existing CRS objects (ours or pyproj)
-    from ._lite_crs import CRS as LiteCRS
-    if isinstance(crs_input, LiteCRS):
-        return crs_input
-    try:
-        pyproj = _try_import_pyproj()
-        if pyproj is not None and isinstance(crs_input, pyproj.CRS):
-            return crs_input
-    except Exception:
-        pass
-
-    # Try lightweight CRS first
-    try:
-        return LiteCRS(crs_input)
-    except (ValueError, TypeError):
-        pass
-
-    # Fall back to pyproj
-    pyproj = _require_pyproj()
-    return pyproj.CRS(crs_input)
-
-
-def _crs_from_wkt(wkt):
-    """Reconstruct a CRS from a WKT string.
-
-    Tries the lightweight CRS (extracts EPSG from AUTHORITY tag),
-    falls back to pyproj.CRS.from_wkt().
-    """
-    from ._lite_crs import CRS as LiteCRS
-    try:
-        return LiteCRS.from_wkt(wkt)
-    except (ValueError, TypeError):
-        pass
-    pyproj = _require_pyproj()
-    return pyproj.CRS.from_wkt(wkt)
-
-
-def _try_import_pyproj():
-    """Try to import pyproj, return None if not installed."""
-    try:
-        import pyproj
-        return pyproj
-    except ImportError:
-        return None
-
-
-def _detect_source_crs(raster):
-    """Auto-detect the CRS of a DataArray.
-
-    Fallback chain:
-    1. ``raster.attrs['crs']`` (EPSG int from xrspatial.geotiff)
-    2. ``raster.attrs['crs_wkt']`` (WKT string from xrspatial.geotiff)
-    3. ``raster.rio.crs`` (rioxarray, if installed)
-    4. None
-    """
-    # attrs (xrspatial.geotiff convention)
-    crs_attr = raster.attrs.get('crs')
-    if crs_attr is not None:
-        return _resolve_crs(crs_attr)
-
-    crs_wkt = raster.attrs.get('crs_wkt')
-    if crs_wkt is not None:
-        return _resolve_crs(crs_wkt)
-
-    # rioxarray fallback
-    try:
-        rio_crs = raster.rio.crs
-        if rio_crs is not None:
-            return _resolve_crs(rio_crs)
-    except Exception:
-        pass
-
-    return None
-
-
-def _detect_nodata(raster, nodata=None):
-    """Determine nodata value from explicit arg, rioxarray, or attrs."""
-    if nodata is not None:
-        return float(nodata)
-
-    # rioxarray
-    try:
-        rio_nd = raster.rio.nodata
-        if rio_nd is not None:
-            return float(rio_nd)
-    except Exception:
-        pass
-
-    # attrs
-    for key in ('_FillValue', 'nodata', 'missing_value'):
-        val = raster.attrs.get(key)
-        if val is not None:
-            return float(val)
-
-    return float('nan')
-```
-
-- [ ] **Step 4: Run tests to verify they pass**
-
-Run: `pytest xrspatial/tests/test_lite_crs.py -v`
-Expected: All PASS
-
-- [ ] **Step 5: Run existing reproject tests to verify no regressions**
-
-Run: `pytest xrspatial/tests/test_reproject.py -v -x`
-Expected: All PASS (existing tests still work because pyproj is installed)
-
-- [ ] **Step 6: Commit**
-
-```bash
-git add xrspatial/reproject/_crs_utils.py xrspatial/tests/test_lite_crs.py
-git commit -m "Two-tier CRS resolution: lite CRS first, pyproj fallback (#1057)"
-```
-
----
-
-### Task 4: Scatter-point transform helper in `_projections.py`
-
-**Files:**
-- Modify: `xrspatial/reproject/_projections.py` (add `transform_points()` near end of file)
-- Modify: `xrspatial/tests/test_lite_crs.py`
-
-- [ ] **Step 1: Write failing tests for `transform_points`**
-
-Add to `test_lite_crs.py`:
-
-```python
-@pytest.mark.skipif(not HAS_PYPROJ, reason="pyproj not installed")
-class TestTransformPoints:
-    """Test the Numba scatter-point transform helper."""
-
-    def test_wgs84_to_web_mercator(self):
-        from xrspatial.reproject._projections import transform_points
-        from xrspatial.reproject._lite_crs import CRS
-        import numpy as np
-        src = CRS(4326)
-        tgt = CRS(3857)
-        xs = np.array([0.0, 10.0, -10.0])
-        ys = np.array([0.0, 45.0, -30.0])
-        tx, ty = transform_points(src, tgt, xs, ys)
-        assert tx is not None
-        # lon=0 -> x=0 in Web Mercator
-        assert abs(tx[0]) < 1.0
-
-    def test_wgs84_to_utm_zone32(self):
-        from xrspatial.reproject._projections import transform_points
-        from xrspatial.reproject._lite_crs import CRS
-        import numpy as np
-        src = CRS(4326)
-        tgt = CRS(32632)
-        xs = np.array([9.0])  # central meridian of zone 32
-        ys = np.array([48.0])
-        tx, ty = transform_points(src, tgt, xs, ys)
-        assert tx is not None
-        # Central meridian -> false easting = 500000
-        assert abs(tx[0] - 500000.0) < 1.0
-
-    def test_unsupported_pair_returns_none(self):
-        from xrspatial.reproject._projections import transform_points
-        import pyproj
-        import numpy as np
-        # Two projected CRS -- no fast path
-        src = pyproj.CRS.from_epsg(32632)
-        tgt = pyproj.CRS.from_epsg(32633)
-        xs = np.array([500000.0])
-        ys = np.array([5000000.0])
-        result = transform_points(src, tgt, xs, ys)
-        assert result is None
-
-    def test_matches_pyproj_transformer(self):
-        """Numba scatter-point transform matches pyproj within tolerance."""
-        from xrspatial.reproject._projections import transform_points
-        from xrspatial.reproject._lite_crs import CRS
-        import numpy as np
-        src = CRS(4326)
-        tgt = CRS(5070)  # Albers
-        xs = np.linspace(-100, -80, 20)
-        ys = np.linspace(30, 45, 20)
-        tx, ty = transform_points(src, tgt, xs, ys)
-        assert tx is not None
-        # Compare with pyproj
-        transformer = pyproj.Transformer.from_crs(
-            pyproj.CRS.from_epsg(4326), pyproj.CRS.from_epsg(5070),
-            always_xy=True
-        )
-        px, py = transformer.transform(xs, ys)
-        np.testing.assert_allclose(tx, px, atol=0.01)
-        np.testing.assert_allclose(ty, py, atol=0.01)
-```
-
-- [ ] **Step 2: Run tests to verify they fail**
-
-Run: `pytest xrspatial/tests/test_lite_crs.py::TestTransformPoints -v`
-Expected: FAIL -- `transform_points` does not exist
-
-- [ ] **Step 3: Implement `transform_points()` in `_projections.py`**
-
-Add near the end of `xrspatial/reproject/_projections.py`, before the `_try_numba_transform_inner` line:
-
-```python
-def transform_points(src_crs, tgt_crs, xs, ys):
-    """Transform scatter points between CRS using Numba kernels.
-
-    Unlike try_numba_transform() which operates on a regular grid, this
-    accepts arbitrary (xs, ys) arrays.  Used by _grid.py for boundary
-    estimation.
-
-    Returns (tx, ty) numpy arrays, or None if no fast path exists.
-    """
-    src_epsg = _get_epsg(src_crs)
-    tgt_epsg = _get_epsg(tgt_crs)
-    if src_epsg is None and tgt_epsg is None:
-        return None
-
-    src_is_geo = _is_supported_geographic(src_epsg)
-    tgt_is_geo = _is_supported_geographic(tgt_epsg)
-    if not src_is_geo and not tgt_is_geo:
-        return None
-
-    xs = np.asarray(xs, dtype=np.float64).ravel()
-    ys = np.asarray(ys, dtype=np.float64).ravel()
-    n = xs.shape[0]
-    tx = np.empty(n, dtype=np.float64)
-    ty = np.empty(n, dtype=np.float64)
-
-    # Geographic -> Web Mercator
-    if src_is_geo and tgt_epsg == 3857:
-        merc_forward(xs, ys, tx, ty)
-        return tx, ty
-    if src_epsg == 3857 and tgt_is_geo:
-        merc_inverse(xs, ys, tx, ty)
-        return tx, ty
-
-    # Geographic -> UTM
-    if src_is_geo:
-        utm = _utm_params(tgt_epsg)
-        if utm is not None:
-            lon0, k0, fe, fn = utm
-            Qn = k0 * _A_RECT
-            tmerc_forward(xs, ys, tx, ty, lon0, k0, fe, fn, Qn, _ALPHA, _CBG)
-            return tx, ty
-    utm_src = _utm_params(src_epsg)
-    if utm_src is not None and tgt_is_geo:
-        lon0, k0, fe, fn = utm_src
-        Qn = k0 * _A_RECT
-        tmerc_inverse(xs, ys, tx, ty, lon0, k0, fe, fn, Qn, _BETA, _CGB)
-        return tx, ty
-
-    # Ellipsoidal Mercator
-    if src_is_geo and tgt_epsg == 3395:
-        emerc_forward(xs, ys, tx, ty, 1.0, _WGS84_E)
-        return tx, ty
-    if src_epsg == 3395 and tgt_is_geo:
-        emerc_inverse(xs, ys, tx, ty, 1.0, _WGS84_E)
-        return tx, ty
-
-    # LCC
-    if src_is_geo:
-        params = _lcc_params(tgt_crs)
-        if params is not None:
-            lon0, nn, c, rho0, k0, fe, fn, to_m = params
-            lcc_forward(xs, ys, tx, ty, lon0, nn, c, rho0, k0, fe, fn,
-                        _WGS84_E, _WGS84_A)
-            return tx, ty
-    if tgt_is_geo:
-        params = _lcc_params(src_crs)
-        if params is not None:
-            lon0, nn, c, rho0, k0, fe, fn, to_m = params
-            # lcc_inverse expects metres; convert from native units first
-            in_xs = xs * to_m if to_m != 1.0 else xs
-            in_ys = ys * to_m if to_m != 1.0 else ys
-            lcc_inverse(in_xs, in_ys, tx, ty, lon0, nn, c, rho0, k0, fe, fn,
-                        _WGS84_E, _WGS84_A)
-            return tx, ty
-
-    # AEA
-    if src_is_geo:
-        params = _aea_params(tgt_crs)
-        if params is not None:
-            lon0, nn, C, rho0, fe, fn = params
-            aea_forward(xs, ys, tx, ty, lon0, nn, C, rho0, fe, fn,
-                        _WGS84_E, _WGS84_A)
-            return tx, ty
-    if tgt_is_geo:
-        params = _aea_params(src_crs)
-        if params is not None:
-            lon0, nn, C, rho0, fe, fn = params
-            aea_inverse(xs, ys, tx, ty, lon0, nn, C, rho0, fe, fn,
-                        _WGS84_E, _WGS84_A, _QP, _APA)
-            return tx, ty
-
-    # CEA
-    if src_is_geo:
-        params = _cea_params(tgt_crs)
-        if params is not None:
-            lon0, k0, fe, fn = params
-            cea_forward(xs, ys, tx, ty, lon0, k0, fe, fn,
-                        _WGS84_E, _WGS84_A, _QP)
-            return tx, ty
-    if tgt_is_geo:
-        params = _cea_params(src_crs)
-        if params is not None:
-            lon0, k0, fe, fn = params
-            cea_inverse(xs, ys, tx, ty, lon0, k0, fe, fn,
-                        _WGS84_E, _WGS84_A, _QP, _APA)
-            return tx, ty
-
-    # LAEA
-    if src_is_geo:
-        params = _laea_params(tgt_crs)
-        if params is not None:
-            lon0, lat0, sinb1, cosb1, dd, xmf, ymf, rq, qp, fe, fn, mode = params
-            laea_forward(xs, ys, tx, ty, lon0, sinb1, cosb1, xmf, ymf,
-                         rq, qp, fe, fn, _WGS84_E, _WGS84_A, _WGS84_E2, mode)
-            return tx, ty
-    if tgt_is_geo:
-        params = _laea_params(src_crs)
-        if params is not None:
-            lon0, lat0, sinb1, cosb1, dd, xmf, ymf, rq, qp, fe, fn, mode = params
-            laea_inverse(xs, ys, tx, ty, lon0, sinb1, cosb1, xmf, ymf,
-                         rq, qp, fe, fn, _WGS84_E, _WGS84_A, _WGS84_E2, mode, _APA)
-            return tx, ty
-
-    # Polar Stereographic
-    if src_is_geo:
-        params = _stere_params(tgt_crs)
-        if params is not None:
-            lon0, k0, akm1, fe, fn, is_south = params
-            stere_forward(xs, ys, tx, ty, lon0, akm1, fe, fn, _WGS84_E, is_south)
-            return tx, ty
-    if tgt_is_geo:
-        params = _stere_params(src_crs)
-        if params is not None:
-            lon0, k0, akm1, fe, fn, is_south = params
-            stere_inverse(xs, ys, tx, ty, lon0, akm1, fe, fn, _WGS84_E, is_south)
-            return tx, ty
-
-    # Oblique Stereographic
-    if src_is_geo:
-        params = _sterea_params(tgt_crs)
-        if params is not None:
-            lon0, sinc0, cosc0, R2, C, K, ratexp, fe, fn, e = params
-            sterea_forward(xs, ys, tx, ty, lon0, sinc0, cosc0, R2, C, K,
-                           ratexp, fe, fn, e)
-            return tx, ty
-    if tgt_is_geo:
-        params = _sterea_params(src_crs)
-        if params is not None:
-            lon0, sinc0, cosc0, R2, C, K, ratexp, fe, fn, e = params
-            sterea_inverse(xs, ys, tx, ty, lon0, sinc0, cosc0, R2, C, K,
-                           ratexp, fe, fn, e)
-            return tx, ty
-
-    return None
-```
-
-Note: This function mirrors the dispatch logic of `try_numba_transform` but operates on flat scatter-point arrays instead of building a regular grid. The existing batch Numba kernels (e.g. `merc_forward`, `tmerc_forward`, etc.) already accept 1-D arrays, so no new Numba code is needed.
-
-**Intentional omissions from `try_numba_transform`:**
-- **Datum shift wrapping** -- `try_numba_transform` wraps projection kernels with Helmert datum shifts for NAD27 etc. For boundary estimation (the primary consumer of `transform_points`), the metre-level error from omitting the datum shift is sub-pixel and does not affect bounding box correctness. A comment in the code should document this.
-- **Sinusoidal and Generic Transverse Mercator** -- these dispatch via `.to_dict()['proj']` not EPSG code, so they only fire when pyproj provides the CRS. Omitting them from `transform_points` means those CRS pairs fall through to the pyproj Transformer fallback in `_transform_boundary`, which is correct behavior.
-
-- [ ] **Step 4: Run tests to verify they pass**
-
-Run: `pytest xrspatial/tests/test_lite_crs.py::TestTransformPoints -v`
-Expected: All PASS
-
-- [ ] **Step 5: Commit**
-
-```bash
-git add xrspatial/reproject/_projections.py xrspatial/tests/test_lite_crs.py
-git commit -m "Add scatter-point transform_points() for boundary estimation (#1057)"
-```
-
----
-
-### Task 5: Wire `_grid.py` to use `transform_points` with pyproj fallback
-
-**Files:**
-- Modify: `xrspatial/reproject/_grid.py`
-- Modify: `xrspatial/tests/test_lite_crs.py`
-
-- [ ] **Step 1: Write failing test for pyproj-free grid computation**
-
-Add to `test_lite_crs.py`:
-
-```python
-class TestGridWithoutPyproj:
-    def test_compute_output_grid_with_lite_crs(self):
-        """_compute_output_grid works with lite CRS objects (no pyproj)."""
-        from xrspatial.reproject._grid import _compute_output_grid
-        from xrspatial.reproject._lite_crs import CRS
-        src_crs = CRS(4326)
-        tgt_crs = CRS(32632)
-        source_bounds = (6.0, 47.0, 12.0, 55.0)  # Germany
-        source_shape = (64, 64)
-        grid = _compute_output_grid(
-            source_bounds, source_shape, src_crs, tgt_crs
-        )
-        assert 'bounds' in grid
-        assert 'shape' in grid
-        h, w = grid['shape']
-        assert h > 0 and w > 0
-        left, bottom, right, top = grid['bounds']
-        assert right > left
-        assert top > bottom
-```
-
-- [ ] **Step 2: Run test to verify it fails**
-
-Run: `pytest xrspatial/tests/test_lite_crs.py::TestGridWithoutPyproj -v`
-Expected: FAIL -- `_grid.py` still calls `_require_pyproj()`
-
-- [ ] **Step 3: Modify `_grid.py` to use `transform_points` with pyproj fallback**
-
-Replace the transformer creation and transform calls in `_compute_output_grid()`:
-
-```python
-"""Output grid computation and chunk layout for reprojection."""
-from __future__ import annotations
-
-import numpy as np
-
-
-def _transform_boundary(source_crs, target_crs, xs, ys):
-    """Transform scatter points, trying Numba fast path first."""
-    try:
-        from ._projections import transform_points
-        result = transform_points(source_crs, target_crs, xs, ys)
-        if result is not None:
-            return result
-    except (ImportError, ModuleNotFoundError):
-        pass
-    # Fall back to pyproj
-    from ._crs_utils import _require_pyproj
-    pyproj = _require_pyproj()
-    transformer = pyproj.Transformer.from_crs(
-        source_crs, target_crs, always_xy=True
-    )
-    tx, ty = transformer.transform(xs, ys)
-    return np.asarray(tx), np.asarray(ty)
-
-
-def _compute_output_grid(source_bounds, source_shape, source_crs, target_crs,
-                         resolution=None, bounds=None, width=None, height=None):
-    # ... (see step 3 body below)
-```
-
-The body of `_compute_output_grid` replaces all `transformer.transform(...)` calls with `_transform_boundary(source_crs, target_crs, ...)`. Remove the `_require_pyproj()` and `Transformer.from_crs()` calls at the top. The `source_crs.is_geographic` check stays as-is (works with both CRS types). Three call sites to replace:
-
-1. Line 79: `tx, ty = transformer.transform(xs, ys)` -> `tx, ty = _transform_boundary(source_crs, target_crs, xs, ys)`
-2. Line 113: `itx, ity = transformer.transform(ixx.ravel(), iyy.ravel())` -> `itx, ity = _transform_boundary(source_crs, target_crs, ixx.ravel(), iyy.ravel())`
-3. Lines 153-158: resolution estimation (3 transform calls) -> batch into one `_transform_boundary` call:
-   ```python
-   pts_x = np.array([center_x, center_x + src_res_x, center_x])
-   pts_y = np.array([center_y, center_y, center_y + src_res_y])
-   tp_x, tp_y = _transform_boundary(source_crs, target_crs, pts_x, pts_y)
-   tc_x, tc_y = float(tp_x[0]), float(tp_y[0])
-   tx_x, tx_y = float(tp_x[1]), float(tp_y[1])
-   ty_x, ty_y = float(tp_x[2]), float(tp_y[2])
-   ```
-
-- [ ] **Step 4: Run test to verify it passes**
-
-Run: `pytest xrspatial/tests/test_lite_crs.py::TestGridWithoutPyproj -v`
-Expected: PASS
-
-- [ ] **Step 5: Run existing reproject tests to verify no regressions**
-
-Run: `pytest xrspatial/tests/test_reproject.py -v -x`
-Expected: All PASS
-
-- [ ] **Step 6: Commit**
-
-```bash
-git add xrspatial/reproject/_grid.py xrspatial/tests/test_lite_crs.py
-git commit -m "Use Numba scatter-point transform for grid boundary estimation (#1057)"
-```
-
----
-
-### Task 6: Update chunk functions and `_source_footprint_in_target`
-
-**Files:**
-- Modify: `xrspatial/reproject/__init__.py`
-- Modify: `xrspatial/tests/test_lite_crs.py`
-
-- [ ] **Step 1: Write failing integration test**
-
-Add to `test_lite_crs.py`:
-
-```python
-class TestReprojWithLiteCRS:
-    def test_reproject_wgs84_to_utm_with_lite_crs(self):
-        """Full reproject() works when _resolve_crs returns a lite CRS."""
-        import xarray as xr
-        from xrspatial.reproject import reproject
-        from xrspatial.reproject._lite_crs import CRS as LiteCRS
-        import numpy as np
-        # Small test raster in WGS84
-        h, w = 32, 32
-        y = np.linspace(49, 47, h)
-        x = np.linspace(8, 10, w)
-        data = np.random.default_rng(42).random((h, w))
-        raster = xr.DataArray(
-            data, dims=['y', 'x'],
-            coords={'y': y, 'x': x},
-            attrs={'crs': 4326},
-        )
-        result = reproject(raster, target_crs=32632)
-        assert result.attrs['crs'] is not None
-        assert result.shape[0] > 0 and result.shape[1] > 0
-```
-
-- [ ] **Step 2: Modify `_reproject_chunk_numpy` to use `_crs_from_wkt`**
-
-In `xrspatial/reproject/__init__.py`, change lines 195-199:
-
-Before:
-```python
-    from ._crs_utils import _require_pyproj
-    pyproj = _require_pyproj()
-    src_crs = pyproj.CRS.from_wkt(src_wkt)
-    tgt_crs = pyproj.CRS.from_wkt(tgt_wkt)
-```
-
-After:
-```python
-    from ._crs_utils import _crs_from_wkt
-    src_crs = _crs_from_wkt(src_wkt)
-    tgt_crs = _crs_from_wkt(tgt_wkt)
-```
-
-Also update the fallback Transformer creation (lines 214-217) to conditionally import pyproj only when needed:
-
-```python
-        from ._crs_utils import _require_pyproj
-        pyproj = _require_pyproj()
-        transformer = pyproj.Transformer.from_crs(
-            tgt_crs, src_crs, always_xy=True
-        )
-```
-
-- [ ] **Step 3: Restructure `_reproject_chunk_cupy` to defer Transformer**
-
-In `xrspatial/reproject/__init__.py`, change lines 324-332:
-
-Before:
-```python
-    from ._crs_utils import _require_pyproj
-    pyproj = _require_pyproj()
-    src_crs = pyproj.CRS.from_wkt(src_wkt)
-    tgt_crs = pyproj.CRS.from_wkt(tgt_wkt)
-    transformer = pyproj.Transformer.from_crs(
-        tgt_crs, src_crs, always_xy=True
-    )
-```
-
-After:
-```python
-    from ._crs_utils import _crs_from_wkt
-    src_crs = _crs_from_wkt(src_wkt)
-    tgt_crs = _crs_from_wkt(tgt_wkt)
-```
-
-Move the Transformer creation into the `else` block (line 372, CPU fallback):
-
-```python
-    else:
-        # CPU fallback (Numba JIT or pyproj)
-        from ._crs_utils import _require_pyproj
-        pyproj = _require_pyproj()
-        transformer = pyproj.Transformer.from_crs(
-            tgt_crs, src_crs, always_xy=True
-        )
-        src_y, src_x = _transform_coords(
-            transformer, chunk_bounds_tuple, chunk_shape, transform_precision,
-            src_crs=src_crs, tgt_crs=tgt_crs,
-        )
-```
-
-- [ ] **Step 4: Update `_source_footprint_in_target`**
-
-In `xrspatial/reproject/__init__.py`, change the function at line 1122:
-
-```python
-def _source_footprint_in_target(src_bounds, src_wkt, tgt_wkt):
-    """Compute approximate bounding box of source raster in target CRS."""
-    try:
-        from ._crs_utils import _crs_from_wkt
-        src_crs = _crs_from_wkt(src_wkt)
-        tgt_crs = _crs_from_wkt(tgt_wkt)
-    except Exception:
-        return None
-
-    sl, sb, sr, st = src_bounds
-    mx = (sl + sr) / 2
-    my = (sb + st) / 2
-    xs = np.array([sl, mx, sr, sl, mx, sr, sl, mx, sr, sl, sr, mx])
-    ys = np.array([sb, sb, sb, my, my, my, st, st, st, mx, mx, sb])
-
-    try:
-        # Try Numba scatter-point transform first
-        from ._projections import transform_points
-        result = transform_points(src_crs, tgt_crs, xs, ys)
-        if result is not None:
-            tx, ty = result
-            tx = [v for v in tx if np.isfinite(v)]
-            ty = [v for v in ty if np.isfinite(v)]
-            if not tx or not ty:
-                return None
-            return (min(tx), min(ty), max(tx), max(ty))
-    except (ImportError, ModuleNotFoundError):
-        pass
-
-    # Fall back to pyproj Transformer
-    try:
-        from ._crs_utils import _require_pyproj
-        pyproj = _require_pyproj()
-        transformer = pyproj.Transformer.from_crs(
-            src_crs, tgt_crs, always_xy=True
-        )
-        tx, ty = transformer.transform(xs.tolist(), ys.tolist())
-        tx = [v for v in tx if np.isfinite(v)]
-        ty = [v for v in ty if np.isfinite(v)]
-        if not tx or not ty:
-            return None
-        return (min(tx), min(ty), max(tx), max(ty))
-    except Exception:
-        return None
-```
-
-- [ ] **Step 5: Remove unconditional `_require_pyproj()` guards in `reproject()` and `merge()`**
-
-In `xrspatial/reproject/__init__.py`:
-
-1. **`reproject()` at line 525**: Remove the `_require_pyproj()` call. The `_resolve_crs()` call on line 528 handles importing pyproj only if needed.
-
-2. **`merge()` at line 1308**: Same pattern -- remove the unconditional `_require_pyproj()` call. The chunk functions and `_source_footprint_in_target` already handle their own fallback to pyproj when needed.
-
-Also remove the now-unused `from ._crs_utils import _require_pyproj` import at lines 516 and 1301 (these functions no longer call `_require_pyproj` directly).
-
-- [ ] **Step 6: Run tests**
-
-Run: `pytest xrspatial/tests/test_lite_crs.py xrspatial/tests/test_reproject.py -v -x`
-Expected: All PASS
-
-- [ ] **Step 7: Commit**
-
-```bash
-git add xrspatial/reproject/__init__.py xrspatial/tests/test_lite_crs.py
-git commit -m "Wire chunk functions and merge helper to use lite CRS (#1057)"
-```
-
----
-
-### Task 7: Integration test -- reprojection without pyproj
-
-**Files:**
-- Modify: `xrspatial/tests/test_lite_crs.py`
-
-- [ ] **Step 1: Write integration test that mocks pyproj as unavailable**
-
-```python
-class TestNoPyproj:
-    """Verify reprojection works for supported CRS pairs when pyproj is absent."""
-
-    def test_reproject_without_pyproj(self, monkeypatch):
-        """Reprojection should work without pyproj for known EPSG codes."""
-        import sys
-        import xarray as xr
-        from xrspatial.reproject._lite_crs import CRS
-        import numpy as np
-
-        # Block pyproj import
-        monkeypatch.setitem(sys.modules, 'pyproj', None)
-
-        # Force fresh imports to pick up blocked pyproj
-        from xrspatial.reproject._crs_utils import _resolve_crs, _crs_from_wkt
-
-        src_crs = CRS(4326)
-        tgt_crs = CRS(3857)
-        assert src_crs.to_epsg() == 4326
-        assert tgt_crs.is_geographic is False
-
-        # _crs_from_wkt round-trips without pyproj
-        wkt = src_crs.to_wkt()
-        restored = _crs_from_wkt(wkt)
-        assert restored.to_epsg() == 4326
-
-    def test_unknown_epsg_without_pyproj_raises(self, monkeypatch):
-        """Unknown EPSG codes raise clear error when pyproj is absent."""
-        import sys
-        monkeypatch.setitem(sys.modules, 'pyproj', None)
-        from xrspatial.reproject._crs_utils import _resolve_crs
-        with pytest.raises((ImportError, ValueError)):
-            _resolve_crs(2193)
-```
-
-- [ ] **Step 2: Run tests**
-
-Run: `pytest xrspatial/tests/test_lite_crs.py::TestNoPyproj -v`
-Expected: All PASS
-
-- [ ] **Step 3: Commit**
-
-```bash
-git add xrspatial/tests/test_lite_crs.py
-git commit -m "Add integration tests for pyproj-free CRS resolution (#1057)"
-```
-
----
-
-### Task 8: Final validation and cleanup
-
-**Files:**
-- All modified files
-
-- [ ] **Step 1: Run full test suite**
-
-Run: `pytest xrspatial/tests/test_lite_crs.py xrspatial/tests/test_reproject.py -v`
-Expected: All PASS
-
-- [ ] **Step 2: Verify no other pyproj imports leaked**
-
-Check that the chunk functions and grid code only import pyproj via `_require_pyproj()` or `_crs_from_wkt()`, not directly:
-
-Run: `grep -n "import pyproj" xrspatial/reproject/__init__.py xrspatial/reproject/_grid.py`
-Expected: No direct `import pyproj` statements in these files
-
-- [ ] **Step 3: Commit any final cleanups**
-
-```bash
-git add -u
-git commit -m "Clean up pyproj imports in reproject module (#1057)"
-```
diff --git a/docs/superpowers/plans/2026-03-24-dask-graph-utilities.md b/docs/superpowers/plans/2026-03-24-dask-graph-utilities.md
deleted file mode 100644
index 759211c1..00000000
--- a/docs/superpowers/plans/2026-03-24-dask-graph-utilities.md
+++ /dev/null
@@ -1,1057 +0,0 @@
-# Dask Graph Utilities Implementation Plan
-
-> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
-
-**Goal:** Add `fused_overlap` and `multi_overlap` utilities that reduce dask graph size by fusing multiple `map_overlap` calls into single passes.
-
-**Architecture:** Two standalone functions in `xrspatial/utils.py`. `fused_overlap` composes sequential overlap stages into one `map_overlap` call with summed depth. `multi_overlap` runs a multi-output kernel via `da.overlap.overlap()` + `da.map_blocks()` with `new_axis=0`. Both are exposed on the DataArray `.xrs` accessor.
-
-**Tech Stack:** dask.array (map_overlap, overlap.overlap, map_blocks), numpy, xarray
-
-**Spec:** `docs/superpowers/specs/2026-03-24-dask-graph-utilities-design.md`
-
----
-
-## File structure
-
-| File | Role |
-|------|------|
-| `xrspatial/utils.py` | Add `_normalize_depth`, `_pad_nan`, `fused_overlap`, `multi_overlap` after existing `rechunk_no_shuffle` (line 1102) |
-| `xrspatial/accessor.py` | Add `fused_overlap`, `multi_overlap` to DataArray accessor after `rechunk_no_shuffle` (line 501) |
-| `xrspatial/__init__.py` | Export both functions |
-| `xrspatial/tests/test_fused_overlap.py` | New test file |
-| `xrspatial/tests/test_multi_overlap.py` | New test file |
-| `xrspatial/tests/test_accessor.py` | Add to expected methods list (line 93) |
-| `docs/source/reference/utilities.rst` | Add API entries |
-| `README.md` | Add rows to Utilities table (line 289) |
-| `examples/user_guide/37_Fused_Overlap.ipynb` | New notebook |
-
----
-
-### Task 1: `_normalize_depth` and `_pad_nan` helpers
-
-**Files:**
-- Modify: `xrspatial/utils.py` (append after line 1102)
-- Test: `xrspatial/tests/test_fused_overlap.py` (new)
-
-- [ ] **Step 1: Write failing tests for `_normalize_depth`**
-
-Create `xrspatial/tests/test_fused_overlap.py`:
-
-```python
-"""Tests for fused_overlap and helpers."""
-
-import numpy as np
-import pytest
-import xarray as xr
-
-from xrspatial.utils import _normalize_depth, _pad_nan
-
-
-class TestNormalizeDepth:
-    def test_int_input(self):
-        assert _normalize_depth(2, ndim=2) == {0: 2, 1: 2}
-
-    def test_tuple_input(self):
-        assert _normalize_depth((3, 1), ndim=2) == {0: 3, 1: 1}
-
-    def test_dict_input(self):
-        assert _normalize_depth({0: 2, 1: 4}, ndim=2) == {0: 2, 1: 4}
-
-    def test_dict_missing_axis_raises(self):
-        with pytest.raises(ValueError, match="missing axes"):
-            _normalize_depth({0: 1}, ndim=2)
-
-    def test_dict_extra_axis_raises(self):
-        with pytest.raises(ValueError, match="extra axes"):
-            _normalize_depth({0: 1, 1: 1, 2: 1}, ndim=2)
-
-    def test_negative_depth_raises(self):
-        with pytest.raises(ValueError, match="non-negative"):
-            _normalize_depth(-1, ndim=2)
-
-    def test_tuple_wrong_length_raises(self):
-        with pytest.raises(ValueError, match="length"):
-            _normalize_depth((1, 2, 3), ndim=2)
-
-
-class TestPadNan:
-    def test_2d_pads_with_nan(self):
-        data = np.ones((4, 4), dtype=np.float32)
-        result = _pad_nan(data, depth=(1, 1))
-        assert result.shape == (6, 6)
-        assert np.isnan(result[0, 0])
-        np.testing.assert_array_equal(result[1:-1, 1:-1], data)
-
-    def test_asymmetric_depth(self):
-        data = np.ones((4, 4), dtype=np.float32)
-        result = _pad_nan(data, depth=(2, 1))
-        assert result.shape == (8, 6)
-
-    def test_integer_dtype_promotes_to_float(self):
-        data = np.ones((4, 4), dtype=np.int32)
-        result = _pad_nan(data, depth=(1, 1))
-        assert np.issubdtype(result.dtype, np.floating)
-```
-
-- [ ] **Step 2: Run tests to verify they fail**
-
-Run: `pytest xrspatial/tests/test_fused_overlap.py -v`
-Expected: FAIL with ImportError
-
-- [ ] **Step 3: Implement `_normalize_depth` and `_pad_nan`**
-
-Append to `xrspatial/utils.py` after line 1102:
-
-```python
-def _normalize_depth(depth, ndim):
-    """Normalize depth to a dict {axis: int}.
-
-    Accepts int, tuple, or dict.  Validates completeness and
-    non-negativity.
-    """
-    if isinstance(depth, dict):
-        expected = set(range(ndim))
-        got = set(depth.keys())
-        missing = expected - got
-        extra = got - expected
-        if missing:
-            raise ValueError(
-                f"_normalize_depth: missing axes {sorted(missing)} "
-                f"for ndim={ndim}"
-            )
-        if extra:
-            raise ValueError(
-                f"_normalize_depth: extra axes {sorted(extra)} "
-                f"for ndim={ndim}"
-            )
-        for v in depth.values():
-            if v < 0:
-                raise ValueError(
-                    f"_normalize_depth: depth must be non-negative, got {v}"
-                )
-        return dict(depth)
-
-    if isinstance(depth, int):
-        if depth < 0:
-            raise ValueError(
-                f"_normalize_depth: depth must be non-negative, got {depth}"
-            )
-        return {ax: depth for ax in range(ndim)}
-
-    if isinstance(depth, tuple):
-        if len(depth) != ndim:
-            raise ValueError(
-                f"_normalize_depth: tuple length {len(depth)} != ndim {ndim}"
-            )
-        for v in depth:
-            if v < 0:
-                raise ValueError(
-                    f"_normalize_depth: depth must be non-negative, got {v}"
-                )
-        return {ax: d for ax, d in enumerate(depth)}
-
-    raise TypeError(
-        f"_normalize_depth: expected int, tuple, or dict, got {type(depth).__name__}"
-    )
-
-
-def _pad_nan(data, depth):
-    """Pad a 2-D numpy or cupy array with NaN on each side.
-
-    Parameters
-    ----------
-    data : numpy or cupy array
-    depth : tuple of int
-        ``(d0, d1)`` cells to pad per axis.
-    """
-    pad_width = tuple((d, d) for d in depth)
-    if is_cupy_array(data):
-        if np.issubdtype(data.dtype, np.integer):
-            data = data.astype(cupy.float64)
-        out = cupy.pad(data, pad_width, mode='constant',
-                       constant_values=np.nan)
-    else:
-        # Promote integer dtypes so NaN fill works
-        if np.issubdtype(data.dtype, np.integer):
-            data = data.astype(np.float64)
-        out = np.pad(data, pad_width, mode='constant',
-                     constant_values=np.nan)
-    return out
-```
-
-- [ ] **Step 4: Run tests to verify they pass**
-
-Run: `pytest xrspatial/tests/test_fused_overlap.py -v`
-Expected: all 10 tests PASS
-
-- [ ] **Step 5: Commit**
-
-```bash
-git add xrspatial/utils.py xrspatial/tests/test_fused_overlap.py
-git commit -m "Add _normalize_depth and _pad_nan helpers"
-```
-
----
-
-### Task 2: `fused_overlap` implementation
-
-**Files:**
-- Modify: `xrspatial/utils.py` (append after `_pad_nan`)
-- Modify: `xrspatial/tests/test_fused_overlap.py` (add tests)
-
-- [ ] **Step 1: Write failing tests for `fused_overlap`**
-
-Append to `xrspatial/tests/test_fused_overlap.py`:
-
-```python
-da = pytest.importorskip("dask.array")
-
-
-def _increment_interior(chunk):
-    """Stage func: adds 1 to every cell. Returns interior only."""
-    # depth=1 means chunk is (H+2, W+2), interior is (H, W)
-    return chunk[1:-1, 1:-1] + 1
-
-
-def _double_interior(chunk):
-    """Stage func: doubles every cell. Returns interior only."""
-    return chunk[1:-1, 1:-1] * 2
-
-
-def _make_dask_raster(shape=(64, 64), chunks=16, dtype=np.float32):
-    data = da.from_array(
-        np.random.RandomState(42).rand(*shape).astype(dtype), chunks=chunks
-    )
-    return xr.DataArray(data, dims=['y', 'x'])
-
-
-class TestFusedOverlapDask:
-    def test_single_stage_matches_map_overlap(self):
-        from xrspatial.utils import fused_overlap
-        raster = _make_dask_raster()
-
-        fused = fused_overlap(raster, (_increment_interior, 1))
-
-        # Sequential reference
-        ref = raster.data.map_overlap(
-            _increment_interior, depth=1, boundary=np.nan,
-            meta=np.array(()),
-        )
-        np.testing.assert_array_equal(fused.values, ref.compute())
-
-    def test_two_stages_match_sequential(self):
-        from xrspatial.utils import fused_overlap
-        raster = _make_dask_raster()
-
-        fused = fused_overlap(
-            raster,
-            (_increment_interior, 1),
-            (_double_interior, 1),
-        )
-
-        # Sequential reference
-        step1 = raster.data.map_overlap(
-            _increment_interior, depth=1, boundary=np.nan,
-            meta=np.array(()),
-        )
-        ref = step1.map_overlap(
-            _double_interior, depth=1, boundary=np.nan,
-            meta=np.array(()),
-        )
-        np.testing.assert_array_equal(fused.values, ref.compute())
-
-    def test_three_stages(self):
-        from xrspatial.utils import fused_overlap
-        raster = _make_dask_raster()
-
-        fused = fused_overlap(
-            raster,
-            (_increment_interior, 1),
-            (_double_interior, 1),
-            (_increment_interior, 1),
-        )
-
-        step1 = raster.data.map_overlap(
-            _increment_interior, depth=1, boundary=np.nan,
-            meta=np.array(()),
-        )
-        step2 = step1.map_overlap(
-            _double_interior, depth=1, boundary=np.nan,
-            meta=np.array(()),
-        )
-        ref = step2.map_overlap(
-            _increment_interior, depth=1, boundary=np.nan,
-            meta=np.array(()),
-        )
-        np.testing.assert_array_equal(fused.values, ref.compute())
-
-    def test_nonsquare_depth(self):
-        from xrspatial.utils import fused_overlap
-
-        def _stage_2_1(chunk):
-            return chunk[2:-2, 1:-1] + 1
-
-        raster = _make_dask_raster(shape=(64, 64), chunks=32)
-        fused = fused_overlap(raster, (_stage_2_1, (2, 1)))
-
-        ref = raster.data.map_overlap(
-            _stage_2_1, depth=(2, 1), boundary=np.nan,
-            meta=np.array(()),
-        )
-        np.testing.assert_array_equal(fused.values, ref.compute())
-
-    def test_returns_dataarray(self):
-        from xrspatial.utils import fused_overlap
-        raster = _make_dask_raster()
-        result = fused_overlap(raster, (_increment_interior, 1))
-        assert isinstance(result, xr.DataArray)
-
-    def test_fewer_graph_layers_than_sequential(self):
-        from xrspatial.utils import fused_overlap
-        raster = _make_dask_raster()
-
-        fused = fused_overlap(
-            raster,
-            (_increment_interior, 1),
-            (_double_interior, 1),
-        )
-
-        step1 = raster.data.map_overlap(
-            _increment_interior, depth=1, boundary=np.nan,
-            meta=np.array(()),
-        )
-        sequential = step1.map_overlap(
-            _double_interior, depth=1, boundary=np.nan,
-            meta=np.array(()),
-        )
-        assert len(dict(fused.data.__dask_graph__())) < len(
-            dict(sequential.__dask_graph__())
-        )
-
-
-class TestFusedOverlapNumpy:
-    def test_numpy_fallback_matches_dask(self):
-        from xrspatial.utils import fused_overlap
-        np_raster = xr.DataArray(
-            np.random.RandomState(42).rand(64, 64).astype(np.float32),
-            dims=['y', 'x'],
-        )
-        dask_raster = np_raster.chunk(16)
-
-        np_result = fused_overlap(
-            np_raster,
-            (_increment_interior, 1),
-            (_double_interior, 1),
-        )
-        dask_result = fused_overlap(
-            dask_raster,
-            (_increment_interior, 1),
-            (_double_interior, 1),
-        )
-        # Compare interior (edges may differ due to NaN propagation,
-        # but interior cells should match)
-        np.testing.assert_array_equal(
-            np_result.values[2:-2, 2:-2],
-            dask_result.values[2:-2, 2:-2],
-        )
-
-
-class TestFusedOverlapValidation:
-    def test_rejects_non_nan_boundary(self):
-        from xrspatial.utils import fused_overlap
-        raster = _make_dask_raster()
-        with pytest.raises(ValueError, match="boundary.*nan"):
-            fused_overlap(raster, (_increment_interior, 1), boundary='nearest')
-
-    def test_rejects_empty_stages(self):
-        from xrspatial.utils import fused_overlap
-        raster = _make_dask_raster()
-        with pytest.raises(ValueError, match="at least one stage"):
-            fused_overlap(raster)
-
-    def test_rejects_non_dataarray(self):
-        from xrspatial.utils import fused_overlap
-        with pytest.raises(TypeError):
-            fused_overlap(np.zeros((10, 10)), (_increment_interior, 1))
-
-    def test_rejects_chunks_smaller_than_total_depth(self):
-        from xrspatial.utils import fused_overlap
-        raster = _make_dask_raster(shape=(32, 32), chunks=4)
-        # total_depth = 5, chunks = 4
-        def _big_depth(chunk):
-            return chunk[5:-5, 5:-5] + 1
-        with pytest.raises(ValueError, match="[Cc]hunk size"):
-            fused_overlap(raster, (_big_depth, 5))
-
-    def test_small_chunks_barely_above_total_depth(self):
-        """Chunks just barely larger than total_depth should work."""
-        from xrspatial.utils import fused_overlap
-        # chunks=6, total_depth=2 (two stages of depth 1)
-        raster = _make_dask_raster(shape=(24, 24), chunks=6)
-        result = fused_overlap(
-            raster,
-            (_increment_interior, 1),
-            (_double_interior, 1),
-        )
-        assert result.shape == (24, 24)
-```
-
-- [ ] **Step 2: Run tests to verify they fail**
-
-Run: `pytest xrspatial/tests/test_fused_overlap.py::TestFusedOverlapDask -v`
-Expected: FAIL with ImportError (`fused_overlap` not found)
-
-- [ ] **Step 3: Implement `fused_overlap`**
-
-Append to `xrspatial/utils.py` after `_pad_nan`:
-
-```python
-def fused_overlap(agg, *stages, boundary='nan'):
-    """Run multiple overlap operations in a single map_overlap call.
-
-    Each stage is a ``(func, depth)`` pair. ``func`` receives a padded
-    chunk and returns the unpadded interior result.  Stages are fused
-    into one ``map_overlap`` call with the sum of all depths, producing
-    one blockwise graph layer instead of N.
-
-    Parameters
-    ----------
-    agg : xr.DataArray
-        Input raster.
-    *stages : tuple of (callable, depth)
-        Each ``func`` takes array ``(H+2*d, W+2*d)`` -> ``(H, W)``.
-        ``depth`` is int, tuple, or dict.
-    boundary : str
-        Must be ``'nan'``.
-
-    Returns
-    -------
-    xr.DataArray
-    """
-    if not isinstance(agg, xr.DataArray):
-        raise TypeError(
-            f"fused_overlap(): expected xr.DataArray, "
-            f"got {type(agg).__name__}"
-        )
-    if not stages:
-        raise ValueError("fused_overlap(): need at least one stage")
-    if boundary != 'nan':
-        raise ValueError(
-            f"fused_overlap(): boundary must be 'nan', got {boundary!r}"
-        )
-
-    ndim = agg.ndim
-
-    # Normalize and sum depths
-    stage_depths = [_normalize_depth(d, ndim) for _, d in stages]
-    total_depth = {ax: sum(sd[ax] for sd in stage_depths)
-                   for ax in range(ndim)}
-
-    # --- non-dask path ---
-    if not has_dask_array() or not isinstance(agg.data, da.Array):
-        result = agg.data
-        for i, (func, _) in enumerate(stages):
-            depth_tuple = tuple(stage_depths[i][ax] for ax in range(ndim))
-            padded = _pad_nan(result, depth_tuple)
-            result = func(padded)
-        return agg.copy(data=result)
-
-    # --- dask path ---
-    # Validate chunk sizes
-    for ax, d in total_depth.items():
-        for cs in agg.chunks[ax]:
-            if cs < d:
-                raise ValueError(
-                    f"Chunk size {cs} on axis {ax} is smaller than "
-                    f"total depth {d}. Rechunk first."
-                )
-
-    funcs = [f for f, _ in stages]
-
-    def _fused_wrapper(block):
-        result = block
-        for func in funcs:
-            result = func(result)
-            # result is now the interior; it still has enough valid
-            # overlap for the remaining stages
-        # re-pad to original block shape so map_overlap can crop
-        pad_width = tuple((total_depth[ax], total_depth[ax])
-                          for ax in range(block.ndim))
-        if is_cupy_array(result):
-            padded = cupy.pad(result, pad_width, mode='constant',
-                              constant_values=np.nan)
-        else:
-            padded = np.pad(result, pad_width, mode='constant',
-                            constant_values=np.nan)
-        return padded
-
-    out = agg.data.map_overlap(
-        _fused_wrapper,
-        depth=total_depth,
-        boundary=np.nan,
-        meta=np.array(()),
-    )
-
-    return agg.copy(data=out)
-```
-
-- [ ] **Step 4: Run tests to verify they pass**
-
-Run: `pytest xrspatial/tests/test_fused_overlap.py -v`
-Expected: all tests PASS
-
-- [ ] **Step 5: Commit**
-
-```bash
-git add xrspatial/utils.py xrspatial/tests/test_fused_overlap.py
-git commit -m "Add fused_overlap utility"
-```
-
----
-
-### Task 3: `multi_overlap` implementation
-
-**Files:**
-- Modify: `xrspatial/utils.py` (append after `fused_overlap`)
-- Create: `xrspatial/tests/test_multi_overlap.py`
-
-- [ ] **Step 1: Write failing tests for `multi_overlap`**
-
-Create `xrspatial/tests/test_multi_overlap.py`:
-
-```python
-"""Tests for multi_overlap."""
-
-import numpy as np
-import pytest
-import xarray as xr
-
-from xrspatial.utils import multi_overlap
-
-da = pytest.importorskip("dask.array")
-
-
-def _triple_kernel(chunk):
-    """Return 3 bands from a padded (H+2, W+2) chunk."""
-    interior = chunk[1:-1, 1:-1]
-    return np.stack([interior + 1, interior * 2, interior - 1], axis=0)
-
-
-def _make_dask_raster(shape=(64, 64), chunks=16, dtype=np.float32):
-    data = da.from_array(
-        np.random.RandomState(99).rand(*shape).astype(dtype), chunks=chunks
-    )
-    return xr.DataArray(
-        data, dims=['y', 'x'],
-        coords={'y': np.arange(shape[0]), 'x': np.arange(shape[1])},
-        attrs={'crs': 'EPSG:4326'},
-    )
-
-
-class TestMultiOverlapDask:
-    def test_matches_sequential_stack(self):
-        raster = _make_dask_raster()
-
-        multi = multi_overlap(raster, _triple_kernel, n_outputs=3, depth=1)
-
-        # Sequential reference: run kernel 3 times extracting one band each
-        def _band_i(chunk, i=0):
-            return _triple_kernel(chunk)[i]
-
-        bands = []
-        for i in range(3):
-            from functools import partial
-            b = raster.data.map_overlap(
-                partial(_band_i, i=i), depth=1, boundary=np.nan,
-                meta=np.array(()),
-            )
-            bands.append(b)
-        ref = da.stack(bands, axis=0).compute()
-
-        np.testing.assert_array_equal(multi.values, ref)
-
-    def test_output_shape(self):
-        raster = _make_dask_raster(shape=(32, 32), chunks=16)
-        result = multi_overlap(raster, _triple_kernel, n_outputs=3, depth=1)
-        assert result.shape == (3, 32, 32)
-
-    def test_returns_dataarray_with_band_dim(self):
-        raster = _make_dask_raster()
-        result = multi_overlap(raster, _triple_kernel, n_outputs=3, depth=1)
-        assert isinstance(result, xr.DataArray)
-        assert result.dims[0] == 'band'
-        assert result.dims[1] == 'y'
-        assert result.dims[2] == 'x'
-
-    def test_preserves_coords_and_attrs(self):
-        raster = _make_dask_raster()
-        result = multi_overlap(raster, _triple_kernel, n_outputs=3, depth=1)
-        assert result.attrs == raster.attrs
-        xr.testing.assert_equal(
-            result.coords['x'], raster.coords['x']
-        )
-
-    def test_explicit_dtype(self):
-        raster = _make_dask_raster()
-        result = multi_overlap(
-            raster, _triple_kernel, n_outputs=3, depth=1,
-            dtype=np.float64,
-        )
-        assert result.dtype == np.float64
-
-    def test_fewer_graph_tasks_than_sequential(self):
-        raster = _make_dask_raster()
-        multi = multi_overlap(raster, _triple_kernel, n_outputs=3, depth=1)
-
-        from functools import partial
-        def _band_i(chunk, i=0):
-            return _triple_kernel(chunk)[i]
-
-        bands = []
-        for i in range(3):
-            b = raster.data.map_overlap(
-                partial(_band_i, i=i), depth=1, boundary=np.nan,
-                meta=np.array(()),
-            )
-            bands.append(b)
-        sequential = da.stack(bands, axis=0)
-
-        assert len(dict(multi.data.__dask_graph__())) < len(
-            dict(sequential.__dask_graph__())
-        )
-
-    def test_single_output_matches_map_overlap(self):
-        def _single_kernel(chunk):
-            return (chunk[1:-1, 1:-1] + 1)[np.newaxis, :]
-
-        raster = _make_dask_raster()
-        multi = multi_overlap(raster, _single_kernel, n_outputs=1, depth=1)
-
-        def _ref_func(chunk):
-            return chunk[1:-1, 1:-1] + 1
-        ref = raster.data.map_overlap(
-            _ref_func, depth=1, boundary=np.nan, meta=np.array(()),
-        )
-        np.testing.assert_array_equal(multi.values[0], ref.compute())
-
-    def test_dtype_inference_defaults_to_input(self):
-        raster = _make_dask_raster(dtype=np.float32)
-        result = multi_overlap(raster, _triple_kernel, n_outputs=3, depth=1)
-        assert result.dtype == np.float32
-
-    def test_non_nan_boundary(self):
-        """multi_overlap supports all boundary modes."""
-        raster = _make_dask_raster()
-        result = multi_overlap(
-            raster, _triple_kernel, n_outputs=3, depth=1,
-            boundary='nearest',
-        )
-        assert result.shape == (3, 64, 64)
-        assert not np.any(np.isnan(result.values))
-
-
-class TestMultiOverlapNumpy:
-    def test_numpy_fallback(self):
-        raster = xr.DataArray(
-            np.random.RandomState(99).rand(32, 32).astype(np.float32),
-            dims=['y', 'x'],
-        )
-        result = multi_overlap(raster, _triple_kernel, n_outputs=3, depth=1)
-        assert isinstance(result, xr.DataArray)
-        assert result.shape == (3, 32, 32)
-
-
-class TestMultiOverlapValidation:
-    def test_rejects_non_2d(self):
-        raster = xr.DataArray(da.zeros((4, 32, 32), chunks=16), dims=['z', 'y', 'x'])
-        with pytest.raises(ValueError, match="2-D"):
-            multi_overlap(raster, _triple_kernel, n_outputs=3, depth=1)
-
-    def test_rejects_n_outputs_zero(self):
-        raster = _make_dask_raster()
-        with pytest.raises(ValueError, match="n_outputs.*>= 1"):
-            multi_overlap(raster, _triple_kernel, n_outputs=0, depth=1)
-
-    def test_rejects_depth_zero(self):
-        raster = _make_dask_raster()
-        with pytest.raises(ValueError, match="depth.*>= 1"):
-            multi_overlap(raster, _triple_kernel, n_outputs=3, depth=0)
-
-    def test_rejects_chunks_smaller_than_depth(self):
-        raster = _make_dask_raster(shape=(32, 32), chunks=4)
-        with pytest.raises(ValueError, match="[Cc]hunk size"):
-            multi_overlap(raster, _triple_kernel, n_outputs=3, depth=5)
-
-    def test_rejects_non_dataarray(self):
-        with pytest.raises(TypeError):
-            multi_overlap(np.zeros((10, 10)), _triple_kernel, 3, 1)
-```
-
-- [ ] **Step 2: Run tests to verify they fail**
-
-Run: `pytest xrspatial/tests/test_multi_overlap.py -v`
-Expected: FAIL with ImportError
-
-- [ ] **Step 3: Implement `multi_overlap`**
-
-Append to `xrspatial/utils.py` after `fused_overlap`:
-
-```python
-def multi_overlap(agg, func, n_outputs, depth, boundary='nan', dtype=None):
-    """Run a multi-output kernel via a single overlap + map_blocks call.
-
-    ``func`` receives a padded 2-D chunk and returns
-    ``(n_outputs, H, W)`` — the unpadded interior for each output band.
-    The result is a 3-D DataArray with a leading ``band`` dimension.
-
-    Parameters
-    ----------
-    agg : xr.DataArray
-        2-D input raster.
-    func : callable
-        ``(H+2*dy, W+2*dx) -> (n_outputs, H, W)``
-    n_outputs : int
-        Number of output bands (>= 1).
-    depth : int or tuple of int
-        Per-axis overlap (>= 1 on each axis).
-    boundary : str
-        Boundary mode: 'nan', 'nearest', 'reflect', or 'wrap'.
-    dtype : numpy dtype, optional
-        Output dtype.  Defaults to input dtype.
-
-    Returns
-    -------
-    xr.DataArray
-        Shape ``(n_outputs, H, W)`` with ``band`` leading dimension.
-    """
-    if not isinstance(agg, xr.DataArray):
-        raise TypeError(
-            f"multi_overlap(): expected xr.DataArray, "
-            f"got {type(agg).__name__}"
-        )
-    if agg.ndim != 2:
-        raise ValueError(
-            f"multi_overlap(): input must be 2-D, got {agg.ndim}-D"
-        )
-    if n_outputs < 1:
-        raise ValueError(
-            f"multi_overlap(): n_outputs must be >= 1, got {n_outputs}"
-        )
-
-    _validate_boundary(boundary)
-
-    depth_dict = _normalize_depth(depth, agg.ndim)
-    for ax, d in depth_dict.items():
-        if d < 1:
-            raise ValueError(
-                f"multi_overlap(): depth must be >= 1, got {d} on axis {ax}"
-            )
-
-    dtype = dtype or agg.dtype
-
-    # --- non-dask path ---
-    if not has_dask_array() or not isinstance(agg.data, da.Array):
-        if boundary == 'nan':
-            depth_tuple = tuple(depth_dict[ax] for ax in range(agg.ndim))
-            padded = _pad_nan(agg.data, depth_tuple)
-        else:
-            depth_tuple = tuple(depth_dict[ax] for ax in range(agg.ndim))
-            padded = _pad_array(agg.data, depth_tuple, boundary)
-        result_data = func(padded).astype(dtype)
-        return xr.DataArray(
-            result_data,
-            dims=['band'] + list(agg.dims),
-            coords=agg.coords,
-            attrs=agg.attrs,
-        )
-
-    # --- dask path ---
-    import dask.array.overlap as _dask_overlap
-
-    _validate_boundary(boundary)
-    boundary_val = _boundary_to_dask(boundary, is_cupy=is_cupy_backed(agg))
-
-    # Validate chunk sizes
-    for ax, d in depth_dict.items():
-        for cs in agg.chunks[ax]:
-            if cs < d:
-                raise ValueError(
-                    f"Chunk size {cs} on axis {ax} is smaller than "
-                    f"depth {d}. Rechunk first."
-                )
-
-    # Step 1: pad with overlap
-    padded = _dask_overlap.overlap(
-        agg.data, depth=depth_dict, boundary=boundary_val
-    )
-
-    # Step 2: map_blocks — func returns (n_outputs, H, W) per block
-    out = da.map_blocks(
-        func,
-        padded,
-        dtype=dtype,
-        new_axis=0,
-        chunks=((n_outputs,),) + agg.data.chunks,
-    )
-
-    return xr.DataArray(
-        out,
-        dims=['band'] + list(agg.dims),
-        coords=agg.coords,
-        attrs=agg.attrs,
-    )
-```
-
-- [ ] **Step 4: Run tests to verify they pass**
-
-Run: `pytest xrspatial/tests/test_multi_overlap.py -v`
-Expected: all tests PASS
-
-- [ ] **Step 5: Commit**
-
-```bash
-git add xrspatial/utils.py xrspatial/tests/test_multi_overlap.py
-git commit -m "Add multi_overlap utility"
-```
-
----
-
-### Task 4: Accessor and exports
-
-**Files:**
-- Modify: `xrspatial/accessor.py` (line 501, after `rechunk_no_shuffle`)
-- Modify: `xrspatial/__init__.py` (after `rechunk_no_shuffle` import)
-- Modify: `xrspatial/tests/test_accessor.py` (line 93, expected methods list)
-
-- [ ] **Step 1: Write failing accessor tests**
-
-Append to `xrspatial/tests/test_fused_overlap.py`:
-
-```python
-class TestFusedOverlapAccessor:
-    def test_accessor_delegates(self):
-        import xrspatial  # noqa: F401
-        from xrspatial.utils import fused_overlap
-        raster = _make_dask_raster()
-        direct = fused_overlap(raster, (_increment_interior, 1))
-        via_acc = raster.xrs.fused_overlap((_increment_interior, 1))
-        np.testing.assert_array_equal(direct.values, via_acc.values)
-```
-
-Append to `xrspatial/tests/test_multi_overlap.py`:
-
-```python
-class TestMultiOverlapAccessor:
-    def test_accessor_delegates(self):
-        import xrspatial  # noqa: F401
-        raster = _make_dask_raster()
-        direct = multi_overlap(raster, _triple_kernel, n_outputs=3, depth=1)
-        via_acc = raster.xrs.multi_overlap(_triple_kernel, n_outputs=3, depth=1)
-        np.testing.assert_array_equal(direct.values, via_acc.values)
-```
-
-- [ ] **Step 2: Run accessor tests to verify they fail**
-
-Run: `pytest xrspatial/tests/test_fused_overlap.py::TestFusedOverlapAccessor -v`
-Expected: FAIL (method not found)
-
-- [ ] **Step 3: Add accessor methods**
-
-In `xrspatial/accessor.py`, after the `rechunk_no_shuffle` method (line 501), add:
-
-```python
-    def fused_overlap(self, *stages, **kwargs):
-        from .utils import fused_overlap
-        return fused_overlap(self._obj, *stages, **kwargs)
-
-    def multi_overlap(self, func, n_outputs, **kwargs):
-        from .utils import multi_overlap
-        return multi_overlap(self._obj, func, n_outputs, **kwargs)
-```
-
-- [ ] **Step 4: Add exports to `__init__.py`**
-
-After the existing `rechunk_no_shuffle` import line, add:
-
-```python
-from xrspatial.utils import fused_overlap  # noqa
-from xrspatial.utils import multi_overlap  # noqa
-```
-
-- [ ] **Step 5: Update expected methods list in test_accessor.py**
-
-In `xrspatial/tests/test_accessor.py`, find the `test_dataarray_accessor_has_expected_methods` function's expected list (around line 93). Add `'fused_overlap'` and `'multi_overlap'` to the list.
-
-- [ ] **Step 6: Run all tests**
-
-Run: `pytest xrspatial/tests/test_fused_overlap.py xrspatial/tests/test_multi_overlap.py xrspatial/tests/test_accessor.py -v`
-Expected: all PASS
-
-- [ ] **Step 7: Commit**
-
-```bash
-git add xrspatial/accessor.py xrspatial/__init__.py xrspatial/tests/test_accessor.py xrspatial/tests/test_fused_overlap.py xrspatial/tests/test_multi_overlap.py
-git commit -m "Add fused_overlap and multi_overlap to accessor and exports"
-```
-
----
-
-### Task 5: Documentation and README
-
-**Files:**
-- Modify: `docs/source/reference/utilities.rst`
-- Modify: `README.md` (line 289, Utilities table)
-
-- [ ] **Step 1: Add API entries to utilities.rst**
-
-In `docs/source/reference/utilities.rst`, before the `Rechunking` section, add:
-
-```rst
-Overlap Fusion
-==============
-.. autosummary::
-    :toctree: _autosummary
-
-    xrspatial.utils.fused_overlap
-    xrspatial.utils.multi_overlap
-```
-
-- [ ] **Step 2: Add README rows**
-
-In `README.md`, in the Utilities table (after the `rechunk_no_shuffle` row around line 289), add:
-
-```markdown
-| [fused_overlap](xrspatial/utils.py) | Fuse sequential map_overlap calls into a single pass | Custom | 🔄 | ✅️ | 🔄 | ✅️ |
-| [multi_overlap](xrspatial/utils.py) | Run multi-output kernel in a single overlap pass | Custom | 🔄 | ✅️ | 🔄 | ✅️ |
-```
-
-- [ ] **Step 3: Commit**
-
-```bash
-git add docs/source/reference/utilities.rst README.md
-git commit -m "Add fused_overlap and multi_overlap to docs and README"
-```
-
----
-
-### Task 6: User guide notebook
-
-**Files:**
-- Create: `examples/user_guide/37_Fused_Overlap.ipynb`
-
-- [ ] **Step 1: Create the notebook**
-
-Create `examples/user_guide/37_Fused_Overlap.ipynb` with these cells:
-
-**Cell 1 (markdown):**
-```
-# Fusing Overlap Operations
-
-When you chain spatial operations like erode then dilate, each one adds a
-blockwise layer to the dask graph. `fused_overlap` runs them in a single
-`map_overlap` call, and `multi_overlap` does the same for kernels that
-produce multiple output bands.
-```
-
-**Cell 2 (code):**
-```python
-import numpy as np
-import dask.array as da
-import xarray as xr
-import xrspatial
-from xrspatial.utils import fused_overlap, multi_overlap
-```
-
-**Cell 3 (markdown):**
-```
-## fused_overlap: chained operations in one pass
-
-Define two stage functions. Each takes a padded chunk and returns the
-unpadded interior.
-```
-
-**Cell 4 (code):**
-```python
-def smooth_interior(chunk):
-    """3x3 mean filter. Takes (H+2, W+2), returns (H, W)."""
-    from numpy.lib.stride_tricks import sliding_window_view
-    windows = sliding_window_view(chunk, (3, 3))
-    return np.nanmean(windows, axis=(-2, -1))
-
-def threshold_interior(chunk):
-    """Binary threshold. Takes (H+2, W+2), returns (H, W)."""
-    interior = chunk[1:-1, 1:-1]
-    return (interior > 0.5).astype(np.float32)
-
-np.random.seed(42)
-raw = np.random.rand(512, 512).astype(np.float32)
-dem = xr.DataArray(da.from_array(raw, chunks=128), dims=['y', 'x'])
-```
-
-**Cell 5 (code):**
-```python
-# Fused: one map_overlap call
-fused = fused_overlap(dem, (smooth_interior, 1), (threshold_interior, 1))
-
-# Sequential: two map_overlap calls
-step1 = dem.data.map_overlap(smooth_interior, depth=1, boundary=np.nan, meta=np.array(()))
-sequential = step1.map_overlap(threshold_interior, depth=1, boundary=np.nan, meta=np.array(()))
-
-print(f'Fused graph:      {len(dict(fused.data.__dask_graph__())):,} tasks')
-print(f'Sequential graph: {len(dict(sequential.__dask_graph__())):,} tasks')
-```
-
-**Cell 6 (markdown):**
-```
-## multi_overlap: N outputs in one pass
-```
-
-**Cell 7 (code):**
-```python
-def gradient_kernel(chunk):
-    """Compute dx and dy gradients. Takes (H+2, W+2), returns (2, H, W)."""
-    dx = (chunk[1:-1, 2:] - chunk[1:-1, :-2]) / 2.0
-    dy = (chunk[2:, 1:-1] - chunk[:-2, 1:-1]) / 2.0
-    return np.stack([dx, dy], axis=0)
-
-result = multi_overlap(dem, gradient_kernel, n_outputs=2, depth=1)
-print(f'Output shape: {result.shape}')
-print(f'Dimensions:   {result.dims}')
-print(f'Graph tasks:  {len(dict(result.data.__dask_graph__())):,}')
-```
-
-**Cell 8 (code):**
-```python
-# Accessor syntax works too
-fused_acc = dem.xrs.fused_overlap((smooth_interior, 1), (threshold_interior, 1))
-multi_acc = dem.xrs.multi_overlap(gradient_kernel, n_outputs=2, depth=1)
-print('Accessor: OK')
-```
-
-- [ ] **Step 2: Commit**
-
-```bash
-git add examples/user_guide/37_Fused_Overlap.ipynb
-git commit -m "Add fused_overlap and multi_overlap user guide notebook"
-```
-
----
-
-### Task 7: Final verification
-
-- [ ] **Step 1: Run all new tests**
-
-```bash
-pytest xrspatial/tests/test_fused_overlap.py xrspatial/tests/test_multi_overlap.py xrspatial/tests/test_accessor.py -v
-```
-
-Expected: all PASS
-
-- [ ] **Step 2: Run existing test suite to check for regressions**
-
-```bash
-pytest xrspatial/tests/test_rechunk_no_shuffle.py -v
-```
-
-Expected: all PASS (no regressions in utils.py)
diff --git a/docs/superpowers/plans/2026-03-24-hypsometric-integral.md b/docs/superpowers/plans/2026-03-24-hypsometric-integral.md
deleted file mode 100644
index df0eacea..00000000
--- a/docs/superpowers/plans/2026-03-24-hypsometric-integral.md
+++ /dev/null
@@ -1,659 +0,0 @@
-# Hypsometric Integral Implementation Plan
-
-> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
-
-**Goal:** Add a `hypsometric_integral(zones, values)` function to `xrspatial/zonal.py` that computes HI = (mean - min) / (max - min) per zone and paints the result back into a raster.
-
-**Architecture:** A new public function in `zonal.py` with four backend implementations (numpy, cupy, dask+numpy, dask+cupy) following the existing `ArrayTypeFunctionMapping` dispatch pattern. The numpy path uses `np.unique` + boolean masking per zone (simpler than `_sort_and_stride` since we only need min/mean/max; `_sort_and_stride` exists for the generic multi-stat case in `stats()`). The cupy path transfers to host and reuses the numpy path (same pattern as `_stats_dask_cupy`; spec's "device-side scatter/gather" is aspirational, host transfer is fine for this use case). The dask path uses module-level `@delayed` functions for per-block aggregation (min/max/sum/count), reduces to global per-zone HI, then `map_blocks` to paint back (preserving chunk structure). Exposed via the `.xrs` accessor and the top-level `xrspatial` namespace.
-
-**Tech Stack:** numpy, cupy (optional), dask (optional), xarray, pytest
-
-**Spec:** `docs/superpowers/specs/2026-03-24-hypsometric-integral-design.md`
-
----
-
-### Task 1: Write failing tests for numpy backend
-
-**Files:**
-- Create: `xrspatial/tests/test_hypsometric_integral.py`
-
-Every test function needs `@pytest.mark.parametrize("backend", ...)` — there is no global `backend` fixture. This matches the pattern in `test_zonal.py`.
-
-- [ ] **Step 1: Create test file with hand-crafted test cases**
-
-```python
-# xrspatial/tests/test_hypsometric_integral.py
-try:
-    import dask.array as da
-except ImportError:
-    da = None
-
-import numpy as np
-import pytest
-import xarray as xr
-
-from .general_checks import create_test_raster
-
-
-# --- fixtures ---------------------------------------------------------------
-
-@pytest.fixture
-def hi_zones(backend):
-    """Two zones (1, 2) plus nodata (0).
-
-    Zone 1: 5 cells — (0,1), (0,2), (0,3), (1,1), (1,2)
-    Zone 2: 6 cells — (1,3), (2,1), (2,2), (2,3), (3,1), (3,2)
-    Nodata: 4 cells — column 0 and (3,3)
-    """
-    data = np.array([
-        [0, 1, 1, 1],
-        [0, 1, 1, 2],
-        [0, 2, 2, 2],
-        [0, 2, 2, 0],
-    ], dtype=np.float64)
-    return create_test_raster(data, backend, dims=['y', 'x'],
-                              attrs={'res': (1.0, 1.0)}, chunks=(2, 2))
-
-
-@pytest.fixture
-def hi_values(backend):
-    """Elevation values.
-
-    Zone 1 cells: 10, 20, 30, 40, 50
-      min=10, max=50, mean=30, HI=(30-10)/(50-10) = 0.5
-
-    Zone 2 cells: 100, 60, 70, 80, 90, 95
-      min=60, max=100, mean=82.5, HI=(82.5-60)/(100-60) = 0.5625
-    """
-    data = np.array([
-        [999., 10., 20., 30.],
-        [999., 40., 50., 100.],
-        [999., 60., 70., 80.],
-        [999., 90., 95., 999.],
-    ], dtype=np.float64)
-    return create_test_raster(data, backend, dims=['y', 'x'],
-                              attrs={'res': (1.0, 1.0)}, chunks=(2, 2))
-
-
-# --- basic -------------------------------------------------------------------
-
-@pytest.mark.parametrize("backend", ['numpy', 'dask+numpy', 'cupy', 'dask+cupy'])
-def test_hypsometric_integral_basic(backend, hi_zones, hi_values):
-    from xrspatial.zonal import hypsometric_integral
-
-    result = hypsometric_integral(hi_zones, hi_values)
-
-    assert isinstance(result, xr.DataArray)
-    assert result.shape == hi_values.shape
-    assert result.dims == hi_values.dims
-    assert result.name == 'hypsometric_integral'
-
-    out = result.values if not (da and isinstance(result.data, da.Array)) else result.compute().values
-
-    # zone 0 (nodata) cells should be NaN
-    nodata_mask = np.array([
-        [True, False, False, False],
-        [True, False, False, False],
-        [True, False, False, False],
-        [True, False, False, True],
-    ])
-    assert np.all(np.isnan(out[nodata_mask]))
-
-    # zone 1: HI = 0.5
-    z1_mask = np.array([
-        [False, True, True, True],
-        [False, True, True, False],
-        [False, False, False, False],
-        [False, False, False, False],
-    ])
-    np.testing.assert_allclose(out[z1_mask], 0.5, rtol=1e-10)
-
-    # zone 2: HI = 0.5625
-    z2_mask = np.array([
-        [False, False, False, False],
-        [False, False, False, True],
-        [False, True, True, True],
-        [False, True, True, False],
-    ])
-    np.testing.assert_allclose(out[z2_mask], 0.5625, rtol=1e-10)
-
-
-# --- edge cases --------------------------------------------------------------
-
-@pytest.mark.parametrize("backend", ['numpy', 'dask+numpy', 'cupy', 'dask+cupy'])
-def test_hypsometric_integral_flat_zone(backend):
-    """A zone with all identical values has range=0, so HI should be NaN."""
-    from xrspatial.zonal import hypsometric_integral
-
-    zones = create_test_raster(
-        np.array([[1, 1], [1, 1]], dtype=np.float64), backend,
-        chunks=(2, 2))
-    values = create_test_raster(
-        np.array([[5.0, 5.0], [5.0, 5.0]]), backend,
-        chunks=(2, 2))
-
-    result = hypsometric_integral(zones, values, nodata=0)
-    out = result.values if not (da and isinstance(result.data, da.Array)) else result.compute().values
-    assert np.all(np.isnan(out))
-
-
-@pytest.mark.parametrize("backend", ['numpy', 'dask+numpy', 'cupy', 'dask+cupy'])
-def test_hypsometric_integral_nan_in_values(backend):
-    """NaN elevation cells should be excluded from per-zone stats."""
-    from xrspatial.zonal import hypsometric_integral
-
-    zones = create_test_raster(
-        np.array([[1, 1], [1, 1]], dtype=np.float64), backend,
-        chunks=(2, 2))
-    values = create_test_raster(
-        np.array([[10.0, np.nan], [20.0, 30.0]]), backend,
-        chunks=(2, 2))
-
-    result = hypsometric_integral(zones, values, nodata=0)
-    out = result.values if not (da and isinstance(result.data, da.Array)) else result.compute().values
-
-    # zone 1 finite values: 10, 20, 30 -> HI = (20-10)/(30-10) = 0.5
-    # the NaN cell should remain NaN in output
-    assert np.isnan(out[0, 1])
-    np.testing.assert_allclose(out[0, 0], 0.5, rtol=1e-10)
-    np.testing.assert_allclose(out[1, 0], 0.5, rtol=1e-10)
-    np.testing.assert_allclose(out[1, 1], 0.5, rtol=1e-10)
-
-
-@pytest.mark.parametrize("backend", ['numpy', 'dask+numpy', 'cupy', 'dask+cupy'])
-def test_hypsometric_integral_single_cell_zone(backend):
-    """A zone with a single cell has range=0, so HI=NaN."""
-    from xrspatial.zonal import hypsometric_integral
-
-    zones = create_test_raster(
-        np.array([[1, 2]], dtype=np.float64), backend,
-        chunks=(1, 2))
-    values = create_test_raster(
-        np.array([[10.0, 20.0]]), backend,
-        chunks=(1, 2))
-
-    result = hypsometric_integral(zones, values, nodata=0)
-    out = result.values if not (da and isinstance(result.data, da.Array)) else result.compute().values
-    # single cell -> range=0 -> NaN
-    assert np.all(np.isnan(out))
-
-
-@pytest.mark.parametrize("backend", ['numpy', 'dask+numpy', 'cupy', 'dask+cupy'])
-def test_hypsometric_integral_all_nan_zone(backend):
-    """A zone whose elevation cells are all NaN should produce NaN."""
-    from xrspatial.zonal import hypsometric_integral
-
-    zones = create_test_raster(
-        np.array([[1, 1], [2, 2]], dtype=np.float64), backend,
-        chunks=(2, 2))
-    values = create_test_raster(
-        np.array([[np.nan, np.nan], [10.0, 20.0]]), backend,
-        chunks=(2, 2))
-
-    result = hypsometric_integral(zones, values, nodata=0)
-    out = result.values if not (da and isinstance(result.data, da.Array)) else result.compute().values
-
-    # zone 1: all NaN -> NaN
-    assert np.all(np.isnan(out[0, :]))
-    # zone 2: HI = (15-10)/(20-10) = 0.5
-    np.testing.assert_allclose(out[1, :], 0.5, rtol=1e-10)
-
-
-@pytest.mark.parametrize("backend", ['numpy', 'dask+numpy', 'cupy', 'dask+cupy'])
-def test_hypsometric_integral_nodata_none(backend):
-    """When nodata=None, all zone IDs are included (even 0)."""
-    from xrspatial.zonal import hypsometric_integral
-
-    zones = create_test_raster(
-        np.array([[0, 0], [1, 1]], dtype=np.float64), backend,
-        chunks=(2, 2))
-    values = create_test_raster(
-        np.array([[10.0, 20.0], [30.0, 40.0]]), backend,
-        chunks=(2, 2))
-
-    result = hypsometric_integral(zones, values, nodata=None)
-    out = result.values if not (da and isinstance(result.data, da.Array)) else result.compute().values
-
-    # zone 0: HI = (15-10)/(20-10) = 0.5
-    np.testing.assert_allclose(out[0, :], 0.5, rtol=1e-10)
-    # zone 1: HI = (35-30)/(40-30) = 0.5
-    np.testing.assert_allclose(out[1, :], 0.5, rtol=1e-10)
-```
-
-- [ ] **Step 2: Run tests to verify they fail**
-
-Run: `pytest xrspatial/tests/test_hypsometric_integral.py -v -x -k "numpy" --no-header 2>&1 | head -30`
-Expected: FAIL — `ImportError: cannot import name 'hypsometric_integral' from 'xrspatial.zonal'`
-
-- [ ] **Step 3: Commit**
-
-```bash
-git add xrspatial/tests/test_hypsometric_integral.py
-git commit -m "Add failing tests for hypsometric_integral"
-```
-
----
-
-### Task 2: Implement numpy backend and public function
-
-**Files:**
-- Modify: `xrspatial/zonal.py` (add `_hi_numpy` and `hypsometric_integral`)
-
-Note: `zonal.py` does not import `functools.partial`. The dispatcher uses lambdas to bind arguments, following the existing pattern in `stats()`.
-
-- [ ] **Step 1: Add numpy backend function to zonal.py**
-
-Add in a new section before the existing `_apply_numpy` helper function (search for `def _apply_numpy`):
-
-```python
-# ---------------------------------------------------------------------------
-# Hypsometric integral
-# ---------------------------------------------------------------------------
-
-def _hi_numpy(zones_data, values_data, nodata):
-    """Numpy backend for hypsometric integral."""
-    unique_zones = np.unique(zones_data[np.isfinite(zones_data)])
-    if nodata is not None:
-        unique_zones = unique_zones[unique_zones != nodata]
-
-    out = np.full(values_data.shape, np.nan, dtype=np.float64)
-
-    for z in unique_zones:
-        mask = (zones_data == z) & np.isfinite(values_data)
-        if not np.any(mask):
-            continue
-        vals = values_data[mask]
-        mn, mx = vals.min(), vals.max()
-        if mx == mn:
-            continue  # flat zone -> NaN
-        hi = (vals.mean() - mn) / (mx - mn)
-        out[mask] = hi
-    return out
-```
-
-- [ ] **Step 2: Add public `hypsometric_integral` function**
-
-Add immediately after `_hi_numpy`:
-
-```python
-def hypsometric_integral(
-    zones,
-    values,
-    nodata=0,
-    column=None,
-    rasterize_kw=None,
-    name='hypsometric_integral',
-):
-    """Hypsometric integral (HI) per zone, painted back to a raster.
-
-    HI measures geomorphic maturity: ``(mean - min) / (max - min)``
-    computed over elevations within each zone.  Values range from 0 to 1.
-
-    Parameters
-    ----------
-    zones : xr.DataArray, GeoDataFrame, or list of (geometry, value) pairs
-        Zone definitions.  Integer zone IDs.  GeoDataFrame and list-of-pairs
-        inputs are rasterized using *values* as the template grid.
-    values : xr.DataArray
-        2D elevation raster (float), same shape as *zones*.
-    nodata : int or None, default 0
-        Zone ID that means "no zone".  Excluded from computation; those
-        cells get NaN in the output.  Set to ``None`` to include all IDs.
-    column : str, optional
-        Column in a GeoDataFrame containing zone IDs.
-    rasterize_kw : dict, optional
-        Extra keyword arguments for ``rasterize()`` when *zones* is vector.
-    name : str, default ``'hypsometric_integral'``
-        Name for the output DataArray.
-
-    Returns
-    -------
-    xr.DataArray
-        Float64 raster, same shape/dims/coords as *values*.  Each cell
-        holds the HI of its zone.  NaN for nodata zones, non-finite
-        elevation cells, and flat zones (elevation range = 0).
-    """
-    zones = _maybe_rasterize_zones(zones, values, column=column,
-                                   rasterize_kw=rasterize_kw)
-    validate_arrays(zones, values)
-
-    _nodata = nodata  # capture for closures
-
-    mapper = ArrayTypeFunctionMapping(
-        numpy_func=lambda z, v: _hi_numpy(z, v, _nodata),
-        cupy_func=not_implemented_func,
-        dask_func=not_implemented_func,
-        dask_cupy_func=not_implemented_func,
-    )
-
-    out = mapper(zones)(zones.data, values.data)
-
-    return xr.DataArray(
-        out,
-        name=name,
-        dims=values.dims,
-        coords=values.coords,
-        attrs=values.attrs,
-    )
-```
-
-- [ ] **Step 3: Run numpy tests to verify they pass**
-
-Run: `pytest xrspatial/tests/test_hypsometric_integral.py -v -k "numpy and not dask and not cupy" --no-header 2>&1 | tail -20`
-Expected: all `numpy` backend tests PASS
-
-- [ ] **Step 4: Commit**
-
-```bash
-git add xrspatial/zonal.py
-git commit -m "Add hypsometric_integral with numpy backend"
-```
-
----
-
-### Task 3: Implement cupy backend
-
-**Files:**
-- Modify: `xrspatial/zonal.py` (add `_hi_cupy`, update dispatcher)
-
-- [ ] **Step 1: Add cupy backend function**
-
-Add after `_hi_numpy`:
-
-```python
-def _hi_cupy(zones_data, values_data, nodata):
-    """CuPy backend for hypsometric integral — transfer to host, compute, return."""
-    import cupy as cp
-    result_np = _hi_numpy(cp.asnumpy(zones_data), cp.asnumpy(values_data), nodata)
-    return cp.asarray(result_np)
-```
-
-- [ ] **Step 2: Update dispatcher in `hypsometric_integral`**
-
-Change `cupy_func=not_implemented_func` to:
-
-```python
-cupy_func=lambda z, v: _hi_cupy(z, v, _nodata),
-```
-
-- [ ] **Step 3: Run tests (numpy still passes)**
-
-Run: `pytest xrspatial/tests/test_hypsometric_integral.py -v -k "numpy and not dask and not cupy" --no-header 2>&1 | tail -10`
-Expected: PASS
-
-- [ ] **Step 4: Commit**
-
-```bash
-git add xrspatial/zonal.py
-git commit -m "Add cupy backend for hypsometric_integral"
-```
-
----
-
-### Task 4: Implement dask+numpy backend
-
-**Files:**
-- Modify: `xrspatial/zonal.py` (add `_hi_dask_numpy`, update dispatcher)
-
-The dask path: (1) compute per-zone min/max/sum/count as delayed tasks across blocks, (2) reduce to global per-zone HI lookup, (3) `map_blocks` to paint HI back per chunk (preserving chunk structure).
-
-- [ ] **Step 1: Add module-level delayed helpers and dask+numpy backend function**
-
-Add three functions after `_hi_cupy`. The `@delayed` helpers are defined at module level to match the existing pattern in `zonal.py` (see `_single_stats_func` at line ~280).
-
-```python
-@delayed
-def _hi_block_stats(z_block, v_block, uzones):
-    """Per-chunk: return (n_zones, 4) array of [min, max, sum, count]."""
-    result = np.full((len(uzones), 4), np.nan, dtype=np.float64)
-    result[:, 3] = 0  # count starts at 0
-    for i, z in enumerate(uzones):
-        mask = (z_block == z) & np.isfinite(v_block)
-        if not np.any(mask):
-            continue
-        vals = v_block[mask]
-        result[i, 0] = vals.min()
-        result[i, 1] = vals.max()
-        result[i, 2] = vals.sum()
-        result[i, 3] = len(vals)
-    return result
-
-
-@delayed
-def _hi_reduce(partials_list, uzones):
-    """Reduce per-block stats to global per-zone HI lookup dict."""
-    stacked = np.stack(partials_list)  # (n_blocks, n_zones, 4)
-    g_min = np.nanmin(stacked[:, :, 0], axis=0)
-    g_max = np.nanmax(stacked[:, :, 1], axis=0)
-    g_sum = np.nansum(stacked[:, :, 2], axis=0)
-    g_count = np.nansum(stacked[:, :, 3], axis=0)
-
-    hi_lookup = {}
-    for i, z in enumerate(uzones):
-        if g_count[i] == 0 or g_max[i] == g_min[i]:
-            hi_lookup[z] = np.nan
-        else:
-            mean = g_sum[i] / g_count[i]
-            hi_lookup[z] = (mean - g_min[i]) / (g_max[i] - g_min[i])
-    return hi_lookup
-
-
-def _hi_dask_numpy(zones_data, values_data, nodata):
-    """Dask+numpy backend for hypsometric integral."""
-    # Step 1: find all unique zones across all chunks
-    unique_zones = _unique_finite_zones(zones_data)
-    if nodata is not None:
-        unique_zones = unique_zones[unique_zones != nodata]
-
-    if len(unique_zones) == 0:
-        return da.full(values_data.shape, np.nan, dtype=np.float64,
-                       chunks=values_data.chunks)
-
-    # Step 2: per-block aggregation -> global reduce
-    zones_blocks = zones_data.to_delayed().ravel()
-    values_blocks = values_data.to_delayed().ravel()
-
-    partials = [
-        _hi_block_stats(zb, vb, unique_zones)
-        for zb, vb in zip(zones_blocks, values_blocks)
-    ]
-
-    # Compute the HI lookup eagerly so map_blocks can use it as a parameter.
-    hi_lookup = dask.compute(_hi_reduce(partials, unique_zones))[0]
-
-    # Step 3: paint back using map_blocks (preserves chunk structure)
-    def _paint(zones_chunk, values_chunk, hi_map):
-        out = np.full(zones_chunk.shape, np.nan, dtype=np.float64)
-        for z, hi_val in hi_map.items():
-            mask = (zones_chunk == z) & np.isfinite(values_chunk)
-            out[mask] = hi_val
-        return out
-
-    return da.map_blocks(
-        _paint, zones_data, values_data, hi_map=hi_lookup,
-        dtype=np.float64, meta=np.array(()),
-    )
-```
-
-Note: `delayed` and `dask` are already imported at module level in `zonal.py`.
-
-- [ ] **Step 2: Update dispatcher**
-
-Change `dask_func=not_implemented_func` to:
-
-```python
-dask_func=lambda z, v: _hi_dask_numpy(z, v, _nodata),
-```
-
-- [ ] **Step 3: Run all tests including dask**
-
-Run: `pytest xrspatial/tests/test_hypsometric_integral.py -v -k "numpy or dask" --no-header 2>&1 | tail -20`
-Expected: all numpy and dask+numpy tests PASS
-
-- [ ] **Step 4: Commit**
-
-```bash
-git add xrspatial/zonal.py
-git commit -m "Add dask+numpy backend for hypsometric_integral"
-```
-
----
-
-### Task 5: Implement dask+cupy backend
-
-**Files:**
-- Modify: `xrspatial/zonal.py` (add `_hi_dask_cupy`, update dispatcher)
-
-Follows the same pattern as `_stats_dask_cupy`: convert dask+cupy chunks to numpy, delegate to the dask+numpy path.
-
-- [ ] **Step 1: Add dask+cupy backend function**
-
-Add after `_hi_dask_numpy`:
-
-```python
-def _hi_dask_cupy(zones_data, values_data, nodata):
-    """Dask+cupy backend: convert chunks to numpy, delegate."""
-    zones_cpu = zones_data.map_blocks(
-        lambda x: x.get(), dtype=zones_data.dtype, meta=np.array(()),
-    )
-    values_cpu = values_data.map_blocks(
-        lambda x: x.get(), dtype=values_data.dtype, meta=np.array(()),
-    )
-    return _hi_dask_numpy(zones_cpu, values_cpu, nodata)
-```
-
-- [ ] **Step 2: Update dispatcher**
-
-Change `dask_cupy_func=not_implemented_func` to:
-
-```python
-dask_cupy_func=lambda z, v: _hi_dask_cupy(z, v, _nodata),
-```
-
-- [ ] **Step 3: Run tests (numpy and dask still pass)**
-
-Run: `pytest xrspatial/tests/test_hypsometric_integral.py -v -k "numpy and not cupy" --no-header 2>&1 | tail -10`
-Expected: PASS
-
-- [ ] **Step 4: Commit**
-
-```bash
-git add xrspatial/zonal.py
-git commit -m "Add dask+cupy backend for hypsometric_integral"
-```
-
----
-
-### Task 6: Wire up public exports and accessor
-
-**Files:**
-- Modify: `xrspatial/__init__.py` (add export)
-- Modify: `xrspatial/accessor.py` (add accessor method)
-- Modify: `xrspatial/tests/test_hypsometric_integral.py` (add accessor test)
-
-- [ ] **Step 1: Write failing test for accessor**
-
-Add to the end of `xrspatial/tests/test_hypsometric_integral.py`:
-
-```python
-@pytest.mark.parametrize("backend", ['numpy', 'dask+numpy', 'cupy', 'dask+cupy'])
-def test_hypsometric_integral_accessor(backend, hi_zones, hi_values):
-    """Verify the .xrs accessor method works."""
-    result = hi_values.xrs.zonal_hypsometric_integral(hi_zones)
-    assert isinstance(result, xr.DataArray)
-    assert result.shape == hi_values.shape
-
-    out = result.values if not (da and isinstance(result.data, da.Array)) else result.compute().values
-    z1_mask = np.array([
-        [False, True, True, True],
-        [False, True, True, False],
-        [False, False, False, False],
-        [False, False, False, False],
-    ])
-    np.testing.assert_allclose(out[z1_mask], 0.5, rtol=1e-10)
-
-
-def test_hypsometric_integral_list_of_pairs_zones():
-    """Vector zones via list of (geometry, value) pairs."""
-    from shapely.geometry import box
-    from xrspatial.zonal import hypsometric_integral
-
-    pytest.importorskip("shapely")
-    pytest.importorskip("rasterio")
-
-    values_data = np.array([
-        [10., 20., 30.],
-        [40., 50., 60.],
-        [70., 80., 90.],
-    ], dtype=np.float64)
-    values = xr.DataArray(values_data, dims=['y', 'x'])
-    values['y'] = [2.0, 1.0, 0.0]
-    values['x'] = [0.0, 1.0, 2.0]
-    values.attrs['res'] = (1.0, 1.0)
-
-    # Zone 1 covers left half, zone 2 covers right half
-    zones_pairs = [
-        (box(-0.5, -0.5, 1.5, 2.5), 1),
-        (box(1.5, -0.5, 2.5, 2.5), 2),
-    ]
-
-    result = hypsometric_integral(zones_pairs, values, nodata=0)
-    assert isinstance(result, xr.DataArray)
-    assert result.shape == values.shape
-```
-
-- [ ] **Step 2: Run test to verify it fails**
-
-Run: `pytest xrspatial/tests/test_hypsometric_integral.py::test_hypsometric_integral_accessor -v -k "numpy and not dask and not cupy" --no-header 2>&1 | tail -10`
-Expected: FAIL — `AttributeError`
-
-- [ ] **Step 3: Add accessor method to `xrspatial/accessor.py`**
-
-Add after the `zonal_crosstab` method (search for `def zonal_crosstab`):
-
-```python
-    def zonal_hypsometric_integral(self, zones, **kwargs):
-        from .zonal import hypsometric_integral
-        return hypsometric_integral(zones, self._obj, **kwargs)
-```
-
-- [ ] **Step 4: Add top-level export to `xrspatial/__init__.py`**
-
-Add after the existing zonal imports (search for `zonal_stats`):
-
-```python
-from xrspatial.zonal import hypsometric_integral  # noqa
-```
-
-- [ ] **Step 5: Run all tests**
-
-Run: `pytest xrspatial/tests/test_hypsometric_integral.py -v -k "numpy and not dask and not cupy" --no-header 2>&1 | tail -20`
-Expected: all tests PASS
-
-- [ ] **Step 6: Commit**
-
-```bash
-git add xrspatial/__init__.py xrspatial/accessor.py xrspatial/tests/test_hypsometric_integral.py
-git commit -m "Wire up hypsometric_integral accessor and public export"
-```
-
----
-
-### Task 7: Final validation
-
-**Files:** (none — verification only)
-
-- [ ] **Step 1: Run the full zonal test suite to check for regressions**
-
-Run: `pytest xrspatial/tests/test_zonal.py -v -k "numpy and not dask and not cupy" --no-header -q 2>&1 | tail -10`
-Expected: existing zonal tests still PASS
-
-- [ ] **Step 2: Run the hypsometric integral tests one final time**
-
-Run: `pytest xrspatial/tests/test_hypsometric_integral.py -v --no-header 2>&1`
-Expected: all PASS (cupy/dask+cupy tests skip if no GPU)
-
-- [ ] **Step 3: Verify import works from top level**
-
-Run: `python -c "from xrspatial import hypsometric_integral; print(hypsometric_integral.__doc__[:60])"`
-Expected: prints first 60 chars of docstring without error
diff --git a/docs/superpowers/plans/2026-03-30-geotiff-perf-controls.md b/docs/superpowers/plans/2026-03-30-geotiff-perf-controls.md
deleted file mode 100644
index e64d7666..00000000
--- a/docs/superpowers/plans/2026-03-30-geotiff-perf-controls.md
+++ /dev/null
@@ -1,813 +0,0 @@
-# GeoTIFF performance and memory controls implementation plan
-
-> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
-
-**Goal:** Add `dtype` to `open_geotiff`, `compression_level` to `to_geotiff`, and VRT tiled output when `to_geotiff` is given a `.vrt` path. Issue #1083.
-
-**Architecture:** Three independent features threaded into the existing geotiff module. `dtype` intercepts each read path after tile/strip decode. `compression_level` passes through `to_geotiff` → `write()` → `_write_tiled`/`_write_stripped` → `compress()`. VRT output adds a new code path in `to_geotiff` that slices the input into per-chunk GeoTIFFs and calls `write_vrt()`.
-
-**Tech Stack:** numpy, xarray, dask (optional), numba, cupy (optional). All existing dependencies.
-
----
-
-## File map
-
-| File | Role | Changes |
-|------|------|---------|
-| `xrspatial/geotiff/__init__.py` | Public API | Add `dtype` param to `open_geotiff`, `read_geotiff_dask`, `read_geotiff_gpu`, `_delayed_read_window`. Add `compression_level` param to `to_geotiff`, `write_geotiff_gpu`. Add VRT output path in `to_geotiff`. Add `_validate_dtype_cast()` helper. |
-| `xrspatial/geotiff/_writer.py` | Tile/strip compression, file assembly | Thread `compression_level` through `write()`, `_write_tiled()`, `_write_stripped()`, `_prepare_tile()`. |
-| `xrspatial/geotiff/_compression.py` | Codec dispatch | No changes needed -- `compress()` already accepts `level`. |
-| `xrspatial/geotiff/tests/test_dtype_read.py` | New test file | Tests for `dtype` on eager, dask, validation. |
-| `xrspatial/geotiff/tests/test_compression_level.py` | New test file | Tests for `compression_level` round-trips. |
-| `xrspatial/geotiff/tests/test_vrt_write.py` | New test file | Tests for `.vrt` output path, dask streaming, numpy slicing, edge cases. |
-
----
-
-### Task 1: `compression_level` plumbing through the writer
-
-The simplest of the three features. Thread the level integer from the public API down to the `compress()` call.
-
-**Files:**
-- Modify: `xrspatial/geotiff/_writer.py:298-403` (`_write_stripped`, `_prepare_tile`, `_write_tiled`, `write`)
-- Modify: `xrspatial/geotiff/__init__.py:342-519` (`to_geotiff`, `write_geotiff_gpu`)
-- Test: `xrspatial/geotiff/tests/test_compression_level.py` (create)
-
-- [ ] **Step 1: Write the failing test**
-
-Create `xrspatial/geotiff/tests/test_compression_level.py`:
-
-```python
-"""Tests for compression_level parameter on to_geotiff."""
-import numpy as np
-import os
-import pytest
-import xarray as xr
-
-from xrspatial.geotiff import open_geotiff, to_geotiff
-
-
-@pytest.fixture
-def sample_float32(tmp_path):
-    """100x100 float32 raster with coords and CRS."""
-    arr = np.random.default_rng(42).random((100, 100), dtype=np.float32)
-    y = np.linspace(40.0, 41.0, 100)
-    x = np.linspace(-105.0, -104.0, 100)
-    da = xr.DataArray(arr, dims=['y', 'x'],
-                      coords={'y': y, 'x': x},
-                      attrs={'crs': 4326})
-    return da
-
-
-class TestCompressionLevel:
-    """Round-trip tests: write with level, read back, verify data matches."""
-
-    def test_zstd_level_1_round_trip(self, sample_float32, tmp_path):
-        path = str(tmp_path / 'test_1083_zstd_l1.tif')
-        to_geotiff(sample_float32, path, compression='zstd',
-                   compression_level=1)
-        result = open_geotiff(path)
-        np.testing.assert_array_almost_equal(result.values,
-                                             sample_float32.values, decimal=6)
-
-    def test_zstd_level_22_round_trip(self, sample_float32, tmp_path):
-        path = str(tmp_path / 'test_1083_zstd_l22.tif')
-        to_geotiff(sample_float32, path, compression='zstd',
-                   compression_level=22)
-        result = open_geotiff(path)
-        np.testing.assert_array_almost_equal(result.values,
-                                             sample_float32.values, decimal=6)
-
-    def test_deflate_level_1_round_trip(self, sample_float32, tmp_path):
-        path = str(tmp_path / 'test_1083_deflate_l1.tif')
-        to_geotiff(sample_float32, path, compression='deflate',
-                   compression_level=1)
-        result = open_geotiff(path)
-        np.testing.assert_array_almost_equal(result.values,
-                                             sample_float32.values, decimal=6)
-
-    def test_deflate_level_9_round_trip(self, sample_float32, tmp_path):
-        path = str(tmp_path / 'test_1083_deflate_l9.tif')
-        to_geotiff(sample_float32, path, compression='deflate',
-                   compression_level=9)
-        result = open_geotiff(path)
-        np.testing.assert_array_almost_equal(result.values,
-                                             sample_float32.values, decimal=6)
-
-    def test_higher_level_produces_smaller_file(self, sample_float32, tmp_path):
-        path_l1 = str(tmp_path / 'test_1083_small_l1.tif')
-        path_l22 = str(tmp_path / 'test_1083_small_l22.tif')
-        to_geotiff(sample_float32, path_l1, compression='zstd',
-                   compression_level=1)
-        to_geotiff(sample_float32, path_l22, compression='zstd',
-                   compression_level=22)
-        assert os.path.getsize(path_l22) <= os.path.getsize(path_l1)
-
-    def test_level_none_uses_default(self, sample_float32, tmp_path):
-        path = str(tmp_path / 'test_1083_default.tif')
-        to_geotiff(sample_float32, path, compression='zstd',
-                   compression_level=None)
-        result = open_geotiff(path)
-        np.testing.assert_array_almost_equal(result.values,
-                                             sample_float32.values, decimal=6)
-
-    def test_level_ignored_for_lzw(self, sample_float32, tmp_path):
-        """LZW has no level support; setting one should not error."""
-        path = str(tmp_path / 'test_1083_lzw_level.tif')
-        to_geotiff(sample_float32, path, compression='lzw',
-                   compression_level=5)
-        result = open_geotiff(path)
-        np.testing.assert_array_almost_equal(result.values,
-                                             sample_float32.values, decimal=6)
-
-    def test_invalid_level_raises(self, sample_float32, tmp_path):
-        path = str(tmp_path / 'test_1083_bad_level.tif')
-        with pytest.raises(ValueError, match='compression_level'):
-            to_geotiff(sample_float32, path, compression='zstd',
-                       compression_level=99)
-
-    def test_invalid_deflate_level_raises(self, sample_float32, tmp_path):
-        path = str(tmp_path / 'test_1083_bad_deflate.tif')
-        with pytest.raises(ValueError, match='compression_level'):
-            to_geotiff(sample_float32, path, compression='deflate',
-                       compression_level=10)
-```
-
-- [ ] **Step 2: Run test to verify it fails**
-
-Run: `cd .claude/worktrees/issue-1083 && python -m pytest xrspatial/geotiff/tests/test_compression_level.py -v --no-header -x 2>&1 | head -30`
-Expected: FAIL -- `to_geotiff()` got an unexpected keyword argument `compression_level`.
-
-- [ ] **Step 3: Add `compression_level` validation to `to_geotiff`**
-
-In `xrspatial/geotiff/__init__.py`, change the `to_geotiff` signature and add validation before the write call. Add `compression_level: int | None = None` parameter after `compression`. Add this validation block before the `write()` call (before line 499):
-
-```python
-    # Validate compression_level
-    _LEVEL_RANGES = {
-        'deflate': (1, 9), 'zstd': (1, 22), 'lz4': (0, 16),
-    }
-    if compression_level is not None:
-        level_range = _LEVEL_RANGES.get(compression)
-        if level_range is not None:
-            lo, hi = level_range
-            if not (lo <= compression_level <= hi):
-                raise ValueError(
-                    f"compression_level={compression_level} out of range "
-                    f"for {compression} (valid: {lo}-{hi})")
-```
-
-Pass `compression_level=compression_level` to the `write()` call at line 499.
-
-- [ ] **Step 4: Thread `compression_level` through `write()` → `_write_tiled` → `_prepare_tile`**
-
-In `xrspatial/geotiff/_writer.py`:
-
-1. Add `compression_level: int | None = None` parameter to `write()` (after `predictor`).
-2. Pass `compression_level=compression_level` to `_write_tiled()` and `_write_stripped()` calls inside `write()`.
-3. Add `compression_level: int | None = None` parameter to `_write_tiled()` and `_write_stripped()`.
-4. Add `compression_level: int | None = None` parameter to `_prepare_tile()`.
-5. In `_prepare_tile()`, change `return compress(tile_data, compression)` to `return compress(tile_data, compression, level=compression_level)` when `compression_level is not None`, else `return compress(tile_data, compression)`. Simplest: `return compress(tile_data, compression) if compression_level is None else compress(tile_data, compression, level=compression_level)`.
-6. In `_write_stripped()`, do the same for the `compress(strip_data, compression)` call at the sequential path.
-7. Pass `compression_level` through all `_prepare_tile` call sites in `_write_tiled`.
-
-The `compress()` function in `_compression.py` already accepts `level` as a keyword argument with default 6, so we just need to pass it when non-None.
-
-- [ ] **Step 5: Run tests to verify they pass**
-
-Run: `cd .claude/worktrees/issue-1083 && python -m pytest xrspatial/geotiff/tests/test_compression_level.py -v --no-header 2>&1 | tail -20`
-Expected: All PASS.
-
-- [ ] **Step 6: Commit**
-
-```bash
-cd .claude/worktrees/issue-1083
-git add xrspatial/geotiff/__init__.py xrspatial/geotiff/_writer.py xrspatial/geotiff/tests/test_compression_level.py
-git commit -m "Add compression_level parameter to to_geotiff (#1083)"
-```
-
----
-
-### Task 2: `dtype` parameter on `open_geotiff` (eager and dask paths)
-
-**Files:**
-- Modify: `xrspatial/geotiff/__init__.py:151-636` (`open_geotiff`, `read_geotiff_dask`, `_delayed_read_window`, `read_geotiff_gpu`)
-- Test: `xrspatial/geotiff/tests/test_dtype_read.py` (create)
-
-- [ ] **Step 1: Write the failing test**
-
-Create `xrspatial/geotiff/tests/test_dtype_read.py`:
-
-```python
-"""Tests for dtype parameter on open_geotiff."""
-import numpy as np
-import pytest
-import xarray as xr
-
-from xrspatial.geotiff import open_geotiff, to_geotiff
-
-
-@pytest.fixture
-def float64_tif(tmp_path):
-    """Write a float64 GeoTIFF for dtype cast tests."""
-    arr = np.random.default_rng(99).random((80, 80)).astype(np.float64)
-    y = np.linspace(40.0, 41.0, 80)
-    x = np.linspace(-105.0, -104.0, 80)
-    da = xr.DataArray(arr, dims=['y', 'x'],
-                      coords={'y': y, 'x': x},
-                      attrs={'crs': 4326})
-    path = str(tmp_path / 'test_1083_f64.tif')
-    to_geotiff(da, path, compression='none')
-    return path, arr
-
-
-@pytest.fixture
-def uint16_tif(tmp_path):
-    """Write a uint16 GeoTIFF for dtype cast tests."""
-    arr = np.random.default_rng(77).integers(0, 10000, (60, 60),
-                                             dtype=np.uint16)
-    y = np.linspace(40.0, 41.0, 60)
-    x = np.linspace(-105.0, -104.0, 60)
-    da = xr.DataArray(arr, dims=['y', 'x'],
-                      coords={'y': y, 'x': x},
-                      attrs={'crs': 4326})
-    path = str(tmp_path / 'test_1083_u16.tif')
-    to_geotiff(da, path, compression='none')
-    return path, arr
-
-
-class TestDtypeEager:
-    """dtype parameter on eager (numpy) reads."""
-
-    def test_float64_to_float32(self, float64_tif):
-        path, orig = float64_tif
-        result = open_geotiff(path, dtype='float32')
-        assert result.dtype == np.float32
-        np.testing.assert_array_almost_equal(
-            result.values, orig.astype(np.float32), decimal=6)
-
-    def test_float64_to_float16(self, float64_tif):
-        path, orig = float64_tif
-        result = open_geotiff(path, dtype=np.float16)
-        assert result.dtype == np.float16
-
-    def test_uint16_to_int32(self, uint16_tif):
-        path, orig = uint16_tif
-        result = open_geotiff(path, dtype='int32')
-        assert result.dtype == np.int32
-        np.testing.assert_array_equal(result.values, orig.astype(np.int32))
-
-    def test_uint16_to_uint8(self, uint16_tif):
-        """Narrowing int cast is allowed (user asked for it)."""
-        path, _ = uint16_tif
-        result = open_geotiff(path, dtype='uint8')
-        assert result.dtype == np.uint8
-
-    def test_float_to_int_raises(self, float64_tif):
-        path, _ = float64_tif
-        with pytest.raises(ValueError, match='float.*int'):
-            open_geotiff(path, dtype='int32')
-
-    def test_dtype_none_preserves_native(self, float64_tif):
-        path, _ = float64_tif
-        result = open_geotiff(path, dtype=None)
-        assert result.dtype == np.float64
-
-
-class TestDtypeDask:
-    """dtype parameter on dask reads."""
-
-    def test_float64_to_float32_dask(self, float64_tif):
-        path, orig = float64_tif
-        result = open_geotiff(path, dtype='float32', chunks=40)
-        assert result.dtype == np.float32
-        computed = result.values
-        np.testing.assert_array_almost_equal(
-            computed, orig.astype(np.float32), decimal=6)
-
-    def test_chunks_are_target_dtype(self, float64_tif):
-        path, _ = float64_tif
-        result = open_geotiff(path, dtype='float32', chunks=40)
-        # Each chunk should be float32, not float64
-        assert result.data.dtype == np.float32
-
-    def test_float_to_int_raises_dask(self, float64_tif):
-        path, _ = float64_tif
-        with pytest.raises(ValueError, match='float.*int'):
-            open_geotiff(path, dtype='int32', chunks=40)
-```
-
-- [ ] **Step 2: Run test to verify it fails**
-
-Run: `cd .claude/worktrees/issue-1083 && python -m pytest xrspatial/geotiff/tests/test_dtype_read.py -v --no-header -x 2>&1 | head -20`
-Expected: FAIL -- `open_geotiff()` got an unexpected keyword argument `dtype`.
-
-- [ ] **Step 3: Add `_validate_dtype_cast` helper and `dtype` to `open_geotiff`**
-
-In `xrspatial/geotiff/__init__.py`, add a helper function after the `_geo_to_coords` function (around line 58):
-
-```python
-def _validate_dtype_cast(source_dtype, target_dtype):
-    """Validate that casting source_dtype to target_dtype is allowed.
-
-    Raises ValueError for float-to-int casts (lossy in a way users
-    often don't intend).  All other casts are permitted -- the user
-    asked for them explicitly.
-    """
-    src = np.dtype(source_dtype)
-    tgt = np.dtype(target_dtype)
-    if src.kind == 'f' and tgt.kind in ('u', 'i'):
-        raise ValueError(
-            f"Cannot cast float ({src}) to int ({tgt}). "
-            f"This loses fractional data and is usually unintentional. "
-            f"Cast explicitly after reading if you really want this.")
-```
-
-Then modify `open_geotiff` signature to add `dtype=None` after `source`. In the eager path (after `arr, geo_info = read_to_array(...)` at line 204), add:
-
-```python
-    if dtype is not None:
-        target = np.dtype(dtype)
-        _validate_dtype_cast(arr.dtype, target)
-        arr = arr.astype(target)
-```
-
-Pass `dtype=dtype` through to `read_geotiff_dask()` and `read_geotiff_gpu()` calls.
-
-- [ ] **Step 4: Add `dtype` to `read_geotiff_dask` and `_delayed_read_window`**
-
-In `read_geotiff_dask`:
-1. Add `dtype` parameter to signature.
-2. Before building dask blocks, validate: `if dtype is not None: target = np.dtype(dtype); _validate_dtype_cast(file_dtype, target)` where `file_dtype` is the dtype from the metadata read.
-3. If dtype is set, use `target` instead of `dtype` (the file dtype) for `da.from_delayed(..., dtype=target)`.
-4. Pass `dtype` to `_delayed_read_window`.
-
-In `_delayed_read_window`:
-1. Add `target_dtype=None` parameter.
-2. Inside the `_read()` closure, after the nodata masking, add: `if target_dtype is not None: arr = arr.astype(target_dtype)`.
-
-- [ ] **Step 5: Add `dtype` to `read_geotiff_gpu`**
-
-In `read_geotiff_gpu`:
-1. Add `dtype` parameter to signature.
-2. After the final `arr_gpu` is built (before building the DataArray), add: `if dtype is not None: target = np.dtype(dtype); _validate_dtype_cast(np.dtype(str(arr_gpu.dtype)), target); arr_gpu = arr_gpu.astype(target)`.
-
-- [ ] **Step 6: Run tests to verify they pass**
-
-Run: `cd .claude/worktrees/issue-1083 && python -m pytest xrspatial/geotiff/tests/test_dtype_read.py -v --no-header 2>&1 | tail -20`
-Expected: All PASS.
-
-- [ ] **Step 7: Run existing tests to check for regressions**
-
-Run: `cd .claude/worktrees/issue-1083 && python -m pytest xrspatial/geotiff/tests/ -v --no-header -x -q 2>&1 | tail -20`
-Expected: All PASS.
-
-- [ ] **Step 8: Commit**
-
-```bash
-cd .claude/worktrees/issue-1083
-git add xrspatial/geotiff/__init__.py xrspatial/geotiff/tests/test_dtype_read.py
-git commit -m "Add dtype parameter to open_geotiff (#1083)"
-```
-
----
-
-### Task 3: VRT tiled output from `to_geotiff`
-
-**Files:**
-- Modify: `xrspatial/geotiff/__init__.py:342-519` (`to_geotiff`)
-- Test: `xrspatial/geotiff/tests/test_vrt_write.py` (create)
-
-- [ ] **Step 1: Write the failing tests**
-
-Create `xrspatial/geotiff/tests/test_vrt_write.py`:
-
-```python
-"""Tests for VRT tiled output from to_geotiff."""
-import numpy as np
-import os
-import pytest
-import xarray as xr
-
-from xrspatial.geotiff import open_geotiff, to_geotiff
-
-
-@pytest.fixture
-def sample_raster():
-    """200x200 float32 raster with coords and CRS."""
-    arr = np.random.default_rng(55).random((200, 200), dtype=np.float32)
-    y = np.linspace(41.0, 40.0, 200)  # north-to-south
-    x = np.linspace(-106.0, -105.0, 200)
-    da = xr.DataArray(arr, dims=['y', 'x'],
-                      coords={'y': y, 'x': x},
-                      attrs={'crs': 4326, 'nodata': -9999.0})
-    return da
-
-
-class TestVrtOutputNumpy:
-    """VRT output from numpy-backed DataArrays."""
-
-    def test_creates_vrt_and_tiles_dir(self, sample_raster, tmp_path):
-        vrt_path = str(tmp_path / 'out_1083.vrt')
-        to_geotiff(sample_raster, vrt_path)
-        assert os.path.exists(vrt_path)
-        tiles_dir = str(tmp_path / 'out_1083_tiles')
-        assert os.path.isdir(tiles_dir)
-        tile_files = os.listdir(tiles_dir)
-        assert len(tile_files) > 0
-        assert all(f.endswith('.tif') for f in tile_files)
-
-    def test_round_trip_numpy(self, sample_raster, tmp_path):
-        vrt_path = str(tmp_path / 'rt_1083.vrt')
-        to_geotiff(sample_raster, vrt_path)
-        result = open_geotiff(vrt_path)
-        np.testing.assert_array_almost_equal(
-            result.values, sample_raster.values, decimal=5)
-
-    def test_tile_naming_convention(self, sample_raster, tmp_path):
-        vrt_path = str(tmp_path / 'named_1083.vrt')
-        to_geotiff(sample_raster, vrt_path, tile_size=100)
-        tiles_dir = str(tmp_path / 'named_1083_tiles')
-        files = sorted(os.listdir(tiles_dir))
-        # 200x200 with tile_size=100 -> 2x2 grid
-        assert files == [
-            'tile_00_00.tif', 'tile_00_01.tif',
-            'tile_01_00.tif', 'tile_01_01.tif',
-        ]
-
-    def test_relative_paths_in_vrt(self, sample_raster, tmp_path):
-        vrt_path = str(tmp_path / 'rel_1083.vrt')
-        to_geotiff(sample_raster, vrt_path)
-        with open(vrt_path) as f:
-            content = f.read()
-        # Paths should be relative (no leading /)
-        assert 'rel_1083_tiles/' in content
-        assert str(tmp_path) not in content
-
-    def test_compression_level_passed_to_tiles(self, sample_raster, tmp_path):
-        vrt_path = str(tmp_path / 'cl_1083.vrt')
-        to_geotiff(sample_raster, vrt_path, compression='zstd',
-                   compression_level=1)
-        result = open_geotiff(vrt_path)
-        np.testing.assert_array_almost_equal(
-            result.values, sample_raster.values, decimal=5)
-
-
-class TestVrtOutputDask:
-    """VRT output from dask-backed DataArrays."""
-
-    def test_dask_round_trip(self, sample_raster, tmp_path):
-        dask_da = sample_raster.chunk({'y': 100, 'x': 100})
-        vrt_path = str(tmp_path / 'dask_1083.vrt')
-        to_geotiff(dask_da, vrt_path)
-        result = open_geotiff(vrt_path)
-        np.testing.assert_array_almost_equal(
-            result.values, sample_raster.values, decimal=5)
-
-    def test_dask_one_tile_per_chunk(self, sample_raster, tmp_path):
-        dask_da = sample_raster.chunk({'y': 100, 'x': 100})
-        vrt_path = str(tmp_path / 'chunks_1083.vrt')
-        to_geotiff(dask_da, vrt_path)
-        tiles_dir = str(tmp_path / 'chunks_1083_tiles')
-        # 200x200 chunked 100x100 -> 2x2 = 4 tiles
-        assert len(os.listdir(tiles_dir)) == 4
-
-
-class TestVrtEdgeCases:
-    """Edge cases and validation."""
-
-    def test_cog_with_vrt_raises(self, sample_raster, tmp_path):
-        vrt_path = str(tmp_path / 'cog_1083.vrt')
-        with pytest.raises(ValueError, match='cog.*vrt|vrt.*cog'):
-            to_geotiff(sample_raster, vrt_path, cog=True)
-
-    def test_overview_levels_with_vrt_raises(self, sample_raster, tmp_path):
-        vrt_path = str(tmp_path / 'ovr_1083.vrt')
-        with pytest.raises(ValueError, match='overview.*vrt|vrt.*overview'):
-            to_geotiff(sample_raster, vrt_path, overview_levels=[2, 4])
-
-    def test_nonempty_tiles_dir_raises(self, sample_raster, tmp_path):
-        tiles_dir = tmp_path / 'exist_1083_tiles'
-        tiles_dir.mkdir()
-        (tiles_dir / 'dummy.tif').write_text('x')
-        vrt_path = str(tmp_path / 'exist_1083.vrt')
-        with pytest.raises(FileExistsError):
-            to_geotiff(sample_raster, vrt_path)
-
-    def test_empty_tiles_dir_ok(self, sample_raster, tmp_path):
-        tiles_dir = tmp_path / 'empty_1083_tiles'
-        tiles_dir.mkdir()
-        vrt_path = str(tmp_path / 'empty_1083.vrt')
-        to_geotiff(sample_raster, vrt_path)
-        assert os.path.exists(vrt_path)
-```
-
-- [ ] **Step 2: Run tests to verify they fail**
-
-Run: `cd .claude/worktrees/issue-1083 && python -m pytest xrspatial/geotiff/tests/test_vrt_write.py -v --no-header -x 2>&1 | head -20`
-Expected: FAIL -- VRT path not yet handled.
-
-- [ ] **Step 3: Implement VRT output path in `to_geotiff`**
-
-In `xrspatial/geotiff/__init__.py`, add the VRT detection and dispatch at the top of `to_geotiff`, right after the docstring and before the GPU dispatch:
-
-```python
-    # VRT tiled output
-    if path.lower().endswith('.vrt'):
-        if cog:
-            raise ValueError(
-                "cog=True is not compatible with VRT output. "
-                "VRT writes tiled GeoTIFFs, not a single COG.")
-        if overview_levels is not None:
-            raise ValueError(
-                "overview_levels is not compatible with VRT output. "
-                "VRT tiles do not include overviews.")
-        _write_vrt_tiled(data, path,
-                         crs=crs, nodata=nodata,
-                         compression=compression,
-                         compression_level=compression_level,
-                         tile_size=tile_size,
-                         predictor=predictor,
-                         bigtiff=bigtiff,
-                         gpu=gpu)
-        return
-```
-
-Then add the `_write_vrt_tiled` function (new function in `__init__.py`):
-
-```python
-def _write_vrt_tiled(data, vrt_path: str, *,
-                     crs=None, nodata=None,
-                     compression='zstd', compression_level=None,
-                     tile_size=256, predictor=False,
-                     bigtiff=None, gpu=None):
-    """Write a DataArray as a directory of tiled GeoTIFFs with a VRT index.
-
-    For dask inputs, each chunk is computed and written independently
-    so the full array never materialises in RAM.
-    """
-    import os
-    import math
-    from ._vrt import write_vrt as _write_vrt_fn
-
-    stem = os.path.splitext(os.path.basename(vrt_path))[0]
-    tiles_dir = os.path.join(os.path.dirname(vrt_path) or '.', f'{stem}_tiles')
-
-    # Validate tiles directory
-    if os.path.isdir(tiles_dir) and os.listdir(tiles_dir):
-        raise FileExistsError(
-            f"Tiles directory already exists and is not empty: {tiles_dir}")
-    os.makedirs(tiles_dir, exist_ok=True)
-
-    # Resolve metadata from the DataArray
-    epsg = None
-    wkt = None
-    nodata_val = nodata
-    geo_transform = None
-
-    if isinstance(data, xr.DataArray):
-        geo_transform = _coords_to_transform(data)
-        if crs is None:
-            crs_attr = data.attrs.get('crs')
-            if isinstance(crs_attr, str):
-                epsg = _wkt_to_epsg(crs_attr)
-                if epsg is None:
-                    wkt = crs_attr
-            elif crs_attr is not None:
-                epsg = int(crs_attr)
-            if epsg is None:
-                wkt_attr = data.attrs.get('crs_wkt')
-                if isinstance(wkt_attr, str):
-                    epsg = _wkt_to_epsg(wkt_attr)
-                    if epsg is None:
-                        wkt = wkt_attr
-        elif isinstance(crs, int):
-            epsg = crs
-        elif isinstance(crs, str):
-            epsg = _wkt_to_epsg(crs)
-            if epsg is None:
-                wkt = crs
-        if nodata_val is None:
-            nodata_val = data.attrs.get('nodata')
-
-    raw = data.data if isinstance(data, xr.DataArray) else data
-    is_dask = hasattr(raw, 'dask')
-    is_cupy = hasattr(raw, 'device') or hasattr(raw, 'get')
-
-    if is_dask:
-        # Dask path: one tile per chunk
-        import dask
-        chunks_y = raw.chunks[0]
-        chunks_x = raw.chunks[1]
-        n_rows = len(chunks_y)
-        n_cols = len(chunks_x)
-    else:
-        # Numpy/CuPy path: slice by tile_size
-        if is_cupy:
-            arr = raw
-        else:
-            arr = np.asarray(raw)
-        h, w = arr.shape[:2]
-        n_rows = math.ceil(h / tile_size)
-        n_cols = math.ceil(w / tile_size)
-
-    pad_width = len(str(max(n_rows, n_cols) - 1))
-    tile_paths = []
-
-    if is_dask:
-        delayed_writes = []
-        row_offset = 0
-        for ri, ch_y in enumerate(chunks_y):
-            col_offset = 0
-            for ci, ch_x in enumerate(chunks_x):
-                tile_name = f'tile_{ri:0{pad_width}d}_{ci:0{pad_width}d}.tif'
-                tile_path = os.path.join(tiles_dir, tile_name)
-                tile_paths.append(tile_path)
-
-                # Extract the chunk as a dask array
-                chunk_slice = raw[
-                    row_offset:row_offset + ch_y,
-                    col_offset:col_offset + ch_x,
-                ]
-
-                # Build per-tile geo_transform
-                tile_gt = None
-                if geo_transform is not None:
-                    t = geo_transform
-                    tile_gt = GeoTransform(
-                        origin_x=t.origin_x + col_offset * t.pixel_width,
-                        origin_y=t.origin_y + row_offset * t.pixel_height,
-                        pixel_width=t.pixel_width,
-                        pixel_height=t.pixel_height,
-                    )
-
-                delayed_writes.append(
-                    dask.delayed(_write_single_tile)(
-                        chunk_slice, tile_path, tile_gt, epsg, wkt,
-                        nodata_val, compression, compression_level,
-                        tile_size, predictor, bigtiff))
-
-                col_offset += ch_x
-            row_offset += ch_y
-
-        dask.compute(*delayed_writes)
-
-    else:
-        # Numpy/CuPy: slice and write sequentially
-        h, w = arr.shape[:2]
-        for ri in range(n_rows):
-            for ci in range(n_cols):
-                r0 = ri * tile_size
-                c0 = ci * tile_size
-                r1 = min(r0 + tile_size, h)
-                c1 = min(c0 + tile_size, w)
-
-                tile_name = f'tile_{ri:0{pad_width}d}_{ci:0{pad_width}d}.tif'
-                tile_path = os.path.join(tiles_dir, tile_name)
-                tile_paths.append(tile_path)
-
-                tile_data = arr[r0:r1, c0:c1]
-
-                tile_gt = None
-                if geo_transform is not None:
-                    t = geo_transform
-                    tile_gt = GeoTransform(
-                        origin_x=t.origin_x + c0 * t.pixel_width,
-                        origin_y=t.origin_y + r0 * t.pixel_height,
-                        pixel_width=t.pixel_width,
-                        pixel_height=t.pixel_height,
-                    )
-
-                _write_single_tile(
-                    tile_data, tile_path, tile_gt, epsg, wkt,
-                    nodata_val, compression, compression_level,
-                    tile_size, predictor, bigtiff)
-
-    # Generate VRT index with relative paths
-    write_vrt(vrt_path, tile_paths, relative=True,
-              nodata=nodata_val)
-
-
-def _write_single_tile(chunk_data, path, geo_transform, epsg, wkt,
-                       nodata, compression, compression_level,
-                       tile_size, predictor, bigtiff):
-    """Write a single tile GeoTIFF. Used by _write_vrt_tiled."""
-    if hasattr(chunk_data, 'compute'):
-        chunk_data = chunk_data.compute()
-    if hasattr(chunk_data, 'get'):
-        chunk_data = chunk_data.get()  # CuPy -> numpy
-
-    arr = np.asarray(chunk_data)
-
-    # Auto-promote unsupported dtypes
-    if arr.dtype == np.float16:
-        arr = arr.astype(np.float32)
-    elif arr.dtype == np.bool_:
-        arr = arr.astype(np.uint8)
-
-    # Restore NaN to nodata sentinel
-    if nodata is not None and arr.dtype.kind == 'f' and not np.isnan(nodata):
-        nan_mask = np.isnan(arr)
-        if nan_mask.any():
-            arr = arr.copy()
-            arr[nan_mask] = arr.dtype.type(nodata)
-
-    write(arr, path,
-          geo_transform=geo_transform,
-          crs_epsg=epsg,
-          crs_wkt=wkt if epsg is None else None,
-          nodata=nodata,
-          compression=compression,
-          tiled=True,
-          tile_size=tile_size,
-          predictor=predictor,
-          compression_level=compression_level,
-          bigtiff=bigtiff)
-```
-
-Note: The import of `GeoTransform` is already at the top of `__init__.py` (line 19). The import of `write_vrt` should come from `._vrt`. Adjust the import inside `_write_vrt_tiled` to: `from ._vrt import write_vrt as _write_vrt_fn` and call `_write_vrt_fn(...)`.
-
-- [ ] **Step 4: Run tests to verify they pass**
-
-Run: `cd .claude/worktrees/issue-1083 && python -m pytest xrspatial/geotiff/tests/test_vrt_write.py -v --no-header 2>&1 | tail -30`
-Expected: All PASS.
-
-- [ ] **Step 5: Run full test suite to check for regressions**
-
-Run: `cd .claude/worktrees/issue-1083 && python -m pytest xrspatial/geotiff/tests/ -v --no-header -q 2>&1 | tail -20`
-Expected: All PASS.
-
-- [ ] **Step 6: Commit**
-
-```bash
-cd .claude/worktrees/issue-1083
-git add xrspatial/geotiff/__init__.py xrspatial/geotiff/tests/test_vrt_write.py
-git commit -m "Add VRT tiled output from to_geotiff (#1083)"
-```
-
----
-
-### Task 4: Update documentation and README
-
-**Files:**
-- Modify: `docs/source/reference/io.rst` (or equivalent -- check for existing geotiff docs)
-- Modify: `README.md`
-
-- [ ] **Step 1: Update API docs**
-
-Check if `docs/source/reference/` has an entry for `open_geotiff`/`to_geotiff`. If so, no code change needed since the docstrings will auto-generate. If there's a manually maintained parameter list, add `dtype`, `compression_level`, and the `.vrt` extension behaviour.
-
-- [ ] **Step 2: Update README usage examples**
-
-In `README.md`, find the GeoTIFF I/O section (around line 140-201 based on the exploration). Add these examples to the existing list:
-
-```python
-open_geotiff('dem.tif', dtype='float32')              # half memory
-open_geotiff('dem.tif', dtype='float32', chunks=512)   # dask + half memory
-to_geotiff(data, 'out.tif', compression_level=1)       # fast scratch write
-to_geotiff(data, 'out.tif', compression_level=22)      # max compression
-to_geotiff(dask_da, 'mosaic.vrt')                      # stream dask to VRT
-```
-
-- [ ] **Step 3: Commit**
-
-```bash
-cd .claude/worktrees/issue-1083
-git add README.md docs/
-git commit -m "Update docs for dtype, compression_level, VRT output (#1083)"
-```
-
----
-
-### Task 5: User guide notebook
-
-**Files:**
-- Create: `examples/user_guide/46_GeoTIFF_Performance.ipynb`
-
-- [ ] **Step 1: Create the notebook**
-
-Create `examples/user_guide/46_GeoTIFF_Performance.ipynb` with these cells:
-
-1. **Markdown: title** -- "GeoTIFF Performance Controls: dtype, compression_level, and VRT output"
-2. **Code: imports** -- `import numpy as np, xarray as xr, os, tempfile` and `from xrspatial.geotiff import open_geotiff, to_geotiff`
-3. **Markdown: dtype section** -- explain what `dtype` does and when to use it
-4. **Code: create a float64 raster, write it, read back with dtype='float32'** -- show the memory savings (arr.nbytes before and after)
-5. **Code: dask dtype** -- same with `chunks=256`, show `.dtype` on the result
-6. **Markdown: compression_level section** -- explain the speed/size tradeoff
-7. **Code: write same raster at level=1 and level=22** -- compare file sizes and write times with `%%time`
-8. **Markdown: VRT output section** -- explain the streaming write and directory layout
-9. **Code: create a larger raster, chunk it, write to .vrt** -- show the output directory listing
-10. **Code: read the VRT back** -- round-trip verification
-11. **Markdown: summary** -- one-paragraph recap
-
-- [ ] **Step 2: Run the notebook to verify it executes**
-
-Run: `cd .claude/worktrees/issue-1083 && jupyter nbconvert --to notebook --execute examples/user_guide/46_GeoTIFF_Performance.ipynb --output /dev/null 2>&1 | tail -5`
-Expected: No errors.
-
-- [ ] **Step 3: Commit**
-
-```bash
-cd .claude/worktrees/issue-1083
-git add examples/user_guide/46_GeoTIFF_Performance.ipynb
-git commit -m "Add user guide notebook for geotiff performance controls (#1083)"
-```
diff --git a/docs/superpowers/plans/2026-03-31-sweep-performance.md b/docs/superpowers/plans/2026-03-31-sweep-performance.md
deleted file mode 100644
index 8615a41e..00000000
--- a/docs/superpowers/plans/2026-03-31-sweep-performance.md
+++ /dev/null
@@ -1,743 +0,0 @@
-# Sweep-Performance Implementation Plan
-
-> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
-
-**Goal:** Create a `/sweep-performance` slash command that audits all xrspatial modules for performance bottlenecks, OOM risk under 30TB dask workloads, and backend anti-patterns using parallel subagents, then generates a ralph-loop to fix HIGH-severity issues.
-
-**Architecture:** Single command file (`.claude/commands/sweep-performance.md`) containing all instructions for both phases. Phase 1 dispatches parallel subagents via the Agent tool for static analysis + 30TB graph simulation. Phase 2 generates a `/ralph-loop` command targeting HIGH-severity modules for real benchmarks and `/rockout` fixes. State persisted in `.claude/performance-sweep-state.json`.
-
-**Tech Stack:** Claude Code slash commands (markdown), Agent tool for subagent dispatch, Bash for git metadata and benchmark scripts, dask for graph simulation, tracemalloc/resource/cupy for memory measurement.
-
----
-
-## File Structure
-
-| File | Purpose |
-|------|---------|
-| Create: `.claude/commands/sweep-performance.md` | The slash command — all Phase 1 and Phase 2 logic |
-| Create: `.claude/performance-sweep-state.json` | Runtime state file (created by the command at execution time, not committed) |
-
-This is a single-file deliverable. The command file contains all the instructions that Claude follows when `/sweep-performance` is invoked. No Python code, no library files — just a well-structured prompt document, same pattern as `accuracy-sweep.md`.
-
----
-
-### Task 1: Scaffold the Command Header and Argument Parsing
-
-**Files:**
-- Create: `.claude/commands/sweep-performance.md`
-
-- [ ] **Step 1: Create the command file with title, description, and argument parsing**
-
-```markdown
-# Performance Sweep: Parallel Triage and Fix Workflow
-
-Audit xrspatial modules for performance bottlenecks, OOM risk under 30TB dask
-workloads, and backend-specific anti-patterns. Dispatches parallel subagents
-for fast triage, then generates a ralph-loop to benchmark and fix HIGH-severity
-issues.
-
-Optional arguments: $ARGUMENTS
-(e.g. `--top 5`, `--exclude slope,aspect`, `--only-io`, `--reset-state`)
-
----
-
-## Step 0 -- Determine mode and parse arguments
-
-Parse $ARGUMENTS for these flags (multiple may combine):
-
-| Flag | Effect |
-|------|--------|
-| `--top N` | Limit Phase 1 to the top N scored modules (default: all) |
-| `--exclude mod1,mod2` | Remove named modules from scope |
-| `--only-terrain` | Restrict to: slope, aspect, curvature, terrain, terrain_metrics, hillshade, sky_view_factor |
-| `--only-focal` | Restrict to: focal, convolution, morphology, bilateral, edge_detection, glcm |
-| `--only-hydro` | Restrict to: flood, cost_distance, geodesic, surface_distance, viewshed, erosion, diffusion |
-| `--only-io` | Restrict to: geotiff, reproject, rasterize, polygonize |
-| `--reset-state` | Delete `.claude/performance-sweep-state.json` and treat all modules as never-inspected |
-| `--skip-phase1` | Skip triage; reuse last state file; go straight to ralph-loop generation for unresolved HIGH items |
-| `--report-only` | Run Phase 1 triage but do not generate a ralph-loop command |
-| `--size small` | Phase 2 benchmarks use 128x128 arrays |
-| `--size large` | Phase 2 benchmarks use 2048x2048 arrays |
-| `--high-only` | Only report HIGH severity findings in the triage output |
-
-If `--skip-phase1` is set, jump to Step 6 (ralph-loop generation).
-Otherwise proceed to Step 1.
-```
-
-- [ ] **Step 2: Verify the file was created correctly**
-
-Run: `head -40 .claude/commands/sweep-performance.md`
-Expected: The title, description, and Step 0 argument table are present.
-
-- [ ] **Step 3: Commit**
-
-```bash
-git add .claude/commands/sweep-performance.md
-git commit -m "Add sweep-performance command scaffold with argument parsing"
-```
-
----
-
-### Task 2: Module Discovery and Scoring (Step 1-2)
-
-**Files:**
-- Modify: `.claude/commands/sweep-performance.md`
-
-- [ ] **Step 1: Append Step 1 (module discovery) to the command file**
-
-Append the following to `.claude/commands/sweep-performance.md`:
-
-```markdown
-## Step 1 -- Discover modules in scope
-
-Enumerate all candidate modules. For each, record its file path(s):
-
-**Single-file modules:** Every `.py` file directly under `xrspatial/`, excluding
-`__init__.py`, `_version.py`, `__main__.py`, `utils.py`, `accessor.py`,
-`preview.py`, `dataset_support.py`, `diagnostics.py`, `analytics.py`.
-
-**Subpackage modules:** The `geotiff/` and `reproject/` directories under
-`xrspatial/`. Treat each subpackage as a single audit unit. List all `.py`
-files within each (excluding `__init__.py`).
-
-Apply `--only-*` and `--exclude` filters from Step 0 to narrow the list.
-
-Store the filtered module list in memory (do NOT write intermediate files).
-```
-
-- [ ] **Step 2: Append Step 2 (git metadata and scoring) to the command file**
-
-Append the following:
-
-```markdown
-## Step 2 -- Gather metadata and score each module
-
-For every module in scope, collect:
-
-| Field | How |
-|-------|-----|
-| **last_modified** | `git log -1 --format=%aI -- <path>` (for subpackages, use the most recent file) |
-| **total_commits** | `git log --oneline -- <path> \| wc -l` |
-| **loc** | `wc -l < <path>` (for subpackages, sum all files) |
-| **has_dask_backend** | grep the file(s) for `_run_dask`, `map_overlap`, `map_blocks` |
-| **has_cuda_backend** | grep the file(s) for `@cuda.jit`, `import cupy` |
-| **is_io_module** | module is geotiff or reproject |
-| **has_existing_bench** | a file matching the module name exists in `benchmarks/benchmarks/` |
-
-### Load inspection state
-
-Read `.claude/performance-sweep-state.json`. If it does not exist, treat every
-module as never-inspected. If `--reset-state` was set, delete the file first.
-
-State file schema:
-
-```json
-{
-  "last_triage": "ISO-DATE",
-  "modules": {
-    "slope": {
-      "last_inspected": "ISO-DATE",
-      "oom_verdict": "SAFE",
-      "bottleneck": "compute-bound",
-      "high_count": 0,
-      "issue": null
-    }
-  }
-}
-```
-
-### Compute scores
-
-```
-days_since_inspected = (today - last_perf_inspected).days   # 9999 if never
-days_since_modified  = (today - last_modified).days
-
-score = (days_since_inspected * 3)
-      + (loc * 0.1)
-      + (total_commits * 0.5)
-      + (has_dask_backend * 200)
-      + (has_cuda_backend * 150)
-      + (is_io_module * 300)
-      - (days_since_modified * 0.2)
-      - (has_existing_bench * 100)
-```
-
-Sort modules by score descending. If `--top N` is set, keep only the top N.
-```
-
-- [ ] **Step 3: Verify the appended steps read correctly**
-
-Run: `grep -c "^## Step" .claude/commands/sweep-performance.md`
-Expected: `3` (Step 0, Step 1, Step 2)
-
-- [ ] **Step 4: Commit**
-
-```bash
-git add .claude/commands/sweep-performance.md
-git commit -m "Add module discovery and scoring to sweep-performance"
-```
-
----
-
-### Task 3: Phase 1 Subagent Dispatch (Step 3)
-
-**Files:**
-- Modify: `.claude/commands/sweep-performance.md`
-
-- [ ] **Step 1: Append Step 3 (subagent dispatch and analysis instructions)**
-
-Append the following to `.claude/commands/sweep-performance.md`:
-
-````markdown
-## Step 3 -- Dispatch parallel subagents for static triage
-
-For each module in the scored list, dispatch a subagent using the Agent tool.
-Launch ALL subagents in a single message (parallel dispatch). Each subagent
-receives the prompt below, with `MODULE_NAME` and `MODULE_FILES` substituted.
-
-**Subagent prompt template:**
-
-```
-You are auditing the xrspatial module "MODULE_NAME" for performance issues.
-
-Read these files: MODULE_FILES
-
-Perform ALL of the following analyses and return your findings as a single
-JSON object. Do NOT modify any files. This is read-only analysis.
-
-## 1. Dask Path Analysis
-
-Trace every dask code path (_run_dask, _run_dask_cupy, or any function that
-receives dask-backed DataArrays). Flag these patterns with severity:
-
-- HIGH: `.values` on a dask-backed DataArray or CuPy array (premature materialization)
-- HIGH: `.compute()` inside a loop (materializes full graph each iteration)
-- HIGH: `np.array()` or `np.asarray()` wrapping a dask or CuPy array
-- MEDIUM: `da.stack()` without a following `.rechunk()`
-- MEDIUM: `map_overlap` with depth >= chunk_size / 4
-- MEDIUM: Missing `boundary` argument in `map_overlap`
-- MEDIUM: Same function called twice on same input without caching
-- MEDIUM: Python `for` loop iterating over dask chunks (serializes the graph)
-
-If the module has NO dask code path, note "no dask backend" and skip.
-
-## 2. 30TB / 16GB OOM Verdict
-
-For each dask code path found in section 1:
-
-**Part A — Static trace:** Follow the code end-to-end. Answer: does peak
-memory scale with total array size, or with chunk size? If any operation
-forces full materialization, the verdict is WILL OOM.
-
-**Part B — Task graph simulation:** Write and run a Python script (in /tmp/
-with a unique name including "MODULE_NAME") that:
-
-```python
-import dask.array as da
-import xarray as xr
-import json, sys
-
-arr = da.zeros((2560, 2560), chunks=(256, 256), dtype='float64')
-raster = xr.DataArray(arr, dims=['y', 'x'])
-
-# Add coords if the function needs them (geodesic, slope with CRS, etc.)
-# raster = raster.assign_coords(x=np.linspace(-180, 180, 2560),
-#                                y=np.linspace(-90, 90, 2560))
-
-try:
-    result = MODULE_FUNCTION(raster, **DEFAULT_ARGS)
-    graph = result.__dask_graph__()
-    task_count = len(graph)
-    tasks_per_chunk = task_count / 100.0
-
-    # Check for fan-out: any task key that depends on more than 4 other tasks
-    deps = dict(graph)
-    max_fan_in = 0
-    for key, val in deps.items():
-        if hasattr(val, '__dask_graph__'):
-            sub = val.__dask_graph__()
-            max_fan_in = max(max_fan_in, len(sub))
-
-    print(json.dumps({
-        "success": True,
-        "task_count": task_count,
-        "tasks_per_chunk": round(tasks_per_chunk, 2),
-        "max_fan_in": max_fan_in,
-        "extrapolation_30tb": "~{} tasks at 57M chunks".format(
-            int(tasks_per_chunk * 57_000_000))
-    }))
-except Exception as e:
-    print(json.dumps({"success": False, "error": str(e)}))
-```
-
-Adapt the function call and imports for the specific module. Run the script
-and capture its JSON output. If it errors, record the error and rely on
-Part A alone.
-
-**Verdict:** One of:
-- `SAFE` — memory bounded by chunk size, graph scales linearly
-- `RISKY` — bounded but tight (e.g. large overlap depth, 3D intermediates)
-- `WILL OOM` — forces full materialization or unbounded memory growth
-
-## 3. GPU Transfer Analysis
-
-Scan for CuPy/CUDA code paths. Flag:
-
-- HIGH: `.data.get()` followed by CuPy operations (GPU-CPU-GPU round-trip)
-- HIGH: `cupy.asarray()` inside a loop (repeated CPU-GPU transfers)
-- MEDIUM: Mixing NumPy and CuPy ops in same function without clear reason
-- MEDIUM: Register pressure — count float64 local variables in `@cuda.jit`
-  kernels; flag if >20
-- MEDIUM: Thread blocks >16x16 on kernels with >20 float64 locals
-
-If the module has NO GPU code path, note "no GPU backend" and skip.
-
-## 4. Memory Allocation Patterns
-
-- MEDIUM: Unnecessary `.copy()` on arrays never mutated downstream
-- MEDIUM: Large temporary arrays that could be fused into the kernel
-- LOW: `np.zeros_like()` + fill loop where `np.empty()` would suffice
-
-## 5. Numba Anti-Patterns
-
-- MEDIUM: Missing `@ngjit` on nested for-loops over `.data` arrays
-- MEDIUM: `@jit` without `nopython=True` (object-mode fallback risk)
-- LOW: Type instability — initializing with int then assigning float
-- LOW: Column-major iteration on row-major arrays (inner loop should be last axis)
-
-## 6. Bottleneck Classification
-
-Based on your analysis, classify the module as ONE of:
-- `IO-bound` — dominated by disk reads/writes or serialization
-- `memory-bound` — peak allocation is the limiting factor
-- `compute-bound` — CPU/GPU time dominates, memory is fine
-- `graph-bound` — dask task graph overhead dominates
-
-## Output Format
-
-Return EXACTLY this JSON structure (no extra text before or after):
-
-```json
-{
-  "module": "MODULE_NAME",
-  "files_read": ["list of files you read"],
-  "findings": [
-    {
-      "severity": "HIGH|MEDIUM|LOW",
-      "category": "dask_materialization|dask_chunking|gpu_transfer|register_pressure|memory_allocation|numba_antipattern",
-      "file": "filename.py",
-      "line": 123,
-      "description": "what the issue is",
-      "fix": "how to fix it",
-      "backends_affected": ["dask+numpy", "dask+cupy", "cupy", "numpy"]
-    }
-  ],
-  "oom_verdict": {
-    "dask_numpy": "SAFE|RISKY|WILL OOM",
-    "dask_cupy": "SAFE|RISKY|WILL OOM",
-    "reasoning": "one-sentence explanation",
-    "estimated_peak_per_chunk_mb": 0.5,
-    "task_count": 3721,
-    "tasks_per_chunk": 37.21,
-    "graph_simulation_ran": true
-  },
-  "bottleneck": "compute-bound|memory-bound|IO-bound|graph-bound",
-  "bottleneck_reasoning": "one-sentence explanation"
-}
-```
-
-IMPORTANT: Only flag patterns that are ACTUALLY present in the code. Do not
-report hypothetical issues. False positives are worse than missed issues.
-If a pattern like `.values` is used on a known-numpy-only code path, do not
-flag it.
-```
-
-Wait for all subagents to return before proceeding to Step 4.
-````
-
-- [ ] **Step 2: Verify the subagent prompt is well-formed**
-
-Run: `grep -c "## [0-9]" .claude/commands/sweep-performance.md`
-Expected: At least 6 (the six analysis sections inside the subagent prompt)
-
-- [ ] **Step 3: Commit**
-
-```bash
-git add .claude/commands/sweep-performance.md
-git commit -m "Add Phase 1 subagent dispatch and analysis template"
-```
-
----
-
-### Task 4: Phase 1 Report Merging and State Update (Steps 4-5)
-
-**Files:**
-- Modify: `.claude/commands/sweep-performance.md`
-
-- [ ] **Step 1: Append Step 4 (merge subagent results into report)**
-
-Append the following to `.claude/commands/sweep-performance.md`:
-
-````markdown
-## Step 4 -- Merge results and print the triage report
-
-Parse the JSON returned by each subagent. If a subagent returned malformed
-output, record the module as "audit failed" with a note.
-
-### 4a. Print the Module Risk Ranking Table
-
-Sort modules by score descending. Print:
-
-```
-## Performance Sweep — Static Triage Report
-
-### Module Risk Ranking
-| Rank | Module          | Score  | OOM Verdict     | Bottleneck    | HIGH | MED | LOW |
-|------|-----------------|--------|-----------------|---------------|------|-----|-----|
-| 1    | geotiff         | 31200  | WILL OOM (d+np) | IO-bound      | 3    | 1   | 0   |
-| 2    | viewshed        | 30050  | RISKY (d+np)    | memory-bound  | 2    | 2   | 1   |
-| ...  | ...             | ...    | ...             | ...           | ...  | ... | ... |
-```
-
-If `--high-only` is set, only count HIGH findings and omit modules with zero HIGH.
-
-### 4b. Print the 30TB / 16GB Verdict Summary
-
-Group modules by OOM verdict:
-
-```
-### 30TB on Disk / 16GB RAM — Out-of-Memory Analysis
-
-#### WILL OOM (fix required)
-- **module_name**: reasoning from subagent
-
-#### RISKY (bounded but tight)
-- **module_name**: reasoning from subagent
-
-#### SAFE (memory bounded by chunk size)
-- module_name, module_name, module_name, ...
-```
-
-### 4c. Print Detailed Findings
-
-For each module that has findings, print a severity-grouped table:
-
-```
-### module_name (bottleneck: compute-bound, OOM: SAFE)
-
-| # | Severity | File:Line      | Category                | Description                  | Fix                           |
-|---|----------|----------------|-------------------------|------------------------------|-------------------------------|
-| 1 | HIGH     | slope.py:142   | dask_materialization    | .values on dask input        | Use .data or stay lazy        |
-| 2 | MEDIUM   | slope.py:88    | dask_chunking           | map_overlap depth too large  | Reduce depth or warn users    |
-```
-
-### 4d. Print Actionable Rockout Commands
-
-For each HIGH-severity finding, print a ready-to-paste `/rockout` command:
-
-```
-### Ready-to-Run Fixes (HIGH severity only)
-
-1. **geotiff** — eager .values materialization (WILL OOM)
-   /rockout "Fix eager .values materialization in geotiff reader.
-   The dask read path at reader.py:87 calls .values which forces
-   the full array into memory. For 30TB inputs this will OOM on
-   a 16GB machine. Must stay lazy through the entire read path."
-
-2. **cost_distance** — iterative solver unbounded memory (WILL OOM)
-   /rockout "Fix cost_distance iterative solver to work within
-   bounded memory. Currently materializes the full distance matrix
-   each iteration. Must use chunked iteration for 30TB dask inputs."
-```
-
-Construct each `/rockout` command from the finding's description and fix fields.
-Include the OOM verdict and bottleneck classification in the prompt text so
-rockout has full context.
-````
-
-- [ ] **Step 2: Append Step 5 (state file update)**
-
-Append the following:
-
-````markdown
-## Step 5 -- Update state file
-
-Write `.claude/performance-sweep-state.json` with the triage results:
-
-```json
-{
-  "last_triage": "<current ISO datetime>",
-  "modules": {
-    "<module_name>": {
-      "last_inspected": "<current ISO datetime>",
-      "oom_verdict": "<SAFE|RISKY|WILL OOM>",
-      "bottleneck": "<IO-bound|memory-bound|compute-bound|graph-bound>",
-      "high_count": <number of HIGH findings>,
-      "issue": null
-    }
-  }
-}
-```
-
-If the file already exists, merge — update entries for modules that were
-just audited, keep entries for modules not in this run's scope.
-
-If `--report-only` is set, stop here. Do not proceed to Step 6.
-````
-
-- [ ] **Step 3: Verify both steps appended**
-
-Run: `grep -c "^## Step" .claude/commands/sweep-performance.md`
-Expected: `6` (Steps 0 through 5)
-
-- [ ] **Step 4: Commit**
-
-```bash
-git add .claude/commands/sweep-performance.md
-git commit -m "Add Phase 1 report merging and state update"
-```
-
----
-
-### Task 5: Phase 2 Ralph-Loop Generation (Step 6)
-
-**Files:**
-- Modify: `.claude/commands/sweep-performance.md`
-
-- [ ] **Step 1: Append Step 6 (ralph-loop generation)**
-
-Append the following to `.claude/commands/sweep-performance.md`:
-
-````markdown
-## Step 6 -- Generate the ralph-loop command
-
-Collect all modules from Step 4 (or from the state file if `--skip-phase1`)
-that have at least one HIGH-severity finding and no `issue` recorded in the
-state file (i.e. not yet fixed).
-
-Sort them by: WILL OOM first, then RISKY, then by HIGH count descending.
-
-Determine the benchmark array size from arguments:
-- `--size small` → 128x128
-- `--size large` → 2048x2048
-- default → 512x512
-
-### 6a. Print the ranked target list
-
-```
-### Phase 2 Targets (HIGH severity, unfixed)
-| # | Module        | HIGH Count | OOM Verdict | Bottleneck   |
-|---|---------------|------------|-------------|--------------|
-| 1 | geotiff       | 3          | WILL OOM    | IO-bound     |
-| 2 | cost_distance | 1          | WILL OOM    | memory-bound |
-| 3 | viewshed      | 2          | RISKY       | memory-bound |
-```
-
-If no modules qualify, print:
-"No HIGH-severity findings to fix. Run `/sweep-performance` without
-`--skip-phase1` to refresh the triage."
-Then stop.
-
-### 6b. Print the ralph-loop command
-
-Using the target list, generate and print:
-
-````
-/ralph-loop "Performance sweep Phase 2: benchmark and fix HIGH-severity findings.
-
-**Target modules in priority order:**
-1. <module> (<N> HIGH findings, <OOM verdict>) -- <one-line summary of worst finding>
-2. <module> ...
-...
-
-**For each module, in order:**
-
-1. Write a benchmark script at /tmp/perf_sweep_bench_<module>.py that:
-   - Imports the module's public functions
-   - Creates a test array (<SIZE>x<SIZE>, float64)
-   - For EACH available backend (numpy, dask+numpy; cupy and dask+cupy only if available):
-     a. Wrap the array in the appropriate DataArray type
-     b. Measure wall time: timeit.repeat(number=1, repeat=3), take median
-     c. Measure Python memory: tracemalloc.start() / tracemalloc.get_traced_memory()[1] for peak
-     d. Measure process memory: resource.getrusage(RUSAGE_SELF).ru_maxrss before and after
-     e. For CuPy backends: cupy.get_default_memory_pool().used_bytes() before and after
-   - Print results as JSON to stdout
-
-2. Run the benchmark script and capture results.
-
-3. Confirm the HIGH finding from Phase 1:
-   - If the dask backend uses significantly more memory than expected for
-     the chunk size, or wall time shows a materialization stall: CONFIRMED.
-   - If the benchmark shows no anomaly: downgrade to MEDIUM in state file,
-     print 'False positive — skipping' and move to the next module.
-
-4. If confirmed: run /rockout to fix the issue end-to-end (issue, worktree,
-   implementation, tests, docs). Include the benchmark numbers in the
-   issue body for context.
-
-5. After rockout completes: rerun the same benchmark script. Print a
-   before/after comparison:
-   | Backend    | Metric       | Before | After  | Ratio | Verdict    |
-   |------------|-------------|--------|--------|-------|------------|
-   | numpy      | wall_ms     | 45.2   | 12.1   | 0.27x | IMPROVED   |
-   | dask+numpy | peak_rss_mb | 892    | 34     | 0.04x | IMPROVED   |
-   Thresholds: IMPROVED < 0.8x, REGRESSION > 1.2x, else UNCHANGED.
-
-6. Update .claude/performance-sweep-state.json with the issue number.
-
-7. Output <promise>ITERATION DONE</promise>
-
-If all targets have been addressed or confirmed as false positives:
-<promise>ALL PERFORMANCE ISSUES FIXED</promise>." --max-iterations <N+2> --completion-promise "ALL PERFORMANCE ISSUES FIXED"
-````
-
-Set `--max-iterations` to the number of target modules plus 2 (buffer for
-retries).
-
-### 6c. Print reminder text
-
-```
-Phase 1 triage complete. To proceed with fixes:
-  Copy the ralph-loop command above and paste it.
-
-Other options:
-  Fix one manually:    copy any /rockout command from the report above
-  Rerun triage only:   /sweep-performance --report-only
-  Skip Phase 1:        /sweep-performance --skip-phase1 (reuses last triage)
-  Reset all tracking:  /sweep-performance --reset-state
-```
-````
-
-- [ ] **Step 2: Verify the full command file structure**
-
-Run: `grep "^## Step" .claude/commands/sweep-performance.md`
-Expected output:
-```
-## Step 0 -- Determine mode and parse arguments
-## Step 1 -- Discover modules in scope
-## Step 2 -- Gather metadata and score each module
-## Step 3 -- Dispatch parallel subagents for static triage
-## Step 4 -- Merge results and print the triage report
-## Step 5 -- Update state file
-## Step 6 -- Generate the ralph-loop command
-```
-
-- [ ] **Step 3: Commit**
-
-```bash
-git add .claude/commands/sweep-performance.md
-git commit -m "Add Phase 2 ralph-loop generation to sweep-performance"
-```
-
----
-
-### Task 6: General Rules and Final Polish
-
-**Files:**
-- Modify: `.claude/commands/sweep-performance.md`
-
-- [ ] **Step 1: Append the General Rules section**
-
-Append the following to `.claude/commands/sweep-performance.md`:
-
-```markdown
----
-
-## General Rules
-
-- Phase 1 subagents do NOT modify any source, test, or benchmark files.
-  Read-only analysis only.
-- Phase 2 ralph-loop modifies code only through `/rockout`.
-- Temporary benchmark scripts and graph simulation scripts go in `/tmp/`
-  with unique names including the module name (e.g. `/tmp/perf_sweep_bench_slope.py`,
-  `/tmp/perf_sweep_graph_slope.py`). Clean them up after capturing results.
-- Only flag patterns that are ACTUALLY present in the code. Do not report
-  hypothetical issues or patterns that "could" occur.
-- Include the exact file path and line number for every finding so the user
-  can navigate directly to the issue.
-- False positives are worse than missed issues. If you are not confident a
-  pattern is actually harmful in context (e.g. `.values` used intentionally
-  on a known-numpy array), do not flag it.
-- The 30TB simulation constructs the dask task graph only; it NEVER calls
-  `.compute()`.
-- State file (`.claude/performance-sweep-state.json`) is gitignored by
-  convention — do not add it to git.
-- If $ARGUMENTS is empty, use defaults: audit all modules, benchmark at
-  512x512, generate ralph-loop for HIGH items.
-- For subpackage modules (geotiff, reproject), the subagent should read ALL
-  `.py` files in the subpackage directory, not just `__init__.py`.
-- When generating `/rockout` commands, include the OOM verdict, bottleneck
-  classification, and affected backends in the prompt text so rockout has
-  full performance context.
-```
-
-- [ ] **Step 2: Read the full file end-to-end and verify structure**
-
-Run: `wc -l .claude/commands/sweep-performance.md`
-Expected: Roughly 300-400 lines.
-
-Run: `grep "^## " .claude/commands/sweep-performance.md`
-Expected: Step 0 through Step 6, plus "General Rules".
-
-- [ ] **Step 3: Commit**
-
-```bash
-git add .claude/commands/sweep-performance.md
-git commit -m "Add general rules and finalize sweep-performance command"
-```
-
----
-
-### Task 7: Smoke Test the Command
-
-**Files:**
-- Read: `.claude/commands/sweep-performance.md`
-
-- [ ] **Step 1: Verify the command appears in the slash command list**
-
-Run: `ls .claude/commands/sweep-performance.md`
-Expected: File exists.
-
-- [ ] **Step 2: Verify all cross-references are consistent**
-
-Check that:
-- The state file path `.claude/performance-sweep-state.json` is spelled
-  the same everywhere in the file.
-- The subagent JSON output schema field names match the fields referenced
-  in Step 4 (report merging).
-- The `/rockout` and `/ralph-loop` command syntax matches the patterns used
-  in the existing `accuracy-sweep.md` and `rockout.md` commands.
-
-Run: `grep -c "performance-sweep-state.json" .claude/commands/sweep-performance.md`
-Expected: At least 4 occurrences (Steps 2, 5, 6, and General Rules).
-
-Run: `grep -c "/rockout" .claude/commands/sweep-performance.md`
-Expected: At least 3 occurrences (Steps 4d, 6b, General Rules).
-
-Run: `grep -c "/ralph-loop" .claude/commands/sweep-performance.md`
-Expected: At least 2 occurrences (Step 6b, Step 6c).
-
-- [ ] **Step 3: Verify subagent output schema fields match report consumption**
-
-These field names must appear in both the subagent prompt (Step 3) and the
-report merging logic (Step 4):
-- `module`
-- `findings` (with `severity`, `category`, `file`, `line`, `description`, `fix`, `backends_affected`)
-- `oom_verdict` (with `dask_numpy`, `dask_cupy`, `reasoning`)
-- `bottleneck`
-
-Run: `grep -c '"oom_verdict"' .claude/commands/sweep-performance.md`
-Expected: At least 2 (schema definition + state file).
-
-Run: `grep -c '"bottleneck"' .claude/commands/sweep-performance.md`
-Expected: At least 2.
-
-- [ ] **Step 4: Final commit with all checks passing**
-
-```bash
-git add .claude/commands/sweep-performance.md
-git commit -m "Verify sweep-performance command integrity"
-```
-
-Only commit if there were fixes needed. If all checks passed with no
-changes, skip this commit.
diff --git a/docs/superpowers/plans/2026-04-01-multi-observer-viewshed.md b/docs/superpowers/plans/2026-04-01-multi-observer-viewshed.md
deleted file mode 100644
index 41063ba9..00000000
--- a/docs/superpowers/plans/2026-04-01-multi-observer-viewshed.md
+++ /dev/null
@@ -1,1110 +0,0 @@
-# Multi-Observer Viewshed & Line-of-Sight Implementation Plan
-
-> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
-
-**Goal:** Add `cumulative_viewshed`, `visibility_frequency`, and `line_of_sight` functions to a new `xrspatial/visibility.py` module, with tests, docs, notebook, and README updates.
-
-**Architecture:** New `visibility.py` composes over existing `viewshed()` for multi-observer work and implements Bresenham-based transect extraction for line-of-sight. No changes to `viewshed.py` internals.
-
-**Tech Stack:** numpy, xarray, dask.delayed (for lazy multi-observer), Bresenham's line algorithm, Fresnel zone math.
-
-**Spec:** `docs/superpowers/specs/2026-04-01-multi-observer-viewshed-design.md`
-
----
-
-### Task 1: Create worktree and scaffold module
-
-**Files:**
-- Create: `xrspatial/visibility.py`
-
-- [ ] **Step 1: Create worktree**
-
-```bash
-git worktree add .claude/worktrees/issue-1145 -b issue-1145
-```
-
-- [ ] **Step 2: Create empty visibility module with docstring**
-
-Create `xrspatial/visibility.py` in the worktree:
-
-```python
-"""
-Multi-observer viewshed and line-of-sight profile tools.
-
-Functions
----------
-cumulative_viewshed
-    Count how many observers can see each cell.
-visibility_frequency
-    Fraction of observers that can see each cell.
-line_of_sight
-    Elevation profile and visibility along a straight line between two points.
-"""
-```
-
-- [ ] **Step 3: Commit scaffold**
-
-```bash
-git add xrspatial/visibility.py
-git commit -m "Add empty visibility module scaffold (#1145)"
-```
-
----
-
-### Task 2: Implement and test `_bresenham_line`
-
-**Files:**
-- Modify: `xrspatial/visibility.py`
-- Create: `xrspatial/tests/test_visibility.py`
-
-- [ ] **Step 1: Write the failing test**
-
-Create `xrspatial/tests/test_visibility.py`:
-
-```python
-import numpy as np
-import pytest
-
-from xrspatial.visibility import _bresenham_line
-
-
-class TestBresenhamLine:
-    def test_horizontal(self):
-        cells = _bresenham_line(0, 0, 0, 4)
-        assert cells == [(0, 0), (0, 1), (0, 2), (0, 3), (0, 4)]
-
-    def test_vertical(self):
-        cells = _bresenham_line(0, 0, 4, 0)
-        assert cells == [(0, 0), (1, 0), (2, 0), (3, 0), (4, 0)]
-
-    def test_diagonal(self):
-        cells = _bresenham_line(0, 0, 3, 3)
-        assert cells == [(0, 0), (1, 1), (2, 2), (3, 3)]
-
-    def test_single_cell(self):
-        cells = _bresenham_line(2, 3, 2, 3)
-        assert cells == [(2, 3)]
-
-    def test_steep_negative(self):
-        cells = _bresenham_line(4, 2, 0, 0)
-        # Must start at (4, 2) and end at (0, 0)
-        assert cells[0] == (4, 2)
-        assert cells[-1] == (0, 0)
-        assert len(cells) == 5
-
-    def test_includes_endpoints(self):
-        cells = _bresenham_line(1, 1, 5, 8)
-        assert cells[0] == (1, 1)
-        assert cells[-1] == (5, 8)
-```
-
-- [ ] **Step 2: Run test to verify it fails**
-
-```bash
-pytest xrspatial/tests/test_visibility.py::TestBresenhamLine -v
-```
-
-Expected: FAIL with `ImportError` (function doesn't exist yet).
-
-- [ ] **Step 3: Implement `_bresenham_line`**
-
-Add to `xrspatial/visibility.py`:
-
-```python
-def _bresenham_line(r0, c0, r1, c1):
-    """Return list of (row, col) cells along the line from (r0,c0) to (r1,c1).
-
-    Uses Bresenham's line algorithm. Both endpoints are included.
-    """
-    cells = []
-    dr = abs(r1 - r0)
-    dc = abs(c1 - c0)
-    sr = 1 if r1 > r0 else -1
-    sc = 1 if c1 > c0 else -1
-    err = dr - dc
-    r, c = r0, c0
-    while True:
-        cells.append((r, c))
-        if r == r1 and c == c1:
-            break
-        e2 = 2 * err
-        if e2 > -dc:
-            err -= dc
-            r += sr
-        if e2 < dr:
-            err += dr
-            c += sc
-    return cells
-```
-
-- [ ] **Step 4: Run test to verify it passes**
-
-```bash
-pytest xrspatial/tests/test_visibility.py::TestBresenhamLine -v
-```
-
-Expected: all 6 tests PASS.
-
-- [ ] **Step 5: Commit**
-
-```bash
-git add xrspatial/visibility.py xrspatial/tests/test_visibility.py
-git commit -m "Add Bresenham line algorithm for LOS transects (#1145)"
-```
-
----
-
-### Task 3: Implement and test `_extract_transect`
-
-**Files:**
-- Modify: `xrspatial/visibility.py`
-- Modify: `xrspatial/tests/test_visibility.py`
-
-- [ ] **Step 1: Write the failing test**
-
-Append to `xrspatial/tests/test_visibility.py`:
-
-```python
-import xarray as xr
-from xrspatial.visibility import _extract_transect
-
-
-class TestExtractTransect:
-    def _make_raster(self, data):
-        h, w = data.shape
-        return xr.DataArray(
-            data,
-            dims=['y', 'x'],
-            coords={'y': np.arange(h, dtype=float),
-                    'x': np.arange(w, dtype=float)},
-        )
-
-    def test_numpy_diagonal(self):
-        data = np.arange(25, dtype=float).reshape(5, 5)
-        raster = self._make_raster(data)
-        cells = [(0, 0), (1, 1), (2, 2), (3, 3), (4, 4)]
-        elev, xs, ys = _extract_transect(raster, cells)
-        np.testing.assert_array_equal(elev, [0, 6, 12, 18, 24])
-        np.testing.assert_array_equal(xs, [0, 1, 2, 3, 4])
-        np.testing.assert_array_equal(ys, [0, 1, 2, 3, 4])
-
-    def test_dask_matches_numpy(self):
-        import dask.array as da
-        data = np.arange(25, dtype=float).reshape(5, 5)
-        raster_np = self._make_raster(data)
-        raster_dask = raster_np.copy()
-        raster_dask.data = da.from_array(data, chunks=(3, 3))
-        cells = [(0, 0), (2, 3), (4, 4)]
-        elev_np, _, _ = _extract_transect(raster_np, cells)
-        elev_da, _, _ = _extract_transect(raster_dask, cells)
-        np.testing.assert_array_equal(elev_np, elev_da)
-```
-
-- [ ] **Step 2: Run test to verify it fails**
-
-```bash
-pytest xrspatial/tests/test_visibility.py::TestExtractTransect -v
-```
-
-Expected: FAIL with `ImportError`.
-
-- [ ] **Step 3: Implement `_extract_transect`**
-
-Add to `xrspatial/visibility.py`:
-
-```python
-import numpy as np
-import xarray
-
-from .utils import has_cuda_and_cupy, has_dask_array, is_cupy_array
-
-
-def _extract_transect(raster, cells):
-    """Extract elevation, x-coords, and y-coords for a list of (row, col) cells.
-
-    For dask or cupy-backed rasters the values are pulled to numpy.
-    Returns (elevations, x_coords, y_coords) as 1-D numpy arrays.
-    """
-    rows = np.array([r for r, c in cells])
-    cols = np.array([c for r, c in cells])
-
-    x_coords = raster.coords['x'].values[cols]
-    y_coords = raster.coords['y'].values[rows]
-
-    data = raster.data
-    if has_dask_array():
-        import dask.array as da
-        if isinstance(data, da.Array):
-            data = data.compute()
-    if has_cuda_and_cupy() and is_cupy_array(data):
-        data = data.get()
-
-    elevations = data[rows, cols].astype(np.float64)
-    return elevations, x_coords, y_coords
-```
-
-- [ ] **Step 4: Run test to verify it passes**
-
-```bash
-pytest xrspatial/tests/test_visibility.py::TestExtractTransect -v
-```
-
-Expected: PASS.
-
-- [ ] **Step 5: Commit**
-
-```bash
-git add xrspatial/visibility.py xrspatial/tests/test_visibility.py
-git commit -m "Add transect extraction helper for LOS profiles (#1145)"
-```
-
----
-
-### Task 4: Implement and test `line_of_sight`
-
-**Files:**
-- Modify: `xrspatial/visibility.py`
-- Modify: `xrspatial/tests/test_visibility.py`
-
-- [ ] **Step 1: Write the failing tests**
-
-Append to `xrspatial/tests/test_visibility.py`:
-
-```python
-from xrspatial.visibility import line_of_sight
-
-
-def _make_raster(data):
-    """Module-level helper for creating test rasters."""
-    h, w = data.shape
-    return xr.DataArray(
-        data,
-        dims=['y', 'x'],
-        coords={'y': np.arange(h, dtype=float),
-                'x': np.arange(w, dtype=float)},
-    )
-
-
-class TestLineOfSight:
-    def test_flat_terrain_all_visible(self):
-        data = np.zeros((5, 10), dtype=float)
-        raster = _make_raster(data)
-        result = line_of_sight(raster, x0=0, y0=2, x1=9, y1=2,
-                               observer_elev=10, target_elev=10)
-        assert isinstance(result, xr.Dataset)
-        assert 'visible' in result
-        assert 'elevation' in result
-        assert 'los_height' in result
-        assert 'distance' in result
-        assert result['visible'].all()
-
-    def test_obstruction_blocks_view(self):
-        data = np.zeros((1, 10), dtype=float)
-        data[0, 5] = 100  # tall wall in the middle
-        raster = _make_raster(data)
-        result = line_of_sight(raster, x0=0, y0=0, x1=9, y1=0,
-                               observer_elev=1, target_elev=0)
-        vis = result['visible'].values
-        # observer cell is visible
-        assert vis[0]
-        # cells before the wall are visible
-        assert all(vis[:6])
-        # at least some cells after the wall are blocked
-        assert not all(vis[6:])
-
-    def test_observer_equals_target(self):
-        data = np.ones((5, 5), dtype=float)
-        raster = _make_raster(data)
-        result = line_of_sight(raster, x0=2, y0=2, x1=2, y1=2)
-        assert len(result['sample']) == 1
-        assert result['visible'].values[0]
-
-    def test_elevation_offsets(self):
-        data = np.zeros((1, 5), dtype=float)
-        raster = _make_raster(data)
-        result = line_of_sight(raster, x0=0, y0=0, x1=4, y1=0,
-                               observer_elev=10, target_elev=20)
-        los = result['los_height'].values
-        # LOS starts at 10, ends at 20
-        assert abs(los[0] - 10.0) < 1e-10
-        assert abs(los[-1] - 20.0) < 1e-10
-
-    def test_distance_monotonic(self):
-        data = np.zeros((5, 10), dtype=float)
-        raster = _make_raster(data)
-        result = line_of_sight(raster, x0=0, y0=0, x1=9, y1=4)
-        d = result['distance'].values
-        assert all(d[i] <= d[i + 1] for i in range(len(d) - 1))
-
-    def test_fresnel_zone(self):
-        data = np.zeros((1, 11), dtype=float)
-        raster = _make_raster(data)
-        result = line_of_sight(raster, x0=0, y0=0, x1=10, y1=0,
-                               observer_elev=50, target_elev=50,
-                               frequency_mhz=900)
-        assert 'fresnel_radius' in result
-        assert 'fresnel_clear' in result
-        # midpoint has largest Fresnel radius
-        fr = result['fresnel_radius'].values
-        mid = len(fr) // 2
-        assert fr[mid] >= fr[1]
-        assert fr[mid] >= fr[-2]
-        # with 50m clearance and flat terrain, Fresnel should be clear
-        assert result['fresnel_clear'].all()
-
-    def test_no_fresnel_by_default(self):
-        data = np.zeros((5, 5), dtype=float)
-        raster = _make_raster(data)
-        result = line_of_sight(raster, x0=0, y0=0, x1=4, y1=4)
-        assert 'fresnel_radius' not in result
-        assert 'fresnel_clear' not in result
-
-    def test_xy_coords_in_output(self):
-        data = np.zeros((5, 10), dtype=float)
-        raster = _make_raster(data)
-        result = line_of_sight(raster, x0=0, y0=2, x1=9, y1=2)
-        # first point should match observer
-        assert abs(result['x'].values[0] - 0.0) < 1e-10
-        assert abs(result['y'].values[0] - 2.0) < 1e-10
-        # last point should match target
-        assert abs(result['x'].values[-1] - 9.0) < 1e-10
-        assert abs(result['y'].values[-1] - 2.0) < 1e-10
-```
-
-- [ ] **Step 2: Run tests to verify they fail**
-
-```bash
-pytest xrspatial/tests/test_visibility.py::TestLineOfSight -v
-```
-
-Expected: FAIL with `ImportError`.
-
-- [ ] **Step 3: Implement `line_of_sight`**
-
-Add to `xrspatial/visibility.py`:
-
-```python
-from .utils import _validate_raster
-
-SPEED_OF_LIGHT = 299_792_458.0  # m/s
-
-
-def _fresnel_radius_1(d1, d2, freq_hz):
-    """First Fresnel zone radius at a point d1 from transmitter, d2 from receiver."""
-    D = d1 + d2
-    if D == 0 or freq_hz == 0:
-        return 0.0
-    wavelength = SPEED_OF_LIGHT / freq_hz
-    return np.sqrt(wavelength * d1 * d2 / D)
-
-
-def line_of_sight(
-    raster: xarray.DataArray,
-    x0: float, y0: float,
-    x1: float, y1: float,
-    observer_elev: float = 0,
-    target_elev: float = 0,
-    frequency_mhz: float = None,
-) -> xarray.Dataset:
-    """Compute elevation profile and visibility along a straight line.
-
-    Parameters
-    ----------
-    raster : xarray.DataArray
-        Elevation raster.
-    x0, y0 : float
-        Observer location in data-space coordinates.
-    x1, y1 : float
-        Target location in data-space coordinates.
-    observer_elev : float
-        Height above terrain at the observer.
-    target_elev : float
-        Height above terrain at the target.
-    frequency_mhz : float, optional
-        Radio frequency in MHz. When set, first Fresnel zone clearance
-        is computed at each sample point.
-
-    Returns
-    -------
-    xarray.Dataset
-        Dataset with dimension ``sample`` containing variables
-        ``distance``, ``elevation``, ``los_height``, ``visible``,
-        ``x``, ``y``, and optionally ``fresnel_radius`` and
-        ``fresnel_clear``.
-    """
-    _validate_raster(raster, func_name='line_of_sight', name='raster')
-
-    x_coords = raster.coords['x'].values
-    y_coords = raster.coords['y'].values
-
-    # snap to nearest grid cell
-    c0 = int(np.argmin(np.abs(x_coords - x0)))
-    r0 = int(np.argmin(np.abs(y_coords - y0)))
-    c1 = int(np.argmin(np.abs(x_coords - x1)))
-    r1 = int(np.argmin(np.abs(y_coords - y1)))
-
-    cells = _bresenham_line(r0, c0, r1, c1)
-    elevations, xs, ys = _extract_transect(raster, cells)
-
-    n = len(cells)
-
-    # compute cumulative distance along the transect
-    distance = np.zeros(n, dtype=np.float64)
-    for i in range(1, n):
-        dx = xs[i] - xs[i - 1]
-        dy = ys[i] - ys[i - 1]
-        distance[i] = distance[i - 1] + np.sqrt(dx * dx + dy * dy)
-
-    total_dist = distance[-1] if n > 1 else 0.0
-
-    # LOS height: linear interpolation from observer to target
-    obs_h = elevations[0] + observer_elev
-    tgt_h = elevations[-1] + target_elev if n > 1 else obs_h
-    if total_dist > 0:
-        los_height = obs_h + (tgt_h - obs_h) * (distance / total_dist)
-    else:
-        los_height = np.array([obs_h])
-
-    # visibility: track max elevation angle from observer
-    visible = np.ones(n, dtype=bool)
-    max_angle = -np.inf
-    for i in range(1, n):
-        if distance[i] == 0:
-            continue
-        angle = (elevations[i] - obs_h) / distance[i]
-        if angle >= max_angle:
-            max_angle = angle
-        else:
-            visible[i] = False
-
-    data_vars = {
-        'distance': ('sample', distance),
-        'elevation': ('sample', elevations),
-        'los_height': ('sample', los_height),
-        'visible': ('sample', visible),
-        'x': ('sample', xs),
-        'y': ('sample', ys),
-    }
-
-    if frequency_mhz is not None:
-        freq_hz = frequency_mhz * 1e6
-        fresnel = np.zeros(n, dtype=np.float64)
-        fresnel_clear = np.ones(n, dtype=bool)
-        for i in range(n):
-            d1 = distance[i]
-            d2 = total_dist - d1
-            fresnel[i] = _fresnel_radius_1(d1, d2, freq_hz)
-            clearance = los_height[i] - elevations[i]
-            if clearance < fresnel[i]:
-                fresnel_clear[i] = False
-        data_vars['fresnel_radius'] = ('sample', fresnel)
-        data_vars['fresnel_clear'] = ('sample', fresnel_clear)
-
-    return xarray.Dataset(data_vars)
-```
-
-- [ ] **Step 4: Run tests to verify they pass**
-
-```bash
-pytest xrspatial/tests/test_visibility.py::TestLineOfSight -v
-```
-
-Expected: all 8 tests PASS.
-
-- [ ] **Step 5: Commit**
-
-```bash
-git add xrspatial/visibility.py xrspatial/tests/test_visibility.py
-git commit -m "Add line_of_sight with Fresnel zone support (#1145)"
-```
-
----
-
-### Task 5: Implement and test `cumulative_viewshed`
-
-**Files:**
-- Modify: `xrspatial/visibility.py`
-- Modify: `xrspatial/tests/test_visibility.py`
-
-- [ ] **Step 1: Write the failing tests**
-
-Append to `xrspatial/tests/test_visibility.py`:
-
-```python
-import dask.array as da
-from xrspatial.visibility import cumulative_viewshed
-
-
-class TestCumulativeViewshed:
-    def test_flat_terrain_all_visible(self):
-        """On flat terrain with elevated observers, every cell is visible."""
-        data = np.zeros((10, 10), dtype=float)
-        raster = _make_raster(data)
-        observers = [
-            {'x': 2.0, 'y': 2.0, 'observer_elev': 10},
-            {'x': 7.0, 'y': 7.0, 'observer_elev': 10},
-        ]
-        result = cumulative_viewshed(raster, observers)
-        assert result.dtype == np.int32
-        # every cell should be seen by both observers
-        assert (result.values == 2).all()
-
-    def test_single_observer_matches_viewshed(self):
-        """Single-observer cumulative should match binary viewshed."""
-        from xrspatial import viewshed
-        from xrspatial.viewshed import INVISIBLE
-        data = np.random.RandomState(42).rand(15, 15).astype(float) * 100
-        raster = _make_raster(data)
-        obs = {'x': 7.0, 'y': 7.0, 'observer_elev': 50}
-        result = cumulative_viewshed(raster, [obs])
-        vs = viewshed(raster, x=7.0, y=7.0, observer_elev=50)
-        expected = (vs.values != INVISIBLE).astype(np.int32)
-        np.testing.assert_array_equal(result.values, expected)
-
-    def test_wall_blocks_one_side(self):
-        """A tall wall blocks visibility from the other side."""
-        data = np.zeros((1, 11), dtype=float)
-        data[0, 5] = 1000  # tall wall
-        raster = _make_raster(data)
-        obs_left = {'x': 0.0, 'y': 0.0, 'observer_elev': 1}
-        obs_right = {'x': 10.0, 'y': 0.0, 'observer_elev': 1}
-        result = cumulative_viewshed(raster, [obs_left, obs_right])
-        # the wall cell itself is visible to both
-        assert result.values[0, 5] == 2
-        # cells immediately next to wall on each side visible to both
-        # cells far from wall visible to only one observer
-        assert result.values[0, 0] >= 1
-        assert result.values[0, 10] >= 1
-
-    def test_per_observer_max_distance(self):
-        """Per-observer max_distance limits the analysis radius."""
-        data = np.zeros((20, 20), dtype=float)
-        raster = _make_raster(data)
-        obs = {'x': 10.0, 'y': 10.0, 'observer_elev': 10, 'max_distance': 3}
-        result = cumulative_viewshed(raster, [obs])
-        # corners should be 0 (beyond max_distance)
-        assert result.values[0, 0] == 0
-        assert result.values[19, 19] == 0
-        # center should be 1
-        assert result.values[10, 10] == 1
-
-    def test_empty_observers_raises(self):
-        data = np.zeros((5, 5), dtype=float)
-        raster = _make_raster(data)
-        with pytest.raises(ValueError):
-            cumulative_viewshed(raster, [])
-
-    def test_dask_matches_numpy(self):
-        """Dask backend should produce the same result as numpy."""
-        data = np.random.RandomState(99).rand(15, 15).astype(float) * 50
-        raster_np = _make_raster(data)
-        raster_dask = raster_np.copy()
-        raster_dask.data = da.from_array(data, chunks=(8, 8))
-        observers = [
-            {'x': 3.0, 'y': 3.0, 'observer_elev': 30},
-            {'x': 12.0, 'y': 12.0, 'observer_elev': 30},
-        ]
-        result_np = cumulative_viewshed(raster_np, observers)
-        result_dask = cumulative_viewshed(raster_dask, observers)
-        np.testing.assert_array_equal(result_np.values, result_dask.values)
-
-    def test_preserves_coords_and_dims(self):
-        data = np.zeros((5, 5), dtype=float)
-        raster = _make_raster(data)
-        raster.attrs['crs'] = 'EPSG:4326'
-        observers = [{'x': 2.0, 'y': 2.0, 'observer_elev': 10}]
-        result = cumulative_viewshed(raster, observers)
-        assert result.dims == raster.dims
-        np.testing.assert_array_equal(result.coords['x'].values,
-                                      raster.coords['x'].values)
-        np.testing.assert_array_equal(result.coords['y'].values,
-                                      raster.coords['y'].values)
-        assert result.attrs.get('crs') == 'EPSG:4326'
-```
-
-- [ ] **Step 2: Run tests to verify they fail**
-
-```bash
-pytest xrspatial/tests/test_visibility.py::TestCumulativeViewshed -v
-```
-
-Expected: FAIL with `ImportError`.
-
-- [ ] **Step 3: Implement `cumulative_viewshed`**
-
-Add to `xrspatial/visibility.py`:
-
-```python
-from .viewshed import viewshed, INVISIBLE
-
-
-def cumulative_viewshed(
-    raster: xarray.DataArray,
-    observers: list,
-    target_elev: float = 0,
-    max_distance: float = None,
-) -> xarray.DataArray:
-    """Count how many observers can see each cell.
-
-    Parameters
-    ----------
-    raster : xarray.DataArray
-        Elevation raster (numpy, cupy, or dask-backed).
-    observers : list of dict
-        Each dict must have ``x`` and ``y`` keys (data-space coords).
-        Optional keys: ``observer_elev`` (default 0), ``target_elev``
-        (overrides function-level default), ``max_distance`` (per-observer
-        analysis radius).
-    target_elev : float
-        Default target elevation for observers that don't specify one.
-    max_distance : float, optional
-        Default maximum analysis radius.
-
-    Returns
-    -------
-    xarray.DataArray
-        Integer raster (int32) with the count of observers that have
-        line-of-sight to each cell.
-    """
-    _validate_raster(raster, func_name='cumulative_viewshed', name='raster')
-    if not observers:
-        raise ValueError("observers list must not be empty")
-
-    count = np.zeros(raster.shape, dtype=np.int32)
-
-    for obs in observers:
-        ox = obs['x']
-        oy = obs['y']
-        oe = obs.get('observer_elev', 0)
-        te = obs.get('target_elev', target_elev)
-        md = obs.get('max_distance', max_distance)
-
-        vs = viewshed(raster, x=ox, y=oy, observer_elev=oe,
-                      target_elev=te, max_distance=md)
-
-        mask = vs.values if isinstance(vs.data, np.ndarray) else vs.values
-        count += (mask != INVISIBLE).astype(np.int32)
-
-    return xarray.DataArray(count, coords=raster.coords,
-                            dims=raster.dims, attrs=raster.attrs)
-```
-
-- [ ] **Step 4: Run tests to verify they pass**
-
-```bash
-pytest xrspatial/tests/test_visibility.py::TestCumulativeViewshed -v
-```
-
-Expected: all 7 tests PASS.
-
-- [ ] **Step 5: Commit**
-
-```bash
-git add xrspatial/visibility.py xrspatial/tests/test_visibility.py
-git commit -m "Add cumulative_viewshed for multi-observer analysis (#1145)"
-```
-
----
-
-### Task 6: Implement and test `visibility_frequency`
-
-**Files:**
-- Modify: `xrspatial/visibility.py`
-- Modify: `xrspatial/tests/test_visibility.py`
-
-- [ ] **Step 1: Write the failing tests**
-
-Append to `xrspatial/tests/test_visibility.py`:
-
-```python
-from xrspatial.visibility import visibility_frequency
-
-
-class TestVisibilityFrequency:
-    def test_flat_terrain_all_ones(self):
-        data = np.zeros((10, 10), dtype=float)
-        raster = _make_raster(data)
-        observers = [
-            {'x': 2.0, 'y': 2.0, 'observer_elev': 10},
-            {'x': 7.0, 'y': 7.0, 'observer_elev': 10},
-        ]
-        result = visibility_frequency(raster, observers)
-        assert result.dtype == np.float64
-        np.testing.assert_allclose(result.values, 1.0)
-
-    def test_equals_cumulative_divided_by_n(self):
-        data = np.random.RandomState(7).rand(15, 15).astype(float) * 100
-        raster = _make_raster(data)
-        observers = [
-            {'x': 3.0, 'y': 3.0, 'observer_elev': 50},
-            {'x': 10.0, 'y': 10.0, 'observer_elev': 50},
-            {'x': 7.0, 'y': 2.0, 'observer_elev': 50},
-        ]
-        freq = visibility_frequency(raster, observers)
-        cum = cumulative_viewshed(raster, observers)
-        expected = cum.values.astype(np.float64) / 3.0
-        np.testing.assert_allclose(freq.values, expected)
-```
-
-- [ ] **Step 2: Run tests to verify they fail**
-
-```bash
-pytest xrspatial/tests/test_visibility.py::TestVisibilityFrequency -v
-```
-
-Expected: FAIL with `ImportError`.
-
-- [ ] **Step 3: Implement `visibility_frequency`**
-
-Add to `xrspatial/visibility.py`:
-
-```python
-def visibility_frequency(
-    raster: xarray.DataArray,
-    observers: list,
-    target_elev: float = 0,
-    max_distance: float = None,
-) -> xarray.DataArray:
-    """Fraction of observers that can see each cell.
-
-    Parameters are the same as :func:`cumulative_viewshed`.
-
-    Returns
-    -------
-    xarray.DataArray
-        Float64 raster with values in [0, 1].
-    """
-    cum = cumulative_viewshed(raster, observers, target_elev, max_distance)
-    freq = cum.astype(np.float64) / len(observers)
-    return freq
-```
-
-- [ ] **Step 4: Run tests to verify they pass**
-
-```bash
-pytest xrspatial/tests/test_visibility.py::TestVisibilityFrequency -v
-```
-
-Expected: all 2 tests PASS.
-
-- [ ] **Step 5: Commit**
-
-```bash
-git add xrspatial/visibility.py xrspatial/tests/test_visibility.py
-git commit -m "Add visibility_frequency wrapper (#1145)"
-```
-
----
-
-### Task 7: Integration — exports, accessor, docs, README
-
-**Files:**
-- Modify: `xrspatial/__init__.py:121` (add imports after viewshed import)
-- Modify: `xrspatial/accessor.py:213` (add accessor methods after viewshed)
-- Modify: `docs/source/reference/surface.rst:82` (add section after viewshed)
-- Modify: `README.md:318` (add rows after viewshed row)
-
-- [ ] **Step 1: Add imports to `__init__.py`**
-
-After line 121 (`from xrspatial.viewshed import viewshed  # noqa`), add:
-
-```python
-from xrspatial.visibility import cumulative_viewshed  # noqa
-from xrspatial.visibility import line_of_sight  # noqa
-from xrspatial.visibility import visibility_frequency  # noqa
-```
-
-- [ ] **Step 2: Add accessor methods to `accessor.py`**
-
-After the `viewshed` method (line 213), add:
-
-```python
-    def cumulative_viewshed(self, observers, **kwargs):
-        from .visibility import cumulative_viewshed
-        return cumulative_viewshed(self._obj, observers, **kwargs)
-
-    def visibility_frequency(self, observers, **kwargs):
-        from .visibility import visibility_frequency
-        return visibility_frequency(self._obj, observers, **kwargs)
-
-    def line_of_sight(self, x0, y0, x1, y1, **kwargs):
-        from .visibility import line_of_sight
-        return line_of_sight(self._obj, x0, y0, x1, y1, **kwargs)
-```
-
-- [ ] **Step 3: Update docs reference**
-
-In `docs/source/reference/surface.rst`, after the Viewshed section (line 82), add:
-
-```rst
-
-Cumulative Viewshed
-===================
-.. autosummary::
-    :toctree: _autosummary
-
-    xrspatial.visibility.cumulative_viewshed
-
-Visibility Frequency
-====================
-.. autosummary::
-    :toctree: _autosummary
-
-    xrspatial.visibility.visibility_frequency
-
-Line of Sight
-=============
-.. autosummary::
-    :toctree: _autosummary
-
-    xrspatial.visibility.line_of_sight
-```
-
-- [ ] **Step 4: Update README feature matrix**
-
-After the Viewshed row (line 318), add:
-
-```markdown
-| [Cumulative Viewshed](xrspatial/visibility.py) | Counts how many observers can see each cell | Custom | ✅️ | 🔄 | 🔄 | 🔄 |
-| [Visibility Frequency](xrspatial/visibility.py) | Fraction of observers with line-of-sight to each cell | Custom | ✅️ | 🔄 | 🔄 | 🔄 |
-| [Line of Sight](xrspatial/visibility.py) | Elevation profile and visibility along a point-to-point transect | Custom | ✅️ | 🔄 | 🔄 | 🔄 |
-```
-
-(These inherit viewshed's backend support. CuPy/Dask marked as 🔄 because cumulative_viewshed falls back through `viewshed()` and LOS always runs on CPU.)
-
-- [ ] **Step 5: Run full test suite to verify nothing is broken**
-
-```bash
-pytest xrspatial/tests/test_visibility.py -v
-```
-
-Expected: all tests PASS.
-
-- [ ] **Step 6: Commit**
-
-```bash
-git add xrspatial/__init__.py xrspatial/accessor.py docs/source/reference/surface.rst README.md
-git commit -m "Add visibility module to exports, accessor, docs, and README (#1145)"
-```
-
----
-
-### Task 8: User guide notebook
-
-**Files:**
-- Create: `examples/user_guide/37_Visibility_Analysis.ipynb`
-
-- [ ] **Step 1: Create the notebook**
-
-Create `examples/user_guide/37_Visibility_Analysis.ipynb` with cells:
-
-**Cell 1 (markdown):**
-```markdown
-# Visibility Analysis
-
-This notebook demonstrates multi-observer viewshed analysis and point-to-point
-line-of-sight profiling.
-
-**Functions covered:**
-- `cumulative_viewshed` -- count of observers with line-of-sight to each cell
-- `visibility_frequency` -- normalized version (fraction of observers)
-- `line_of_sight` -- elevation profile, visibility, and optional Fresnel zone clearance
-```
-
-**Cell 2 (code):**
-```python
-import numpy as np
-import xarray as xr
-import matplotlib.pyplot as plt
-from matplotlib.colors import ListedColormap
-
-from xrspatial.terrain import generate_terrain
-from xrspatial.hillshade import hillshade
-from xrspatial.visibility import (
-    cumulative_viewshed,
-    visibility_frequency,
-    line_of_sight,
-)
-```
-
-**Cell 3 (markdown):**
-```markdown
-## Generate synthetic terrain
-```
-
-**Cell 4 (code):**
-```python
-terrain = generate_terrain(width=200, height=200, seed=42)
-hs = hillshade(terrain)
-
-fig, ax = plt.subplots(figsize=(8, 8))
-hs.plot.imshow(ax=ax, cmap='gray', add_colorbar=False)
-terrain.plot.imshow(ax=ax, alpha=0.4, cmap='terrain', add_colorbar=True)
-ax.set_title('Synthetic terrain')
-plt.tight_layout()
-```
-
-**Cell 5 (markdown):**
-```markdown
-## Cumulative viewshed
-
-Place three observers on the terrain and count how many can see each cell.
-```
-
-**Cell 6 (code):**
-```python
-xs = terrain.coords['x'].values
-ys = terrain.coords['y'].values
-
-observers = [
-    {'x': float(xs[40]),  'y': float(ys[40]),  'observer_elev': 20},
-    {'x': float(xs[160]), 'y': float(ys[80]),  'observer_elev': 20},
-    {'x': float(xs[100]), 'y': float(ys[160]), 'observer_elev': 20},
-]
-
-cum = cumulative_viewshed(terrain, observers)
-
-fig, ax = plt.subplots(figsize=(8, 8))
-hs.plot.imshow(ax=ax, cmap='gray', add_colorbar=False)
-cum.plot.imshow(ax=ax, alpha=0.6, cmap='YlOrRd',
-                add_colorbar=True, vmin=0, vmax=3)
-for obs in observers:
-    ax.plot(obs['x'], obs['y'], 'k^', markersize=10)
-ax.set_title('Cumulative viewshed (observer count)')
-plt.tight_layout()
-```
-
-**Cell 7 (markdown):**
-```markdown
-## Visibility frequency
-
-Same data, but normalized to [0, 1].
-```
-
-**Cell 8 (code):**
-```python
-freq = visibility_frequency(terrain, observers)
-
-fig, ax = plt.subplots(figsize=(8, 8))
-hs.plot.imshow(ax=ax, cmap='gray', add_colorbar=False)
-freq.plot.imshow(ax=ax, alpha=0.6, cmap='RdYlGn',
-                 add_colorbar=True, vmin=0, vmax=1)
-for obs in observers:
-    ax.plot(obs['x'], obs['y'], 'k^', markersize=10)
-ax.set_title('Visibility frequency')
-plt.tight_layout()
-```
-
-**Cell 9 (markdown):**
-```markdown
-## Line of sight
-
-Draw a transect between two points and check visibility along it.
-```
-
-**Cell 10 (code):**
-```python
-ox, oy = float(xs[20]), float(ys[100])
-tx, ty = float(xs[180]), float(ys[100])
-
-los = line_of_sight(terrain, x0=ox, y0=oy, x1=tx, y1=ty,
-                    observer_elev=15, target_elev=5)
-
-fig, ax = plt.subplots(figsize=(12, 4))
-ax.fill_between(los['distance'].values, 0, los['elevation'].values,
-                color='sienna', alpha=0.4, label='Terrain')
-ax.plot(los['distance'].values, los['los_height'].values,
-        'r--', label='LOS ray')
-
-vis = los['visible'].values
-d = los['distance'].values
-ax.scatter(d[vis], los['elevation'].values[vis],
-           c='green', s=8, label='Visible', zorder=3)
-ax.scatter(d[~vis], los['elevation'].values[~vis],
-           c='red', s=8, label='Blocked', zorder=3)
-
-ax.set_xlabel('Distance')
-ax.set_ylabel('Elevation')
-ax.set_title('Line-of-sight profile')
-ax.legend()
-plt.tight_layout()
-```
-
-**Cell 11 (markdown):**
-```markdown
-## Fresnel zone clearance
-
-For radio link planning, check whether the first Fresnel zone is clear at 900 MHz.
-```
-
-**Cell 12 (code):**
-```python
-los_f = line_of_sight(terrain, x0=ox, y0=oy, x1=tx, y1=ty,
-                      observer_elev=50, target_elev=50,
-                      frequency_mhz=900)
-
-fig, ax = plt.subplots(figsize=(12, 4))
-d = los_f['distance'].values
-ax.fill_between(d, 0, los_f['elevation'].values,
-                color='sienna', alpha=0.4, label='Terrain')
-ax.plot(d, los_f['los_height'].values, 'r--', label='LOS ray')
-
-# Fresnel zone envelope
-lh = los_f['los_height'].values
-fr = los_f['fresnel_radius'].values
-ax.fill_between(d, lh - fr, lh + fr, alpha=0.15, color='blue',
-                label='1st Fresnel zone')
-
-fc = los_f['fresnel_clear'].values
-ax.scatter(d[fc], los_f['elevation'].values[fc],
-           c='green', s=8, label='Fresnel clear', zorder=3)
-ax.scatter(d[~fc], los_f['elevation'].values[~fc],
-           c='red', s=8, label='Fresnel blocked', zorder=3)
-
-ax.set_xlabel('Distance')
-ax.set_ylabel('Elevation')
-ax.set_title('Fresnel zone clearance (900 MHz)')
-ax.legend()
-plt.tight_layout()
-```
-
-- [ ] **Step 2: Commit**
-
-```bash
-git add examples/user_guide/37_Visibility_Analysis.ipynb
-git commit -m "Add visibility analysis user guide notebook (#1145)"
-```
-
----
-
-### Task 9: Final validation
-
-- [ ] **Step 1: Run full test suite**
-
-```bash
-pytest xrspatial/tests/test_visibility.py -v
-```
-
-Expected: all tests PASS.
-
-- [ ] **Step 2: Verify imports work**
-
-```bash
-python -c "from xrspatial import cumulative_viewshed, visibility_frequency, line_of_sight; print('OK')"
-```
-
-Expected: `OK`.
-
-- [ ] **Step 3: Verify accessor works**
-
-```bash
-python -c "
-import numpy as np, xarray as xr
-r = xr.DataArray(np.zeros((5,5)), dims=['y','x'],
-                 coords={'y': np.arange(5.0), 'x': np.arange(5.0)})
-r.xrs.cumulative_viewshed([{'x':2.0,'y':2.0,'observer_elev':10}])
-print('accessor OK')
-"
-```
-
-Expected: `accessor OK`.
diff --git a/docs/superpowers/plans/2026-04-01-spatial-autocorrelation.md b/docs/superpowers/plans/2026-04-01-spatial-autocorrelation.md
deleted file mode 100644
index d7c9f19c..00000000
--- a/docs/superpowers/plans/2026-04-01-spatial-autocorrelation.md
+++ /dev/null
@@ -1,1740 +0,0 @@
-# Spatial Autocorrelation Implementation Plan
-
-> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
-
-**Goal:** Add Global Moran's I and Local Moran's I (LISA) with queen/rook contiguity weights, supporting all four backends.
-
-**Architecture:** New `xrspatial/autocorrelation.py` module following the existing `ArrayTypeFunctionMapping` dispatch pattern. Global Moran's I reuses the existing convolution infrastructure for spatial lag computation. LISA uses a fused `@ngjit` kernel for numpy (lag + permutation in one pass) and separate `map_overlap` calls for dask. All four backends: numpy, cupy, dask+numpy, dask+cupy.
-
-**Tech Stack:** NumPy, Numba (`@ngjit`), CuPy, Dask (`map_overlap`, `map_blocks`), xarray. CuPy backends fall back to CPU for the branching-heavy permutation step (same pattern as `emerging_hotspots.py`).
-
-**Spec:** `docs/superpowers/specs/2026-04-01-spatial-autocorrelation-design.md`
-
----
-
-### File Map
-
-| File | Action | Responsibility |
-|------|--------|----------------|
-| `xrspatial/autocorrelation.py` | Create | Public API (`morans_i`, `lisa`), backend dispatch, all backend implementations |
-| `xrspatial/tests/test_autocorrelation.py` | Create | Tests for both functions across all backends and edge cases |
-| `xrspatial/__init__.py` | Modify (line ~2) | Add `morans_i` and `lisa` exports |
-| `docs/source/reference/autocorrelation.rst` | Create | Sphinx API docs |
-| `docs/source/reference/index.rst` | Modify (line 10) | Add `autocorrelation` to toctree |
-| `README.md` | Modify (line ~567) | Add Spatial Statistics section to feature matrix |
-| `examples/user_guide/48_Spatial_Autocorrelation.ipynb` | Create | User guide notebook |
-
----
-
-### Task 1: Create Worktree and Scaffold Module
-
-**Files:**
-- Create: `xrspatial/autocorrelation.py`
-- Create: `xrspatial/tests/test_autocorrelation.py`
-
-- [ ] **Step 1: Create worktree**
-
-```bash
-git worktree add .claude/worktrees/issue-1135 -b issue-1135
-```
-
-- [ ] **Step 2: Create module scaffold**
-
-Create `xrspatial/autocorrelation.py` in the worktree:
-
-```python
-"""Spatial autocorrelation statistics.
-
-Global and local measures of spatial autocorrelation for raster data,
-using queen or rook contiguity weights derived from the grid structure.
-"""
-
-import math
-from functools import partial
-
-import numpy as np
-import xarray as xr
-from numba import jit, prange
-
-from xrspatial.convolution import (
-    _convolve_2d_numpy,
-    _convolve_2d_numpy_boundary,
-)
-from xrspatial.utils import (
-    ArrayTypeFunctionMapping,
-    _boundary_to_dask,
-    _validate_boundary,
-    _validate_raster,
-    cuda_args,
-    has_cuda_and_cupy,
-    has_dask_array,
-    is_cupy_array,
-    is_dask_cupy,
-    ngjit,
-)
-
-# Contiguity kernels (center pixel excluded)
-try:
-    import cupy
-    from xrspatial.convolution import _convolve_2d_cupy
-except ImportError:
-    cupy = None
-
-# Contiguity kernels (center pixel excluded)
-_QUEEN_KERNEL = np.array([[1, 1, 1],
-                          [1, 0, 1],
-                          [1, 1, 1]], dtype=np.float32)
-
-_ROOK_KERNEL = np.array([[0, 1, 0],
-                         [1, 0, 1],
-                         [0, 1, 0]], dtype=np.float32)
-
-VALID_CONTIGUITY = ('queen', 'rook')
-
-
-def _contiguity_kernel(contiguity):
-    """Return the 3x3 weight kernel for the given contiguity type."""
-    if contiguity == 'queen':
-        return _QUEEN_KERNEL.copy()
-    elif contiguity == 'rook':
-        return _ROOK_KERNEL.copy()
-    else:
-        raise ValueError(
-            f"Invalid contiguity '{contiguity}'. "
-            f"Expected one of {VALID_CONTIGUITY}"
-        )
-
-
-def _not_implemented(*args, **kwargs):
-    raise NotImplementedError("Backend not yet implemented")
-
-
-def morans_i(raster, contiguity='queen', boundary='nan'):
-    """Global Moran's I statistic for spatial autocorrelation.
-
-    Parameters
-    ----------
-    raster : xr.DataArray
-        2D raster of numeric values.
-    contiguity : str, default 'queen'
-        Contiguity type: 'queen' (8 neighbors) or 'rook' (4 neighbors).
-    boundary : str, default 'nan'
-        How to handle edges: 'nan', 'nearest', 'reflect', or 'wrap'.
-
-    Returns
-    -------
-    xr.DataArray
-        Scalar (0-dimensional) DataArray with the I statistic as its value.
-        Attrs include expected_I, variance_I, z_score, p_value, N, S0,
-        and contiguity.
-    """
-    _validate_raster(raster, func_name='morans_i', name='raster', ndim=2)
-    _validate_boundary(boundary)
-    kernel = _contiguity_kernel(contiguity)
-
-    mapper = ArrayTypeFunctionMapping(
-        numpy_func=partial(_morans_i_numpy, kernel=kernel, boundary=boundary),
-        cupy_func=partial(_morans_i_cupy, kernel=kernel, boundary=boundary),
-        dask_func=partial(_morans_i_dask_numpy, kernel=kernel, boundary=boundary),
-        dask_cupy_func=partial(_morans_i_dask_cupy, kernel=kernel, boundary=boundary),
-    )
-    return mapper(raster)(raster)
-
-
-def lisa(raster, contiguity='queen', n_permutations=999, boundary='nan'):
-    """Local Indicators of Spatial Association (Local Moran's I).
-
-    Parameters
-    ----------
-    raster : xr.DataArray
-        2D raster of numeric values.
-    contiguity : str, default 'queen'
-        Contiguity type: 'queen' (8 neighbors) or 'rook' (4 neighbors).
-    n_permutations : int, default 999
-        Number of random permutations for pseudo p-value computation.
-    boundary : str, default 'nan'
-        How to handle edges: 'nan', 'nearest', 'reflect', or 'wrap'.
-
-    Returns
-    -------
-    xr.Dataset
-        Dataset with variables:
-        - lisa_values (y, x) float32: local I_i per pixel
-        - p_values (y, x) float32: pseudo p-values from permutation
-        - cluster (y, x) int8: 0=NS, 1=HH, 2=LL, 3=HL, 4=LH
-    """
-    _validate_raster(raster, func_name='lisa', name='raster', ndim=2)
-    _validate_boundary(boundary)
-    kernel = _contiguity_kernel(contiguity)
-
-    mapper = ArrayTypeFunctionMapping(
-        numpy_func=partial(_lisa_numpy, kernel=kernel,
-                           n_permutations=n_permutations, boundary=boundary),
-        cupy_func=partial(_lisa_cupy, kernel=kernel,
-                          n_permutations=n_permutations, boundary=boundary),
-        dask_func=partial(_lisa_dask_numpy, kernel=kernel,
-                          n_permutations=n_permutations, boundary=boundary),
-        dask_cupy_func=partial(_lisa_dask_cupy, kernel=kernel,
-                               n_permutations=n_permutations, boundary=boundary),
-    )
-    result = mapper(raster)(raster)
-
-    dims_2d = raster.dims[-2:]
-    coords_2d = {k: v for k, v in raster.coords.items()
-                 if k in dims_2d or set(v.dims).issubset(set(dims_2d))}
-
-    lisa_vals, p_vals, cluster_vals = result
-    return xr.Dataset(
-        {
-            'lisa_values': xr.DataArray(lisa_vals, dims=dims_2d, coords=coords_2d),
-            'p_values': xr.DataArray(p_vals, dims=dims_2d, coords=coords_2d),
-            'cluster': xr.DataArray(cluster_vals, dims=dims_2d, coords=coords_2d),
-        },
-        attrs={
-            'n_permutations': n_permutations,
-            'contiguity': contiguity,
-        },
-    )
-
-
-# --- Backend stubs (filled in by subsequent tasks) ---
-
-_morans_i_numpy = _not_implemented
-_morans_i_cupy = _not_implemented
-_morans_i_dask_numpy = _not_implemented
-_morans_i_dask_cupy = _not_implemented
-_lisa_numpy = _not_implemented
-_lisa_cupy = _not_implemented
-_lisa_dask_numpy = _not_implemented
-_lisa_dask_cupy = _not_implemented
-```
-
-- [ ] **Step 3: Create test file scaffold**
-
-Create `xrspatial/tests/test_autocorrelation.py` in the worktree:
-
-```python
-"""Tests for xrspatial.autocorrelation (Moran's I, LISA)."""
-
-import numpy as np
-import pytest
-import xarray as xr
-
-from xrspatial.autocorrelation import morans_i, lisa, _contiguity_kernel
-
-
-# ---------------------------------------------------------------------------
-# Helpers
-# ---------------------------------------------------------------------------
-
-def _make_raster(data, dims=('y', 'x')):
-    """Wrap a numpy array as an xr.DataArray."""
-    return xr.DataArray(data.astype(np.float32), dims=dims)
-
-
-def _checkerboard(n=4):
-    """Return an n x n checkerboard of 0s and 1s (float32)."""
-    arr = np.indices((n, n)).sum(axis=0) % 2
-    return arr.astype(np.float32)
-
-
-def _gradient(n=4):
-    """Return an n x n array where each row is a constant (0..n-1)."""
-    return np.tile(np.arange(n, dtype=np.float32).reshape(-1, 1), (1, n))
-```
-
-- [ ] **Step 4: Commit scaffold**
-
-```bash
-cd .claude/worktrees/issue-1135
-git add xrspatial/autocorrelation.py xrspatial/tests/test_autocorrelation.py
-git commit -m "Scaffold autocorrelation module and tests (#1135)"
-```
-
----
-
-### Task 2: Global Moran's I -- NumPy Backend (TDD)
-
-**Files:**
-- Modify: `xrspatial/tests/test_autocorrelation.py`
-- Modify: `xrspatial/autocorrelation.py`
-
-- [ ] **Step 1: Write failing tests**
-
-Append to `xrspatial/tests/test_autocorrelation.py`:
-
-```python
-# ---------------------------------------------------------------------------
-# Global Moran's I -- NumPy
-# ---------------------------------------------------------------------------
-
-class TestMoransINumpy:
-    """Global Moran's I on numpy-backed DataArrays."""
-
-    def test_checkerboard_negative(self):
-        """Checkerboard: perfectly alternating -> strong negative I."""
-        raster = _make_raster(_checkerboard(6))
-        result = morans_i(raster, contiguity='queen')
-        assert result.shape == ()
-        I = float(result)
-        assert I < -0.5, f"Checkerboard should have negative I, got {I}"
-
-    def test_gradient_positive(self):
-        """Row gradient: spatially smooth -> strong positive I."""
-        raster = _make_raster(_gradient(6))
-        result = morans_i(raster, contiguity='queen')
-        I = float(result)
-        assert I > 0.3, f"Gradient should have positive I, got {I}"
-
-    def test_rook_vs_queen_differ(self):
-        """Queen and rook should produce different I values."""
-        raster = _make_raster(_checkerboard(6))
-        I_queen = float(morans_i(raster, contiguity='queen'))
-        I_rook = float(morans_i(raster, contiguity='rook'))
-        assert I_queen != I_rook
-
-    def test_attrs_present(self):
-        """Result attrs should contain analytical inference fields."""
-        raster = _make_raster(_gradient(6))
-        result = morans_i(raster, contiguity='queen')
-        for key in ('expected_I', 'variance_I', 'z_score', 'p_value', 'N', 'S0'):
-            assert key in result.attrs, f"Missing attr: {key}"
-        assert result.attrs['N'] == 36
-        assert result.attrs['contiguity'] == 'queen'
-
-    def test_constant_raster_nan(self):
-        """Constant raster (zero variance) -> NaN."""
-        data = np.ones((4, 4), dtype=np.float32)
-        result = morans_i(_make_raster(data))
-        assert np.isnan(float(result))
-```
-
-- [ ] **Step 2: Run tests to verify they fail**
-
-```bash
-cd .claude/worktrees/issue-1135
-python -m pytest xrspatial/tests/test_autocorrelation.py::TestMoransINumpy -v 2>&1 | head -30
-```
-
-Expected: FAIL with `NotImplementedError`
-
-- [ ] **Step 3: Implement numpy backend**
-
-In `xrspatial/autocorrelation.py`, replace the `_morans_i_numpy = _not_implemented` stub with:
-
-```python
-def _morans_i_numpy(raster, kernel, boundary='nan'):
-    """Global Moran's I -- numpy backend."""
-    data = raster.values.astype(np.float32)
-    mask = ~np.isnan(data)
-    N = int(np.sum(mask))
-
-    if N < 2:
-        return _scalar_result(np.nan, contiguity='unknown')
-
-    mean = np.nanmean(data)
-    z = np.where(mask, data - mean, np.nan)
-    var = np.nansum(z[mask] ** 2) / N
-
-    if var == 0.0:
-        return _scalar_result(np.nan, contiguity='unknown')
-
-    # Spatial lag via existing convolution
-    lag = _convolve_2d_numpy_boundary(z, kernel, boundary=boundary)
-
-    # S0: total weight count (sum of neighbor counts for valid pixels)
-    mask_f = mask.astype(np.float32)
-    n_neighbors = _convolve_2d_numpy_boundary(mask_f, kernel, boundary=boundary)
-    S0 = float(np.nansum(n_neighbors[mask]))
-
-    # Moran's I
-    numerator = float(np.nansum(z * lag))
-    denominator = float(np.nansum(z[mask] ** 2))
-    I = (N / S0) * numerator / denominator
-
-    # Analytical inference (normality assumption, Cliff & Ord 1981)
-    S1 = 2.0 * S0  # symmetric binary weights: S1 = 2*S0
-    S2 = 4.0 * float(np.nansum(n_neighbors[mask] ** 2))
-    expected_I = -1.0 / (N - 1)
-    var_I = (
-        (N ** 2 * S1 - N * S2 + 3 * S0 ** 2)
-        / (S0 ** 2 * (N ** 2 - 1))
-    ) - expected_I ** 2
-    var_I = max(var_I, 0.0)
-
-    z_score = (I - expected_I) / math.sqrt(var_I) if var_I > 0 else np.nan
-    p_value = float(2.0 * (1.0 - _norm_cdf(abs(z_score)))) if not np.isnan(z_score) else np.nan
-
-    return xr.DataArray(
-        np.float64(I),
-        attrs={
-            'expected_I': expected_I,
-            'variance_I': var_I,
-            'z_score': z_score,
-            'p_value': p_value,
-            'N': N,
-            'S0': S0,
-            'contiguity': 'queen' if kernel.sum() == 8 else 'rook',
-        },
-    )
-```
-
-Also add these helper functions above the backend stubs:
-
-```python
-def _norm_cdf(x):
-    """Standard normal CDF via the error function (no scipy dependency)."""
-    return 0.5 * (1.0 + math.erf(x / math.sqrt(2.0)))
-
-
-def _scalar_result(value, contiguity='unknown'):
-    """Return a scalar DataArray with NaN attrs."""
-    return xr.DataArray(
-        np.float64(value),
-        attrs={
-            'expected_I': np.nan,
-            'variance_I': np.nan,
-            'z_score': np.nan,
-            'p_value': np.nan,
-            'N': 0,
-            'S0': 0.0,
-            'contiguity': contiguity,
-        },
-    )
-```
-
-Update the `morans_i` public function's mapper to pass `contiguity` string through. In the numpy backend, infer contiguity from kernel sum (8=queen, 4=rook). This is already handled in the code above.
-
-- [ ] **Step 4: Run tests to verify they pass**
-
-```bash
-cd .claude/worktrees/issue-1135
-python -m pytest xrspatial/tests/test_autocorrelation.py::TestMoransINumpy -v
-```
-
-Expected: all 5 tests PASS
-
-- [ ] **Step 5: Commit**
-
-```bash
-git add xrspatial/autocorrelation.py xrspatial/tests/test_autocorrelation.py
-git commit -m "Add Global Moran's I numpy backend (#1135)"
-```
-
----
-
-### Task 3: LISA -- NumPy Backend (TDD)
-
-**Files:**
-- Modify: `xrspatial/tests/test_autocorrelation.py`
-- Modify: `xrspatial/autocorrelation.py`
-
-- [ ] **Step 1: Write failing tests**
-
-Append to `xrspatial/tests/test_autocorrelation.py`:
-
-```python
-# ---------------------------------------------------------------------------
-# LISA -- NumPy
-# ---------------------------------------------------------------------------
-
-class TestLisaNumpy:
-    """Local Moran's I (LISA) on numpy-backed DataArrays."""
-
-    def test_returns_dataset(self):
-        """LISA returns a Dataset with expected variables."""
-        raster = _make_raster(_gradient(6))
-        ds = lisa(raster, n_permutations=99)
-        assert isinstance(ds, xr.Dataset)
-        assert 'lisa_values' in ds
-        assert 'p_values' in ds
-        assert 'cluster' in ds
-        assert ds['lisa_values'].shape == (6, 6)
-        assert ds['p_values'].dtype == np.float32
-        assert ds['cluster'].dtype == np.int8
-
-    def test_checkerboard_negative_lisa(self):
-        """Checkerboard: all local I_i should be negative."""
-        raster = _make_raster(_checkerboard(6))
-        ds = lisa(raster, n_permutations=99)
-        vals = ds['lisa_values'].values
-        # Interior pixels (not on boundary) should be negative
-        interior = vals[1:-1, 1:-1]
-        valid = interior[~np.isnan(interior)]
-        assert np.all(valid < 0), "Checkerboard interior LISA should be negative"
-
-    def test_checkerboard_clusters_hl_lh(self):
-        """Checkerboard: significant clusters should be HL or LH."""
-        raster = _make_raster(_checkerboard(8))
-        ds = lisa(raster, n_permutations=199)
-        cluster = ds['cluster'].values
-        sig = cluster[cluster != 0]
-        # All significant pixels should be HL (3) or LH (4)
-        assert np.all((sig == 3) | (sig == 4)), f"Unexpected clusters: {np.unique(sig)}"
-
-    def test_gradient_positive_lisa(self):
-        """Gradient: interior pixels should have positive local I_i."""
-        raster = _make_raster(_gradient(8))
-        ds = lisa(raster, n_permutations=99)
-        vals = ds['lisa_values'].values
-        interior = vals[2:-2, 2:-2]
-        valid = interior[~np.isnan(interior)]
-        assert np.all(valid > 0), "Gradient interior LISA should be positive"
-
-    def test_pvalues_in_range(self):
-        """p-values should be in [0, 1] for all valid pixels."""
-        rng = np.random.default_rng(42)
-        data = rng.standard_normal((10, 10)).astype(np.float32)
-        ds = lisa(_make_raster(data), n_permutations=99)
-        p = ds['p_values'].values
-        valid = p[~np.isnan(p)]
-        assert np.all(valid >= 0.0) and np.all(valid <= 1.0)
-
-    def test_cluster_codes_valid(self):
-        """Cluster codes should be in {0, 1, 2, 3, 4}."""
-        rng = np.random.default_rng(42)
-        data = rng.standard_normal((10, 10)).astype(np.float32)
-        ds = lisa(_make_raster(data), n_permutations=99)
-        codes = ds['cluster'].values
-        assert set(np.unique(codes)).issubset({0, 1, 2, 3, 4})
-
-    def test_nan_propagation(self):
-        """NaN input pixels produce NaN in output."""
-        data = np.ones((6, 6), dtype=np.float32)
-        data[0, 0] = np.nan
-        data[2, 3] = 5.0  # break constant to avoid zero-var
-        ds = lisa(_make_raster(data), n_permutations=99)
-        assert np.isnan(ds['lisa_values'].values[0, 0])
-        assert np.isnan(ds['p_values'].values[0, 0])
-        assert ds['cluster'].values[0, 0] == 0
-
-    def test_constant_raster_nan(self):
-        """Constant raster (zero variance) -> NaN LISA values."""
-        data = np.full((4, 4), 5.0, dtype=np.float32)
-        ds = lisa(_make_raster(data), n_permutations=99)
-        assert np.all(np.isnan(ds['lisa_values'].values))
-
-    def test_attrs(self):
-        """Dataset attrs carry metadata."""
-        ds = lisa(_make_raster(_gradient(4)), n_permutations=99)
-        assert ds.attrs['n_permutations'] == 99
-        assert ds.attrs['contiguity'] == 'queen'
-```
-
-- [ ] **Step 2: Run tests to verify they fail**
-
-```bash
-cd .claude/worktrees/issue-1135
-python -m pytest xrspatial/tests/test_autocorrelation.py::TestLisaNumpy -v 2>&1 | head -30
-```
-
-Expected: FAIL with `NotImplementedError`
-
-- [ ] **Step 3: Implement the @ngjit fused LISA kernel**
-
-In `xrspatial/autocorrelation.py`, add the Numba JIT function (place it after `_scalar_result` and before the backend stubs):
-
-```python
-@ngjit
-def _lisa_fused_ngjit(z, kernel, inv_var, n_perms, seed,
-                      out_lisa, out_pval, out_cluster):
-    """Fused LISA computation: lag + permutation + cluster in one pass.
-
-    Parameters
-    ----------
-    z : float32 2D array (data - mean, NaN where input is NaN)
-    kernel : float32 3x3 contiguity kernel
-    inv_var : float (1 / variance)
-    n_perms : int
-    seed : int64
-    out_lisa, out_pval : float32 2D arrays (output)
-    out_cluster : int8 2D array (output)
-    """
-    rows, cols = z.shape
-    kr, kc = kernel.shape
-    hr, hc = kr // 2, kc // 2
-    max_neighbors = kr * kc  # 9 for 3x3, but center is 0
-
-    for i in range(rows):
-        for j in range(cols):
-            zi = z[i, j]
-            # NaN pixel
-            if zi != zi:
-                out_lisa[i, j] = np.nan
-                out_pval[i, j] = np.nan
-                out_cluster[i, j] = 0
-                continue
-
-            # Extract valid neighbors
-            nbr_z = np.empty(max_neighbors, dtype=np.float32)
-            nbr_w = np.empty(max_neighbors, dtype=np.float32)
-            n_nbr = 0
-            for di in range(kr):
-                for dj in range(kc):
-                    w = kernel[di, dj]
-                    if w == 0.0:
-                        continue
-                    ni = i + di - hr
-                    nj = j + dj - hc
-                    if 0 <= ni < rows and 0 <= nj < cols:
-                        val = z[ni, nj]
-                        if val == val:  # not NaN
-                            nbr_z[n_nbr] = val
-                            nbr_w[n_nbr] = w
-                            n_nbr += 1
-
-            if n_nbr == 0:
-                out_lisa[i, j] = np.nan
-                out_pval[i, j] = np.nan
-                out_cluster[i, j] = 0
-                continue
-
-            # Observed spatial lag and LISA value
-            lag = 0.0
-            for k in range(n_nbr):
-                lag += nbr_w[k] * nbr_z[k]
-            I_obs = zi * inv_var * lag
-            out_lisa[i, j] = I_obs
-
-            # Permutation test (Fisher-Yates shuffle)
-            abs_I = abs(I_obs)
-            count = 0
-            rng = np.int64(seed) + np.int64(i * cols + j)
-
-            for p in range(n_perms):
-                # Shuffle neighbor z values in place
-                for k in range(n_nbr - 1, 0, -1):
-                    rng = rng * np.int64(6364136223846793005) + np.int64(1442695040888963407)
-                    idx = int((rng >> 33) & np.int64(0x7fffffff)) % (k + 1)
-                    tmp = nbr_z[k]
-                    nbr_z[k] = nbr_z[idx]
-                    nbr_z[idx] = tmp
-
-                perm_lag = 0.0
-                for k in range(n_nbr):
-                    perm_lag += nbr_w[k] * nbr_z[k]
-                I_perm = zi * inv_var * perm_lag
-                if abs(I_perm) >= abs_I:
-                    count += 1
-
-            out_pval[i, j] = np.float32(count + 1) / np.float32(n_perms + 1)
-
-            # Cluster classification (p <= 0.05)
-            if out_pval[i, j] > 0.05:
-                out_cluster[i, j] = 0  # not significant
-            elif zi > 0.0 and lag > 0.0:
-                out_cluster[i, j] = 1  # HH
-            elif zi < 0.0 and lag < 0.0:
-                out_cluster[i, j] = 2  # LL
-            elif zi > 0.0 and lag < 0.0:
-                out_cluster[i, j] = 3  # HL
-            else:
-                out_cluster[i, j] = 4  # LH
-```
-
-- [ ] **Step 4: Implement _lisa_numpy backend**
-
-Replace the `_lisa_numpy = _not_implemented` stub:
-
-```python
-def _lisa_numpy(raster, kernel, n_permutations=999, boundary='nan'):
-    """LISA -- numpy backend."""
-    data = raster.values.astype(np.float32)
-    mask = ~np.isnan(data)
-    N = int(np.sum(mask))
-
-    if N < 2:
-        nans = np.full(data.shape, np.nan, dtype=np.float32)
-        zeros = np.zeros(data.shape, dtype=np.int8)
-        return nans, nans.copy(), zeros
-
-    mean = np.nanmean(data)
-    z = np.where(mask, data - mean, np.nan).astype(np.float32)
-    var = np.nansum(z[mask] ** 2) / N
-
-    if var == 0.0:
-        nans = np.full(data.shape, np.nan, dtype=np.float32)
-        zeros = np.zeros(data.shape, dtype=np.int8)
-        return nans, nans.copy(), zeros
-
-    inv_var = np.float32(1.0 / var)
-    out_lisa = np.empty(data.shape, dtype=np.float32)
-    out_pval = np.empty(data.shape, dtype=np.float32)
-    out_cluster = np.empty(data.shape, dtype=np.int8)
-
-    _lisa_fused_ngjit(z, kernel, inv_var, n_permutations, 42,
-                      out_lisa, out_pval, out_cluster)
-    return out_lisa, out_pval, out_cluster
-```
-
-- [ ] **Step 5: Run tests to verify they pass**
-
-```bash
-cd .claude/worktrees/issue-1135
-python -m pytest xrspatial/tests/test_autocorrelation.py::TestLisaNumpy -v
-```
-
-Expected: all 9 tests PASS
-
-- [ ] **Step 6: Commit**
-
-```bash
-git add xrspatial/autocorrelation.py xrspatial/tests/test_autocorrelation.py
-git commit -m "Add LISA numpy backend with permutation testing (#1135)"
-```
-
----
-
-### Task 4: Global Moran's I and LISA -- CuPy Backends
-
-**Files:**
-- Modify: `xrspatial/tests/test_autocorrelation.py`
-- Modify: `xrspatial/autocorrelation.py`
-
-- [ ] **Step 1: Write cupy tests**
-
-Append to `xrspatial/tests/test_autocorrelation.py`:
-
-```python
-# ---------------------------------------------------------------------------
-# CuPy backends
-# ---------------------------------------------------------------------------
-
-@pytest.fixture
-def cupy_available():
-    return pytest.importorskip("cupy")
-
-
-class TestMoransICuPy:
-    def test_matches_numpy(self, cupy_available):
-        cp = cupy_available
-        data = _gradient(6)
-        I_np = float(morans_i(_make_raster(data)))
-        I_cp = float(morans_i(_make_raster(cp.asarray(data))))
-        np.testing.assert_allclose(I_cp, I_np, rtol=1e-5)
-
-
-class TestLisaCuPy:
-    def test_matches_numpy(self, cupy_available):
-        cp = cupy_available
-        data = _gradient(8)
-        ds_np = lisa(_make_raster(data), n_permutations=99)
-        ds_cp = lisa(_make_raster(cp.asarray(data)), n_permutations=99)
-        np.testing.assert_allclose(
-            ds_cp['lisa_values'].values,
-            ds_np['lisa_values'].values,
-            rtol=1e-4, atol=1e-6,
-        )
-```
-
-- [ ] **Step 2: Implement _morans_i_cupy**
-
-Replace the `_morans_i_cupy = _not_implemented` stub (cupy import is already in the scaffold):
-
-```python
-def _morans_i_cupy(raster, kernel, boundary='nan'):
-    """Global Moran's I -- cupy backend.
-
-    Transfers to CPU for the analytical inference step since it's
-    just scalar arithmetic.
-    """
-    data = raster.data.astype(cupy.float32)
-    mask = ~cupy.isnan(data)
-    N = int(cupy.sum(mask))
-
-    if N < 2:
-        return _scalar_result(np.nan)
-
-    mean = float(cupy.nanmean(data))
-    z = cupy.where(mask, data - mean, cupy.nan).astype(cupy.float32)
-    var = float(cupy.nansum(z[mask] ** 2)) / N
-
-    if var == 0.0:
-        return _scalar_result(np.nan)
-
-    # Spatial lag via GPU convolution
-    lag = _convolve_2d_cupy(z, kernel, boundary=boundary)
-
-    # S0 from neighbor count
-    mask_f = mask.astype(cupy.float32)
-    n_neighbors = _convolve_2d_cupy(mask_f, kernel, boundary=boundary)
-    S0 = float(cupy.nansum(n_neighbors[mask]))
-
-    numerator = float(cupy.nansum(z * lag))
-    denominator = float(cupy.nansum(z[mask] ** 2))
-    I = (N / S0) * numerator / denominator
-
-    # Analytical inference (same scalar math as numpy)
-    S1 = 2.0 * S0
-    S2 = 4.0 * float(cupy.nansum(n_neighbors[mask] ** 2))
-    expected_I = -1.0 / (N - 1)
-    var_I = (
-        (N ** 2 * S1 - N * S2 + 3 * S0 ** 2)
-        / (S0 ** 2 * (N ** 2 - 1))
-    ) - expected_I ** 2
-    var_I = max(var_I, 0.0)
-    z_score = (I - expected_I) / math.sqrt(var_I) if var_I > 0 else np.nan
-    p_value = float(2.0 * (1.0 - _norm_cdf(abs(z_score)))) if not np.isnan(z_score) else np.nan
-
-    return xr.DataArray(
-        np.float64(I),
-        attrs={
-            'expected_I': expected_I, 'variance_I': var_I,
-            'z_score': z_score, 'p_value': p_value,
-            'N': N, 'S0': S0,
-            'contiguity': 'queen' if kernel.sum() == 8 else 'rook',
-        },
-    )
-```
-
-- [ ] **Step 3: Implement _lisa_cupy**
-
-Replace `_lisa_cupy = _not_implemented`:
-
-```python
-def _lisa_cupy(raster, kernel, n_permutations=999, boundary='nan'):
-    """LISA -- cupy backend.
-
-    Falls back to CPU for the permutation step (branching-heavy Fisher-Yates
-    shuffle is faster on CPU, same pattern as emerging_hotspots.py).
-    """
-    data = raster.data.astype(cupy.float32)
-    mask = ~cupy.isnan(data)
-    N = int(cupy.sum(mask))
-
-    if N < 2:
-        shape = data.shape
-        nans = cupy.full(shape, cupy.nan, dtype=cupy.float32)
-        zeros = cupy.zeros(shape, dtype=cupy.int8)
-        return cupy.asnumpy(nans), cupy.asnumpy(nans), cupy.asnumpy(zeros)
-
-    mean = float(cupy.nanmean(data))
-    z_gpu = cupy.where(mask, data - mean, cupy.nan).astype(cupy.float32)
-    var = float(cupy.nansum(z_gpu[mask] ** 2)) / N
-
-    if var == 0.0:
-        shape = data.shape
-        nans = np.full(shape, np.nan, dtype=np.float32)
-        zeros = np.zeros(shape, dtype=np.int8)
-        return nans, nans.copy(), zeros
-
-    # Transfer to CPU for branching-heavy permutation
-    z_cpu = cupy.asnumpy(z_gpu)
-    inv_var = np.float32(1.0 / var)
-    out_lisa = np.empty(z_cpu.shape, dtype=np.float32)
-    out_pval = np.empty(z_cpu.shape, dtype=np.float32)
-    out_cluster = np.empty(z_cpu.shape, dtype=np.int8)
-
-    _lisa_fused_ngjit(z_cpu, kernel, inv_var, n_permutations, 42,
-                      out_lisa, out_pval, out_cluster)
-
-    return cupy.asarray(out_lisa), cupy.asarray(out_pval), cupy.asarray(out_cluster)
-```
-
-- [ ] **Step 4: Run cupy tests**
-
-```bash
-cd .claude/worktrees/issue-1135
-python -m pytest xrspatial/tests/test_autocorrelation.py::TestMoransICuPy -v
-python -m pytest xrspatial/tests/test_autocorrelation.py::TestLisaCuPy -v
-```
-
-Expected: PASS (or SKIP if no GPU)
-
-- [ ] **Step 5: Commit**
-
-```bash
-git add xrspatial/autocorrelation.py xrspatial/tests/test_autocorrelation.py
-git commit -m "Add Moran's I and LISA cupy backends (#1135)"
-```
-
----
-
-### Task 5: Global Moran's I and LISA -- Dask Backends
-
-**Files:**
-- Modify: `xrspatial/tests/test_autocorrelation.py`
-- Modify: `xrspatial/autocorrelation.py`
-
-- [ ] **Step 1: Write dask tests**
-
-Append to `xrspatial/tests/test_autocorrelation.py`:
-
-```python
-# ---------------------------------------------------------------------------
-# Dask backends
-# ---------------------------------------------------------------------------
-
-@pytest.fixture
-def dask_available():
-    return pytest.importorskip("dask.array")
-
-
-def _make_dask_raster(data, chunks=(4, 4)):
-    import dask.array as da
-    darr = da.from_array(data.astype(np.float32), chunks=chunks)
-    return xr.DataArray(darr, dims=('y', 'x'))
-
-
-class TestMoransIDask:
-    def test_matches_numpy(self, dask_available):
-        data = _gradient(8)
-        I_np = float(morans_i(_make_raster(data)))
-        I_dask = float(morans_i(_make_dask_raster(data)))
-        np.testing.assert_allclose(I_dask, I_np, rtol=1e-5)
-
-    def test_checkerboard(self, dask_available):
-        data = _checkerboard(8)
-        result = morans_i(_make_dask_raster(data))
-        assert float(result) < -0.5
-
-
-class TestLisaDask:
-    def test_matches_numpy(self, dask_available):
-        data = _gradient(8)
-        ds_np = lisa(_make_raster(data), n_permutations=99)
-        ds_dask = lisa(_make_dask_raster(data, chunks=(8, 8)), n_permutations=99)
-        np.testing.assert_allclose(
-            ds_dask['lisa_values'].values,
-            ds_np['lisa_values'].values,
-            rtol=1e-4, atol=1e-6,
-        )
-
-    def test_chunked_pvalues_in_range(self, dask_available):
-        rng = np.random.default_rng(42)
-        data = rng.standard_normal((16, 16)).astype(np.float32)
-        ds = lisa(_make_dask_raster(data, chunks=(8, 8)), n_permutations=99)
-        p = ds['p_values'].values
-        valid = p[~np.isnan(p)]
-        assert np.all(valid >= 0.0) and np.all(valid <= 1.0)
-```
-
-- [ ] **Step 2: Implement dask chunk functions**
-
-In `xrspatial/autocorrelation.py`, add after the `_lisa_fused_ngjit` function:
-
-```python
-def _lag_chunk_numpy(chunk, kernel):
-    """Compute spatial lag for one dask chunk (called by map_overlap)."""
-    return _convolve_2d_numpy(chunk.astype(np.float32), kernel)
-
-
-def _perm_chunk_numpy(chunk_z, kernel, inv_var, n_perms, seed):
-    """Compute LISA p-values for one dask chunk (called by map_overlap)."""
-    rows, cols = chunk_z.shape
-    out_pval = np.empty((rows, cols), dtype=np.float32)
-    out_pval[:] = np.nan
-
-    kr, kc = kernel.shape
-    hr, hc = kr // 2, kc // 2
-    max_nbr = kr * kc
-
-    for i in range(rows):
-        for j in range(cols):
-            zi = chunk_z[i, j]
-            if zi != zi:
-                continue
-
-            nbr_z = np.empty(max_nbr, dtype=np.float32)
-            nbr_w = np.empty(max_nbr, dtype=np.float32)
-            n_nbr = 0
-            for di in range(kr):
-                for dj in range(kc):
-                    w = kernel[di, dj]
-                    if w == 0.0:
-                        continue
-                    ni = i + di - hr
-                    nj = j + dj - hc
-                    if 0 <= ni < rows and 0 <= nj < cols:
-                        val = chunk_z[ni, nj]
-                        if val == val:
-                            nbr_z[n_nbr] = val
-                            nbr_w[n_nbr] = w
-                            n_nbr += 1
-
-            if n_nbr == 0:
-                continue
-
-            lag = 0.0
-            for k in range(n_nbr):
-                lag += nbr_w[k] * nbr_z[k]
-            I_obs = zi * inv_var * lag
-            abs_I = abs(I_obs)
-            count = 0
-            rng = np.int64(seed) + np.int64(i * cols + j)
-
-            for p in range(n_perms):
-                for k in range(n_nbr - 1, 0, -1):
-                    rng = rng * np.int64(6364136223846793005) + np.int64(1442695040888963407)
-                    idx = int((rng >> 33) & np.int64(0x7fffffff)) % (k + 1)
-                    tmp = nbr_z[k]
-                    nbr_z[k] = nbr_z[idx]
-                    nbr_z[idx] = tmp
-
-                perm_lag = 0.0
-                for k in range(n_nbr):
-                    perm_lag += nbr_w[k] * nbr_z[k]
-                I_perm = zi * inv_var * perm_lag
-                if abs(I_perm) >= abs_I:
-                    count += 1
-
-            out_pval[i, j] = np.float32(count + 1) / np.float32(n_perms + 1)
-
-    return out_pval
-
-
-_perm_chunk_numpy_jit = ngjit(_perm_chunk_numpy)
-```
-
-Wait -- `_perm_chunk_numpy` can't be JIT'd directly because it's the chunk function called by `map_overlap`. The JIT version would need the same inner loop extracted. Let me restructure: keep `_perm_chunk_numpy` as a plain Python wrapper that calls a JIT inner function.
-
-Actually, the simpler approach: reuse `_lisa_fused_ngjit` inside the chunk function, since it already does exactly what we need. The chunk function wraps it:
-
-```python
-def _lisa_chunk_numpy(chunk_z, kernel, inv_var, n_perms, seed):
-    """Fused LISA chunk: lag + pval + cluster for map_overlap."""
-    out_lisa = np.empty(chunk_z.shape, dtype=np.float32)
-    out_pval = np.empty(chunk_z.shape, dtype=np.float32)
-    out_cluster = np.empty(chunk_z.shape, dtype=np.int8)
-    _lisa_fused_ngjit(chunk_z.astype(np.float32), kernel, inv_var,
-                      n_perms, seed, out_lisa, out_pval, out_cluster)
-    return out_lisa  # or out_pval or out_cluster depending on which pass
-```
-
-But we need three separate outputs. So we run `_lisa_fused_ngjit` once inside a wrapper that stores all three, and we call `map_overlap` three times each returning a different band. This triples the computation but avoids the multi-output problem.
-
-Better approach: run the fused kernel once in a wrapper, cache the results, return the requested band. But caching across `map_overlap` calls is fragile.
-
-Best practical approach for dask: call the fused kernel in one `map_overlap`, encode all three outputs into a single 2D array by interleaving rows:
-
-No -- too hacky. Let me use the cleanest approach: three `map_overlap` calls sharing the same chunk function structure, each returning one output. The chunk function computes everything but only returns one band.
-
-Even better: just live with computing 3x. For depth=1, the overlap is tiny and the dominant cost is the permutation loop anyway. Each `map_overlap` call does the same permutation work. The total cost is 3x, but for a plan, clarity beats optimization. We can fuse later if profiling shows it matters.
-
-Actually, the best practical approach given the existing pattern in the codebase is:
-
-1. `map_overlap` → spatial lag (one call)
-2. `lisa_values = z * lag / var` (element-wise)
-3. `map_overlap` → p_values (one call, does its own permutation)
-4. `map_blocks` → cluster (element-wise from z, lag, p)
-
-This is only 2 `map_overlap` calls, not 3.
-
-Let me restructure the chunk functions to be cleaner.
-
-Alright, I'll restructure the dask approach in the plan. Let me update the chunk functions section.
-
-- [ ] **Step 3: Implement _morans_i_dask_numpy**
-
-Replace `_morans_i_dask_numpy = _not_implemented`:
-
-```python
-def _morans_i_dask_numpy(raster, kernel, boundary='nan'):
-    """Global Moran's I -- dask+numpy backend."""
-    import dask.array as da
-
-    data = raster.data.astype(np.float32)
-    mask = ~da.isnan(data)
-
-    # Eagerly compute scalars
-    mean, N = da.compute(da.nanmean(data), da.sum(mask))
-    N = int(N)
-    if N < 2:
-        return _scalar_result(np.nan)
-
-    z = da.where(mask, data - mean, np.nan)
-    var_total = da.nansum(z ** 2) / N
-
-    # Spatial lag via map_overlap
-    _lag = partial(_lag_chunk_numpy, kernel=kernel)
-    lag = z.map_overlap(_lag, depth=(1, 1),
-                        boundary=_boundary_to_dask(boundary),
-                        meta=np.array((), dtype=np.float32))
-
-    # S0 from mask convolution
-    mask_f = mask.astype(np.float32)
-    n_neighbors = mask_f.map_overlap(_lag, depth=(1, 1),
-                                     boundary=_boundary_to_dask(boundary),
-                                     meta=np.array((), dtype=np.float32))
-    S0 = da.nansum(da.where(mask, n_neighbors, 0.0))
-
-    numerator = da.nansum(z * lag)
-    denominator = da.nansum(z ** 2)
-
-    # Compute all dask scalars at once
-    I_num, I_den, S0_val, var_val, S2_inner = da.compute(
-        numerator, denominator, S0,
-        var_total,
-        da.nansum(da.where(mask, n_neighbors ** 2, 0.0)),
-    )
-    S0_val = float(S0_val)
-
-    if float(var_val) == 0.0:
-        return _scalar_result(np.nan)
-
-    I = (N / S0_val) * float(I_num) / float(I_den)
-
-    S1 = 2.0 * S0_val
-    S2 = 4.0 * float(S2_inner)
-    expected_I = -1.0 / (N - 1)
-    var_I = (
-        (N ** 2 * S1 - N * S2 + 3 * S0_val ** 2)
-        / (S0_val ** 2 * (N ** 2 - 1))
-    ) - expected_I ** 2
-    var_I = max(var_I, 0.0)
-    z_score = (I - expected_I) / math.sqrt(var_I) if var_I > 0 else np.nan
-    p_value = float(2.0 * (1.0 - _norm_cdf(abs(z_score)))) if not np.isnan(z_score) else np.nan
-
-    return xr.DataArray(
-        np.float64(I),
-        attrs={
-            'expected_I': expected_I, 'variance_I': var_I,
-            'z_score': z_score, 'p_value': p_value,
-            'N': N, 'S0': S0_val,
-            'contiguity': 'queen' if kernel.sum() == 8 else 'rook',
-        },
-    )
-```
-
-- [ ] **Step 4: Implement dask LISA chunk functions and _lisa_dask_numpy**
-
-Add chunk functions in `xrspatial/autocorrelation.py`:
-
-```python
-def _lag_chunk_numpy(chunk, kernel):
-    """Spatial lag for one dask chunk. Called by map_overlap."""
-    return _convolve_2d_numpy(chunk.astype(np.float32), kernel)
-
-
-@ngjit
-def _perm_pvalue_ngjit(z, kernel, inv_var, n_perms, seed, out_pval):
-    """Compute permutation p-values only (LISA value computed separately)."""
-    rows, cols = z.shape
-    kr, kc = kernel.shape
-    hr, hc = kr // 2, kc // 2
-    max_nbr = kr * kc
-
-    for i in range(rows):
-        for j in range(cols):
-            zi = z[i, j]
-            if zi != zi:
-                out_pval[i, j] = np.nan
-                continue
-
-            nbr_z = np.empty(max_nbr, dtype=np.float32)
-            nbr_w = np.empty(max_nbr, dtype=np.float32)
-            n_nbr = 0
-            for di in range(kr):
-                for dj in range(kc):
-                    w = kernel[di, dj]
-                    if w == 0.0:
-                        continue
-                    ni = i + di - hr
-                    nj = j + dj - hc
-                    if 0 <= ni < rows and 0 <= nj < cols:
-                        val = z[ni, nj]
-                        if val == val:
-                            nbr_z[n_nbr] = val
-                            nbr_w[n_nbr] = w
-                            n_nbr += 1
-
-            if n_nbr == 0:
-                out_pval[i, j] = np.nan
-                continue
-
-            # Observed lag and LISA
-            lag = 0.0
-            for k in range(n_nbr):
-                lag += nbr_w[k] * nbr_z[k]
-            I_obs = zi * inv_var * lag
-            abs_I = abs(I_obs)
-
-            count = 0
-            rng = np.int64(seed) + np.int64(i * cols + j)
-
-            for p in range(n_perms):
-                for k in range(n_nbr - 1, 0, -1):
-                    rng = rng * np.int64(6364136223846793005) + np.int64(1442695040888963407)
-                    idx = int((rng >> 33) & np.int64(0x7fffffff)) % (k + 1)
-                    tmp = nbr_z[k]
-                    nbr_z[k] = nbr_z[idx]
-                    nbr_z[idx] = tmp
-
-                perm_lag = 0.0
-                for k in range(n_nbr):
-                    perm_lag += nbr_w[k] * nbr_z[k]
-                if abs(zi * inv_var * perm_lag) >= abs_I:
-                    count += 1
-
-            out_pval[i, j] = np.float32(count + 1) / np.float32(n_perms + 1)
-
-
-def _perm_chunk_wrapper(chunk_z, kernel, inv_var, n_perms, seed):
-    """Wrapper for map_overlap: compute p-values for one chunk."""
-    chunk_z = chunk_z.astype(np.float32)
-    out = np.empty(chunk_z.shape, dtype=np.float32)
-    _perm_pvalue_ngjit(chunk_z, kernel, inv_var, n_perms, seed, out)
-    return out
-```
-
-Replace `_lisa_dask_numpy = _not_implemented`:
-
-```python
-def _lisa_dask_numpy(raster, kernel, n_permutations=999, boundary='nan'):
-    """LISA -- dask+numpy backend."""
-    import dask.array as da
-
-    data = raster.data.astype(np.float32)
-    mask = ~da.isnan(data)
-
-    mean, var_sum, N = da.compute(
-        da.nanmean(data),
-        da.nansum((data - da.nanmean(data)) ** 2),
-        da.sum(mask),
-    )
-    N = int(N)
-    if N < 2:
-        nans = np.full(data.shape, np.nan, dtype=np.float32)
-        zeros = np.zeros(data.shape, dtype=np.int8)
-        return nans, nans.copy(), zeros
-
-    mean = float(mean)
-    var = float(var_sum) / N
-    if var == 0.0:
-        nans = np.full(data.shape, np.nan, dtype=np.float32)
-        zeros = np.zeros(data.shape, dtype=np.int8)
-        return nans, nans.copy(), zeros
-
-    inv_var = np.float32(1.0 / var)
-    z = da.where(mask, data - mean, np.nan).astype(np.float32)
-    bnd = _boundary_to_dask(boundary)
-
-    # 1. Spatial lag via map_overlap
-    lag = z.map_overlap(
-        partial(_lag_chunk_numpy, kernel=kernel),
-        depth=(1, 1), boundary=bnd,
-        meta=np.array((), dtype=np.float32),
-    )
-
-    # 2. LISA values (element-wise, lazy)
-    lisa_vals = z * lag * inv_var
-
-    # 3. p-values via map_overlap (permutation)
-    p_vals = z.map_overlap(
-        partial(_perm_chunk_wrapper, kernel=kernel,
-                inv_var=inv_var, n_perms=n_permutations, seed=42),
-        depth=(1, 1), boundary=bnd,
-        meta=np.array((), dtype=np.float32),
-    )
-
-    # 4. Cluster classification (element-wise via map_blocks)
-    def _classify_block(z_blk, lag_blk, p_blk):
-        out = np.zeros(z_blk.shape, dtype=np.int8)
-        sig = p_blk <= 0.05
-        out[sig & (z_blk > 0) & (lag_blk > 0)] = 1  # HH
-        out[sig & (z_blk < 0) & (lag_blk < 0)] = 2  # LL
-        out[sig & (z_blk > 0) & (lag_blk < 0)] = 3  # HL
-        out[sig & (z_blk < 0) & (lag_blk > 0)] = 4  # LH
-        nan_mask = np.isnan(z_blk)
-        out[nan_mask] = 0
-        return out
-
-    cluster = da.map_blocks(
-        _classify_block, z, lag, p_vals,
-        dtype=np.int8, meta=np.array((), dtype=np.int8),
-    )
-
-    # Compute all lazy arrays
-    lisa_vals, p_vals, cluster = da.compute(lisa_vals, p_vals, cluster)
-
-    return (
-        lisa_vals.astype(np.float32),
-        p_vals.astype(np.float32),
-        cluster.astype(np.int8),
-    )
-```
-
-- [ ] **Step 5: Implement _morans_i_dask_cupy and _lisa_dask_cupy**
-
-Replace both dask+cupy stubs. These fall back to the dask+numpy implementations since the permutation step is CPU-bound anyway:
-
-```python
-def _morans_i_dask_cupy(raster, kernel, boundary='nan'):
-    """Global Moran's I -- dask+cupy backend.
-
-    Falls back to dask+numpy. The convolution is too small (3x3 kernel)
-    to benefit from GPU, and the reduction is scalar.
-    """
-    import dask.array as da
-    data_np = raster.data.map_blocks(
-        lambda b: cupy.asnumpy(b), dtype=np.float32,
-        meta=np.array((), dtype=np.float32),
-    )
-    raster_np = xr.DataArray(data_np, dims=raster.dims, coords=raster.coords)
-    return _morans_i_dask_numpy(raster_np, kernel=kernel, boundary=boundary)
-
-
-def _lisa_dask_cupy(raster, kernel, n_permutations=999, boundary='nan'):
-    """LISA -- dask+cupy backend.
-
-    Falls back to dask+numpy. Permutation is branching-heavy and runs
-    faster on CPU (same rationale as emerging_hotspots.py).
-    """
-    import dask.array as da
-    data_np = raster.data.map_blocks(
-        lambda b: cupy.asnumpy(b), dtype=np.float32,
-        meta=np.array((), dtype=np.float32),
-    )
-    raster_np = xr.DataArray(data_np, dims=raster.dims, coords=raster.coords)
-    return _lisa_dask_numpy(raster_np, kernel=kernel,
-                            n_permutations=n_permutations, boundary=boundary)
-```
-
-- [ ] **Step 6: Run all dask tests**
-
-```bash
-cd .claude/worktrees/issue-1135
-python -m pytest xrspatial/tests/test_autocorrelation.py::TestMoransIDask -v
-python -m pytest xrspatial/tests/test_autocorrelation.py::TestLisaDask -v
-```
-
-Expected: PASS
-
-- [ ] **Step 7: Commit**
-
-```bash
-git add xrspatial/autocorrelation.py xrspatial/tests/test_autocorrelation.py
-git commit -m "Add dask backends for Moran's I and LISA (#1135)"
-```
-
----
-
-### Task 6: Edge Cases and Cross-Backend Tests
-
-**Files:**
-- Modify: `xrspatial/tests/test_autocorrelation.py`
-
-- [ ] **Step 1: Add edge case and contiguity kernel tests**
-
-Append to `xrspatial/tests/test_autocorrelation.py`:
-
-```python
-# ---------------------------------------------------------------------------
-# Edge cases
-# ---------------------------------------------------------------------------
-
-class TestEdgeCases:
-    def test_single_cell(self):
-        data = np.array([[5.0]], dtype=np.float32)
-        result = morans_i(_make_raster(data))
-        assert np.isnan(float(result))
-
-    def test_all_nan(self):
-        data = np.full((4, 4), np.nan, dtype=np.float32)
-        result = morans_i(_make_raster(data))
-        assert np.isnan(float(result))
-
-    def test_single_non_nan(self):
-        data = np.full((4, 4), np.nan, dtype=np.float32)
-        data[2, 2] = 1.0
-        result = morans_i(_make_raster(data))
-        assert np.isnan(float(result))
-
-    def test_nan_corners(self):
-        data = _gradient(6)
-        data[0, 0] = np.nan
-        data[0, -1] = np.nan
-        data[-1, 0] = np.nan
-        data[-1, -1] = np.nan
-        result = morans_i(_make_raster(data))
-        assert not np.isnan(float(result))
-
-    def test_lisa_single_cell(self):
-        data = np.array([[5.0]], dtype=np.float32)
-        ds = lisa(_make_raster(data), n_permutations=9)
-        assert np.all(np.isnan(ds['lisa_values'].values))
-
-    def test_lisa_all_nan(self):
-        data = np.full((4, 4), np.nan, dtype=np.float32)
-        ds = lisa(_make_raster(data), n_permutations=9)
-        assert np.all(np.isnan(ds['lisa_values'].values))
-
-    def test_contiguity_kernel_invalid(self):
-        with pytest.raises(ValueError, match="Invalid contiguity"):
-            _contiguity_kernel('bishop')
-
-    def test_contiguity_kernel_queen(self):
-        k = _contiguity_kernel('queen')
-        assert k.shape == (3, 3)
-        assert k[1, 1] == 0.0
-        assert k.sum() == 8.0
-
-    def test_contiguity_kernel_rook(self):
-        k = _contiguity_kernel('rook')
-        assert k.shape == (3, 3)
-        assert k[1, 1] == 0.0
-        assert k.sum() == 4.0
-        assert k[0, 0] == 0.0  # corners are zero
-
-    def test_rook_checkerboard_perfect_negative(self):
-        """4-connected rook on checkerboard: every neighbor is opposite."""
-        raster = _make_raster(_checkerboard(8))
-        I = float(morans_i(raster, contiguity='rook'))
-        assert I < -0.9, f"Rook checkerboard should be near -1, got {I}"
-```
-
-- [ ] **Step 2: Run edge case tests**
-
-```bash
-cd .claude/worktrees/issue-1135
-python -m pytest xrspatial/tests/test_autocorrelation.py::TestEdgeCases -v
-```
-
-Expected: PASS
-
-- [ ] **Step 3: Commit**
-
-```bash
-git add xrspatial/tests/test_autocorrelation.py
-git commit -m "Add edge case and contiguity tests (#1135)"
-```
-
----
-
-### Task 7: Register Exports
-
-**Files:**
-- Modify: `xrspatial/__init__.py` (line 2 area)
-
-- [ ] **Step 1: Add imports to __init__.py**
-
-Add after line 1 (`from xrspatial.aspect import aspect  # noqa`):
-
-```python
-from xrspatial.autocorrelation import lisa  # noqa
-from xrspatial.autocorrelation import morans_i  # noqa
-```
-
-- [ ] **Step 2: Verify imports work**
-
-```bash
-cd .claude/worktrees/issue-1135
-python -c "from xrspatial import morans_i, lisa; print('OK')"
-```
-
-Expected: `OK`
-
-- [ ] **Step 3: Run full test suite for autocorrelation**
-
-```bash
-cd .claude/worktrees/issue-1135
-python -m pytest xrspatial/tests/test_autocorrelation.py -v
-```
-
-Expected: all tests PASS
-
-- [ ] **Step 4: Commit**
-
-```bash
-git add xrspatial/__init__.py
-git commit -m "Export morans_i and lisa from xrspatial (#1135)"
-```
-
----
-
-### Task 8: Documentation
-
-**Files:**
-- Create: `docs/source/reference/autocorrelation.rst`
-- Modify: `docs/source/reference/index.rst` (line 10)
-- Modify: `README.md` (line ~567)
-
-- [ ] **Step 1: Create autocorrelation.rst**
-
-Create `docs/source/reference/autocorrelation.rst`:
-
-```rst
-.. _reference.autocorrelation:
-
-***********************
-Spatial Autocorrelation
-***********************
-
-.. caution::
-
-   LISA uses ``dask.array.map_overlap`` with depth 1.  Each chunk dimension
-   must be **at least 3 cells** (larger than the contiguity kernel radius).
-
-.. note::
-
-   Permutation-based p-values use a fixed internal seed for reproducibility
-   within a single call.  Results are deterministic for the same input and
-   ``n_permutations`` value.
-
-Global Moran's I
-================
-.. autosummary::
-   :toctree: _autosummary
-
-   xrspatial.autocorrelation.morans_i
-
-Local Moran's I (LISA)
-======================
-.. autosummary::
-   :toctree: _autosummary
-
-   xrspatial.autocorrelation.lisa
-```
-
-- [ ] **Step 2: Add to docs toctree**
-
-In `docs/source/reference/index.rst`, add `autocorrelation` after line 9 (after the `:maxdepth: 2` line, before `classification`):
-
-```rst
-   autocorrelation
-   classification
-```
-
-- [ ] **Step 3: Add Spatial Statistics section to README**
-
-In `README.md`, insert after line 567 (`-----------` after Dasymetric section) and before `#### Usage`:
-
-```markdown
-
-### **Spatial Statistics**
-
-| Name | Description | Source | NumPy xr.DataArray | Dask xr.DataArray | CuPy GPU xr.DataArray | Dask GPU xr.DataArray |
-|:----------:|:------------|:------:|:----------------------:|:--------------------:|:-------------------:|:------:|
-| [Moran's I](xrspatial/autocorrelation.py) | Global spatial autocorrelation with analytical inference | Cliff & Ord 1981 | ✅️ | ✅️ | ✅️ | ✅️ |
-| [LISA](xrspatial/autocorrelation.py) | Local Indicators of Spatial Association with permutation p-values | Anselin 1995 | ✅️ | ✅️ | ✅️ | ✅️ |
-
------------
-```
-
-- [ ] **Step 4: Commit**
-
-```bash
-git add docs/source/reference/autocorrelation.rst docs/source/reference/index.rst README.md
-git commit -m "Add spatial autocorrelation docs and README entry (#1135)"
-```
-
----
-
-### Task 9: User Guide Notebook
-
-**Files:**
-- Create: `examples/user_guide/48_Spatial_Autocorrelation.ipynb`
-
-- [ ] **Step 1: Create notebook**
-
-Create `examples/user_guide/48_Spatial_Autocorrelation.ipynb` with cells:
-
-**Cell 1 (markdown):**
-```markdown
-# Spatial Autocorrelation: Moran's I and LISA
-
-This notebook demonstrates how to measure spatial autocorrelation in raster data using `morans_i` (global) and `lisa` (local).
-
-**Spatial autocorrelation** measures whether nearby pixels tend to have similar values (positive autocorrelation) or dissimilar values (negative autocorrelation). It answers the question: "Is the spatial pattern in this raster clustered, dispersed, or random?"
-
-- **Global Moran's I** produces a single statistic for the entire raster.
-- **LISA (Local Indicators of Spatial Association)** produces a per-pixel statistic, identifying where clusters and outliers are.
-```
-
-**Cell 2 (code):**
-```python
-import numpy as np
-import xarray as xr
-import matplotlib.pyplot as plt
-from matplotlib.colors import ListedColormap
-
-from xrspatial import morans_i, lisa
-from xrspatial.terrain import generate_terrain
-```
-
-**Cell 3 (markdown):**
-```markdown
-## Generate sample data
-
-We'll create a synthetic elevation surface with spatial structure, plus a random noise surface for comparison.
-```
-
-**Cell 4 (code):**
-```python
-# Spatially structured surface (elevation)
-terrain = generate_terrain(canvas_width=200, canvas_height=200)
-
-# Random noise (no spatial structure)
-rng = np.random.default_rng(42)
-noise = xr.DataArray(
-    rng.standard_normal((200, 200)).astype(np.float32),
-    dims=('y', 'x'),
-)
-
-fig, axes = plt.subplots(1, 2, figsize=(12, 5))
-terrain.plot(ax=axes[0], cmap='terrain')
-axes[0].set_title('Elevation (spatially structured)')
-noise.plot(ax=axes[1], cmap='RdBu_r')
-axes[1].set_title('Random noise')
-plt.tight_layout()
-plt.show()
-```
-
-**Cell 5 (markdown):**
-```markdown
-## Global Moran's I
-
-A single statistic summarizing the degree of spatial autocorrelation:
-- **I > E[I]**: positive autocorrelation (clustering)
-- **I < E[I]**: negative autocorrelation (dispersion)
-- **I ≈ E[I]**: random spatial pattern
-```
-
-**Cell 6 (code):**
-```python
-for name, raster in [('Elevation', terrain), ('Noise', noise)]:
-    result = morans_i(raster, contiguity='queen')
-    I = float(result)
-    print(f"{name}:")
-    print(f"  Moran's I = {I:.4f}")
-    print(f"  Expected   = {result.attrs['expected_I']:.4f}")
-    print(f"  z-score    = {result.attrs['z_score']:.2f}")
-    print(f"  p-value    = {result.attrs['p_value']:.2e}")
-    print()
-```
-
-**Cell 7 (markdown):**
-```markdown
-## Queen vs Rook contiguity
-
-Queen contiguity uses all 8 neighbors (including diagonals). Rook contiguity uses only 4 (up/down/left/right).
-```
-
-**Cell 8 (code):**
-```python
-I_queen = float(morans_i(terrain, contiguity='queen'))
-I_rook = float(morans_i(terrain, contiguity='rook'))
-print(f"Queen I = {I_queen:.4f}")
-print(f"Rook  I = {I_rook:.4f}")
-```
-
-**Cell 9 (markdown):**
-```markdown
-## LISA: Local spatial autocorrelation
-
-LISA identifies **where** clustering occurs. Each pixel gets:
-- A local I value (positive = similar neighbors, negative = dissimilar)
-- A p-value from permutation testing
-- A cluster classification: HH (hot spot), LL (cold spot), HL/LH (spatial outliers)
-```
-
-**Cell 10 (code):**
-```python
-ds = lisa(terrain, contiguity='queen', n_permutations=999)
-
-fig, axes = plt.subplots(1, 3, figsize=(16, 5))
-
-ds['lisa_values'].plot(ax=axes[0], cmap='RdBu_r', robust=True)
-axes[0].set_title('Local Moran\'s I')
-
-ds['p_values'].plot(ax=axes[1], cmap='YlOrRd_r', vmin=0, vmax=0.1)
-axes[1].set_title('p-values')
-
-# Cluster map with custom colors
-cluster_cmap = ListedColormap(['lightgrey', 'red', 'blue', 'pink', 'lightblue'])
-ds['cluster'].plot(ax=axes[2], cmap=cluster_cmap, vmin=0, vmax=4,
-                   add_colorbar=False)
-axes[2].set_title('Clusters: grey=NS, red=HH, blue=LL, pink=HL, cyan=LH')
-
-plt.tight_layout()
-plt.show()
-```
-
-**Cell 11 (markdown):**
-```markdown
-## Comparison: structured vs random
-
-On random noise, LISA should find few significant clusters (mostly not-significant grey).
-```
-
-**Cell 12 (code):**
-```python
-ds_noise = lisa(noise, n_permutations=999)
-
-fig, axes = plt.subplots(1, 2, figsize=(12, 5))
-cluster_cmap = ListedColormap(['lightgrey', 'red', 'blue', 'pink', 'lightblue'])
-
-ds['cluster'].plot(ax=axes[0], cmap=cluster_cmap, vmin=0, vmax=4, add_colorbar=False)
-axes[0].set_title('Elevation clusters')
-
-ds_noise['cluster'].plot(ax=axes[1], cmap=cluster_cmap, vmin=0, vmax=4, add_colorbar=False)
-axes[1].set_title('Noise clusters (mostly NS)')
-
-plt.tight_layout()
-plt.show()
-
-sig_terrain = float((ds['cluster'].values != 0).mean() * 100)
-sig_noise = float((ds_noise['cluster'].values != 0).mean() * 100)
-print(f"Significant pixels: elevation={sig_terrain:.1f}%, noise={sig_noise:.1f}%")
-```
-
-- [ ] **Step 2: Verify notebook runs**
-
-```bash
-cd .claude/worktrees/issue-1135
-python -m jupyter nbconvert --to notebook --execute examples/user_guide/48_Spatial_Autocorrelation.ipynb --output /dev/null 2>&1 | tail -5
-```
-
-If jupyter is not available, verify the key cells run as a script:
-
-```bash
-cd .claude/worktrees/issue-1135
-python -c "
-from xrspatial import morans_i, lisa
-from xrspatial.terrain import generate_terrain
-terrain = generate_terrain(canvas_width=50, canvas_height=50)
-print('Global I:', float(morans_i(terrain)))
-ds = lisa(terrain, n_permutations=99)
-print('LISA vars:', list(ds.data_vars))
-print('Clusters:', set(ds['cluster'].values.flat))
-print('OK')
-"
-```
-
-- [ ] **Step 3: Commit**
-
-```bash
-git add examples/user_guide/48_Spatial_Autocorrelation.ipynb
-git commit -m "Add spatial autocorrelation user guide notebook (#1135)"
-```
-
----
-
-### Task 10: Final Verification
-
-- [ ] **Step 1: Run the full autocorrelation test suite**
-
-```bash
-cd .claude/worktrees/issue-1135
-python -m pytest xrspatial/tests/test_autocorrelation.py -v --tb=short
-```
-
-Expected: all tests PASS
-
-- [ ] **Step 2: Run a smoke test of the full xrspatial test suite**
-
-```bash
-cd .claude/worktrees/issue-1135
-python -m pytest xrspatial/tests/ -x -q --tb=line 2>&1 | tail -10
-```
-
-Verify no regressions in existing tests.
-
-- [ ] **Step 3: Review git log**
-
-```bash
-cd .claude/worktrees/issue-1135
-git log --oneline master..issue-1135
-```
-
-Expected: 7-8 commits, each referencing #1135.
diff --git a/docs/superpowers/plans/2026-04-06-polygonize-simplification.md b/docs/superpowers/plans/2026-04-06-polygonize-simplification.md
deleted file mode 100644
index 1fe6d5be..00000000
--- a/docs/superpowers/plans/2026-04-06-polygonize-simplification.md
+++ /dev/null
@@ -1,916 +0,0 @@
-# Polygonize Geometry Simplification Implementation Plan
-
-> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
-
-**Goal:** Add topology-preserving Douglas-Peucker simplification to `polygonize()` via shared-edge decomposition.
-
-**Architecture:** New `simplify_tolerance` and `simplify_method` parameters on `polygonize()`. Simplification runs after boundary tracing / chunk merging but before output conversion. A shared-edge approach decomposes all polygon rings into unique edge chains at junction vertices, simplifies each chain once with numba-compiled Douglas-Peucker, then reassembles rings. This guarantees adjacent polygons share identical simplified boundaries (no gaps/overlaps).
-
-**Tech Stack:** Python, numba (`@ngjit`), numpy, xarray, pytest. Optional: shapely (for topology tests only).
-
----
-
-### Task 1: Douglas-Peucker kernel
-
-**Files:**
-- Modify: `xrspatial/polygonize.py` (insert after `_group_rings_into_polygons` ~line 920)
-- Test: `xrspatial/tests/test_polygonize.py`
-
-- [ ] **Step 1: Write the failing test for `_douglas_peucker`**
-
-Add at the end of `xrspatial/tests/test_polygonize.py`:
-
-```python
-class TestSimplifyHelpers:
-    """Tests for internal simplification helper functions."""
-
-    def test_douglas_peucker_straight_line(self):
-        """DP on a straight line should reduce to just endpoints."""
-        from ..polygonize import _douglas_peucker
-        coords = np.array([[0.0, 0.0], [1.0, 0.0], [2.0, 0.0],
-                           [3.0, 0.0], [4.0, 0.0]], dtype=np.float64)
-        result = _douglas_peucker(coords, 0.1)
-        expected = np.array([[0.0, 0.0], [4.0, 0.0]], dtype=np.float64)
-        assert_allclose(result, expected)
-
-    def test_douglas_peucker_preserves_bend(self):
-        """DP should keep a vertex that exceeds tolerance."""
-        from ..polygonize import _douglas_peucker
-        coords = np.array([[0.0, 0.0], [2.0, 3.0], [4.0, 0.0]],
-                          dtype=np.float64)
-        # Distance of (2,3) from line (0,0)-(4,0) is 3.0
-        result = _douglas_peucker(coords, 2.0)
-        assert len(result) == 3  # all points kept
-
-    def test_douglas_peucker_removes_below_tolerance(self):
-        """DP should remove a vertex within tolerance."""
-        from ..polygonize import _douglas_peucker
-        coords = np.array([[0.0, 0.0], [2.0, 0.5], [4.0, 0.0]],
-                          dtype=np.float64)
-        # Distance of (2,0.5) from line (0,0)-(4,0) is 0.5
-        result = _douglas_peucker(coords, 1.0)
-        expected = np.array([[0.0, 0.0], [4.0, 0.0]], dtype=np.float64)
-        assert_allclose(result, expected)
-
-    def test_douglas_peucker_two_points(self):
-        """DP on two points should return them unchanged."""
-        from ..polygonize import _douglas_peucker
-        coords = np.array([[0.0, 0.0], [4.0, 0.0]], dtype=np.float64)
-        result = _douglas_peucker(coords, 1.0)
-        assert_allclose(result, coords)
-```
-
-- [ ] **Step 2: Run test to verify it fails**
-
-Run: `pytest xrspatial/tests/test_polygonize.py::TestSimplifyHelpers -v`
-Expected: FAIL with ImportError (`_douglas_peucker` not found)
-
-- [ ] **Step 3: Implement `_douglas_peucker`**
-
-Add to `xrspatial/polygonize.py` after `_group_rings_into_polygons` (~line 920), before `_merge_polygon_rings`:
-
-```python
-@ngjit
-def _perpendicular_distance(px, py, ax, ay, bx, by):
-    """Perpendicular distance from point (px,py) to line (ax,ay)-(bx,by)."""
-    dx = bx - ax
-    dy = by - ay
-    len_sq = dx * dx + dy * dy
-    if len_sq == 0.0:
-        return np.sqrt((px - ax) ** 2 + (py - ay) ** 2)
-    t = ((px - ax) * dx + (py - ay) * dy) / len_sq
-    t = max(0.0, min(1.0, t))
-    proj_x = ax + t * dx
-    proj_y = ay + t * dy
-    return np.sqrt((px - proj_x) ** 2 + (py - proj_y) ** 2)
-
-
-@ngjit
-def _douglas_peucker(coords, tolerance):
-    """Douglas-Peucker line simplification on an Nx2 float64 array.
-
-    Endpoints are always preserved. Returns a new Nx2 array with
-    only the retained vertices.
-    """
-    n = len(coords)
-    if n <= 2:
-        return coords.copy()
-
-    # Iterative DP using an explicit stack to avoid recursion depth issues.
-    keep = np.zeros(n, dtype=nb.boolean)
-    keep[0] = True
-    keep[n - 1] = True
-
-    # Stack of (start, end) index pairs.
-    stack = [(np.int64(0), np.int64(n - 1))]
-    while len(stack) > 0:
-        start, end = stack.pop()
-        if end - start < 2:
-            continue
-
-        ax, ay = coords[start, 0], coords[start, 1]
-        bx, by = coords[end, 0], coords[end, 1]
-
-        max_dist = 0.0
-        max_idx = start
-        for i in range(start + 1, end):
-            d = _perpendicular_distance(
-                coords[i, 0], coords[i, 1], ax, ay, bx, by)
-            if d > max_dist:
-                max_dist = d
-                max_idx = i
-
-        if max_dist > tolerance:
-            keep[max_idx] = True
-            stack.append((start, max_idx))
-            stack.append((max_idx, end))
-
-    count = 0
-    for i in range(n):
-        if keep[i]:
-            count += 1
-
-    result = np.empty((count, 2), dtype=np.float64)
-    j = 0
-    for i in range(n):
-        if keep[i]:
-            result[j, 0] = coords[i, 0]
-            result[j, 1] = coords[i, 1]
-            j += 1
-
-    return result
-```
-
-- [ ] **Step 4: Run test to verify it passes**
-
-Run: `pytest xrspatial/tests/test_polygonize.py::TestSimplifyHelpers -v`
-Expected: PASS (all 4 tests)
-
-- [ ] **Step 5: Commit**
-
-```bash
-git add xrspatial/polygonize.py xrspatial/tests/test_polygonize.py
-git commit -m "Add Douglas-Peucker kernel for polygonize simplification (#1151)"
-```
-
----
-
-### Task 2: Shared-edge simplification orchestrator
-
-**Files:**
-- Modify: `xrspatial/polygonize.py` (insert after `_douglas_peucker`)
-- Test: `xrspatial/tests/test_polygonize.py`
-
-- [ ] **Step 1: Write the failing test for `_simplify_polygons`**
-
-Add to the `TestSimplifyHelpers` class in `xrspatial/tests/test_polygonize.py`:
-
-```python
-    def test_simplify_polygons_reduces_vertices(self):
-        """Simplification should reduce vertex count on staircase edges."""
-        from ..polygonize import _simplify_polygons
-
-        # L-shaped polygon (exterior only, staircase boundary).
-        # 3x3 grid, value in top-left 2x2 block.
-        raster = np.array([[1, 1, 0],
-                           [1, 1, 0],
-                           [0, 0, 0]], dtype=np.int64)
-        data = xr.DataArray(raster)
-        column_orig, pp_orig = polygonize(data, return_type="numpy")
-
-        from ..polygonize import _simplify_polygons
-        pp_simplified = _simplify_polygons(pp_orig, tolerance=0.5)
-
-        # Area must be preserved.
-        for orig_rings, simp_rings in zip(pp_orig, pp_simplified):
-            orig_area = sum(calc_boundary_area(r) for r in orig_rings)
-            simp_area = sum(calc_boundary_area(r) for r in simp_rings)
-            assert_allclose(simp_area, orig_area, atol=1e-10)
-
-    def test_simplify_polygons_topology_preserved(self):
-        """Adjacent simplified polygons should not create gaps."""
-        from ..polygonize import _simplify_polygons
-
-        # Checkerboard-ish: two adjacent rectangles sharing an edge.
-        raster = np.array([[1, 1, 2, 2],
-                           [1, 1, 2, 2],
-                           [1, 1, 2, 2],
-                           [1, 1, 2, 2]], dtype=np.int64)
-        data = xr.DataArray(raster)
-        column, pp = polygonize(data, return_type="numpy")
-
-        pp_simplified = _simplify_polygons(pp, tolerance=0.0)
-
-        # With tolerance=0, no vertices should be removed; output
-        # should match input exactly.
-        for orig_rings, simp_rings in zip(pp, pp_simplified):
-            assert len(orig_rings) == len(simp_rings)
-            for orig_ring, simp_ring in zip(orig_rings, simp_rings):
-                assert_allclose(simp_ring, orig_ring)
-
-    def test_simplify_polygons_shared_edge_identical(self):
-        """Two polygons sharing an edge must have identical simplified edges."""
-        from ..polygonize import _simplify_polygons
-
-        # Create a raster where two regions share a staircase boundary.
-        raster = np.array([
-            [1, 1, 1, 2, 2, 2],
-            [1, 1, 2, 2, 2, 2],
-            [1, 2, 2, 2, 2, 2],
-            [1, 1, 2, 2, 2, 2],
-            [1, 1, 1, 2, 2, 2],
-        ], dtype=np.int64)
-        data = xr.DataArray(raster)
-        column, pp = polygonize(data, return_type="numpy")
-
-        pp_simplified = _simplify_polygons(pp, tolerance=1.5)
-
-        # Extract edge vertices of each polygon. The shared boundary
-        # should appear in both polygons with identical coordinates.
-        # Check total area is preserved (which requires no gaps/overlaps).
-        total_orig = sum(
-            sum(calc_boundary_area(r) for r in rings) for rings in pp)
-        total_simp = sum(
-            sum(calc_boundary_area(r) for r in rings) for rings in pp_simplified)
-        assert_allclose(total_simp, total_orig, atol=1e-10)
-```
-
-- [ ] **Step 2: Run test to verify it fails**
-
-Run: `pytest xrspatial/tests/test_polygonize.py::TestSimplifyHelpers::test_simplify_polygons_reduces_vertices -v`
-Expected: FAIL with ImportError (`_simplify_polygons` not found)
-
-- [ ] **Step 3: Implement shared-edge orchestrator functions**
-
-Add to `xrspatial/polygonize.py` after `_douglas_peucker`:
-
-```python
-def _find_junctions(all_rings):
-    """Find junction vertices where 3+ ring boundaries meet.
-
-    A junction is any (x, y) coordinate that appears in 3 or more
-    distinct rings.  These vertices are pinned during simplification.
-
-    Parameters
-    ----------
-    all_rings : list of list of np.ndarray
-        polygon_points structure: list of polygons, each polygon is
-        a list of rings (Nx2 arrays, closed).
-
-    Returns
-    -------
-    set of (float, float)
-    """
-    vertex_ring_count = {}  # (x, y) -> set of ring identifiers
-    ring_id = 0
-    for rings in all_rings:
-        for ring in rings:
-            for k in range(len(ring) - 1):  # skip closing duplicate
-                pt = (ring[k, 0], ring[k, 1])
-                if pt not in vertex_ring_count:
-                    vertex_ring_count[pt] = set()
-                vertex_ring_count[pt].add(ring_id)
-            ring_id += 1
-
-    return {pt for pt, ids in vertex_ring_count.items() if len(ids) >= 3}
-
-
-def _split_ring_at_junctions(ring, junctions):
-    """Split a closed ring into chains at junction vertices.
-
-    Each chain starts and ends at a junction vertex (endpoints included
-    in the chain).  If the ring contains no junctions, the entire ring
-    is returned as a single chain.
-
-    Parameters
-    ----------
-    ring : np.ndarray, shape (N, 2)
-        Closed ring (first == last vertex).
-    junctions : set of (float, float)
-
-    Returns
-    -------
-    list of np.ndarray
-        Each array is an Mx2 chain.  Consecutive chains share their
-        endpoint/startpoint.
-    """
-    n = len(ring) - 1  # number of unique vertices
-
-    # Find indices of junction vertices within this ring.
-    junction_indices = []
-    for k in range(n):
-        if (ring[k, 0], ring[k, 1]) in junctions:
-            junction_indices.append(k)
-
-    if len(junction_indices) == 0:
-        # No junctions: return the whole ring as a single chain.
-        return [ring.copy()]
-
-    # Rotate ring so that the first junction is at index 0.
-    first = junction_indices[0]
-    if first > 0:
-        # Rotate unique vertices, then re-close.
-        rotated = np.empty_like(ring)
-        rotated[:n - first] = ring[first:n]
-        rotated[n - first:n] = ring[:first]
-        rotated[n] = rotated[0]
-        ring = rotated
-        junction_indices = [(ji - first) % n for ji in junction_indices]
-        junction_indices.sort()
-
-    # Split at each junction.
-    chains = []
-    for i in range(len(junction_indices)):
-        start = junction_indices[i]
-        if i + 1 < len(junction_indices):
-            end = junction_indices[i + 1]
-        else:
-            end = n  # wrap back to first junction (index 0 after rotation)
-        chains.append(ring[start:end + 1].copy())
-
-    return chains
-
-
-def _chain_key(chain):
-    """Canonical key for deduplicating shared edge chains.
-
-    Two chains that connect the same pair of junctions but are traversed
-    in opposite directions should map to the same key.  We use the sorted
-    endpoint pair plus the frozenset of interior vertices.
-    """
-    start = (chain[0, 0], chain[0, 1])
-    end = (chain[-1, 0], chain[-1, 1])
-    if start > end:
-        start, end = end, start
-    # Include chain length to disambiguate chains between the same
-    # junction pair with different paths.
-    interior = tuple(
-        (chain[k, 0], chain[k, 1]) for k in range(1, len(chain) - 1))
-    # For reversed chains, interior order is reversed.
-    interior_rev = interior[::-1]
-    interior = min(interior, interior_rev)
-    return (start, end, interior)
-
-
-def _simplify_polygons(polygon_points, tolerance):
-    """Topology-preserving simplification of all polygons.
-
-    Uses shared-edge decomposition: finds junction vertices, splits
-    rings into chains at junctions, simplifies each unique chain once
-    with Douglas-Peucker, then reassembles rings.
-
-    Parameters
-    ----------
-    polygon_points : list of list of np.ndarray
-        Output of polygonize backend: list of polygons, each polygon
-        is [exterior_ring, *hole_rings].
-    tolerance : float
-        Douglas-Peucker tolerance in coordinate units.
-
-    Returns
-    -------
-    list of list of np.ndarray
-        Same structure as input, with simplified coordinates.
-    """
-    if tolerance <= 0:
-        return polygon_points
-
-    # Step 1: Find junctions.
-    junctions = _find_junctions(polygon_points)
-
-    # Step 2 & 3: Split rings into chains, deduplicate, simplify.
-    simplified_chains = {}  # chain_key -> simplified np.ndarray
-
-    # We also need to track how to reassemble each ring.
-    # ring_info[poly_idx][ring_idx] = list of (chain_key, is_reversed)
-    ring_info = []
-
-    for poly_idx, rings in enumerate(polygon_points):
-        poly_info = []
-        for ring in rings:
-            chains = _split_ring_at_junctions(ring, junctions)
-            chain_refs = []
-            for chain in chains:
-                key = _chain_key(chain)
-                if key not in simplified_chains:
-                    simplified_chains[key] = _douglas_peucker(chain, tolerance)
-                # Determine if this chain was reversed relative to canonical.
-                start = (chain[0, 0], chain[0, 1])
-                canonical_start = (simplified_chains[key][0, 0],
-                                   simplified_chains[key][0, 1])
-                is_reversed = (start != canonical_start)
-                chain_refs.append((key, is_reversed))
-            poly_info.append(chain_refs)
-        ring_info.append(poly_info)
-
-    # Step 4: Reassemble rings.
-    result = []
-    for poly_idx, rings in enumerate(polygon_points):
-        new_rings = []
-        for ring_idx, chain_refs in enumerate(ring_info[poly_idx]):
-            if len(chain_refs) == 1 and len(chain_refs[0]) == 2:
-                key, is_reversed = chain_refs[0]
-                simplified = simplified_chains[key]
-                if is_reversed:
-                    simplified = simplified[::-1].copy()
-                # Ensure ring is closed.
-                if not (simplified[0, 0] == simplified[-1, 0] and
-                        simplified[0, 1] == simplified[-1, 1]):
-                    simplified = np.vstack([simplified, simplified[:1]])
-                new_rings.append(simplified)
-            else:
-                # Multiple chains: concatenate (drop duplicate junction points).
-                parts = []
-                for key, is_reversed in chain_refs:
-                    simplified = simplified_chains[key]
-                    if is_reversed:
-                        simplified = simplified[::-1].copy()
-                    if parts:
-                        # Skip first point (same as last of previous chain).
-                        parts.append(simplified[1:])
-                    else:
-                        parts.append(simplified)
-                assembled = np.vstack(parts)
-                # Ensure ring is closed.
-                if not (assembled[0, 0] == assembled[-1, 0] and
-                        assembled[0, 1] == assembled[-1, 1]):
-                    assembled = np.vstack([assembled, assembled[:1]])
-                new_rings.append(assembled)
-
-        # Drop degenerate rings (fewer than 4 vertices = triangle minimum).
-        filtered = []
-        for ring in new_rings:
-            if len(ring) >= 4:
-                filtered.append(ring)
-            elif len(new_rings) > 0 and ring is new_rings[0]:
-                # Keep exterior even if degenerate (shouldn't happen with
-                # reasonable tolerances, but better than losing the polygon).
-                filtered.append(ring)
-        if filtered:
-            result.append(filtered)
-        else:
-            result.append(new_rings)
-
-    return result
-```
-
-- [ ] **Step 4: Run tests to verify they pass**
-
-Run: `pytest xrspatial/tests/test_polygonize.py::TestSimplifyHelpers -v`
-Expected: PASS (all 7 tests)
-
-- [ ] **Step 5: Commit**
-
-```bash
-git add xrspatial/polygonize.py xrspatial/tests/test_polygonize.py
-git commit -m "Add shared-edge simplification orchestrator (#1151)"
-```
-
----
-
-### Task 3: Wire into `polygonize()` public API
-
-**Files:**
-- Modify: `xrspatial/polygonize.py` (function signature + body at ~line 1021)
-- Test: `xrspatial/tests/test_polygonize.py`
-
-- [ ] **Step 1: Write failing tests for the public API**
-
-Add to `xrspatial/tests/test_polygonize.py`:
-
-```python
-class TestPolygonizeSimplify:
-    """Tests for simplify_tolerance and simplify_method parameters."""
-
-    def test_tolerance_none_no_change(self):
-        """tolerance=None should produce identical output to no-arg call."""
-        raster = np.array([[1, 1, 2, 2],
-                           [1, 1, 2, 2]], dtype=np.int64)
-        data = xr.DataArray(raster)
-        col1, pp1 = polygonize(data)
-        col2, pp2 = polygonize(data, simplify_tolerance=None)
-        assert col1 == col2
-        for r1, r2 in zip(pp1, pp2):
-            for a, b in zip(r1, r2):
-                assert_allclose(a, b)
-
-    def test_tolerance_zero_no_change(self):
-        """tolerance=0.0 should produce identical output."""
-        raster = np.array([[1, 1, 2, 2],
-                           [1, 1, 2, 2]], dtype=np.int64)
-        data = xr.DataArray(raster)
-        col1, pp1 = polygonize(data)
-        col2, pp2 = polygonize(data, simplify_tolerance=0.0)
-        assert col1 == col2
-        for r1, r2 in zip(pp1, pp2):
-            for a, b in zip(r1, r2):
-                assert_allclose(a, b)
-
-    def test_negative_tolerance_raises(self):
-        """Negative tolerance should raise ValueError."""
-        raster = np.array([[1, 1], [1, 1]], dtype=np.int64)
-        data = xr.DataArray(raster)
-        with pytest.raises(ValueError, match="simplify_tolerance"):
-            polygonize(data, simplify_tolerance=-1.0)
-
-    def test_visvalingam_not_implemented(self):
-        """Visvalingam-Whyatt should raise NotImplementedError."""
-        raster = np.array([[1, 1], [1, 1]], dtype=np.int64)
-        data = xr.DataArray(raster)
-        with pytest.raises(NotImplementedError):
-            polygonize(data, simplify_tolerance=1.0,
-                       simplify_method="visvalingam-whyatt")
-
-    def test_invalid_method_raises(self):
-        """Unknown method should raise ValueError."""
-        raster = np.array([[1, 1], [1, 1]], dtype=np.int64)
-        data = xr.DataArray(raster)
-        with pytest.raises(ValueError, match="simplify_method"):
-            polygonize(data, simplify_tolerance=1.0,
-                       simplify_method="invalid")
-
-    def test_simplify_reduces_vertices(self):
-        """Simplification should reduce total vertex count."""
-        # Staircase boundary between two values.
-        raster = np.array([
-            [1, 1, 1, 2, 2, 2],
-            [1, 1, 2, 2, 2, 2],
-            [1, 2, 2, 2, 2, 2],
-            [1, 1, 2, 2, 2, 2],
-            [1, 1, 1, 2, 2, 2],
-        ], dtype=np.int64)
-        data = xr.DataArray(raster)
-        _, pp_orig = polygonize(data)
-        _, pp_simp = polygonize(data, simplify_tolerance=1.5)
-
-        orig_verts = sum(len(r) for rings in pp_orig for r in rings)
-        simp_verts = sum(len(r) for rings in pp_simp for r in rings)
-        assert simp_verts < orig_verts
-
-    def test_simplify_preserves_area(self):
-        """Total area must be preserved after simplification."""
-        raster = np.array([
-            [1, 1, 1, 2, 2, 2],
-            [1, 1, 2, 2, 2, 2],
-            [1, 2, 2, 2, 2, 2],
-            [1, 1, 2, 2, 2, 2],
-            [1, 1, 1, 2, 2, 2],
-        ], dtype=np.int64)
-        data = xr.DataArray(raster)
-        _, pp_orig = polygonize(data)
-        _, pp_simp = polygonize(data, simplify_tolerance=1.5)
-
-        total_orig = sum(
-            sum(calc_boundary_area(r) for r in rings) for rings in pp_orig)
-        total_simp = sum(
-            sum(calc_boundary_area(r) for r in rings) for rings in pp_simp)
-        assert_allclose(total_simp, total_orig, atol=1e-10)
-
-    def test_simplify_with_geopandas(self):
-        """Simplification should work with geopandas return type."""
-        pytest.importorskip("geopandas")
-        raster = np.array([
-            [1, 1, 2, 2],
-            [1, 1, 2, 2],
-            [1, 1, 2, 2],
-        ], dtype=np.int64)
-        data = xr.DataArray(raster)
-        gdf = polygonize(data, simplify_tolerance=0.5,
-                         return_type="geopandas")
-        assert len(gdf) == 2  # two polygons
-        assert gdf.geometry.is_valid.all()
-
-    def test_simplify_with_geojson(self):
-        """Simplification should work with geojson return type."""
-        raster = np.array([
-            [1, 1, 2, 2],
-            [1, 1, 2, 2],
-        ], dtype=np.int64)
-        data = xr.DataArray(raster)
-        fc = polygonize(data, simplify_tolerance=0.5,
-                        return_type="geojson")
-        assert fc["type"] == "FeatureCollection"
-        assert len(fc["features"]) == 2
-
-
-@pytest.mark.skipif(da is None, reason="dask not installed")
-class TestPolygonizeSimplifyDask:
-    """Simplification with dask backend."""
-
-    def test_simplify_dask_matches_numpy(self):
-        """Dask simplification should produce same areas as numpy."""
-        raster = np.array([
-            [1, 1, 1, 2, 2, 2],
-            [1, 1, 2, 2, 2, 2],
-            [1, 2, 2, 2, 2, 2],
-            [1, 1, 2, 2, 2, 2],
-            [1, 1, 1, 2, 2, 2],
-        ], dtype=np.int64)
-
-        data_np = xr.DataArray(raster)
-        data_dask = xr.DataArray(da.from_array(raster, chunks=(3, 3)))
-
-        _, pp_np = polygonize(data_np, simplify_tolerance=1.5)
-        _, pp_dask = polygonize(data_dask, simplify_tolerance=1.5)
-
-        # Compare per-value area sums.
-        col_np, _ = polygonize(data_np, simplify_tolerance=1.5)
-        col_dask, _ = polygonize(data_dask, simplify_tolerance=1.5)
-
-        areas_np = {}
-        for val, rings in zip(col_np, pp_np):
-            a = sum(calc_boundary_area(r) for r in rings)
-            areas_np[val] = areas_np.get(val, 0.0) + a
-
-        areas_dask = {}
-        for val, rings in zip(col_dask, pp_dask):
-            a = sum(calc_boundary_area(r) for r in rings)
-            areas_dask[val] = areas_dask.get(val, 0.0) + a
-
-        for val in areas_np:
-            assert_allclose(areas_dask[val], areas_np[val], atol=1e-10)
-```
-
-- [ ] **Step 2: Run test to verify it fails**
-
-Run: `pytest xrspatial/tests/test_polygonize.py::TestPolygonizeSimplify::test_negative_tolerance_raises -v`
-Expected: FAIL with TypeError (unexpected keyword argument)
-
-- [ ] **Step 3: Modify `polygonize()` signature and body**
-
-In `xrspatial/polygonize.py`, update the `polygonize` function:
-
-**Signature** — add the two new parameters:
-
-```python
-def polygonize(
-    raster: xr.DataArray,
-    mask: Optional[xr.DataArray] = None,
-    connectivity: int = 4,
-    transform: Optional[np.ndarray] = None,
-    column_name: str = "DN",
-    return_type: str = "numpy",
-    simplify_tolerance: Optional[float] = None,
-    simplify_method: str = "douglas-peucker",
-):
-```
-
-**Docstring** — add parameter descriptions after the `return_type` entry:
-
-```
-    simplify_tolerance: float, optional
-        Douglas-Peucker simplification tolerance in coordinate units.
-        When set, polygon boundaries are simplified using shared-edge
-        decomposition to preserve topology between adjacent polygons.
-        Default is None (no simplification).
-
-    simplify_method: str, default="douglas-peucker"
-        Simplification algorithm.  Currently only "douglas-peucker" is
-        supported.  "visvalingam-whyatt" is reserved for future use.
-```
-
-**Validation** — add after the transform check block (~line 1112):
-
-```python
-    # Check simplification parameters.
-    if simplify_tolerance is not None and simplify_tolerance < 0:
-        raise ValueError(
-            "simplify_tolerance must be non-negative, "
-            f"got {simplify_tolerance}")
-    if simplify_method not in ("douglas-peucker", "visvalingam-whyatt"):
-        raise ValueError(
-            f"simplify_method must be 'douglas-peucker' or "
-            f"'visvalingam-whyatt', got '{simplify_method}'")
-    if (simplify_method == "visvalingam-whyatt"
-            and simplify_tolerance is not None
-            and simplify_tolerance > 0):
-        raise NotImplementedError(
-            "Visvalingam-Whyatt simplification is not yet implemented")
-```
-
-**Simplification call** — add after the `mapper(raster)(...)` call and before the return-type conversion block:
-
-```python
-    # Apply simplification if requested.
-    if simplify_tolerance is not None and simplify_tolerance > 0:
-        polygon_points = _simplify_polygons(polygon_points, simplify_tolerance)
-```
-
-- [ ] **Step 4: Run tests to verify they pass**
-
-Run: `pytest xrspatial/tests/test_polygonize.py::TestPolygonizeSimplify xrspatial/tests/test_polygonize.py::TestPolygonizeSimplifyDask -v`
-Expected: PASS (all tests)
-
-- [ ] **Step 5: Run the full test suite to check for regressions**
-
-Run: `pytest xrspatial/tests/test_polygonize.py -v`
-Expected: All existing tests PASS (new parameters have defaults so no breakage)
-
-- [ ] **Step 6: Commit**
-
-```bash
-git add xrspatial/polygonize.py xrspatial/tests/test_polygonize.py
-git commit -m "Wire simplify_tolerance into polygonize() public API (#1151)"
-```
-
----
-
-### Task 4: Update documentation
-
-**Files:**
-- Modify: `docs/source/reference/utilities.rst` (no changes needed — autodoc picks up new params)
-- Verify: docstring is complete and renders correctly
-
-- [ ] **Step 1: Verify docstring has the new parameters**
-
-Read `xrspatial/polygonize.py` and confirm `simplify_tolerance` and `simplify_method` are documented in the `polygonize` docstring.
-
-- [ ] **Step 2: Test docs build (if sphinx is available)**
-
-Run: `cd docs && make html 2>&1 | tail -20`
-If sphinx is not set up, skip. The autodoc entry in `utilities.rst` already points to `xrspatial.polygonize.polygonize`, so new params will appear automatically.
-
-- [ ] **Step 3: Commit (if any doc changes were needed)**
-
-Only commit if manual changes were required. Autodoc should handle it.
-
----
-
-### Task 5: Create user guide notebook
-
-**Files:**
-- Create: `examples/user_guide/50_Polygonize_Simplification.ipynb`
-
-- [ ] **Step 1: Create the notebook**
-
-Create `examples/user_guide/50_Polygonize_Simplification.ipynb` with these cells:
-
-**Cell 1 (markdown):**
-```markdown
-# Polygonize with Geometry Simplification
-
-`polygonize()` converts raster regions into vector polygons. On high-resolution rasters, the resulting geometries can have thousands of vertices per polygon, making them unwieldy for rendering, file export, and spatial joins.
-
-The `simplify_tolerance` parameter applies Douglas-Peucker simplification during polygonization. Topology is preserved: adjacent polygons share identical simplified boundaries, so no gaps or overlaps appear between neighbors.
-```
-
-**Cell 2 (code):**
-```python
-import numpy as np
-import xarray as xr
-import matplotlib.pyplot as plt
-from matplotlib.patches import Polygon as MplPolygon
-from matplotlib.collections import PatchCollection
-
-from xrspatial import polygonize
-```
-
-**Cell 3 (markdown):**
-```markdown
-## Generate a sample classified raster
-
-We'll create a synthetic land-cover raster with irregular region boundaries — the kind of output you'd get from a classification or segmentation step.
-```
-
-**Cell 4 (code):**
-```python
-rng = np.random.default_rng(42)
-shape = (80, 120)
-
-# Start with smooth noise, then classify into 4 land-cover types.
-from scipy.ndimage import gaussian_filter
-noise = rng.standard_normal(shape)
-smooth = gaussian_filter(noise, sigma=8)
-classified = np.digitize(smooth, bins=[-0.5, 0.0, 0.5]) + 1  # values 1-4
-
-raster = xr.DataArray(classified.astype(np.int32))
-
-fig, ax = plt.subplots(figsize=(10, 6))
-raster.plot(ax=ax, cmap="Set2", add_colorbar=True)
-ax.set_title("Classified raster (4 land-cover types)")
-ax.set_aspect("equal")
-plt.tight_layout()
-plt.show()
-```
-
-**Cell 5 (markdown):**
-```markdown
-## Polygonize without simplification
-
-First, let's see what the raw pixel-boundary polygons look like.
-```
-
-**Cell 6 (code):**
-```python
-col_raw, pp_raw = polygonize(raster)
-total_verts_raw = sum(len(r) for rings in pp_raw for r in rings)
-print(f"Polygons: {len(pp_raw)}, Total vertices: {total_verts_raw}")
-```
-
-**Cell 7 (code):**
-```python
-def plot_polygons(polygon_points, column, title, ax):
-    """Plot polygons colored by value."""
-    cmap = plt.cm.Set2
-    vals = sorted(set(column))
-    val_to_color = {v: cmap(i / max(len(vals) - 1, 1)) for i, v in enumerate(vals)}
-
-    patches = []
-    colors = []
-    for val, rings in zip(column, polygon_points):
-        ext = rings[0]
-        patches.append(MplPolygon(ext[:, :2], closed=True))
-        colors.append(val_to_color[val])
-
-    pc = PatchCollection(patches, facecolors=colors, edgecolors="black",
-                         linewidths=0.3)
-    ax.add_collection(pc)
-    ax.set_xlim(0, 120)
-    ax.set_ylim(0, 80)
-    ax.set_aspect("equal")
-    ax.set_title(title)
-    ax.invert_yaxis()
-
-fig, ax = plt.subplots(figsize=(10, 6))
-plot_polygons(pp_raw, col_raw, f"Raw polygons ({total_verts_raw} vertices)", ax)
-plt.tight_layout()
-plt.show()
-```
-
-**Cell 8 (markdown):**
-```markdown
-## Polygonize with simplification
-
-Now apply Douglas-Peucker simplification with increasing tolerances. The `simplify_tolerance` is in the raster's coordinate units (pixels here, but would be meters or degrees with a georeferenced raster).
-
-Topology is preserved: adjacent polygons share identical simplified edges, so no gaps appear between them.
-```
-
-**Cell 9 (code):**
-```python
-fig, axes = plt.subplots(1, 3, figsize=(18, 6))
-
-for ax, tol in zip(axes, [0.5, 1.5, 3.0]):
-    col, pp = polygonize(raster, simplify_tolerance=tol)
-    n_verts = sum(len(r) for rings in pp for r in rings)
-    reduction = 100 * (1 - n_verts / total_verts_raw)
-    plot_polygons(pp, col,
-                  f"tolerance={tol}  ({n_verts} verts, {reduction:.0f}% reduction)",
-                  ax)
-
-plt.tight_layout()
-plt.show()
-```
-
-**Cell 10 (markdown):**
-```markdown
-## GeoDataFrame output
-
-`simplify_tolerance` works with all return types. Here's a GeoDataFrame:
-```
-
-**Cell 11 (code):**
-```python
-gdf = polygonize(raster, simplify_tolerance=1.5, return_type="geopandas",
-                 column_name="landcover")
-print(gdf.head())
-print(f"\nAll geometries valid: {gdf.geometry.is_valid.all()}")
-```
-
-- [ ] **Step 2: Commit**
-
-```bash
-git add examples/user_guide/50_Polygonize_Simplification.ipynb
-git commit -m "Add polygonize simplification user guide notebook (#1151)"
-```
-
----
-
-### Task 6: Update README feature matrix
-
-**Files:**
-- Modify: `README.md` (~line 514)
-
-- [ ] **Step 1: Verify no README change is needed**
-
-The simplification feature adds parameters to the existing `polygonize()` function. No new function is created, and backend support does not change. The existing README row:
-
-```
-| [Polygonize](xrspatial/polygonize.py) | Converts contiguous regions of equal value into vector polygons | Standard (CCL) | ✅️ | ✅️ | ✅️ | 🔄 |
-```
-
-is still accurate. **No change needed.** Skip this task.
-
----
-
-### Task 7: Final integration check
-
-- [ ] **Step 1: Run the full polygonize test suite**
-
-Run: `pytest xrspatial/tests/test_polygonize.py -v --tb=short`
-Expected: All tests pass, including new simplification tests.
-
-- [ ] **Step 2: Verify backward compatibility**
-
-Run: `python -c "from xrspatial import polygonize; import xarray as xr, numpy as np; d = xr.DataArray(np.array([[1,2],[3,4]])); c, p = polygonize(d); print(f'{len(c)} polygons')"`
-Expected: `4 polygons` (no error, same behavior as before)
diff --git a/docs/superpowers/specs/2026-03-23-lightweight-crs-parser-design.md b/docs/superpowers/specs/2026-03-23-lightweight-crs-parser-design.md
deleted file mode 100644
index d55f7808..00000000
--- a/docs/superpowers/specs/2026-03-23-lightweight-crs-parser-design.md
+++ /dev/null
@@ -1,166 +0,0 @@
-# Lightweight CRS Parser -- Drop-in `pyproj.CRS` Replacement
-
-**Issue:** [#1057](https://github.com/xarray-contrib/xarray-spatial/issues/1057)
-**Date:** 2026-03-23
-
-## Problem
-
-The reproject module uses pyproj for two things:
-
-1. **CRS metadata** (~1ms): parsing EPSG codes, extracting projection parameters, comparing CRS objects
-2. **Coordinate transforms** (30-700ms): per-pixel projection math
-
-We already replaced #2 with Numba JIT kernels for the most common projection families. But #1 still requires pyproj at import time, making it a hard dependency even when the actual math never touches it.
-
-## Goal
-
-Make `pip install xarray-spatial` sufficient for basic reprojection (UTM, Web Mercator, LCC, Albers, etc.) without pyproj. Users with exotic CRS or datum requirements still benefit from installing pyproj.
-
-## Design
-
-### New file: `xrspatial/reproject/_lite_crs.py`
-
-A `CRS` class that implements the same interface surface as `pyproj.CRS` for EPSG codes that have Numba fast paths.
-
-**Constructor signatures:**
-
-```python
-CRS(4326)                    # from EPSG int
-CRS("EPSG:4326")             # from authority string
-CRS.from_epsg(4326)          # classmethod
-CRS.from_wkt(wkt_string)     # extracts EPSG from AUTHORITY["EPSG","XXXX"] via regex
-```
-
-Raises `ValueError` if the EPSG code is not in the embedded table.
-
-**Interface (duck-type compatible with pyproj.CRS):**
-
-| Method/Property | Return type | Description |
-|---|---|---|
-| `.to_dict()` | `dict` | PROJ4-style parameter dict (`k_0` is the canonical scale factor key) |
-| `.to_wkt()` | `str` | OGC WKT string |
-| `.to_epsg()` | `int` | EPSG code |
-| `.to_authority()` | `tuple[str, str]` | `('EPSG', '4326')` |
-| `.is_geographic` | `bool` | True for lat/lon CRS |
-
-The `.to_dict()` return value uses PROJ4-style keys: `proj`, `datum`, `ellps`, `lon_0`, `lat_0`, `lat_1`, `lat_2`, `k_0`, `x_0`, `y_0`, `units`, `zone`. The scale factor is always stored as `k_0` (pyproj convention); the `_*_params()` extractors already handle `d.get('k_0', d.get('k', 1.0))` fallback, so this is compatible.
-
-**Equality and hashing:** Two `CRS` objects with the same EPSG code are equal and hash equally. Cross-type comparison with pyproj is one-directional: `lite_crs == pyproj_crs` works (our `__eq__` calls `.to_epsg()` on the other object), but `pyproj_crs == lite_crs` uses pyproj's WKT comparison and may return False. To avoid issues, ensure both CRS objects in any comparison pass through `_resolve_crs()` so they are the same type.
-
-### Embedded EPSG table
-
-Covers codes where Numba fast paths exist:
-
-| Category | EPSG codes |
-|---|---|
-| Geographic | 4326 (WGS84), 4269 (NAD83), 4267 (NAD27) |
-| Web Mercator | 3857 |
-| Ellipsoidal Mercator | 3395 |
-| UTM North (WGS84) | 32601-32660 |
-| UTM South (WGS84) | 32701-32760 |
-| UTM (NAD83) | 26901-26923 |
-| Lambert Conformal Conic | 2154 |
-| Albers Equal Area | 5070 |
-| Lambert Azimuthal Equal Area | 3035 |
-| Polar Stereographic | 3031, 3413, 3996 |
-| Oblique Stereographic | 28992 |
-| Cylindrical Equal Area | 6933 |
-
-UTM zones are generated programmatically (not stored individually). Named codes store their full PROJ4 parameter dict and a WKT template string.
-
-**Not in table:**
-
-- **Sinusoidal** -- has a Numba fast path (`_sinu_params`), but MODIS sinusoidal grids use custom WKT or SR-ORG codes, not a standard EPSG code. The fast path dispatches via `.to_dict()['proj'] == 'sinu'`, so it still fires when pyproj is installed and produces the right dict. Without pyproj, sinusoidal falls back to requiring pyproj.
-- **Generic Transverse Mercator** (State Plane, national grids) -- `_tmerc_params` dispatches via `.to_dict()['proj'] == 'tmerc'` for hundreds of EPSG codes. Embedding all State Plane codes is out of scope. These fast paths only fire when pyproj provides the `.to_dict()`.
-- **Oblique Mercator Hotine** -- `_omerc_params` exists but is disabled in the dispatch pending alignment with PROJ's variant handling.
-
-### Changes to `_crs_utils.py`
-
-`_resolve_crs()` gets a two-tier resolution strategy:
-
-```
-_resolve_crs(input):
-    1. Try our CRS(input)
-       - int -> direct EPSG lookup
-       - "EPSG:XXXX" string -> parse and lookup
-       - WKT string -> regex for AUTHORITY["EPSG","XXXX"], then lookup
-       - existing CRS (ours or pyproj) -> pass through
-    2. If step 1 raises ValueError (code not in table):
-       -> fall back to pyproj.CRS(input)
-       -> if pyproj not installed, raise ImportError
-```
-
-New helper `_crs_from_wkt(wkt)` for the chunk functions that reconstruct CRS from WKT strings. Same two-tier logic: try our `CRS.from_wkt()`, fall back to `pyproj.CRS.from_wkt()`.
-
-Note: `_detect_source_crs()` calls `_resolve_crs()` and benefits automatically. The rioxarray fallback path (`raster.rio.crs`) always returns a pyproj CRS, which passes through unchanged.
-
-### Changes to `_grid.py`
-
-`_compute_output_grid()` currently creates a `pyproj.Transformer` to project ~845 boundary and interior sample points. New flow:
-
-1. Build the boundary/interior sample points (same as today).
-2. Call a new `_transform_points(src_crs, tgt_crs, xs, ys)` helper that accepts scatter points.
-3. That helper extracts the forward/inverse point-level projection functions from `_projections.py` (e.g. `_merc_fwd_point`, UTM kernels, etc.) based on the CRS pair, then applies them in a batch loop over the sample points. This reuses the existing projection math without needing a synthetic grid.
-4. If no lite fast path exists, fall back to `pyproj.Transformer` as before.
-
-Note: `_compute_output_grid` also reads `source_crs.is_geographic` (line 44) for coordinate clamping. The lite CRS must return the correct value for this property.
-
-### Changes to chunk functions
-
-`_reproject_chunk_numpy()` and `_reproject_chunk_cupy()` currently call `pyproj.CRS.from_wkt()` to reconstruct CRS objects. Changed to use `_crs_from_wkt()` which tries our `CRS.from_wkt()` first.
-
-`_reproject_chunk_cupy()` also creates a `pyproj.Transformer` unconditionally before checking the CUDA fast path. This must be restructured to defer Transformer creation to after the CUDA fast path check, matching the pattern in `_reproject_chunk_numpy()`.
-
-`_source_footprint_in_target()` (used by `merge()`) also constructs `pyproj.CRS` and Transformer objects. This function needs the same two-tier CRS resolution and Numba-based point transform treatment.
-
-When the Numba/CUDA fast path handles the transform, pyproj is never imported.
-
-### What still requires pyproj
-
-- CRS pairs without Numba fast paths (per-chunk Transformer fallback)
-- WKT strings without an AUTHORITY/EPSG tag
-- PROJ4 dict input, custom CRS definitions
-- Generic Transverse Mercator / State Plane (dispatches via `.to_dict()['proj']`, not EPSG code)
-- Sinusoidal (no standard EPSG code)
-- Vertical datum transforms (`_vertical.py` and inline geoid code at `__init__.py:725-728`)
-- ITRF frame transforms (`_itrf.py`)
-- GeoTIFF CRS utilities (`from_user_input`)
-
-All of these paths already have `_require_pyproj()` guards.
-
-### Error messages
-
-When pyproj is not installed and the user hits a code path that needs it:
-
-```
-pyproj is required for CRS "EPSG:9999" (not in the built-in table).
-Install it with:  pip install pyproj
-or:  pip install xarray-spatial[reproject]
-```
-
-### Testing
-
-- Unit tests for `CRS` construction, all methods, equality, hashing
-- Round-trip: `CRS(epsg).to_wkt()` -> `CRS.from_wkt(wkt)` -> same object
-- Parameter correctness: compare `.to_dict()` output against `pyproj.CRS` for every embedded code
-- Integration: full reprojection without pyproj on path (mock pyproj as missing)
-- Edge cases: unknown EPSG falls back to pyproj, WKT without AUTHORITY tag falls back
-
-### Files touched
-
-| File | Change |
-|---|---|
-| `xrspatial/reproject/_lite_crs.py` | New -- `CRS` class + EPSG table |
-| `xrspatial/reproject/_crs_utils.py` | Two-tier resolution, `_crs_from_wkt()` helper |
-| `xrspatial/reproject/__init__.py` | Use `_crs_from_wkt()` in chunk functions; restructure cupy chunk; update `_source_footprint_in_target` |
-| `xrspatial/reproject/_grid.py` | Numba-based `_transform_points` for boundary transform |
-| `xrspatial/tests/test_lite_crs.py` | New -- unit + integration tests |
-
-### Out of scope
-
-- WKT parser (complex grammar, not worth reimplementing)
-- Non-EPSG CRS definitions
-- Datum transformations beyond what Helmert already handles
-- Changes to the GeoTIFF module's pyproj usage
-- Oblique Mercator Hotine (kernel disabled pending PROJ alignment)
-- Embedding all State Plane / national grid EPSG codes
diff --git a/docs/superpowers/specs/2026-03-24-dask-graph-utilities-design.md b/docs/superpowers/specs/2026-03-24-dask-graph-utilities-design.md
deleted file mode 100644
index b28a6ee3..00000000
--- a/docs/superpowers/specs/2026-03-24-dask-graph-utilities-design.md
+++ /dev/null
@@ -1,314 +0,0 @@
-# Dask Graph Utilities: fused_overlap and multi_overlap
-
-**Date:** 2026-03-24
-**Status:** Draft
-**Issue:** TBD (to be created during implementation)
-
-## Problem
-
-Several xrspatial operations run multiple `map_overlap` passes over the same data when a single pass would produce the same result with a smaller dask graph:
-
-- **Morphological opening/closing** chains erode + dilate as two separate `map_overlap` calls (2 blockwise layers).
-- **Flow direction MFD** runs the same 3x3 kernel 8 times to extract 8 output bands, then stacks them (8 blockwise layers + 1 stack).
-- **GLCM texture** does the same per-metric extraction pattern.
-
-Each extra `map_overlap` call adds a blockwise layer to the dask graph. For large rasters with many chunks, this inflates task counts and scheduler overhead.
-
-**Not in scope:** Iterative operations like diffusion (N steps of depth-1) are a poor fit for fusion because `total_depth = N`, and for large N the overlap region dominates chunk data. Those are better handled by the existing iterative approach.
-
-## Solution
-
-Two new utilities in `xrspatial/utils.py`, both exposed on the DataArray `.xrs` accessor.
-
----
-
-### 1. `fused_overlap`
-
-Fuses a sequence of overlap operations into a single `map_overlap` call with combined depth.
-
-**Stage function contract:** Each stage function takes a padded array and returns the **unpadded interior result only**. That is, given input of shape `(H + 2*dy, W + 2*dx)`, the function returns shape `(H, W)`. This is different from `map_overlap`'s built-in convention (same-shape return). Existing chunk functions that follow the same-shape convention need a one-line adapter: `lambda chunk: func(chunk)[dy:-dy, dx:-dx]` (where `dy`, `dx` are the per-axis depths).
-
-**Boundary restriction:** Only `boundary='nan'` is supported. For non-NaN boundary modes (`nearest`, `reflect`, `wrap`), the fused result would differ from sequential execution at chunk/array edges because boundary fill happens once on the original data rather than after each stage. Restricting to NaN avoids this correctness gap. NaN boundaries cover the vast majority of spatial raster operations in this codebase.
-
-**Signature:**
-
-```python
-def fused_overlap(agg, *stages, boundary='nan'):
-    """Run multiple overlap operations in a single map_overlap call.
-
-    Parameters
-    ----------
-    agg : xr.DataArray
-        Input raster.  If not dask-backed, stages are applied
-        sequentially with numpy/cupy padding.
-    *stages : tuple of (func, depth)
-        Each stage is a ``(callable, depth)`` pair.  ``func`` takes a
-        padded numpy/cupy array of shape ``(H + 2*d, W + 2*d)`` and
-        returns the interior result of shape ``(H, W)``.  ``depth``
-        is an int or tuple of ints (per-axis overlap).
-    boundary : str
-        Must be ``'nan'``.
-
-    Returns
-    -------
-    xr.DataArray
-        Result of applying all stages in sequence.
-
-    Raises
-    ------
-    ValueError
-        If no stages are provided, boundary is not 'nan', or any
-        chunk dimension is smaller than total_depth.
-    """
-```
-
-**Usage:**
-
-```python
-from xrspatial.utils import fused_overlap
-
-# morphological opening in one pass instead of two
-result = fused_overlap(
-    data,
-    (erode_interior, (1, 1)),
-    (dilate_interior, (1, 1)),
-    boundary='nan',
-)
-
-# via accessor
-result = data.xrs.fused_overlap(
-    (erode_interior, (1, 1)),
-    (dilate_interior, (1, 1)),
-    boundary='nan',
-)
-```
-
-**How it works (dask path):**
-
-1. Normalize each stage's depth to a per-axis dict via `_normalize_depth`.
-2. Compute `total_depth` by summing depths across all stages.
-3. Validate that every chunk dimension exceeds `total_depth`.
-4. Build a wrapper function that operates on the chunk padded with `total_depth`:
-   - Stage 0 receives the full `(H + 2*T, W + 2*T)` block, returns `(H + 2*(T - d0), W + 2*(T - d0))` interior.
-   - Stage 1 receives that `(H + 2*(T - d0), W + 2*(T - d0))` block (which has exactly `T - d0` cells of valid overlap remaining). It returns `(H + 2*(T - d0 - d1), W + 2*(T - d0 - d1))`.
-   - This continues until the final stage, which has exactly `d_last` overlap and returns `(H, W)`.
-   - The wrapper then re-pads the `(H, W)` result back to `(H + 2*T, W + 2*T)` using NaN fill so that `map_overlap` can crop its expected `total_depth`.
-5. Call `data.map_overlap(wrapper, depth=total_depth, boundary=np.nan)` once.
-
-**Worked example (two stages, each depth 1, total_depth 2):**
-
-```
-map_overlap gives wrapper a chunk of shape (H+4, W+4).
-
-Stage 0: receives (H+4, W+4), returns interior (H+2, W+2).
-  - The (H+2, W+2) block has 1 cell of valid overlap on each side.
-
-Stage 1: receives (H+2, W+2), returns interior (H, W).
-
-Wrapper: pads (H, W) back to (H+4, W+4) with NaN fill.
-map_overlap crops total_depth=2 from each side -> final (H, W). Correct.
-```
-
-**Non-dask fallback:** For each stage in sequence: pad with `np.pad(..., mode='constant', constant_values=np.nan)` (or `cupy.pad` for cupy arrays), apply `func`, take interior. Note: the existing `_pad_array` helper does not support `boundary='nan'` directly, so the implementation must use `np.pad` with constant NaN fill for this path.
-
-**Result:** N stages produce 1 blockwise layer instead of N.
-
----
-
-### 2. `multi_overlap`
-
-Runs a multi-output kernel via a single overlap + map_blocks call.
-
-**Signature:**
-
-```python
-def multi_overlap(agg, func, n_outputs, depth, boundary='nan', dtype=None):
-    """Run a multi-output kernel via a single map_overlap call.
-
-    Parameters
-    ----------
-    agg : xr.DataArray
-        2-D input raster.
-    func : callable
-        Takes a padded numpy/cupy chunk of shape
-        ``(H + 2*dy, W + 2*dx)`` and returns an array of shape
-        ``(n_outputs, H, W)`` -- the interior result per output band.
-    n_outputs : int
-        Number of output bands (must be >= 1).
-    depth : int or tuple of int
-        Per-axis overlap.  Must be >= 1 on each spatial axis.
-    boundary : str
-        Boundary mode: 'nan', 'nearest', 'reflect', or 'wrap'.
-    dtype : numpy dtype, optional
-        Output dtype.  If None, uses the input dtype.
-
-    Returns
-    -------
-    xr.DataArray
-        3-D DataArray of shape ``(n_outputs, H, W)`` with a leading
-        ``band`` dimension.
-
-    Raises
-    ------
-    ValueError
-        If n_outputs < 1, depth < 1, input is not 2-D, or any chunk
-        dimension is smaller than depth.
-    """
-```
-
-**Usage:**
-
-```python
-from xrspatial.utils import multi_overlap
-
-# flow direction MFD: one pass instead of 8
-bands = multi_overlap(data, mfd_kernel, n_outputs=8, depth=(1, 1))
-
-# via accessor
-bands = data.xrs.multi_overlap(mfd_kernel, n_outputs=8, depth=(1, 1))
-```
-
-**How it works (dask path):**
-
-```python
-import dask.array.overlap as _overlap
-
-def multi_overlap(agg, func, n_outputs, depth, boundary='nan', dtype=None):
-    depth_dict = _normalize_depth(depth, agg.ndim)
-    boundary_val = _boundary_to_dask(boundary)
-    dtype = dtype or agg.dtype
-
-    # Validate depth >= 1 on each axis
-    for ax, d in depth_dict.items():
-        if d < 1:
-            raise ValueError(f"depth must be >= 1, got {d} on axis {ax}")
-
-    # Validate chunk sizes exceed depth
-    for ax, d in depth_dict.items():
-        for cs in agg.chunks[ax]:
-            if cs < d:
-                raise ValueError(
-                    f"Chunk size {cs} on axis {ax} is smaller than "
-                    f"depth {d}. Rechunk first."
-                )
-
-    # Step 1: pad the dask array with overlap
-    padded = _overlap.overlap(agg.data, depth=depth_dict, boundary=boundary_val)
-
-    # Step 2: map_blocks with new output axis
-    def _wrapper(block):
-        # func returns (n_outputs, H, W) from padded (H+2dy, W+2dx)
-        return func(block)
-
-    out = da.map_blocks(
-        _wrapper,
-        padded,
-        dtype=dtype,
-        new_axis=0,
-        chunks=((n_outputs,),) + agg.data.chunks,
-    )
-
-    # Step 3: wrap in DataArray with band dimension
-    result = xr.DataArray(
-        out,
-        dims=['band'] + list(agg.dims),
-        coords=agg.coords,
-        attrs=agg.attrs,
-    )
-    return result
-```
-
-This produces 1 overlap layer + 1 map_blocks layer = 2 layers total, versus N+1 for the current N-separate-calls + stack approach.
-
-**Non-dask fallback:** Pad the numpy/cupy array (using `_pad_array` for non-NaN boundaries, or `np.pad`/`cupy.pad` with `constant_values=np.nan` for NaN boundary), call `func` (returns `(n_outputs, H, W)`), wrap in DataArray.
-
----
-
-## Helpers
-
-### `_normalize_depth(depth, ndim)`
-
-Accepts `int`, `tuple`, or `dict` and returns a dict `{axis: int}` for all axes. Follows dask's conventions:
-
-- `int` -> same depth on all axes
-- `tuple` -> one depth per axis
-- `dict` -> passed through, validated that all axes `0..ndim-1` are present, all values are non-negative ints, and no extra axes exist
-
----
-
-## Accessor integration
-
-Both functions go on `XrsSpatialDataArrayAccessor` only. Not on the Dataset accessor -- these are chunk-level operations that don't generalize to "apply to every variable."
-
-```python
-# DataArray accessor
-def fused_overlap(self, *stages, **kwargs):
-    from .utils import fused_overlap
-    return fused_overlap(self._obj, *stages, **kwargs)
-
-def multi_overlap(self, func, n_outputs, **kwargs):
-    from .utils import multi_overlap
-    return multi_overlap(self._obj, func, n_outputs, **kwargs)
-```
-
-## Backend support
-
-- **numpy:** Direct application with `_pad_array` for overlap simulation.
-- **dask+numpy:** Primary target. One `map_overlap` or `overlap` + `map_blocks` call.
-- **cupy:** Works if the user's `func` handles cupy arrays. `_pad_array` already supports cupy.
-- **dask+cupy:** Same as dask+numpy, with `is_cupy=True` passed to `_boundary_to_dask`.
-
-No `ArrayTypeFunctionMapping` needed. These are dask wrappers, not spatial operations.
-
-## What this does NOT include
-
-- **Non-NaN boundaries for `fused_overlap`.** Sequential boundary fill between stages gives different results than a single outer fill. NaN is the only mode where fusion is equivalent to sequential execution.
-- **Diffusion / high-iteration fusion.** When `total_depth = N` for large N, the overlap dominates chunk data. The existing iterative approach is better for those cases. Practical limit: 2-5 stages.
-- **Auto-rechunk between stages.** Separate concern -- `rechunk_no_shuffle` exists for that.
-- **Dataset accessor methods.** These are per-array operations.
-- **Refactoring existing call sites** (MFD, GLCM, morphology) to use the new utilities. That's follow-up work after the utilities ship.
-
-## File changes
-
-| File | Change |
-|------|--------|
-| `xrspatial/utils.py` | Add `fused_overlap`, `multi_overlap`, `_normalize_depth` helper |
-| `xrspatial/accessor.py` | Add `fused_overlap`, `multi_overlap` to DataArray accessor |
-| `xrspatial/__init__.py` | Export both functions |
-| `xrspatial/tests/test_fused_overlap.py` | New test file |
-| `xrspatial/tests/test_multi_overlap.py` | New test file |
-| `xrspatial/tests/test_accessor.py` | Add to expected methods list |
-| `docs/source/reference/utilities.rst` | Add API entries |
-| `README.md` | Add rows to Utilities table |
-| `examples/user_guide/37_Fused_Overlap.ipynb` | New notebook |
-
-## Testing strategy
-
-**fused_overlap:**
-- Single stage produces same result as plain `map_overlap`
-- Two stages (erode + dilate) matches sequential `map_overlap` calls
-- Three+ stages work correctly
-- Depth accumulation is correct (depth 1+1 = 2 total overlap)
-- Non-square depth (e.g. `(2, 1)`) works
-- Small chunks (barely larger than total_depth) produce correct results
-- Rejects non-NaN boundary modes with clear error
-- Rejects chunks smaller than total_depth with clear error
-- Non-dask fallback (numpy) works
-- Non-dask fallback (cupy) works
-- Accessor delegates correctly
-- Input validation (empty stages, non-DataArray, etc.)
-
-**multi_overlap:**
-- Single output matches plain `map_overlap`
-- N outputs match N separate `map_overlap` calls + `da.stack`
-- Output is an xr.DataArray with `band` leading dimension
-- Output shape is `(n_outputs, H, W)`
-- Values are identical to the sequential approach
-- dtype inference works (None -> input dtype)
-- Explicit dtype parameter is respected
-- Rejects n_outputs < 1, depth < 1, non-2D input
-- Rejects chunks smaller than depth
-- Non-dask fallback (numpy) works
-- Non-dask fallback (cupy) works
-- Accessor delegates correctly
-- Coords and attrs are preserved
diff --git a/docs/superpowers/specs/2026-03-24-hypsometric-integral-design.md b/docs/superpowers/specs/2026-03-24-hypsometric-integral-design.md
deleted file mode 100644
index d6c6350b..00000000
--- a/docs/superpowers/specs/2026-03-24-hypsometric-integral-design.md
+++ /dev/null
@@ -1,120 +0,0 @@
-# Hypsometric Integral — Design Spec
-
-## Summary
-
-Add a `hypsometric_integral` function to `xrspatial/zonal.py` that computes
-the hypsometric integral (HI) per zone and returns a painted-back raster.
-
-HI is a geomorphic maturity indicator defined as:
-
-```
-HI = (mean - min) / (max - min)
-```
-
-where mean, min, and max are the elevation values within a zone (basin,
-catchment, or arbitrary polygon).
-
-## API
-
-```python
-def hypsometric_integral(
-    zones,
-    values,
-    nodata=0,
-    column=None,
-    rasterize_kw=None,
-    name='hypsometric_integral',
-) -> xr.DataArray:
-```
-
-### Parameters
-
-| Parameter | Type | Description |
-|-----------|------|-------------|
-| `zones` | `DataArray`, `GeoDataFrame`, or list of `(geometry, value)` pairs | 2D integer zone IDs. Vector inputs are rasterized via `_maybe_rasterize_zones` using `values` as the template grid. |
-| `values` | `DataArray` | 2D elevation raster (float), same shape as zones. |
-| `nodata` | `int`, default `0` | Zone ID that represents "no zone". Cells with this zone ID are excluded from computation and filled with `NaN` in the output. Matches `apply()` convention. Set to `None` to include all zone IDs. |
-| `column` | `str` or `None` | Column in a GeoDataFrame containing zone IDs. Required when `zones` is a GeoDataFrame. |
-| `rasterize_kw` | `dict` or `None` | Extra keyword arguments passed to `rasterize()` when vector zones are provided. |
-| `name` | `str` | Name for the output DataArray. Default `'hypsometric_integral'`. |
-
-### Returns
-
-`xr.DataArray` (dtype `float64`) — same shape, dims, coords, and attrs as
-`values`. Each cell contains the HI of its zone. Cells belonging to the
-`nodata` zone or with non-finite elevation values get `NaN`. Zones with zero
-elevation range (flat) also get `NaN`.
-
-## Placement
-
-Lives in `xrspatial/zonal.py` alongside `stats`, `crosstab`, `apply`, etc.
-
-Rationale: HI requires a zones raster + a values raster (same signature as
-other zonal functions) and computes a per-zone aggregate statistic. It is
-structurally a zonal operation, not a local neighborhood transform.
-
-## Backends
-
-All four backends via `ArrayTypeFunctionMapping`:
-
-- **numpy**: use `_sort_and_stride` to group values by zone, compute
-  min/mean/max per zone, build a zone-to-HI lookup, paint back with
-  vectorized indexing.
-- **cupy**: same logic using `cupy.unique` and device-side scatter/gather.
-- **dask+numpy**: compute per-chunk partial aggregates (min, max, sum, count
-  per zone) via `map_blocks`, reduce across chunks to get global per-zone
-  stats, then `map_blocks` again to paint HI values back using the global
-  lookup. Zones and values chunks must be aligned (use `validate_arrays`).
-- **dask+cupy**: same two-pass structure. Follows the existing pattern where
-  chunk functions use cupy internally (same as `_stats_dask_cupy`).
-
-## Algorithm
-
-1. Validate inputs (2D, matching shapes via `validate_arrays`).
-2. Rasterize vector zones if needed (`_maybe_rasterize_zones`).
-3. Identify unique zone IDs, excluding `nodata` zone and NaN.
-4. For each zone `z`:
-   - Mask: cells where `zones == z` and `values` is finite.
-   - Compute `min_z`, `mean_z`, `max_z`.
-   - `hi_z = (mean_z - min_z) / (max_z - min_z)` if `max_z != min_z`,
-     else `NaN`.
-5. Paint `hi_z` back into all cells belonging to zone `z`.
-6. Fill remaining cells (nodata zone, non-finite values, flat zones) with
-   `NaN`.
-
-### Value nodata handling
-
-Only non-finite values (`NaN`, `inf`) are excluded from per-zone statistics.
-Users with sentinel nodata values (e.g., -9999) should mask their DEM before
-calling this function. This matches the convention used by `apply()`.
-
-## Accessor
-
-Expose via the existing `.xrs` accessor:
-
-```python
-elevation.xrs.zonal_hypsometric_integral(zones)
-```
-
-Following the `zonal_` prefix convention used by `zonal_stats`, `zonal_apply`,
-and `zonal_crosstab`.
-
-## Tests
-
-- **Hand-crafted case**: zones with known elevation distributions and
-  pre-computed HI values.
-- **Edge cases**: single-cell zones, flat zones (range=0 returns NaN),
-  NaN cells within a zone (ignored in computation), zones with all-NaN values.
-- **Cross-backend parity**: standard `general_checks` pattern comparing
-  numpy, cupy, dask+numpy, dask+cupy outputs.
-- **Vector zones input**: verify GeoDataFrame and list-of-pairs rasterization
-  paths work.
-
-## Scope
-
-This is intentionally minimal. Future extensions (not in this iteration):
-- Hypsometric curve data (normalized area-altitude distribution)
-- Per-zone summary table output
-- `zone_ids` parameter to restrict computation to a subset of zones
-- Skewness / kurtosis of the hypsometric distribution
-- Integration as a stat option in `zonal.stats()`
diff --git a/docs/superpowers/specs/2026-03-30-geotiff-perf-controls-design.md b/docs/superpowers/specs/2026-03-30-geotiff-perf-controls-design.md
deleted file mode 100644
index 761d5f55..00000000
--- a/docs/superpowers/specs/2026-03-30-geotiff-perf-controls-design.md
+++ /dev/null
@@ -1,147 +0,0 @@
-# GeoTIFF Performance and Memory Controls
-
-Adds three parameters to `open_geotiff` and `to_geotiff` that let callers
-control memory usage, compression speed, and large-raster write strategy.
-All three are opt-in; default behaviour is unchanged.
-
-## 1. `dtype` parameter on `open_geotiff`
-
-### API
-
-```python
-open_geotiff(source, *, dtype=None, ...)
-```
-
-`dtype` accepts any numpy dtype string or object (`np.float32`, `'float32'`,
-etc.). `None` preserves the file's native dtype (current behaviour).
-
-### Read paths
-
-| Path | Behaviour |
-|------|-----------|
-| Eager (numpy) | Output array allocated at target dtype. Each decoded tile/strip cast before copy-in. Peak overhead: one tile at native dtype. |
-| Dask | Each delayed chunk function casts after decode. Output chunks are target dtype. Same per-tile overhead. |
-| GPU (CuPy) | Cast on device after decode. |
-| Dask + CuPy | Combination of dask and GPU paths. |
-
-### Numba LZW fast path
-
-The LZW decoder is a numba JIT function that emits values one at a time into a
-byte buffer. A variant will decode each value and cast inline to the target
-dtype so the per-tile buffer is never allocated at native dtype. Other codecs
-(deflate, zstd) return byte buffers from C libraries where per-value
-interception isn't possible, so those fall back to the tile-level cast.
-
-### Validation
-
-- Narrowing float casts (float64 to float32): allowed.
-- Narrowing int casts (int64 to int16): allowed (user asked for it explicitly).
-- Widening casts (float32 to float64, uint8 to int32): allowed.
-- Float to int: `ValueError` (lossy in a way users often don't intend).
-- Unsupported casts (e.g. complex128 to uint8): `ValueError`.
-
-## 2. `compression_level` parameter on `to_geotiff`
-
-### API
-
-```python
-to_geotiff(data, path, *, compression='zstd', compression_level=None, ...)
-```
-
-`compression_level` is `int | None`. `None` uses the codec's existing default.
-
-### Ranges
-
-| Codec | Range | Default | Direction |
-|-------|-------|---------|-----------|
-| deflate | 1 -- 9 | 6 | 1 = fastest, 9 = smallest |
-| zstd | 1 -- 22 | 3 | 1 = fastest, 22 = smallest |
-| lz4 | 0 -- 16 | 0 | 0 = fastest |
-| lzw | n/a | n/a | No level support; ignored silently |
-| jpeg | n/a | n/a | Quality is a separate axis; ignored |
-| packbits | n/a | n/a | Ignored |
-| none | n/a | n/a | Ignored |
-
-### Plumbing
-
-`to_geotiff` passes `compression_level` to `write()`, which passes it to
-`compress()`. The internal `compress()` already accepts a `level` argument; we
-just thread it through the two intermediate call sites that currently hardcode
-it.
-
-### Validation
-
-- Out-of-range level for a codec that supports levels: `ValueError`.
-- Level set for a codec without level support: silently ignored.
-
-### GPU path
-
-`write_geotiff_gpu` also accepts and forwards the level to nvCOMP batch
-compression, which supports levels for zstd and deflate.
-
-## 3. VRT output from `to_geotiff` via `.vrt` extension
-
-### Trigger
-
-When `path` ends in `.vrt`, `to_geotiff` writes a tiled VRT instead of a
-monolithic TIFF. No new parameter needed.
-
-### Output layout
-
-```
-output.vrt
-output_tiles/
-  tile_0000_0000.tif   # row_col, zero-padded
-  tile_0000_0001.tif
-  ...
-```
-
-Directory name derived from the VRT stem (`foo.vrt` -> `foo_tiles/`).
-Zero-padding width scales to the grid dimensions.
-
-### Behaviour per input type
-
-| Input | Tiling strategy | Memory profile |
-|-------|----------------|----------------|
-| Dask DataArray | One tile per dask chunk. Each task computes its chunk and writes one `.tif`. | One chunk in RAM at a time (scheduler controlled). |
-| Dask + CuPy | Same, GPU compress per tile. | One chunk in GPU memory at a time. |
-| Numpy / ndarray | Slice into `tile_size`-sized pieces, write each. | Source array already in RAM; tile slices are views (no duplication). |
-| CuPy | Same as numpy but GPU compress. | Source on GPU; tiles are views. |
-
-### Per-tile properties
-
-- Same `compression`, `compression_level`, `predictor`, `nodata`, `crs` as the
-  parent call.
-- `tiled=True` with the caller's `tile_size` (internal TIFF tiling within each
-  chunk-file).
-- GeoTransform adjusted to each tile's spatial position (row/col offset from
-  the full raster origin).
-- No COG overviews on individual tiles.
-
-### VRT generation
-
-After all tiles are written, call `write_vrt()` with relative paths. The VRT
-XML references each tile by its spatial extent and band mapping.
-
-### Edge cases and validation
-
-- `cog=True` with a `.vrt` path: `ValueError` (mutually exclusive).
-- Tiles directory exists and is non-empty: `FileExistsError` to prevent silent
-  overwrites.
-- Tiles directory doesn't exist: created automatically.
-- `overview_levels` with `.vrt` path: `ValueError` (overviews don't apply).
-
-### Dask scheduling
-
-For dask inputs, all delayed tile-write tasks are submitted to
-`dask.compute()` at once. The scheduler manages parallelism and memory. Each
-task is: compute chunk, compress, write tile file. No coordination between
-tasks.
-
-## Out of scope
-
-- Streaming write of a monolithic `.tif` from dask input (tracked as a separate
-  issue). Users who need a single file from a large dask array can write to VRT
-  and convert externally, or ensure sufficient RAM.
-- JPEG quality parameter (separate concern from compression level).
-- Automatic chunk-size recommendation based on available memory.
diff --git a/docs/superpowers/specs/2026-03-31-sweep-performance-design.md b/docs/superpowers/specs/2026-03-31-sweep-performance-design.md
deleted file mode 100644
index d544e0af..00000000
--- a/docs/superpowers/specs/2026-03-31-sweep-performance-design.md
+++ /dev/null
@@ -1,368 +0,0 @@
-# Sweep-Performance: Parallel Performance Triage and Fix Workflow
-
-**Date:** 2026-03-31
-**Status:** Draft
-
-## Overview
-
-A `/sweep-performance` slash command that audits every xrspatial module for
-performance bottlenecks, OOM risk under large-scale dask workloads, and
-backend-specific anti-patterns. Uses parallel subagents for fast static triage,
-then a sequential ralph-loop to benchmark and fix confirmed HIGH-severity
-issues.
-
-The central question for every dask backend: "If the data on disk was 30TB
-and the machine only had 16GB of RAM, would this tool cause an out-of-memory
-error?"
-
-## Scope
-
-All `.py` modules under `xrspatial/` plus the `geotiff/` and `reproject/`
-subpackages. Excludes `__init__.py`, `_version.py`, `__main__.py`, `utils.py`,
-`accessor.py`, `preview.py`, `dataset_support.py`, `diagnostics.py`,
-`analytics.py`.
-
-## Architecture
-
-Two phases in a single invocation:
-
-```
-/sweep-performance
-    |
-    +-- Phase 1: Parallel Static Triage
-    |   |-- Score & rank modules (git metadata + complexity heuristics)
-    |   |-- Dispatch one subagent per module
-    |   |   |-- Static analysis (dask, GPU, memory, Numba patterns)
-    |   |   |-- 30TB/16GB OOM simulation (task graph construction, no compute)
-    |   |   +-- Return structured JSON findings
-    |   |-- Merge results into ranked report
-    |   +-- Update state file
-    |
-    +-- Phase 2: Ralph-Loop (HIGH severity only)
-        |-- Generate /ralph-loop command targeting HIGH modules
-        |-- Each iteration:
-        |   |-- Real benchmarks (wall time, tracemalloc, RSS, CuPy pool)
-        |   |-- Confirm finding is not false positive
-        |   |-- /rockout to fix
-        |   |-- Post-fix benchmark comparison
-        |   +-- Update state file
-        +-- User pastes command to start
-```
-
----
-
-## Phase 1: Module Scoring
-
-For every module in scope, collect via git:
-
-| Field              | Source                                                    |
-|--------------------|-----------------------------------------------------------|
-| `last_modified`    | `git log -1 --format=%aI -- <path>`                      |
-| `total_commits`    | `git log --oneline -- <path> \| wc -l`                   |
-| `loc`              | `wc -l < <path>`                                         |
-| `has_dask_backend` | grep for `_run_dask`, `map_overlap`, `map_blocks`         |
-| `has_cuda_backend` | grep for `@cuda.jit`, `import cupy`                       |
-| `is_io_module`     | module is in geotiff/ or reproject/                       |
-| `has_existing_bench` | matching file exists in `benchmarks/benchmarks/`        |
-
-### Scoring Formula
-
-```
-days_since_inspected = (today - last_perf_inspected).days   # 9999 if never
-days_since_modified  = (today - last_modified).days
-
-score = (days_since_inspected * 3)
-      + (loc * 0.1)
-      + (total_commits * 0.5)
-      + (has_dask_backend * 200)
-      + (has_cuda_backend * 150)
-      + (is_io_module * 300)
-      - (days_since_modified * 0.2)
-      - (has_existing_bench * 100)
-```
-
-Rationale:
-- Never-inspected modules dominate (9999 * 3 = ~30,000).
-- Dask and CUDA backends boosted: that is where OOM and perf bugs live.
-- I/O modules get the highest boost: most relevant for 30TB question.
-- Larger modules more likely to contain issues.
-- Existing ASV benchmarks slightly deprioritize (perf already considered).
-
----
-
-## Phase 1: Subagent Static Analysis
-
-One subagent per module. Each performs the checks below and returns a
-structured JSON blob.
-
-### Dask Path Analysis
-
-- `.values` on dask-backed DataArray (premature materialization) — **HIGH**
-- `.compute()` inside a loop — **HIGH**
-- `np.array()` / `np.asarray()` wrapping dask or CuPy array — **HIGH**
-- `da.stack()` without `.rechunk()` — **MEDIUM**
-- `map_overlap` with depth >= chunk_size / 4 — **MEDIUM**
-- Missing `boundary` argument in `map_overlap` — **MEDIUM**
-- Redundant computation (same function called twice on same input) — **MEDIUM**
-- Python loops over dask chunks (serializes the graph) — **MEDIUM**
-
-### 30TB / 16GB OOM Verdict
-
-Two-part analysis for each dask code path:
-
-**Part 1 — Static trace.** Follow the dask code path and answer: does peak
-memory scale with total array size, or with chunk size? If any step forces
-full materialization, verdict is WILL OOM.
-
-**Part 2 — Task graph simulation.** Write and execute a script that:
-
-```python
-import dask.array as da
-import xarray as xr
-
-# Use a representative grid (2560x2560, 10x10 = 100 chunks) to inspect
-# graph structure. The pattern is identical at any scale — what matters
-# is whether the graph fans out, materializes, or stays chunk-local.
-arr = da.zeros((2560, 2560), chunks=(256, 256), dtype='float64')
-raster = xr.DataArray(arr, dims=['y', 'x'])
-
-# Call the function lazily
-result = module_function(raster, **default_args)
-
-# Inspect the graph without executing
-graph = result.__dask_graph__()
-task_count = len(graph)
-tasks_per_chunk = task_count / 100  # normalize to per-chunk
-
-# Check for fan-out patterns or full-materialization nodes
-# Extrapolate to 30TB: ~57 million chunks at 256x256 float64
-# If tasks_per_chunk is constant => graph scales linearly => SAFE
-# If any node depends on all chunks => full materialization => WILL OOM
-```
-
-The script constructs the graph only, never calls `.compute()`. Reports:
-- Task count and tasks-per-chunk ratio
-- Estimated peak memory per chunk (MB)
-- Whether the graph contains fan-out or materialization nodes
-- Extrapolation to 30TB: linear graph growth (SAFE) vs fan-out (WILL OOM)
-
-**Verdict**: `SAFE`, `RISKY` (bounded but tight), or `WILL OOM` (unbounded
-or materializes).
-
-### GPU Transfer Analysis
-
-- `.data.get()` followed by CuPy ops (GPU-CPU-GPU round-trip) — **HIGH**
-- `cupy.asarray()` inside a hot loop — **HIGH**
-- Mixing NumPy/CuPy ops without reason — **MEDIUM**
-- Register pressure: >20 float64 locals in `@cuda.jit` kernel — **MEDIUM**
-- Thread blocks >16x16 on register-heavy kernels — **MEDIUM**
-
-### Memory Allocation Patterns
-
-- Unnecessary `.copy()` on arrays never mutated — **MEDIUM**
-- `np.zeros_like()` + fill loop (could be `np.empty()`) — **LOW**
-- Large temporary arrays that could be fused into the kernel — **MEDIUM**
-
-### Numba Anti-Patterns
-
-- Missing `@ngjit` on nested for-loops over `.data` arrays — **MEDIUM**
-- `@jit` without `nopython=True` (object-mode fallback risk) — **MEDIUM**
-- Type instability (int/float mixing in Numba functions) — **LOW**
-- Column-major iteration on row-major arrays (cache-unfriendly) — **LOW**
-
-### Bottleneck Classification
-
-Based on static analysis, classify the module as one of:
-- **IO-bound** — dominated by disk reads/writes or serialization
-- **Memory-bound** — peak allocation is the limiting factor
-- **Compute-bound** — CPU/GPU time dominates, memory is fine
-- **Graph-bound** — dask task graph overhead dominates (too many small tasks)
-
-### Subagent Output Schema
-
-```json
-{
-  "module": "slope",
-  "files_read": ["xrspatial/slope.py"],
-  "findings": [
-    {
-      "severity": "HIGH",
-      "category": "dask_materialization",
-      "file": "slope.py",
-      "line": 142,
-      "description": ".values on dask input in _run_dask",
-      "fix": "Use .data.compute() or restructure to stay lazy",
-      "backends_affected": ["dask+numpy", "dask+cupy"]
-    }
-  ],
-  "oom_verdict": {
-    "dask_numpy": "SAFE",
-    "dask_cupy": "SAFE",
-    "reasoning": "map_overlap with depth=1, memory bounded by chunk size",
-    "estimated_peak_per_chunk_mb": 0.5,
-    "task_count": 3721,
-    "graph_simulation_ran": true
-  },
-  "bottleneck": "compute-bound",
-  "bottleneck_reasoning": "3x3 kernel with Numba JIT, no I/O, small overlap"
-}
-```
-
----
-
-## Phase 1: Merged Report
-
-After all subagents return, print a consolidated report.
-
-### Module Risk Ranking Table
-
-```
-| Rank | Module        | Score | OOM Verdict     | Bottleneck   | HIGH | MED | LOW |
-|------|---------------|-------|-----------------|--------------|------|-----|-----|
-| 1    | geotiff       | 31200 | WILL OOM (d+np) | IO-bound     | 3    | 1   | 0   |
-| 2    | viewshed      | 30050 | RISKY (d+np)    | memory-bound | 2    | 2   | 1   |
-| ...  | ...           | ...   | ...             | ...          | ...  | ... | ... |
-```
-
-### 30TB / 16GB Verdict Summary
-
-Grouped by verdict:
-
-- **WILL OOM (fix required):** list modules with reasoning
-- **RISKY (bounded but tight):** list modules with reasoning
-- **SAFE (memory bounded by chunk size):** list modules
-
-### Detailed Findings
-
-Per-module table of all findings grouped by severity (file:line, pattern,
-description, fix).
-
-### Actionable Rockout Commands
-
-For each HIGH-severity finding, a ready-to-paste `/rockout` command.
-
-### State File Update
-
-Write `.claude/performance-sweep-state.json`:
-
-```json
-{
-  "last_triage": "2026-03-31T14:00:00Z",
-  "modules": {
-    "slope": {
-      "last_inspected": "2026-03-31T14:00:00Z",
-      "oom_verdict": "SAFE",
-      "bottleneck": "compute-bound",
-      "high_count": 0,
-      "issue": null
-    }
-  }
-}
-```
-
----
-
-## Phase 2: Ralph-Loop for HIGH Severity Fixes
-
-Collect all modules with at least one HIGH-severity finding. Generate a
-`/ralph-loop` command targeting them in priority order.
-
-### Each Iteration
-
-1. **Benchmark** the module on a moderate array (512x512 default) across all
-   available backends. Measure four metrics per backend per function:
-   - Wall time: `timeit.repeat(number=1, repeat=3)`, median
-   - Python memory: `tracemalloc.get_traced_memory()` peak
-   - Process memory: `resource.getrusage(RUSAGE_SELF).ru_maxrss` delta
-   - GPU memory (if CuPy): `cupy.get_default_memory_pool().used_bytes()` delta
-
-2. **Confirm the static finding** from Phase 1 is real. If the benchmark
-   shows the issue does not manifest (false positive), downgrade to MEDIUM
-   in the report and skip to next module.
-
-3. **Classify the bottleneck** with measured data:
-   - IO-bound: wall time dominated by read/write, low CPU
-   - Memory-bound: peak RSS much larger than expected for chunk size
-   - Compute-bound: CPU pegged, memory stable
-   - Graph-bound: dask task count extremely high, scheduler overhead visible
-
-4. **Run `/rockout`** to fix the confirmed issue (GitHub issue, worktree,
-   implementation, tests, docs).
-
-5. **Post-fix benchmark** — rerun the same benchmark. Report before/after
-   delta.
-
-6. **Update state** — record the fix in
-   `.claude/performance-sweep-state.json` with issue number.
-
-7. Output `<promise>ITERATION DONE</promise>`.
-
-### Generated Command Shape
-
-```
-/ralph-loop "Performance sweep Phase 2: benchmark and fix HIGH-severity findings.
-
-**Target modules in priority order:**
-1. geotiff (3 HIGH findings, WILL OOM) -- eager .values materialization
-2. cost_distance (1 HIGH finding, WILL OOM) -- iterative solver unbounded memory
-
-**For each module:**
-1. Write and run a benchmark script measuring wall time, peak memory
-   (tracemalloc + RSS + CuPy pool) across all available backends
-2. Confirm the HIGH finding from Phase 1 triage is real
-3. If confirmed: run /rockout to fix it end-to-end
-4. After rockout: rerun benchmark, report before/after delta
-5. Update .claude/performance-sweep-state.json
-6. Output <promise>ITERATION DONE</promise>
-
-If all targets addressed: <promise>ALL PERFORMANCE ISSUES FIXED</promise>."
---max-iterations {N+2} --completion-promise "ALL PERFORMANCE ISSUES FIXED"
-```
-
-### Reminder Text
-
-```
-Phase 1 triage complete. To proceed with fixes:
-  Copy the ralph-loop command above and paste it.
-
-Other options:
-  Fix one manually:    copy any /rockout command from the report above
-  Rerun triage only:   /sweep-performance --report-only
-  Skip Phase 1:        /sweep-performance --skip-phase1 (reuses last triage)
-  Reset all tracking:  /sweep-performance --reset-state
-```
-
----
-
-## Arguments
-
-| Argument           | Effect                                                     |
-|--------------------|------------------------------------------------------------|
-| `--top N`          | Limit Phase 1 subagents to top N scored modules (default: all) |
-| `--exclude m1,m2`  | Remove named modules from scope                           |
-| `--only-terrain`   | slope, aspect, curvature, terrain, terrain_metrics, hillshade, sky_view_factor |
-| `--only-focal`     | focal, convolution, morphology, bilateral, edge_detection, glcm |
-| `--only-hydro`     | flood, cost_distance, geodesic, surface_distance, viewshed, erosion, diffusion |
-| `--only-io`        | geotiff, reproject, rasterize, polygonize                  |
-| `--reset-state`    | Delete state file and start fresh                          |
-| `--skip-phase1`    | Reuse last triage state, go straight to ralph-loop generation |
-| `--report-only`    | Run Phase 1 only, no ralph-loop command                    |
-| `--size small`     | Benchmark at 128x128                                       |
-| `--size large`     | Benchmark at 2048x2048                                     |
-| `--high-only`      | Only report HIGH severity findings                         |
-
-Default (no arguments): audit all modules, benchmark at 512x512, generate
-ralph-loop for HIGH items.
-
----
-
-## General Rules
-
-- Phase 1 subagents do NOT modify source files. Read-only analysis.
-- Phase 2 ralph-loop modifies code only through `/rockout`.
-- Temporary benchmark scripts go in `/tmp/` with unique names.
-- Only flag patterns actually present in the code; no hypothetical issues.
-- Include exact file path and line number for every finding.
-- False positives are worse than missed issues.
-- The 30TB simulation constructs the dask graph only; it never calls `.compute()`.
-- State file (`.claude/performance-sweep-state.json`) is gitignored by convention.
diff --git a/docs/superpowers/specs/2026-04-01-multi-observer-viewshed-design.md b/docs/superpowers/specs/2026-04-01-multi-observer-viewshed-design.md
deleted file mode 100644
index 7ebea641..00000000
--- a/docs/superpowers/specs/2026-04-01-multi-observer-viewshed-design.md
+++ /dev/null
@@ -1,192 +0,0 @@
-# Multi-Observer Viewshed and Line-of-Sight Profiles
-
-**Issue:** xarray-contrib/xarray-spatial#1145  
-**Date:** 2026-04-01  
-**Module:** `xrspatial/visibility.py` (new)
-
-## Overview
-
-Add three public functions for multi-observer visibility analysis and
-point-to-point line-of-sight profiling. All functions build on the existing
-single-observer `viewshed()` rather than reimplementing the sweep algorithm.
-
-## Public API
-
-### `cumulative_viewshed`
-
-```python
-def cumulative_viewshed(
-    raster: xarray.DataArray,
-    observers: list[dict],
-    target_elev: float = 0,
-    max_distance: float | None = None,
-) -> xarray.DataArray:
-```
-
-**Parameters:**
-
-- `raster` -- elevation DataArray (any backend).
-- `observers` -- list of dicts. Required keys: `x`, `y`. Optional keys:
-  `observer_elev` (default 0), `target_elev` (overrides the function-level
-  default), `max_distance` (per-observer override).
-- `target_elev` -- default target elevation for observers that don't specify
-  their own.
-- `max_distance` -- default maximum analysis radius for observers that don't
-  specify their own.
-
-**Returns:** integer DataArray where each cell value is the count of observers
-with line-of-sight to that cell. Cells visible to zero observers are 0.
-
-**Algorithm:**
-
-1. For each observer, call `viewshed(raster, ...)` with that observer's
-   parameters.
-2. Convert the result to a binary mask (1 where value != INVISIBLE, else 0).
-3. Sum all masks element-wise.
-
-**Backend behaviour:**
-
-| Backend | Strategy |
-|---|---|
-| NumPy | Loop over observers, accumulate in-place into a numpy int32 array. |
-| CuPy | Delegates to `viewshed()` which handles CuPy dispatch. Accumulate on device. |
-| Dask+NumPy | Wrap each `viewshed()` call as a `dask.delayed` task, convert each result to a binary dask array, sum lazily. The graph is submitted once at the end. |
-| Dask+CuPy | Same as Dask+NumPy -- `viewshed()` handles the backend internally. |
-
-For dask backends, `max_distance` should be set (either globally or
-per-observer) to keep each viewshed computation tractable. Without it, each
-observer viewshed loads the full raster into memory.
-
-### `visibility_frequency`
-
-```python
-def visibility_frequency(
-    raster: xarray.DataArray,
-    observers: list[dict],
-    target_elev: float = 0,
-    max_distance: float | None = None,
-) -> xarray.DataArray:
-```
-
-Thin wrapper: returns `cumulative_viewshed(...) / len(observers)` cast to
-float64. Same parameters and backend behaviour.
-
-### `line_of_sight`
-
-```python
-def line_of_sight(
-    raster: xarray.DataArray,
-    x0: float, y0: float,
-    x1: float, y1: float,
-    observer_elev: float = 0,
-    target_elev: float = 0,
-    frequency_mhz: float | None = None,
-) -> xarray.Dataset:
-```
-
-**Parameters:**
-
-- `raster` -- elevation DataArray.
-- `x0, y0` -- observer coordinates in data space.
-- `x1, y1` -- target coordinates in data space.
-- `observer_elev` -- height above terrain at the observer point.
-- `target_elev` -- height above terrain at the target point.
-- `frequency_mhz` -- if set, compute first Fresnel zone clearance.
-
-**Returns:** `xarray.Dataset` with dimension `sample` (one entry per cell
-along the transect) containing:
-
-| Variable | Type | Description |
-|---|---|---|
-| `distance` | float64 | Distance from observer along the transect |
-| `elevation` | float64 | Terrain elevation at the sample point |
-| `los_height` | float64 | Height of the line-of-sight ray at that point |
-| `visible` | bool | Whether the cell is visible from the observer |
-| `x` | float64 | x-coordinate of the sample point |
-| `y` | float64 | y-coordinate of the sample point |
-| `fresnel_radius` | float64 | First Fresnel zone radius (only if `frequency_mhz` set) |
-| `fresnel_clear` | bool | Whether Fresnel zone is clear of terrain (only if `frequency_mhz` set) |
-
-**Algorithm:**
-
-1. Convert (x0, y0) and (x1, y1) to grid indices using the raster's
-   coordinate arrays.
-2. Walk the line between the two grid cells using Bresenham's algorithm to
-   get the sequence of (row, col) pairs.
-3. Extract terrain elevation at each cell. For dask/cupy rasters, pull only
-   the transect cells to numpy.
-4. Compute the straight-line LOS height at each sample point by linear
-   interpolation between observer and target heights (terrain + offsets).
-5. Walk forward from the observer tracking the maximum elevation angle seen
-   so far. A cell is visible if no prior cell has a higher angle.
-6. If `frequency_mhz` is set, compute the first Fresnel zone radius at each
-   point: `F1 = sqrt(d1 * d2 * c / (f * D))` where d1 and d2 are distances
-   from observer and target, D is total distance, f is frequency, and c is
-   the speed of light. A point has Fresnel clearance if the terrain is at
-   least F1 below the LOS height.
-
-**Backend behaviour:** Always runs on CPU. For CuPy-backed rasters, the
-transect elevations are copied to host. For dask-backed rasters, the transect
-slice is computed. The transect is at most `max(H, W)` cells long so this is
-always cheap.
-
-## Module structure
-
-```
-xrspatial/visibility.py
-    cumulative_viewshed()      -- public
-    visibility_frequency()     -- public
-    line_of_sight()            -- public
-    _bresenham_line()          -- private, returns list of (row, col) pairs
-    _extract_transect()        -- private, pulls elevation values from any backend
-    _fresnel_radius()          -- private, first Fresnel zone calculation
-```
-
-## Integration points
-
-- `xrspatial/__init__.py` -- add imports for all three functions.
-- `xrspatial/accessor.py` -- add accessor methods for all three functions.
-- `docs/source/reference/surface.rst` -- add a "Visibility Analysis" section
-  with autosummary entries.
-- `README.md` -- add rows for `cumulative_viewshed`, `visibility_frequency`,
-  and `line_of_sight` in the feature matrix.
-- `examples/user_guide/37_Visibility_Analysis.ipynb` -- new notebook.
-
-## Testing strategy
-
-Tests go in `xrspatial/tests/test_visibility.py`.
-
-**cumulative_viewshed / visibility_frequency:**
-
-- Flat terrain: all cells visible from all observers, count == n_observers.
-- Single tall wall: observers on opposite sides, verify cells behind the wall
-  are only visible to the observer on their side.
-- Single observer: result matches `(viewshed(...) != INVISIBLE).astype(int)`.
-- Per-observer parameters: verify that `observer_elev` and `max_distance`
-  overrides work.
-- Dask backend: verify result matches numpy backend.
-
-**line_of_sight:**
-
-- Flat terrain: all cells visible, LOS height matches linear interpolation.
-- Single obstruction: cells behind the peak are not visible.
-- Observer/target elevation offsets: verify LOS line shifts up.
-- Fresnel zone: known geometry where the zone is partially obstructed.
-- Edge case: observer == target (single cell, trivially visible).
-- Bresenham correctness: verify the line visits expected cells for known
-  endpoints.
-
-## Scope boundaries
-
-**In scope:** the three functions described above, tests, docs, notebook,
-README update.
-
-**Out of scope:**
-
-- Refactoring existing `viewshed.py` internals.
-- GPU-specific kernels for cumulative viewshed (composition via `viewshed()`
-  is sufficient).
-- Weighted observer contributions (each observer counts as 1).
-- Earth curvature correction for line-of-sight (the transect is typically
-  short enough that curvature is negligible; users working at very long
-  distances should use geodesic viewshed).
diff --git a/docs/superpowers/specs/2026-04-01-spatial-autocorrelation-design.md b/docs/superpowers/specs/2026-04-01-spatial-autocorrelation-design.md
deleted file mode 100644
index a34bb580..00000000
--- a/docs/superpowers/specs/2026-04-01-spatial-autocorrelation-design.md
+++ /dev/null
@@ -1,259 +0,0 @@
-# Spatial Autocorrelation: Moran's I and LISA
-
-**Issue:** #1135 (partial -- Global Moran's I, Local Moran's I, queen/rook contiguity)  
-**Date:** 2026-04-01
-
-## Scope
-
-This spec covers the first increment of #1135:
-
-- Global Moran's I with analytical inference
-- Local Moran's I (LISA) with permutation-based pseudo p-values
-- Queen and rook contiguity weights (3x3 kernel)
-
-Geary's C, join count statistics, and distance-band weights are deferred to follow-up issues.
-
-## Public API
-
-Two functions in `xrspatial/autocorrelation.py`:
-
-### `morans_i`
-
-```python
-def morans_i(
-    raster: xr.DataArray,
-    contiguity: str = 'queen',
-    boundary: str = 'nan',
-) -> xr.DataArray:
-```
-
-Returns a scalar (0-dimensional) DataArray. The `.item()` value is the I statistic.
-Attrs carry analytical inference results:
-
-| Attr | Type | Description |
-|------|------|-------------|
-| `expected_I` | float | -1/(N-1) |
-| `variance_I` | float | Cliff & Ord analytical variance |
-| `z_score` | float | (I - E[I]) / sqrt(Var[I]) |
-| `p_value` | float | Two-sided, from normal approximation |
-| `N` | int | Count of non-NaN pixels |
-| `S0` | float | Sum of all weights |
-| `contiguity` | str | 'queen' or 'rook' |
-
-### `lisa`
-
-```python
-def lisa(
-    raster: xr.DataArray,
-    contiguity: str = 'queen',
-    n_permutations: int = 999,
-    boundary: str = 'nan',
-) -> xr.Dataset:
-```
-
-Returns a Dataset with three DataVariables:
-
-| Variable | Dims | Dtype | Description |
-|----------|------|-------|-------------|
-| `lisa_values` | (y, x) | float32 | Local I_i per pixel |
-| `p_values` | (y, x) | float32 | Pseudo p-value from permutation |
-| `cluster` | (y, x) | int8 | 0=NS, 1=HH, 2=LL, 3=HL, 4=LH |
-
-Dataset attrs: `n_permutations`, `contiguity`, `global_morans_I`.
-
-Cluster codes use significance threshold p <= 0.05. Pixels with p > 0.05 get code 0 regardless of their quadrant.
-
-## Mathematics
-
-### Global Moran's I
-
-```
-z = x - mean(x)
-lag = convolve(z, W)           # W = queen or rook kernel
-I = (N / S0) * sum(z * lag) / sum(z^2)
-```
-
-S0 (total weight sum) accounts for border effects and NaN gaps by convolving a non-NaN mask with the weight kernel and summing the result.
-
-Analytical expected value and variance follow Cliff & Ord (1981):
-
-```
-E[I] = -1 / (N - 1)
-Var[I] uses S0, S1, S2, N, and the kurtosis of z
-```
-
-Where S1 = (1/2) * sum_ij (w_ij + w_ji)^2 and S2 = sum_i (sum_j w_ij + sum_j w_ji)^2. For symmetric binary weights (queen/rook), S1 = 2 * S0 and S2 simplifies.
-
-### Local Moran's I (LISA)
-
-```
-I_i = (z_i / var(x)) * sum_j(w_ij * z_j)
-```
-
-### Permutation pseudo p-value
-
-For each pixel i:
-1. Extract the neighbor values (up to 8 for queen, 4 for rook).
-2. Shuffle the neighbor values n_permutations times (Fisher-Yates).
-3. Recompute I_i with each shuffled set.
-4. p_value = (count(|I_perm| >= |I_obs|) + 1) / (n_permutations + 1)
-
-The +1 correction includes the observed value as one permutation (Davison & Hinkley, 1997).
-
-### Cluster classification
-
-| Code | Label | Condition (when p <= 0.05) |
-|------|-------|---------------------------|
-| 0 | NS | not significant |
-| 1 | HH | z_i > 0 and lag_i > 0 |
-| 2 | LL | z_i < 0 and lag_i < 0 |
-| 3 | HL | z_i > 0 and lag_i < 0 |
-| 4 | LH | z_i < 0 and lag_i > 0 |
-
-### NaN handling
-
-- NaN input pixels produce NaN in lisa_values and p_values, 0 in cluster.
-- NaN neighbors are excluded from lag sums (their weight drops to zero).
-- Constant rasters (zero variance) produce NaN for all statistics.
-
-## Contiguity kernels
-
-Queen (8 neighbors):
-```
-[[1, 1, 1],
- [1, 0, 1],
- [1, 1, 1]]
-```
-
-Rook (4 neighbors):
-```
-[[0, 1, 0],
- [1, 0, 1],
- [0, 1, 0]]
-```
-
-## Internal Architecture
-
-### Backend dispatch
-
-Both public functions validate input via `_validate_raster`, build the contiguity kernel, then dispatch through `ArrayTypeFunctionMapping`:
-
-```
-morans_i / lisa
-  -> _validate_raster(raster, ndim=2)
-  -> kernel = _contiguity_kernel(contiguity)
-  -> ArrayTypeFunctionMapping(numpy, cupy, dask, dask_cupy)(raster)(...)
-```
-
-### Backend implementations
-
-**numpy:** Compute mean/var eagerly. Spatial lag via `_convolve_2d_numpy` (imported from convolution module) or a local implementation. LISA permutation via `@ngjit` loop that iterates pixels, extracts 8 neighbors, runs Fisher-Yates shuffle n_permutations times.
-
-**cupy:** Same structure but GPU arrays. Spatial lag via CuPy convolution. Permutation via `@cuda.jit` kernel where each thread handles one pixel. RNG uses `numba.cuda.random.xoroshiro128p_uniform_float32` for Fisher-Yates shuffle.
-
-**dask+numpy:** Eagerly compute global mean, var, N with `da.compute()`. Pass as scalars to chunk function via `partial()`. Single `map_overlap(depth=1, boundary=...)` call with fused chunk function that computes lag + I_i + permutation + cluster. Returns `(3, H, W)` float32 array (lisa_values, p_values, cluster). Unpacked into Dataset after.
-
-**dask+cupy:** Same structure as dask+numpy but chunk function dispatches to CUDA kernel internally.
-
-### Fused LISA chunk function
-
-```python
-def _lisa_chunk_numpy(chunk, kernel, global_mean, global_var, n_permutations, seed):
-    """Single-pass: lag + LISA + permutation + cluster for one chunk."""
-    rows, cols = chunk.shape
-    z = chunk - global_mean
-    out = np.empty((3, rows, cols), dtype=np.float32)
-    # For each pixel: read neighbors, compute lag, permute, classify
-    _lisa_fused_ngjit(z, kernel, global_var, n_permutations, seed, out)
-    return out
-```
-
-The `@ngjit` inner function handles the pixel loop with neighbor extraction, lag computation, Fisher-Yates permutation, and cluster assignment in a single pass.
-
-### Global Moran's I backend flow
-
-```
-numpy:
-  z = data - mean
-  lag = convolve(z, kernel)          # reuse focal convolution
-  mask = ~isnan(data)
-  S0 = sum(convolve(mask, kernel))   # total weight count
-  N = sum(mask)
-  I = (N / S0) * nansum(z * lag) / nansum(z^2)
-  # analytical variance via S0, S1, S2, N
-
-dask:
-  mean, var, N = da.compute(da.nanmean(data), da.nanvar(data), da.sum(~da.isnan(data)))
-  z = data - mean
-  lag = map_overlap(_convolve_chunk, depth=1, ...)
-  S0 = da.sum(map_overlap(_convolve_mask_chunk, depth=1, ...))
-  I = (N / S0) * da.nansum(z * lag) / da.nansum(z**2)
-  I = I.compute()
-```
-
-## map_overlap specifics
-
-- **depth:** 1 in both dimensions (3x3 kernel, half-size = 1)
-- **boundary:** passed through `_boundary_to_dask(boundary)`, default `np.nan`
-- **meta:** `np.array((), dtype=np.float32)` for numpy, `cupy.array((), dtype=cupy.float32)` for cupy
-- **Fused output:** LISA chunk function receives a 2D chunk and returns a `(3, H, W)` array. To make `map_overlap` accept this shape change, pass `new_axis=0` so dask knows the output gains a leading dimension. After the `map_overlap` call, slice `result[0]`, `result[1]`, `result[2]` to get the three output bands. If `new_axis` with `map_overlap` proves brittle at runtime, fall back to three separate `map_overlap` calls (one per output band) at the cost of 3x neighbor reads.
-
-## Seed handling
-
-Permutation tests need reproducible results for cross-backend testing.
-
-- numpy/dask+numpy: `np.random.SeedSequence(seed)` spawns per-pixel child sequences. In practice, the `@ngjit` function uses a simple LCG seeded with `seed + pixel_linear_index` for Fisher-Yates shuffles over 8 elements. Good enough for 8-element shuffles.
-- cupy/dask+cupy: `xoroshiro128p` states initialized with `create_xoroshiro128p_states(n_threads, seed)`.
-- Public API does not expose seed. Internal backends accept it for testing. Default seed is 0 for determinism within a single call (users wanting different random draws can call twice -- permutation p-values are not sensitive to seed choice for n >= 999).
-
-## Testing
-
-File: `xrspatial/tests/test_autocorrelation.py`
-
-### Known-value tests
-
-| Input | Expected I | Rationale |
-|-------|-----------|-----------|
-| 4x4 checkerboard | I < -0.8 | Strong negative autocorrelation |
-| 4x4 row gradient | I > 0.5 | Positive autocorrelation |
-| 8x8 random (seeded) | -0.3 < I < 0.3 | Near zero |
-| Constant value | NaN | Zero variance |
-
-### LISA tests
-
-- Checkerboard: all I_i negative, all clusters HL or LH, all p-values < 0.05
-- Gradient: center pixels positive (HH or LL), p-values < 0.05
-- NaN corners: NaN in output at those positions, valid elsewhere
-
-### Edge cases
-
-- Single-cell raster: returns NaN
-- All-NaN raster: returns NaN
-- Raster with one non-NaN pixel: returns NaN
-
-### Cross-backend parity
-
-- `assert_numpy_equals_dask_numpy` for both functions
-- `assert_numpy_equals_cupy` (skip if no GPU)
-- Fixed seed ensures identical permutation sequences
-
-### Contiguity
-
-- Queen vs rook produce different I values on same input
-- Rook on 4x4 checkerboard: I = -1.0 (perfect negative, all 4 neighbors opposite)
-
-## Documentation
-
-- Add API entry in `docs/source/reference/` (new section or extend focal tools)
-- Add row to README feature matrix under a new "Spatial Statistics" category
-
-## Files changed
-
-| File | Change |
-|------|--------|
-| `xrspatial/autocorrelation.py` | New module |
-| `xrspatial/__init__.py` | Export `morans_i`, `lisa` |
-| `xrspatial/tests/test_autocorrelation.py` | New test file |
-| `docs/source/reference/autocorrelation.rst` | New API docs |
-| `README.md` | Feature matrix row |
-| `examples/user_guide/` | New notebook |
diff --git a/docs/superpowers/specs/2026-04-06-polygonize-simplification-design.md b/docs/superpowers/specs/2026-04-06-polygonize-simplification-design.md
deleted file mode 100644
index 63d3c82a..00000000
--- a/docs/superpowers/specs/2026-04-06-polygonize-simplification-design.md
+++ /dev/null
@@ -1,138 +0,0 @@
-# Polygonize Geometry Simplification
-
-**Issue:** #1151
-**Date:** 2026-04-06
-
-## Problem
-
-`polygonize()` produces exact pixel-boundary polygons. On high-resolution
-rasters this creates dense geometries with thousands of vertices per polygon,
-making output slow to render, large on disk, and unwieldy for spatial joins.
-
-The current workaround chains GDAL's `gdal_polygonize.py` with
-`ogr2ogr -simplify`, adding an external dependency and intermediate file.
-
-## API
-
-Two new parameters on `polygonize()`:
-
-```python
-def polygonize(
-    raster, mask=None, connectivity=4, transform=None,
-    column_name="DN", return_type="numpy",
-    simplify_tolerance=None,            # float, coordinate units
-    simplify_method="douglas-peucker",  # str
-):
-```
-
-- `simplify_tolerance=None` or `0.0`: no simplification (backward compatible).
-- `simplify_tolerance > 0`: apply Douglas-Peucker with given tolerance.
-- `simplify_method="visvalingam-whyatt"`: raises `NotImplementedError`.
-- Negative tolerance raises `ValueError`.
-
-## Algorithm: Shared-Edge Douglas-Peucker
-
-Topology-preserving simplification via shared-edge decomposition, same
-approach used by TopoJSON and GRASS `v.generalize`.
-
-### Pipeline position
-
-Simplification runs between boundary tracing and output conversion:
-
-```
-CCL -> boundary tracing -> [simplification] -> output conversion
-```
-
-For dask backends, simplification runs after chunk merging.
-
-### Steps
-
-1. **Find junctions.** Scan all ring vertices. A junction is any coordinate
-   that appears as a vertex in 3 or more distinct rings. These points are
-   pinned and never removed by simplification.
-
-2. **Split rings into edge chains.** Walk each ring and split at junction
-   vertices. Each resulting chain connects two junctions (or forms a closed
-   loop when the ring contains no junctions). Each chain is shared by at
-   most 2 adjacent polygons.
-
-3. **Deduplicate chains.** Normalize each chain by its sorted endpoint pair
-   so shared edges between adjacent polygons are identified and simplified
-   only once.
-
-4. **Simplify each chain.** Apply Douglas-Peucker to each unique chain.
-   Junction endpoints are fixed. The DP implementation is numba-compiled
-   (`@ngjit`) for performance on large coordinate arrays.
-
-5. **Reassemble rings.** Replace each ring's chain segments with their
-   simplified versions and rebuild the ring coordinate arrays.
-
-### Why this preserves topology
-
-Adjacent polygons reference the same physical edge chain. Simplifying
-each chain once means both neighbors get identical simplified boundaries.
-No gaps or overlaps can arise because there is no independent simplification
-of shared geometry.
-
-## Implementation
-
-All new code lives in `xrspatial/polygonize.py` as internal functions.
-
-### New functions
-
-| Function | Decorator | Purpose |
-|---|---|---|
-| `_find_junctions(all_rings)` | pure Python | Scan rings, return set of junction coords |
-| `_split_ring_at_junctions(ring, junctions)` | pure Python | Break one ring into chains at junctions |
-| `_normalize_chain(chain)` | pure Python | Canonical key for deduplication |
-| `_douglas_peucker(coords, tolerance)` | `@ngjit` | DP simplification on Nx2 array |
-| `_simplify_polygons(polygon_points, tolerance)` | pure Python | Orchestrator: junctions -> split -> DP -> reassemble |
-
-### Integration point
-
-In `polygonize()`, after the `mapper(raster)(...)` call returns
-`(column, polygon_points)` and before the return-type conversion block:
-
-```python
-if simplify_tolerance and simplify_tolerance > 0:
-    polygon_points = _simplify_polygons(polygon_points, simplify_tolerance)
-```
-
-### Backend behavior
-
-- **NumPy / CuPy:** simplification runs on CPU-side coordinate arrays
-  returned by boundary tracing (CuPy already transfers to CPU for tracing).
-- **Dask:** simplification runs after `_merge_chunk_polygons()`, on the
-  fully merged result.
-- No GPU-side simplification. Boundary tracing is already CPU-bound;
-  simplification follows the same pattern.
-
-## Constraints
-
-- No Visvalingam-Whyatt yet. The `simplify_method` parameter is present
-  in the API for forward compatibility; passing `"visvalingam-whyatt"`
-  raises `NotImplementedError`.
-- No streaming simplification. The full polygon set must fit in memory,
-  same constraint as existing boundary tracing.
-- Minimum ring size after simplification: exterior rings keep at least 4
-  vertices (3 unique + closing). Degenerate rings (area below tolerance
-  squared) are dropped.
-
-## Testing
-
-- Correctness: known 4x4 raster, verify simplified polygon areas match
-  originals (simplification must not change topology, only vertex count).
-- Vertex reduction: verify output has fewer vertices than unsimplified.
-- Topology: verify no gaps between adjacent polygons (union of simplified
-  polygons equals union of originals, within floating-point tolerance).
-- Edge cases: tolerance=0, tolerance=None, negative tolerance, single-pixel
-  raster, raster with one uniform value.
-- Backend parity: numpy and dask produce same results.
-- Return types: simplification works with all five return types.
-
-## Out of scope
-
-- Visvalingam-Whyatt implementation (future PR).
-- GPU-accelerated simplification.
-- Per-chunk simplification for dask (simplification is post-merge only).
-- Area-weighted simplification or other adaptive tolerance schemes.