Skip to content

refactor: unify as_dataarray; split broadcasting from coords validation#726

Merged
FBumann merged 8 commits into
fix/bounds-coords-broadcastfrom
feat/unify-as-dataarray-coords
May 27, 2026
Merged

refactor: unify as_dataarray; split broadcasting from coords validation#726
FBumann merged 8 commits into
fix/bounds-coords-broadcastfrom
feat/unify-as-dataarray-coords

Conversation

@FBumann
Copy link
Copy Markdown
Collaborator

@FBumann FBumann commented May 24, 2026

Closes #723. Stacked on #725 (which is stacked on #722).

What changes

as_dataarray_in_coords is folded into as_dataarray and split along the seam between "broadcast arr against coords" and "enforce the coords contract":

  • as_dataarray(arr, coords) — the broadcasting primitive. For every input type, the result is aligned with coords: positional inputs (numpy / unnamed pandas / scalar) are labeled by position, shared-dim coords are reindexed when values are equal in a different order, dims present in coords but not in arr are expanded, and the result is transposed to coords order. Extra dims and disagreeing value sets on shared dims pass through, so xarray broadcasting in expression arithmetic keeps working.
  • assert_compatible_with_coords(arr, coords) — the validation companion. Raises if arr introduces dims not in coords (was: as_dataarray_in_coords's extra-dim raise) or if a shared dim has disagreeing coord values (was: its "do not match" raise).

add_variables and add_constraints now call as_dataarray followed by assert_compatible_with_coords for lower / upper / mask. The previous as_dataarray_in_coords helper is deleted.

_coords_to_dict now filters MultiIndex level coords out of xarray.Coordinates inputs, so the new strict-by-default path treats e.g. station (and not its derived letter / num levels) as the dim. This was already a latent issue once strict semantics governed as_dataarray's coords arg.

Audit summary (the 12 call sites listed in #723)

Call site Strictness Behavior
model.py:705/706 lower/upper in add_variables strict as_dataarray + assert_compatible_with_coords
model.py:715 mask in add_variables strict same
model.py:979 mask in add_constraints strict same
expressions.py:584/613/1105/1668/2154 arithmetic broadcast call signatures unchanged; benefits from positional labeling + reindex for free, with no value-set raise on shared dims
variables.py:330 to_linexpr(coefficient) broadcast unchanged
expressions.py:341/2002/2030/2289, model.py:912/919/972, variables.py:1369 no coords unaffected

Breaking changes for end users

This PR introduces no new user-visible breaking changes. It's a structural refactor that keeps the strict contract at the add_variables / add_constraints call sites via assert_compatible_with_coords. The breaking changes a user sees when this PR lands on master are the union of what #722 and #725 already deliver:

From #725 (mask handling)

Today (master) After this stack Migration
Mask with sparse coords silently fills missing entries with False and emits FutureWarning Raises ValueError mask.reindex({...}, fill_value=False)
Mask with extra dims raises AssertionError Raises ValueError Same fix; only the exception type changed
Mask as pd.Series / pd.DataFrame missing a dim was sometimes silently dropped Broadcast like bounds None needed (strictly more cases work)

From #722 (bounds handling)

Today (master) After this stack
Pandas Series / DataFrame bound missing a dim is silently dropped Broadcast to coords (#709 fix)
DataArray bound dim order depends on bound type (#706) Always follows coords order
DataArray bound with extra dim was silently accepted Raises ValueError
DataArray bound with mismatched coord values raised varied downstream errors Raises ValueError: ...coordinates do not match
Piecewise._broadcast_points had hash-randomized dim order across processes Iterates in declaration order

Refactor-only changes (no user-visible effect on the strict sites)

  • as_dataarray itself no longer raises on extra dims or shared-dim value-set mismatch — every public site that used to depend on those raises now gets them from the explicit assert_compatible_with_coords call. Surface behavior at add_variables / add_constraints is preserved.
  • Two existing tests updated to reflect the new "coords is source of truth, extras broadcast in" semantics of as_dataarray itself:
    • test_as_dataarray_with_ndarray_coords_dict_set_dims_not_aligned: extra coord entries now expand into the result (the test was pinning an undocumented "dims wins, coord entry dropped" corner of the old lax path).
    • test_dataarray_extra_dims: rewritten so the subset check fires (rather than the value-mismatch check) — the assertion still raises on extra dims, just via the new assert_compatible_with_coords helper.

Test plan

  • test/test_common.py — added five tests pinning the new split (extra-dim preservation, disjoint shared-dim values, assert_compatible_with_coords extra-dim raise, value-mismatch raise, subset-dims allowed).
  • Full suite: 3698 passed, 29 skipped (parity with base).
  • pre-commit run --all-files clean (ruff/format/blackdoc/codespell).
  • Microbenchmark (dev-scripts/benchmark_as_dataarray.py, untracked): flat timings vs the base branch on both add_variables-heavy (≈22 ms / 50–57 ms mean) and arithmetic-heavy (≈80–82 ms) workloads.

🤖 Generated with Claude Code

FBumann and others added 6 commits May 24, 2026 15:37
…onstraints

Routes ``mask`` through ``as_dataarray_in_coords(mask, data.coords)``
instead of ``as_dataarray(...) + broadcast_mask(...)``, so pandas
``Series`` / ``DataFrame`` masks missing a dimension are broadcast
to the variable / constraint shape (parallel to the bounds fix in
the previous PR). The ``add_variables`` ``mask`` type hint widens
to ``MaskLike`` to match ``add_constraints``.

The deprecation announced via ``FutureWarning`` in ``broadcast_mask``
("Missing values will be filled with False ... In a future version,
this will raise an error") is now in effect: masks whose
coordinates are a sparse subset of the data's coordinates raise
``ValueError`` instead of silently filling missing entries.
Mask dims not in the data raise ``ValueError`` instead of
``AssertionError`` for consistency with the bounds path.

``broadcast_mask`` had no other callers and is removed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes #723. Folds the body of `as_dataarray_in_coords` into `as_dataarray`
and extracts the contract checks into `assert_compatible_with_coords`, so
linopy now has one broadcasting primitive and one validation companion.

`as_dataarray(arr, coords)` aligns the result against `coords` for every
input type: labels positional inputs (numpy / unnamed pandas / scalar) by
position, reindexes same-values-different-order, expands missing dims,
and transposes to coords order. Extra dims and disagreeing value sets on
shared dims pass through unchanged, so xarray broadcasting in expression
arithmetic keeps working.

`assert_compatible_with_coords(arr, coords)` enforces the strict contract
(`arr.dims ⊆ coords.dims`, plus exact coord-value equality on shared
dims). `add_variables` and `add_constraints` now call it after
`as_dataarray` for `lower` / `upper` / `mask`, replacing the deleted
`as_dataarray_in_coords` helper.

`_coords_to_dict` filters MultiIndex level coords out of
`xarray.Coordinates` inputs so the new strict-by-default path treats
`station` (and not its derived `letter` / `num` levels) as the dim.

Test suite: 3698 passed (no regressions). Two existing tests were
updated to reflect the new "coords is source of truth" semantics:
`test_as_dataarray_with_ndarray_coords_dict_set_dims_not_aligned`
(extra coord entries now broadcast in) and
`test_dataarray_extra_dims` (now triggers the subset check rather than
the value-mismatch check).

Microbenchmark in dev-scripts/benchmark_as_dataarray.py shows flat
timings vs the base branch on both add_variables-heavy and arithmetic-
heavy workloads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes a silent-failure gap in the strict coords-as-truth path: when the
caller passed ``coords=[[1, 2, 3]], dims=["x"]`` to ``add_variables``,
``_coords_to_dict`` returned an empty mapping (unnamed sequences carry
no dim name), so the strict checks short-circuited and bounds with
extra dims or mismatched values flowed through unchecked, producing
variables with frankenstein outer-joined coord values.

``_coords_to_dict`` now accepts an optional ``dims`` argument that
names unnamed sequence entries by position. ``as_dataarray`` and
``assert_compatible_with_coords`` plumb it through; ``add_variables``
forwards ``kwargs.get("dims")`` to the assertions for ``lower`` and
``upper``. ``coords=[[1, 2, 3]], dims=["x"]`` now enforces the same
contract as ``coords={"x": [1, 2, 3]}`` or
``coords=[pd.Index([1, 2, 3], name="x")]``.

Docstring of ``add_variables.coords`` documents the contract
(subset-of-dims, dim order, value match with auto-reindex, missing-dim
broadcast) and includes four doctests pinning it: the extra-dim raise,
the value-mismatch raise, the same-values-different-order auto-reindex,
and the unnamed-coords-plus-dims opt-in.

Test suite: 3698 passed (parity with the previous commit on this
branch). ``pytest --doctest-modules linopy/model.py -k add_variables``
also green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Introduce align_to_coords to wrap as_dataarray and assert_compatible_with_coords
with user-facing labels (lower bound, upper bound, mask). Errors now name the
argument and distinguish extra dimensions, coordinate mismatches, and conversion
failures. Extend mask validation to use coords+dims= when provided.

Co-authored-by: Cursor <cursoragent@cursor.com>
…coords

Three cleanups on top of align_to_coords:

- Drop the trailing ``.broadcast_like(data.labels)`` in ``add_variables``
  and ``add_constraints`` mask paths. ``as_dataarray`` already expands
  missing dims to ``coords`` shape, so the broadcast was a no-op.
- Stop overriding the caller's ``dims=`` in the ``add_variables`` mask
  path when ``coords is None``. The previous code stripped ``dims`` and
  forced ``dims=data.dims``; with ``data.coords`` being an xarray
  ``Coordinates`` with already-named dims, the user's ``dims`` is
  harmless to forward and the override was just hiding intent. Mask
  now goes through one ``align_to_coords`` call regardless of whether
  ``coords`` is supplied.
- Split the exception handler in ``align_to_coords``: ``TypeError`` from
  unsupported input types is re-raised as ``TypeError`` (still labeled),
  while ``ValueError`` / ``CoordinateValidationError`` stay
  ``ValueError``. Preserves the original type signature for callers
  that want to ``except TypeError``.

New test ``test_align_to_coords_preserves_type_errors`` pins the
TypeError pass-through. Suite: 3703 passed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Base automatically changed from fix/mask-coords-broadcast to fix/bounds-coords-broadcast May 27, 2026 08:57
@FabianHofmann
Copy link
Copy Markdown
Collaborator

@FBumann we have some conflicts here now, should I resolve them? I would still need to figure out where things branch off currently

@FBumann
Copy link
Copy Markdown
Collaborator Author

FBumann commented May 27, 2026

I can

FBumann added a commit that referenced this pull request May 27, 2026
…_validate_update

Constraint
- sanitize_zeros now writes _data via _assign_data directly (no
  longer round-trips through update(variables=DataArray), which
  would self-trigger the new deprecation warning).
- Constraint.update(variables=...) emits FutureWarning when passed a
  raw DataArray of integer labels; Variable is the supported input.
  The path stays accepted for back-compat and will be removed
  alongside the v1 cleanup.

Variable
- Extract Variable._validate_update(*, lower, upper) — validates,
  broadcasts, and runs the cross-bound (lb<=ub) check, returning a
  dict ready for assignment. update() body shrinks to ~3 lines.
  Coord validation (parity with add_variables) deferred to #726 land.

Tests
- test_constraint_coef_dirty's variables= test now passes a Variable
  instead of a DataArray (matches the supported input).
- test_constraint_vars_setter_with_array wrapped in
  pytest.warns(FutureWarning) — locks in the deprecation contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…o feat/unify-as-dataarray-coords

# Conflicts:
#	doc/release_notes.rst
#	linopy/common.py
#	linopy/model.py
#	test/test_constraints.py
#	test/test_variables.py
FabianHofmann pushed a commit that referenced this pull request May 27, 2026
* feat: Variable.update() / Constraint.update() as canonical mutation API

Introduces typed ``.update()`` methods on Variable and Constraint as
the single, validated, multi-attribute mutation entry point.

- ``Variable.update(lower=, upper=)``: validates non-constant
  inputs are rejected, new dims are rejected, and the resulting
  ``lower <= upper`` invariant holds across all coords. Returns
  ``self`` for chaining.
- ``Constraint.update(rhs=, sign=)``: constant RHS only. The
  legacy ``c.rhs = variable`` rearrange-to-lhs path stays on the
  setter (different semantic, deserves its own explicit method).

The existing ``.lower`` / ``.upper`` / ``.sign`` setters become
thin shims that forward to ``.update()``, so single-attribute
writes (``z.lower = 2``) stay ergonomic and the canonical
validation runs in one place. The ``.rhs`` setter forwards
constants through ``.update()`` and keeps the expression-rhs
rearrange behaviour.

This is the on-top experiment for the design discussion on #718.
``.lhs`` / ``.coeffs`` / ``.vars`` setters are untouched for now.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(update): Constraint.update accepts Variable/Expression rhs

Mirrors the existing ``c.rhs = expr`` setter and ``add_constraints``
which both accept mixed-side input and rearrange the residual onto
lhs. ``c.update(rhs=x + 5)`` now subtracts ``x`` from lhs and stores
``5`` on rhs. ``.rhs`` setter collapses to a one-line shim.

Variable bound rejection of Variable/Expression is kept (bounds are
numeric, not symbolic); docstring clarified to spell out that
pandas / xarray / numpy arrays are first-class (time-varying bounds).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(update): extend Constraint.update to lhs/coeffs/vars; shim all setters

Adds lhs / coeffs / vars to the canonical mutation API. All
.lhs / .coeffs / .vars setters now forward to .update() — every
Constraint mutation goes through one method with one validation
path, one place that flips _coef_dirty.

Composition rules:
- lhs= replaces the whole expression first; subsequent rhs=
  rearrangement (Variable/Expression in rhs) sees the new lhs.
- lhs= and coeffs= / vars= are mutually exclusive (whole
  replacement vs partial array update).
- sign= is applied last so it composes cleanly.

Internal Constraint.sanitize_zeros migrated to update(vars=,
coeffs=) — no more internal setter calls in linopy/.

389 tests pass across mutation + persistent-solver suite.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(update): rename Constraint.update kwarg vars= -> variables=

Avoids shadowing Python's vars() builtin. The .vars attribute on
Constraint stays (it parallels the .data.vars internal name);
only the kwarg gets the unambiguous spelling.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(update): accept positional ConstraintLike in Constraint.update

Mirrors add_constraints' dispatch: c.update(x + 5 <= 3) is now
shorthand for c.update(lhs=x, sign='<=', rhs=-2), extracted from
the AnonymousConstraint / ConstraintBase the comparison produces.

Mutually exclusive with the per-attribute kwargs; clear error when
mixed.

Also reverts the internal sanitize_zeros migration. The setters
are pure shims forwarding to update(), so the migration didn't
change behaviour or cost — just spelling. The original setter
syntax reads more naturally there.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(update): note kwarg form is the targeted, cheap path

The positional ConstraintLike form (c.update(x + 5 <= 3)) always
rewrites lhs / sign / rhs and flips _coef_dirty. For hot loops that
only touch one part, kwarg form (c.update(rhs=...)) skips the
unchanged attributes and is materially cheaper.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(persistent): default ModelDiff.from_snapshot(same_model=False)

Closes the A1 residual from the #718 review. The flag-trust path
(`skip_coef_compare = same_model and not coef_dirty`) is correct
through Constraint.update() (set in one place, shims forward), but
`c.coeffs.values[...] = ...` still bypasses _coef_dirty. With
same_model=True as the default, that bypass silently produces wrong
diffs.

Flip the default to False. Cross-model paths (the only production
caller, Solver._update_locked, passes explicitly) are unaffected.
Same-model warm-update paths now value-diff the CSR data — small
perf hit (50-200ms at Mayk-scale per Mayk's bench), correct by
default. Solver-aware callers who own the mutation contract can
opt back into the optimization with `same_model=True`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: teach .update() in tutorials; mark setters as syntactic sugar

- examples/manipulating-models.ipynb: rewrite mutation cells to use
  Variable.update / Constraint.update; setter form is mentioned in
  notes as syntactic sugar for the same call.
- examples/creating-constraints.ipynb: reframe the CSRConstraint vs
  Constraint API table around .update() as the mutation API; setters
  are sugar.
- Setter docstrings now say 'syntactic sugar for Constraint/Variable
  .update; do not add logic here so the contract stays single-sourced'
  — a directive to future contributors as much as to readers.

No deprecation, no breaking change. .update() is the documented
canonical mutation API; the seven setters continue to exist as
one-line shims.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* deprecate(update): warn on mutation setters; promote .update() in docs

Adds DeprecationWarning to all seven mutation setters (Variable.lower,
Variable.upper, Constraint.coeffs, Constraint.vars, Constraint.sign,
Constraint.rhs, Constraint.lhs). Each setter still forwards to .update()
so existing code keeps working; the warning points at the canonical API.

Internal sanitize_zeros migrated off setters (the last linopy/ caller).
api.rst gains Modification sections listing .update() for both Variable
and Constraint; tutorial notes rewritten to teach .update() and flag
setters as deprecated. Release note added.

dual.setter / solution.setter untouched — result assignment, not
mutation, different deprecation track.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(update): edge-case coverage; document rhs-rearrangement invariant

Constraint.update tests: lhs-only, coeffs-only (vars preserved),
compound lhs+sign, mutually-exclusive lhs+coeffs and lhs+variables.
Variable.update tests: upper-only, valid array bound.

Migrate test_constraint_coef_dirty.py from the now-deprecated setters
to .update(), exercising the canonical path; add positional-form and
no-op cases. Net effect: same dirty-flag invariants, 7 fewer warnings
per pytest run.

Docs: Constraint.update rhs= gains a worked example showing the two
forms (constant vs variable/expression). add_constraints rhs gets a
matching note pointing at the linopy invariant so the rearrangement
rule is documented at the creation site too.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* review(update): address inline feedback on #727

- Constraint._assign_lhs_expr → _assign_lhs (drop redundant suffix;
  the method already takes a LinearExpression, so the type was in the
  signature, not the name).
- Add Constraint._assign_data(**fields) helper. Wraps the four
  ``self._data = assign_multiindex_safe(self.data, **kw)`` callsites
  inside update() (rhs / coeffs / vars / sign). Untouched: the same
  pattern in dual.setter, sanitize_missings, sanitize_infinities —
  those aren't update() and stay out of scope here.
- Add types.CONSTANT_TYPES tuple, derived from ConstantLike via
  get_args so the two cannot drift. Variable.update bound validation
  flipped from negative (isinstance against a hand-rolled
  non_constant tuple) to positive (isinstance against CONSTANT_TYPES);
  drops a redundant in-function ``from linopy import expressions``
  (the module-level import already covered it).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* deprecate(update): FutureWarning on DataArray as variables=; extract _validate_update

Constraint
- sanitize_zeros now writes _data via _assign_data directly (no
  longer round-trips through update(variables=DataArray), which
  would self-trigger the new deprecation warning).
- Constraint.update(variables=...) emits FutureWarning when passed a
  raw DataArray of integer labels; Variable is the supported input.
  The path stays accepted for back-compat and will be removed
  alongside the v1 cleanup.

Variable
- Extract Variable._validate_update(*, lower, upper) — validates,
  broadcasts, and runs the cross-bound (lb<=ub) check, returning a
  dict ready for assignment. update() body shrinks to ~3 lines.
  Coord validation (parity with add_variables) deferred to #726 land.

Tests
- test_constraint_coef_dirty's variables= test now passes a Variable
  instead of a DataArray (matches the supported input).
- test_constraint_vars_setter_with_array wrapped in
  pytest.warns(FutureWarning) — locks in the deprecation contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(constraints): _assign_data → _update_data; manage _coef_dirty inside

The helper now writes fields AND flips _coef_dirty when the written
set includes coeffs or vars. Callsites in update() (coeffs / vars
branches) and sanitize_zeros drop their explicit `self._coef_dirty = True`
lines — the rule lives in one place, can't be forgotten by future
field additions.

rhs / sign writes still don't dirty (correctly). _assign_lhs is
untouched: it uses a different write mechanic (drop_vars + assign)
and manages its own flag.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread linopy/common.py Outdated
@FBumann
Copy link
Copy Markdown
Collaborator Author

FBumann commented May 27, 2026

@FabianHofmann I think #722 is kind of blocking here atm. Not hard, but will produce conflicts.
We should probably fold that in

Per PR review: align on the project's `validate_*` naming convention
and remove the implicit "AssertionError" connotation of `assert_*`.
Pairs naturally with `align_to_coords`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@FBumann
Copy link
Copy Markdown
Collaborator Author

FBumann commented May 27, 2026

@FabianHofmann Ready for final review.

EDIT: If you dont mind, i would merge this and give the final combined PR a proper review.

@FabianHofmann
Copy link
Copy Markdown
Collaborator

@FabianHofmann Ready for final review.

EDIT: If you dont mind, i would merge this and give the final combined PR a proper review.

already approved. sounds good to me

@FBumann FBumann merged commit 48de61b into fix/bounds-coords-broadcast May 27, 2026
3 checks passed
@FBumann FBumann deleted the feat/unify-as-dataarray-coords branch May 27, 2026 10:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants