Skip to content

fix: broadcast pandas/DataArray bounds in coords and preserve dim order#722

Closed
FBumann wants to merge 16 commits into
masterfrom
fix/bounds-coords-broadcast
Closed

fix: broadcast pandas/DataArray bounds in coords and preserve dim order#722
FBumann wants to merge 16 commits into
masterfrom
fix/bounds-coords-broadcast

Conversation

@FBumann
Copy link
Copy Markdown
Collaborator

@FBumann FBumann commented May 23, 2026

Closes #706, #709. Partially addresses #723 (helper relocated to common.py; call-site audit deferred to follow-up — see issue comment for scope).

What changes

Principle. 0.7.0 made coords the source of truth for DataArray bounds in add_variables. This PR closes the two remaining gaps so the rule holds across every bound type and dimension order:

Implementation. _validate_dataarray_bounds and the downstream as_dataarray(arr, coords) call in add_variables collapse into a single helper, as_dataarray_in_coords (in linopy/common.py, alongside as_dataarray). It converts any input (pandas with named axes via to_xarray, otherwise via as_dataarray), validates the result against coords, expands missing dims, transposes to coords order, and reconstructs the coord variables in that order. expand_dims({}) and identity transpose are metadata-only no-ops, so scalar bounds and full-dim DataArray bounds already in coords order pay no extra cost.

Pandas conversion. A new _named_pandas_to_dataarray helper handles pandas with fully named axes (including MultiIndex on either axis) by stacking all column levels into the row index in one call: arr.stack(level=list(range(arr.columns.nlevels)), future_stack=True), followed by to_xarray(). Pandas with any unnamed axis falls through to as_dataarray.

Piecewise. Same family of fix: linopy.piecewise._broadcast_points built its expand_dims map from a set, giving a hash-randomized dim order across processes. Iterates expressions and dims in declaration order now.

Supersedes

Test plan

  • test/test_variable.py::TestAddVariablesBoundsWithCoords — 57/57. Includes:
    • test_bound_broadcast_missing_dim parametrized over DataArray, Series, DataFrame, Series-multiindex, DataFrame-multicolumns, DataFrame-multiindex.
    • test_dataarray_broadcast_missing_dim_order parametrized over scalar/DA bound combinations.
    • test_dataarray_coord_reorder for the same-values-different-order reindex.
    • test_pandas_bound_with_unnamed_axis_falls_through, test_unnamed_coords_short_circuit, test_dataarray_bound_with_multiindex_coord covering edge branches in the new helpers.
  • test/test_piecewise_constraints.py::test_broadcast_points_dim_order_follows_exprs.
  • Broader sweep: test_model.py, test_linear_expression.py, test_constraint.py, test_constraints.py, test_piecewise_constraints.py, test_variable_assignment.py, test_variables.py — 784 pass, no regressions.
  • Pre-commit (ruff/format/blackdoc/codespell) clean.

🤖 Generated with Claude Code

FBumann and others added 6 commits May 24, 2026 00:38
`add_variables` had two related bugs when `lower`/`upper` were arrays:

- pandas Series/DataFrame bounds missing a dimension in `coords` had
  the missing dimension silently dropped (#709), unlike DataArray
  bounds which were already broadcast.
- DataArray bounds missing a dimension were expanded with
  `DataArray.expand_dims`, which prepends new dimensions and produces
  a `coords`-mismatched dimension order in the resulting variable
  (#706). The order depended on the type of the bounds, so scalar
  bounds worked but two array bounds missing the same dimension did
  not.

Replace `_validate_dataarray_bounds` plus the downstream
`as_dataarray(..., coords)` call with a single helper
`_as_dataarray_in_coords`. It converts any input (pandas with named
axes via `to_xarray`, otherwise via `as_dataarray`), validates the
result against `coords`, expands missing dims, transposes to coords
order, and reconstructs the coord variables in that order.
`expand_dims` and `transpose` are no-ops when the array already
matches, so scalar / full-dim DataArray bounds keep their fast path.

Also fix `linopy.piecewise._broadcast_points`, which built the
`expand_dims` map from a `set`, producing a hash-randomized dimension
order across processes. Iterate expressions and dims in declaration
order instead.

Closes #706 and #709. Supersedes #710 and #719.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Restate #706/#709's fix as a single principle in the docstring,
release note, and `_as_dataarray_in_coords` helper docstring:
when `coords` is provided to `add_variables`, it is the source of
truth for dimensions, dimension order, and coordinate values, and
`lower` / `upper` are broadcast and aligned to match.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
0.7.0 already shipped "add_variables no longer ignores coords when
lower / upper are DataArrays". Recast the new bullet as extending
that fix to the remaining gaps (pandas bounds; dim order across
bound types) so the continuity is visible from the release notes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ng gaps"

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Parametrize test_bound_broadcast_missing_dim with three additional
  cases: Series with MultiIndex(time, colour), DataFrame with
  MultiIndex columns(space, colour), and DataFrame with MultiIndex
  index(time, space). Exercises the `while DataFrame: unstack()`
  loop and the MultiIndex branch of `_named_pandas_to_dataarray`.
- Add test_dataarray_coord_reorder for the same-values-different-order
  reindex branch (previously only the unequal-values raise was
  covered).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Relocate `_as_dataarray_in_coords` and its helpers
(`_coords_to_dict`, `_named_pandas_to_dataarray`) from `model.py`
into `common.py`, alongside the existing `as_dataarray` they
parallel. Rename to `as_dataarray_in_coords` (no leading underscore)
since it is no longer file-local — other modules can import the
strict-coords variant when migrating call sites.

Pure relocation: no behavior change, no call-site changes beyond
`add_variables`'s import. Refs #723.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread linopy/model.py Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would make sense to give mask the same treatment as the bounds to avoid the same pandas issue. Then mask could have the same type hint as lower/upper bounds and the user could for example pass pd.DataFrame for all 3

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would! But this is somewhat breaking (we had a FutureWarning in place, but still...). I will split this into a separate PR!

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed?

Comment thread linopy/model.py Outdated
FBumann and others added 2 commits May 24, 2026 13:15
…anches

Replace the unstack-while-loop / split named-check structure with a
single up-front "all axes named" check and a single
``DataFrame.stack(level=list(range(nlevels)), future_stack=True)``
call that collapses all column levels into the row MultiIndex in
one shot. Same observable behaviour, fewer moving parts, no
defensive unreachable branches.

Add tests covering the unnamed-axis fall-through path, the
empty-coords short-circuit in ``as_dataarray_in_coords``, and the
``MultiIndex``-on-a-dim ``continue`` in the validation loop.
Together with the restructure these bring the new helper code to
full patch coverage.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pandas allows any hashable in ``pd.Index.names`` (tuples, ints,
etc.), but only strings map cleanly to xarray dim names. Reject
anything non-string up front so the pandas falls back to
``as_dataarray`` instead of producing a DataArray with an awkward
non-string dim name that downstream validation would reject with a
confusing "extra dimensions" error.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
FBumann and others added 3 commits May 24, 2026 15:11
Inputs without their own meaningful labels — numpy arrays, polars
Series, pandas with unnamed axes — fell through ``as_dataarray_in_coords``
via a short-circuit return. That meant:

- The default ``dim_0`` / ``dim_1`` axis names from ``as_dataarray``
  leaked into the result, so a pandas Series without an index name
  combined with another bound carrying a named coord produced a
  spurious 2-D variable.
- Shape mismatches surfaced further downstream as confusing
  "coordinates do not match" errors against the auto-generated
  ``RangeIndex``.

The fall-through now: (a) defaults ``dims`` to coords' keys so axes
get labelled correctly; (b) runs the same validate / expand /
transpose path as labelled inputs; (c) re-assigns coords from
``expected`` on the resulting DataArray so positional inputs align
to coords by position. A shape mismatch surfaces as xarray's clear
``conflicting sizes`` from ``assign_coords``. MultiIndex coords are
left alone (re-assigning a PandasMultiIndex emits a FutureWarning).

Replaces the tautological ``test_pandas_bound_with_unnamed_axis_falls_through``
(which sneaked past by naming the coord ``"dim_0"`` to match the
auto-generated dim) with ``test_positional_bound_aligns_to_coords``
that asserts actual positional alignment across numpy / Series /
DataFrame, plus ``test_positional_bound_wrong_size_raises_clear_error``
for the shape-mismatch path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…o_dict

``reformulate_sos1`` / ``reformulate_sos2`` built the coords for
the indicator variable as ``[var.coords[d] for d in var.dims]``,
which is a list of ``xarray.DataArray`` coord objects. The rest
of linopy passes ``coords`` as a list of ``pd.Index``. The mix
slipped through under the old short-circuit fall-through but
broke once the helper started defaulting ``dims`` from
``_coords_to_dict(coords)`` — non-``pd.Index`` entries were
silently dropped, so ``len(dims) < len(coords)`` and xarray
raised ``different number of dimensions on data and dims: 2 vs 1``.

Use ``var.indexes[d]`` instead — it returns the actual
``pd.Index`` (regular or MultiIndex) for the dim and preserves
structure that ``pd.Index(coord.values, ...)`` would flatten.

Also widen ``_coords_to_dict`` to accept any entry with a
``.name`` (xarray DataArrays included) so a future caller passing
mixed types doesn't silently lose coords. The reformulator fix
removes the only known producer of mixed-type coords; this is
belt-and-suspenders.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the permissive ``getattr(c, "name", None)`` check with an
explicit allow-list: ``pd.Index`` (named or not — unnamed silently
skip as before) and unnamed sequences (``list`` / ``tuple`` /
``range`` / ``numpy.ndarray``). Any other type (notably
``xarray.DataArray``, but also ``pd.Series`` and friends) now
raises ``TypeError`` with a hint to pass ``variable.indexes[<dim>]``
instead. This would have caught the SOS-reformulator bug at the
source instead of letting it surface as a confusing xarray error
about mismatched dim counts ten frames down.

Drop ``DataArray`` from the matching ``coords`` type hints in
``model.py`` / ``expressions.py`` so the documented and runtime
type sets agree.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
FabianHofmann and others added 2 commits May 27, 2026 10:54
- _coords_to_dict: explicitly handle pd.MultiIndex — register under
  .name if set, raise TypeError with guidance if .name is missing
- _named_pandas_to_dataarray: use DataArray(df) directly for
  single-level DataFrames; reserve stack() for MultiIndex axes
- as_dataarray_in_coords: validate MultiIndex dims with .equals()
  instead of silently skipping them
- Move MultiIndex tests into dedicated TestAddVariablesMultiIndexCoords
  class with shared fixture
…nts (#725)

* fix(model): apply coords-as-truth rule to mask in add_variables/add_constraints

Routes ``mask`` through ``as_dataarray_in_coords(mask, data.coords)``
instead of ``as_dataarray(...) + broadcast_mask(...)``, so pandas
``Series`` / ``DataFrame`` masks missing a dimension are broadcast
to the variable / constraint shape (parallel to the bounds fix in
the previous PR). The ``add_variables`` ``mask`` type hint widens
to ``MaskLike`` to match ``add_constraints``.

The deprecation announced via ``FutureWarning`` in ``broadcast_mask``
("Missing values will be filled with False ... In a future version,
this will raise an error") is now in effect: masks whose
coordinates are a sparse subset of the data's coordinates raise
``ValueError`` instead of silently filling missing entries.
Mask dims not in the data raise ``ValueError`` instead of
``AssertionError`` for consistency with the bounds path.

``broadcast_mask`` had no other callers and is removed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Update doc/release_notes.rst

Co-authored-by: Fabian Hofmann <fab.hof@gmx.de>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Fabian Hofmann <fab.hof@gmx.de>
…on (#726)

* fix(model): apply coords-as-truth rule to mask in add_variables/add_constraints

Routes ``mask`` through ``as_dataarray_in_coords(mask, data.coords)``
instead of ``as_dataarray(...) + broadcast_mask(...)``, so pandas
``Series`` / ``DataFrame`` masks missing a dimension are broadcast
to the variable / constraint shape (parallel to the bounds fix in
the previous PR). The ``add_variables`` ``mask`` type hint widens
to ``MaskLike`` to match ``add_constraints``.

The deprecation announced via ``FutureWarning`` in ``broadcast_mask``
("Missing values will be filled with False ... In a future version,
this will raise an error") is now in effect: masks whose
coordinates are a sparse subset of the data's coordinates raise
``ValueError`` instead of silently filling missing entries.
Mask dims not in the data raise ``ValueError`` instead of
``AssertionError`` for consistency with the bounds path.

``broadcast_mask`` had no other callers and is removed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor: unify as_dataarray; split broadcasting from coords validation

Closes #723. Folds the body of `as_dataarray_in_coords` into `as_dataarray`
and extracts the contract checks into `assert_compatible_with_coords`, so
linopy now has one broadcasting primitive and one validation companion.

`as_dataarray(arr, coords)` aligns the result against `coords` for every
input type: labels positional inputs (numpy / unnamed pandas / scalar) by
position, reindexes same-values-different-order, expands missing dims,
and transposes to coords order. Extra dims and disagreeing value sets on
shared dims pass through unchanged, so xarray broadcasting in expression
arithmetic keeps working.

`assert_compatible_with_coords(arr, coords)` enforces the strict contract
(`arr.dims ⊆ coords.dims`, plus exact coord-value equality on shared
dims). `add_variables` and `add_constraints` now call it after
`as_dataarray` for `lower` / `upper` / `mask`, replacing the deleted
`as_dataarray_in_coords` helper.

`_coords_to_dict` filters MultiIndex level coords out of
`xarray.Coordinates` inputs so the new strict-by-default path treats
`station` (and not its derived `letter` / `num` levels) as the dim.

Test suite: 3698 passed (no regressions). Two existing tests were
updated to reflect the new "coords is source of truth" semantics:
`test_as_dataarray_with_ndarray_coords_dict_set_dims_not_aligned`
(extra coord entries now broadcast in) and
`test_dataarray_extra_dims` (now triggers the subset check rather than
the value-mismatch check).

Microbenchmark in dev-scripts/benchmark_as_dataarray.py shows flat
timings vs the base branch on both add_variables-heavy and arithmetic-
heavy workloads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: dims= names unnamed coords; doctest the add_variables contract

Closes a silent-failure gap in the strict coords-as-truth path: when the
caller passed ``coords=[[1, 2, 3]], dims=["x"]`` to ``add_variables``,
``_coords_to_dict`` returned an empty mapping (unnamed sequences carry
no dim name), so the strict checks short-circuited and bounds with
extra dims or mismatched values flowed through unchecked, producing
variables with frankenstein outer-joined coord values.

``_coords_to_dict`` now accepts an optional ``dims`` argument that
names unnamed sequence entries by position. ``as_dataarray`` and
``assert_compatible_with_coords`` plumb it through; ``add_variables``
forwards ``kwargs.get("dims")`` to the assertions for ``lower`` and
``upper``. ``coords=[[1, 2, 3]], dims=["x"]`` now enforces the same
contract as ``coords={"x": [1, 2, 3]}`` or
``coords=[pd.Index([1, 2, 3], name="x")]``.

Docstring of ``add_variables.coords`` documents the contract
(subset-of-dims, dim order, value match with auto-reindex, missing-dim
broadcast) and includes four doctests pinning it: the extra-dim raise,
the value-mismatch raise, the same-values-different-order auto-reindex,
and the unnamed-coords-plus-dims opt-in.

Test suite: 3698 passed (parity with the previous commit on this
branch). ``pytest --doctest-modules linopy/model.py -k add_variables``
also green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: add align_to_coords with semantic validation error messages

Introduce align_to_coords to wrap as_dataarray and assert_compatible_with_coords
with user-facing labels (lower bound, upper bound, mask). Errors now name the
argument and distinguish extra dimensions, coordinate mismatches, and conversion
failures. Extend mask validation to use coords+dims= when provided.

Co-authored-by: Cursor <cursoragent@cursor.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* refactor(model): simplify mask align; preserve TypeError in align_to_coords

Three cleanups on top of align_to_coords:

- Drop the trailing ``.broadcast_like(data.labels)`` in ``add_variables``
  and ``add_constraints`` mask paths. ``as_dataarray`` already expands
  missing dims to ``coords`` shape, so the broadcast was a no-op.
- Stop overriding the caller's ``dims=`` in the ``add_variables`` mask
  path when ``coords is None``. The previous code stripped ``dims`` and
  forced ``dims=data.dims``; with ``data.coords`` being an xarray
  ``Coordinates`` with already-named dims, the user's ``dims`` is
  harmless to forward and the override was just hiding intent. Mask
  now goes through one ``align_to_coords`` call regardless of whether
  ``coords`` is supplied.
- Split the exception handler in ``align_to_coords``: ``TypeError`` from
  unsupported input types is re-raised as ``TypeError`` (still labeled),
  while ``ValueError`` / ``CoordinateValidationError`` stay
  ``ValueError``. Preserves the original type signature for callers
  that want to ``except TypeError``.

New test ``test_align_to_coords_preserves_type_errors`` pins the
TypeError pass-through. Suite: 3703 passed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor: rename assert_compatible_with_coords to validate_alignment

Per PR review: align on the project's `validate_*` naming convention
and remove the implicit "AssertionError" connotation of `assert_*`.
Pairs naturally with `align_to_coords`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
@FBumann
Copy link
Copy Markdown
Collaborator Author

FBumann commented May 27, 2026

Superseded by a fresh PR that consolidates the full scope of this branch (#722 original + #725 + #726 + #729 + follow-ups) with a single comprehensive description. The branch fix/bounds-coords-broadcast is unchanged; only the PR thread is being reset for review clarity.

@FBumann
Copy link
Copy Markdown
Collaborator Author

FBumann commented May 27, 2026

→ Reopened as #732 with consolidated scope and a single description.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

add_variables: DataArray bounds with missing dimensions yield wrong dimension order

3 participants