obs column silently shadows var gene expression when key exists in both

# `obs` column silently shadows `var` gene expression when key exists in both

**Environment**: `spatialdata-plot` `0.3.4.dev` (main, commit `5cfedc7`), Python 3.13

---

## Problem

When the same key name exists in both `table.obs.columns` **and** `table.var_names`, the `obs` value silently wins with no warning. Users who intend to color by gene expression (from the `X` matrix via `var_names`) get `obs` data instead — with no indication that anything unexpected has happened.

The root cause is an `elif` in spatialdata's `_get_table_origins`:

```python
if value_key in element.obs.columns:
    origins.append(_ValueOrigin(origin="obs", ...))
elif value_key in element.var_names:   # ← skipped when obs matches
    origins.append(_ValueOrigin(origin="var", ...))
```

Because `elif` is used, finding the key in `obs` entirely prevents the `var` check. The spatialdata-plot layer (`utils.py:1074–1078`) handles the multi-origin case with a descriptive `ValueError`, but it never gets the chance because only one origin (obs) is returned.

This is particularly dangerous when:
- `obs[gene]` stores a pre-computed aggregate or a different assay
- `var[gene]` is the per-cell expression matrix the user wants to visualize

---

## Minimal reproducible example

```python
import matplotlib; matplotlib.use("Agg")
import matplotlib.pyplot as plt
import numpy as np, pandas as pd, anndata as ad
import dask; dask.config.set({"dataframe.query-planning": False})
import spatialdata as sd
from spatialdata.models import PointsModel, TableModel
import spatialdata_plot

pts = PointsModel.parse(pd.DataFrame({"x": [1., 2., 3., 4.], "y": [1., 2., 3., 4.]}))

obs = pd.DataFrame({
    "instance_id": [0, 1, 2, 3],
    "region": ["pts"] * 4,
    "GeneA": [0.9, 0.8, 0.7, 0.6],   # obs: summary/aggregate values, all similar
})
obs.index = obs.index.astype(str)

# var GeneA expression has a very different range: [1.0, 0.8, 0.3, 0.1]
X = np.array([[1.0, 0.5], [0.8, 0.2], [0.3, 0.9], [0.1, 0.7]])
adata = ad.AnnData(X=X, obs=obs, var=pd.DataFrame(index=["GeneA", "GeneB"]))
table = TableModel.parse(adata, region=["pts"], region_key="region", instance_key="instance_id")
sdata = sd.SpatialData(points={"pts": pts}, tables={"t": table})

# User expects gene expression from var — but gets obs values
sdata.pl.render_points("pts", color="GeneA", table_name="t").pl.show()
# No error, no warning — silently uses obs GeneA [0.9, 0.8, 0.7, 0.6]
# instead of var GeneA expression [1.0, 0.8, 0.3, 0.1]
```

---

## Expected behaviour

When a key exists in both `obs` and `var_names`, either:
- **A `UserWarning`** is raised explaining that `obs` is being used and `var` is being shadowed, with a hint to disambiguate
- **Or**: a `ValueError` is raised asking the user to specify which source they want

## Actual behaviour

No warning. The plot uses `obs["GeneA"]` values `[0.9, 0.8, 0.7, 0.6]` — the user intended the `var`-sourced expression values `[1.0, 0.8, 0.3, 0.1]`.

---

## Fix sketch

In `_get_table_origins` (upstream spatialdata), change `elif` to a second `if` for the `var` check. When both `obs` AND `var` match, both origins are appended. The spatialdata-plot layer at `utils.py:1074–1078` already handles multiple origins with a descriptive `ValueError` that explains the ambiguity and asks the user to resolve it — this code would then be triggered correctly.

Alternatively, if obs-first priority is the intended behavior, emit a `UserWarning` at the spatialdata-plot layer when the value was found in `obs` but would also match `var_names`.

---
**Triage tier**: Tier 3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

obs column silently shadows var gene expression when key exists in both #621

`obs` column silently shadows `var` gene expression when key exists in both

Problem

Minimal reproducible example

Expected behaviour

Actual behaviour

Fix sketch

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

obs column silently shadows var gene expression when key exists in both #621

Description

obs column silently shadows var gene expression when key exists in both

Problem

Minimal reproducible example

Expected behaviour

Actual behaviour

Fix sketch

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`obs` column silently shadows `var` gene expression when key exists in both