obs column silently shadows var gene expression when key exists in both
Environment: spatialdata-plot 0.3.4.dev (main, commit 5cfedc7), Python 3.13
Problem
When the same key name exists in both table.obs.columns and table.var_names, the obs value silently wins with no warning. Users who intend to color by gene expression (from the X matrix via var_names) get obs data instead — with no indication that anything unexpected has happened.
The root cause is an elif in spatialdata's _get_table_origins:
if value_key in element.obs.columns:
origins.append(_ValueOrigin(origin="obs", ...))
elif value_key in element.var_names: # ← skipped when obs matches
origins.append(_ValueOrigin(origin="var", ...))
Because elif is used, finding the key in obs entirely prevents the var check. The spatialdata-plot layer (utils.py:1074–1078) handles the multi-origin case with a descriptive ValueError, but it never gets the chance because only one origin (obs) is returned.
This is particularly dangerous when:
obs[gene] stores a pre-computed aggregate or a different assay
var[gene] is the per-cell expression matrix the user wants to visualize
Minimal reproducible example
import matplotlib; matplotlib.use("Agg")
import matplotlib.pyplot as plt
import numpy as np, pandas as pd, anndata as ad
import dask; dask.config.set({"dataframe.query-planning": False})
import spatialdata as sd
from spatialdata.models import PointsModel, TableModel
import spatialdata_plot
pts = PointsModel.parse(pd.DataFrame({"x": [1., 2., 3., 4.], "y": [1., 2., 3., 4.]}))
obs = pd.DataFrame({
"instance_id": [0, 1, 2, 3],
"region": ["pts"] * 4,
"GeneA": [0.9, 0.8, 0.7, 0.6], # obs: summary/aggregate values, all similar
})
obs.index = obs.index.astype(str)
# var GeneA expression has a very different range: [1.0, 0.8, 0.3, 0.1]
X = np.array([[1.0, 0.5], [0.8, 0.2], [0.3, 0.9], [0.1, 0.7]])
adata = ad.AnnData(X=X, obs=obs, var=pd.DataFrame(index=["GeneA", "GeneB"]))
table = TableModel.parse(adata, region=["pts"], region_key="region", instance_key="instance_id")
sdata = sd.SpatialData(points={"pts": pts}, tables={"t": table})
# User expects gene expression from var — but gets obs values
sdata.pl.render_points("pts", color="GeneA", table_name="t").pl.show()
# No error, no warning — silently uses obs GeneA [0.9, 0.8, 0.7, 0.6]
# instead of var GeneA expression [1.0, 0.8, 0.3, 0.1]
Expected behaviour
When a key exists in both obs and var_names, either:
- A
UserWarning is raised explaining that obs is being used and var is being shadowed, with a hint to disambiguate
- Or: a
ValueError is raised asking the user to specify which source they want
Actual behaviour
No warning. The plot uses obs["GeneA"] values [0.9, 0.8, 0.7, 0.6] — the user intended the var-sourced expression values [1.0, 0.8, 0.3, 0.1].
Fix sketch
In _get_table_origins (upstream spatialdata), change elif to a second if for the var check. When both obs AND var match, both origins are appended. The spatialdata-plot layer at utils.py:1074–1078 already handles multiple origins with a descriptive ValueError that explains the ambiguity and asks the user to resolve it — this code would then be triggered correctly.
Alternatively, if obs-first priority is the intended behavior, emit a UserWarning at the spatialdata-plot layer when the value was found in obs but would also match var_names.
Triage tier: Tier 3
obscolumn silently shadowsvargene expression when key exists in bothEnvironment:
spatialdata-plot0.3.4.dev(main, commit5cfedc7), Python 3.13Problem
When the same key name exists in both
table.obs.columnsandtable.var_names, theobsvalue silently wins with no warning. Users who intend to color by gene expression (from theXmatrix viavar_names) getobsdata instead — with no indication that anything unexpected has happened.The root cause is an
elifin spatialdata's_get_table_origins:Because
elifis used, finding the key inobsentirely prevents thevarcheck. The spatialdata-plot layer (utils.py:1074–1078) handles the multi-origin case with a descriptiveValueError, but it never gets the chance because only one origin (obs) is returned.This is particularly dangerous when:
obs[gene]stores a pre-computed aggregate or a different assayvar[gene]is the per-cell expression matrix the user wants to visualizeMinimal reproducible example
Expected behaviour
When a key exists in both
obsandvar_names, either:UserWarningis raised explaining thatobsis being used andvaris being shadowed, with a hint to disambiguateValueErroris raised asking the user to specify which source they wantActual behaviour
No warning. The plot uses
obs["GeneA"]values[0.9, 0.8, 0.7, 0.6]— the user intended thevar-sourced expression values[1.0, 0.8, 0.3, 0.1].Fix sketch
In
_get_table_origins(upstream spatialdata), changeelifto a secondiffor thevarcheck. When bothobsANDvarmatch, both origins are appended. The spatialdata-plot layer atutils.py:1074–1078already handles multiple origins with a descriptiveValueErrorthat explains the ambiguity and asks the user to resolve it — this code would then be triggered correctly.Alternatively, if obs-first priority is the intended behavior, emit a
UserWarningat the spatialdata-plot layer when the value was found inobsbut would also matchvar_names.Triage tier: Tier 3