Skip to content

gene_symbols= column typo produces misleading 'unable to locate color key' error #618

@timtreis

Description

@timtreis

gene_symbols= column typo produces misleading "unable to locate color key" error

The actual problem (wrong column name in gene_symbols=) is hidden behind a generic error.


Environment

  • spatialdata-plot 0.3.4.dev (main, commit 5cfedc7)
  • spatialdata 0.5.0
  • Python 3.13

Problem

When gene_symbols= is passed with a column name that doesn't exist in adata.var, the specific KeyError from _resolve_gene_symbols is caught by a bare except KeyError block and discarded. The table is removed from the candidate list and the caller raises a generic error that gives no indication gene_symbols is involved:

# utils.py:2868-2876
try:
    resolved_var_name = _resolve_gene_symbols(sdata[annotates], col_for_color, gene_symbols)
except KeyError:                    # ← swallows the helpful message
    tables.remove(annotates)        # silently skips the table

The helpful message — "Column 'WRONGCOL' not found in adata.var. Cannot use it as gene_symbols lookup." — is only visible when table_name= is explicitly provided, because that bypasses the bare-except branch.


Minimal reproducible example

import matplotlib; matplotlib.use("Agg")
import matplotlib.pyplot as plt
import numpy as np, pandas as pd, geopandas as gpd, anndata as ad
import dask; dask.config.set({"dataframe.query-planning": False})
import spatialdata as sd
from spatialdata.models import ShapesModel, TableModel
import spatialdata_plot
from shapely.geometry import Point

shapes = ShapesModel.parse(gpd.GeoDataFrame({"geometry": [Point(5,5)], "radius": [2.0]}))
obs = pd.DataFrame({"instance_id": [0], "region": pd.Categorical(["s"])})
var = pd.DataFrame({"genename": ["GeneA"]}, index=["gene1"])   # column is "genename"
adata = ad.AnnData(X=np.zeros((1, 1)), obs=obs, var=var)
table = TableModel.parse(adata, region=["s"], region_key="region", instance_key="instance_id")
sdata = sd.SpatialData(shapes={"s": shapes}, tables={"t": table})

# User has a typo: "WRONGCOL" instead of "genename"
# Without table_name= — generic error, gene_symbols not mentioned:
fig, ax = plt.subplots()
sdata.pl.render_shapes("s", color="GeneA", gene_symbols="WRONGCOL").pl.show(ax=ax)

Expected behaviour

KeyError: "Column 'WRONGCOL' not found in `adata.var`. Cannot use it as `gene_symbols` lookup.
Available var columns: ['genename']"

Actual behaviour

KeyError: "Unable to locate color key 'GeneA' for element 's'.
Please ensure the key exists in a table annotating this element."

No mention of gene_symbols, the typo, or available column names. Users spend time checking their color= argument instead of their gene_symbols= argument.

With table_name="t" explicitly: the helpful error IS raised, because the bare-except branch is not reached. The information asymmetry between the two call paths is the core of the bug.


Fix sketch

Replace the bare except KeyError with a re-raise (or a specific exception type check) so the helpful error from _resolve_gene_symbols propagates to the user:

# utils.py:2868-2876 — instead of swallowing the error:
try:
    resolved_var_name = _resolve_gene_symbols(sdata[annotates], col_for_color, gene_symbols)
except KeyError as e:
    raise KeyError(
        f"Column '{gene_symbols}' specified in gene_symbols= was not found in "
        f"adata.var for table '{annotates}'. "
        f"Available columns: {list(sdata[annotates].var.columns.tolist())}"
    ) from e

At minimum, do not silently swallow the error — let it propagate so users see the specific message.


Related

  • finding_gene_symbols_bad_col_error_swallowed.md

Triage tier: Tier 2

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions