Skip to content

Replacing colData with a named DataFrame silently misaligns assay columns when row order differs #91

@hindrek

Description

@hindrek

Hi!

I noticed (in a hard way) that when updating the colData, the row order of the DataFrame has to match with the column order of the colData also for the named DataFrame. Otherwise, the assay matrix column names and values do not align. This is easy to do unintentionally if one needs to update sample metadata after SE object construction. Unfortunately, the misalignment is silent and difficult to detect downstream.

In contrast, updating rowData with a reordered DataFrame does not affect assay alignment, which makes the behaviour asymmetric and potentially confusing for users.

suppressPackageStartupMessages(library(SummarizedExperiment))
#> Warning: replacing previous import 'S4Arrays::makeNindexFromArrayViewport' by
#> 'DelayedArray::makeNindexFromArrayViewport' when loading 'SummarizedExperiment'

counts <- matrix(1:16, nrow = 4)
rowData <- DataFrame(x = 1:4, row.names = paste0("feature", 1:4))
colData <- DataFrame(x = 1:4, row.names = paste0("sample", 1:4))

# Construct SE object
se <- SummarizedExperiment(
  assays = SimpleList(counts = counts),
  rowData = rowData,
  colData = colData
)
assay(se)
#>          sample1 sample2 sample3 sample4
#> feature1       1       5       9      13
#> feature2       2       6      10      14
#> feature3       3       7      11      15
#> feature4       4       8      12      16

# Update rowData
rowData(se) <- rev(rowData)
assay(se)
#>          sample1 sample2 sample3 sample4
#> feature1       1       5       9      13
#> feature2       2       6      10      14
#> feature3       3       7      11      15
#> feature4       4       8      12      16

# Update colData
colData(se) <- rev(colData)
assay(se)
#>          sample4 sample3 sample2 sample1
#> feature1       1       5       9      13
#> feature2       2       6      10      14
#> feature3       3       7      11      15
#> feature4       4       8      12      16

Created on 2026-01-10 with reprex v2.1.1

Session info

sessionInfo()
#> R version 4.5.2 (2025-10-31 ucrt)
#> Platform: x86_64-w64-mingw32/x64
#> Running under: Windows 11 x64 (build 26200)
#> 
#> Matrix products: default
#>   LAPACK version 3.12.1
#> 
#> locale:
#> [1] LC_COLLATE=Estonian_Estonia.utf8  LC_CTYPE=Estonian_Estonia.utf8   
#> [3] LC_MONETARY=Estonian_Estonia.utf8 LC_NUMERIC=C                     
#> [5] LC_TIME=Estonian_Estonia.utf8    
#> 
#> time zone: Europe/Tallinn
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] SummarizedExperiment_1.40.0 Biobase_2.70.0             
#>  [3] GenomicRanges_1.62.1        Seqinfo_1.0.0              
#>  [5] IRanges_2.44.0              S4Vectors_0.48.0           
#>  [7] BiocGenerics_0.56.0         generics_0.1.4             
#>  [9] MatrixGenerics_1.22.0       matrixStats_1.5.0          
#> 
#> loaded via a namespace (and not attached):
#>  [1] cli_3.6.5           knitr_1.51          rlang_1.1.6        
#>  [4] xfun_0.55           DelayedArray_0.36.0 glue_1.8.0         
#>  [7] htmltools_0.5.9     rmarkdown_2.30      grid_4.5.2         
#> [10] evaluate_1.0.5      abind_1.4-8         fastmap_1.2.0      
#> [13] yaml_2.3.12         lifecycle_1.0.5     compiler_4.5.2     
#> [16] fs_1.6.6            XVector_0.50.0      lattice_0.22-7     
#> [19] digest_0.6.39       SparseArray_1.10.8  reprex_2.1.1       
#> [22] Matrix_1.7-4        tools_4.5.2         withr_3.0.2        
#> [25] S4Arrays_1.10.1

Here, the assay values remain in the original column order, but the column names are reordered to match the new colData, resulting in misalignment between assay values and column names.

Best regards
Hindrek

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions