Create merged Seurat and AnnData objects from Cell Ranger filtered feature barcode matrices.
This workflow processes 10X Chromium single-cell data from Cell Ranger output and creates both:
- Seurat objects (
.qs) via the R/Bioconductor ecosystem - AnnData objects (
.h5ad) via the Python/Scanpy ecosystem
It supports:
- Unimodal data: Gene expression only
- Multimodal data: Gene expression + Antibody Capture (CITE-seq)
The workflow can optionally attach:
- Sample assignments from SNP-based demultiplexing
- Cell type annotations
- Ambient RNA profiles
- Sample-level subsetting (keep only cells from a specified cohort)
This workflow can be run in either standalone mode or module mode.
In "standalone" mode, the data is included in the same repo as the workflow. This mode is used mainly for testing.
./run_test.shThis workflow can be embedded into a dataset as a git submodule.
To use in module mode:
- Add this workflow as a submodule to your dataset
- Copy and configure the config file
- Run the workflow using
run_mod.sh
# From the dataset root
git submodule add <repo-url> modules/mkobj
mkdir -p config/mkobj
cp modules/mkobj/config/template.yaml config/mkobj/config.yaml
# Edit config.yaml for your dataset
./modules/mkobj/run_mod.shThe workflow is organized into parallel Seurat and AnnData pipelines that run concurrently:
Cell Ranger matrices
│
├──► create_seurat_object (per capture) ──► merge_captures ──► merged.qs
│
└──► create_anndata_object (per capture) ──► merge_anndata_captures ──► merged.h5ad
-
create_seurat_object: Create individual Seurat objects per capture
- Reads Cell Ranger filtered matrices (
barcodes.tsv.gz,features.tsv.gz,matrix.mtx.gz) - Handles multimodal data (GEX + AB stored as a separate assay)
- Attaches sample assignments, annotations, and ambient profiles to cell metadata
- Optionally subsets cells to a specified cohort via
samples.csv - Prefixes barcodes with capture ID for uniqueness across captures
- Reads Cell Ranger filtered matrices (
-
merge_captures: Merge all per-capture objects into one
- Combines all captures using Seurat's
merge()function - Joins layers for proper integration
- Output:
merged.qs
- Combines all captures using Seurat's
-
create_anndata_object: Create individual AnnData objects per capture
- Reads Cell Ranger filtered matrices (
barcodes.tsv.gz,features.tsv.gz,matrix.mtx.gz) - Handles multimodal data — gene expression in
X, antibody capture inobsm['AB']with feature names inuns['AB_features'] - Attaches sample assignments, annotations, and ambient profiles to
obs - Optionally subsets cells to a specified cohort via
samples.csv(keeps cohort singlets, doublets, and unassigned cells) - Prefixes barcodes with capture ID for uniqueness across captures
- Converts string columns with NA values to proper string type for h5ad compatibility
- Reads Cell Ranger filtered matrices (
-
merge_anndata_captures: Merge all per-capture objects into one
- Combines all captures using
anndata.concat()withjoin='outer' - Preserves multimodal data in
obsmacross captures - Output:
merged.h5ad
- Combines all captures using
See the configuration guide for detailed instructions.
Quick start:
deps:
cellranger: "data/counts"
captures: "config/mkobj/captures.csv"
demux: "data/demux"
outs:
results: "data/objects"
logs: "logs/mkobj"| File | Description |
|---|---|
merged.qs |
Merged Seurat object (R, serialized with qs) |
merged.h5ad |
Merged AnnData object (Python, HDF5-backed) |
Both outputs contain identical data — the same cells, metadata, and (where applicable) multimodal assays — in their respective ecosystem formats.
- Snakemake >= 8.0
- snakemake-executor-plugin-cluster-generic
- qxub (for PBS cluster submission)
- Conda
- Apptainer (for containerized environments)
Rule-level conda environments are defined in workflow/envs/ and installed automatically:
| Environment | Key packages |
|---|---|
seurat.yaml |
R, Seurat 5.1, SeuratObject, qs, tidyverse |
scanpy.yaml |
Python >= 3.10, scanpy >= 1.10, anndata >= 0.10, pandas, numpy, scipy |
Originally developed as part of the Swarbrick Lab data processing pipelines.