Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR enhances the cluster‐merging workflow by adding parallel DE computation, optional DataFrame output for marker selection, and refactors the final merging API.
- Introduces
return_markers_dfandn_jobsparameters acrossmerge_clustersandselect_marker_geneswith parallelized DE viade_pairs_ebayes_parallel. - Refactors
final_mergeto accept list‐based cluster assignments, include parallelization options, and adds a helper to convert cluster dicts to lists. - Updates example scripts and README to demonstrate the new parameters and usage.
Reviewed Changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| transcriptomic_clustering/merging.py | Added return_markers_df/n_jobs params and updated marker‐selection logic |
| transcriptomic_clustering/markers.py | Extended select_marker_genes signature, return type, and DE parallel call |
| transcriptomic_clustering/final_merging.py | Refactored final_merge API, signature, docstring, and added helper function |
| transcriptomic_clustering/filter_known_modes.py | Changed known_modes param to lookup by column name; updated signature logic |
| transcriptomic_clustering/dimension_reduction.py | Adjusted gene count calculation; potential off‐by‐sum bug |
| transcriptomic_clustering/de_ebayes.py | Added process_pair and de_pairs_ebayes_parallel for parallel DE |
| transcriptomic_clustering/init.py | Exposed new functions and final_merge; duplicate import cleanup needed |
| examples/final_merging_example.py | New example demonstrating final_merge with parallel and marker‐DF options |
| examples/clustering_example.py | Example script for iterative clustering; missing imports cause NameError |
| README.md | Documented the examples directory |
Comments suppressed due to low confidence (5)
transcriptomic_clustering/markers.py:26
- The docstring for select_marker_genes should be updated to describe the new parameters
return_markers_dfandn_jobs, and to reflect the updated return type (Union[pd.DataFrame, set]).
"""
transcriptomic_clustering/final_merging.py:97
- The docstring for final_merge should be updated to include the added parameters
n_jobsandreturn_markers_df, and to clarify the new return type (tuple of list of lists and markers).
"""
transcriptomic_clustering/filter_known_modes.py:13
- The docstring should be updated to reflect that
known_modesis now expected to be a column name (string) inadata.obsrather than a DataFrame or Series.
known_modes: Optional[str] = None,
examples/clustering_example.py:7
- The script uses
sys.pathbutsysis not imported; addimport sysat the top of the file.
sys.path.insert(1, '/allen/programs/celltypes/workgroups/rnaseqanalysis/dyuan/tool/transcriptomic_clustering/')
examples/clustering_example.py:17
transcriptomic_clusteringmodule is not imported, so callingtranscriptomic_clustering.normalizewill raise a NameError; addimport transcriptomic_clusteringor importnormalizedirectly.
adata=transcriptomic_clustering.normalize(adata)
| vidx_bool = np.zeros((adata.n_vars,), dtype=bool) | ||
| vidx_bool[vidx] = True | ||
| n_genes = len(vidx) | ||
| n_genes = sum(vidx) |
There was a problem hiding this comment.
Using sum(vidx) will sum the index positions rather than count the number of genes; consider using len(vidx) or vidx_bool.sum() to get the correct gene count.
| n_genes = sum(vidx) | |
| n_genes = len(vidx) |
| from .diff_expression import de_pairs_chisq, vec_chisq_test | ||
| from .de_ebayes import de_pairs_ebayes | ||
| from .de_ebayes import de_pairs_ebayes, de_pairs_ebayes_parallel | ||
| from .merging import merge_clusters |
There was a problem hiding this comment.
Duplicate import of merge_clusters; remove the redundant line to keep the module init.py clean.
| from .merging import merge_clusters |
No description provided.