update finalMerging example by yyuandann · Pull Request #140 · AllenInstitute/transcriptomic_clustering

yyuandann · 2025-07-16T18:44:29Z

No description provided.

Copilot

Pull Request Overview

This PR enhances the cluster‐merging workflow by adding parallel DE computation, optional DataFrame output for marker selection, and refactors the final merging API.

Introduces return_markers_df and n_jobs parameters across merge_clusters and select_marker_genes with parallelized DE via de_pairs_ebayes_parallel.
Refactors final_merge to accept list‐based cluster assignments, include parallelization options, and adds a helper to convert cluster dicts to lists.
Updates example scripts and README to demonstrate the new parameters and usage.

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
transcriptomic_clustering/merging.py	Added `return_markers_df`/`n_jobs` params and updated marker‐selection logic
transcriptomic_clustering/markers.py	Extended `select_marker_genes` signature, return type, and DE parallel call
transcriptomic_clustering/final_merging.py	Refactored `final_merge` API, signature, docstring, and added helper function
transcriptomic_clustering/filter_known_modes.py	Changed `known_modes` param to lookup by column name; updated signature logic
transcriptomic_clustering/dimension_reduction.py	Adjusted gene count calculation; potential off‐by‐sum bug
transcriptomic_clustering/de_ebayes.py	Added `process_pair` and `de_pairs_ebayes_parallel` for parallel DE
transcriptomic_clustering/init.py	Exposed new functions and `final_merge`; duplicate import cleanup needed
examples/final_merging_example.py	New example demonstrating final_merge with parallel and marker‐DF options
examples/clustering_example.py	Example script for iterative clustering; missing imports cause NameError
README.md	Documented the examples directory

Comments suppressed due to low confidence (5)

transcriptomic_clustering/markers.py:26

The docstring for select_marker_genes should be updated to describe the new parameters return_markers_df and n_jobs, and to reflect the updated return type (Union[pd.DataFrame, set]).

"""

transcriptomic_clustering/final_merging.py:97

The docstring for final_merge should be updated to include the added parameters n_jobs and return_markers_df, and to clarify the new return type (tuple of list of lists and markers).

"""

transcriptomic_clustering/filter_known_modes.py:13

The docstring should be updated to reflect that known_modes is now expected to be a column name (string) in adata.obs rather than a DataFrame or Series.

        known_modes: Optional[str] = None,

examples/clustering_example.py:7

The script uses sys.path but sys is not imported; add import sys at the top of the file.

sys.path.insert(1, '/allen/programs/celltypes/workgroups/rnaseqanalysis/dyuan/tool/transcriptomic_clustering/')

examples/clustering_example.py:17

transcriptomic_clustering module is not imported, so calling transcriptomic_clustering.normalize will raise a NameError; add import transcriptomic_clustering or import normalize directly.

adata=transcriptomic_clustering.normalize(adata)

Copilot · 2025-07-16T18:46:30Z

transcriptomic_clustering/dimension_reduction.py

    vidx_bool = np.zeros((adata.n_vars,), dtype=bool)
    vidx_bool[vidx] = True
-    n_genes = len(vidx)
+    n_genes = sum(vidx)


Using sum(vidx) will sum the index positions rather than count the number of genes; consider using len(vidx) or vidx_bool.sum() to get the correct gene count.

Suggested change

n_genes = sum(vidx)

n_genes = len(vidx)

Copilot · 2025-07-16T18:46:31Z

transcriptomic_clustering/__init__.py

 from .diff_expression import de_pairs_chisq, vec_chisq_test
-from .de_ebayes import de_pairs_ebayes
+from .de_ebayes import de_pairs_ebayes, de_pairs_ebayes_parallel
 from .merging import merge_clusters


Duplicate import of merge_clusters; remove the redundant line to keep the module init.py clean.

Suggested change

from .merging import merge_clusters

Dan Yuan and others added 11 commits August 13, 2024 15:38

Added an example script and updated README

49349e3

Modified the final_merge function

20467e3

Updated the example scripts

4fe87f6

Deleted the origin example_usage.py

5d6b2ba

modified return data type in final_merge()

93147f5

modified final_merge() to allow not providing marger genes

1d35005

Modify example scripts

5eb4083

Modified example scripts

21845d4

fixed a bug in pca();chanegd the input to a str for filter_known_modes()

6438d8f

update the example script for final merging

096165b

update example

026edba

yyuandann requested review from UCDNJJ and Copilot July 16, 2025 18:44

Copilot AI reviewed Jul 16, 2025

View reviewed changes

yyuandann added 2 commits July 31, 2025 20:38

add de pair function

5c43281

update de example

bbd4579

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update finalMerging example#140

update finalMerging example#140
yyuandann wants to merge 13 commits intoAllenInstitute:devfrom
yyuandann:hmba/tc_latent

yyuandann commented Jul 16, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jul 16, 2025

Uh oh!

Copilot AI Jul 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yyuandann commented Jul 16, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant