Skip to content

Add snpclustering subworkflow#11059

Open
dbaku42 wants to merge 3 commits intonf-core:masterfrom
dbaku42:add/snpclustering
Open

Add snpclustering subworkflow#11059
dbaku42 wants to merge 3 commits intonf-core:masterfrom
dbaku42:add/snpclustering

Conversation

@dbaku42
Copy link
Copy Markdown

@dbaku42 dbaku42 commented Mar 26, 2026

Description

This PR adds the snpclustering subworkflow for end-to-end unsupervised clustering of genomic samples directly from multi-sample VCF files.

Features

  • Variant filtering (MAF + missingness) with bcftools/filter
  • LD pruning with plink2/indeppairwise
  • Export pruned VCF with plink2/recodevcf
  • PCA with flashpca2

The subworkflow was developed in relation to the accepted nf-core proposal for the consepopgen pipeline.

Related to:

Checklist

  • nf-core subworkflows lint snpclustering passed
  • nf-core subworkflows test snpclustering passed
  • Follows nf-core subworkflow conventions

Closes # (no specific issue)

@famosab
Copy link
Copy Markdown
Contributor

famosab commented Apr 2, 2026

Please join the nf-core organization on GitHub to enable the CI-tests to run on your PR. You can request to join the organization via #github-invitations in the nf-core slack. You can join the nf-core slack via https://nf-co.re/join.

Copy link
Copy Markdown
Contributor

@famosab famosab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution to nf-core! We really appreciate it. I added a few comments to your PR.

}

then {
assert workflow.success
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also want a snapshot here (look at other subworkflows)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test now passes with direct nf-test. The failure with nf-core subworkflows test is due to a temporary missing Wave container for the plink2/vcf module (manifest unknown). The logic and snapshot are correct.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed anymore

missing

main:
versions = Channel.empty()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check for each module if they still export the versions I think at least bcftools/filter does not anymore

FLASHPCA2 ( PLINK2_RECODE_VCF.out.vcf )
versions = versions.mix(FLASHPCA2.out.versions.first())

// TODO: qui aggiungeremo KMeans/DBSCAN/plot quando creeremo i moduli local
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there still something to add?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your comment @famosab .

You’re absolutely right — the clustering components (KMeans, DBSCAN), internal validation metrics (Silhouette, Calinski–Harabasz, Davies–Bouldin), non-linear embeddings (t-SNE/UMAP), and the final HTML report still need to be integrated.

These features are already implemented in the original pipeline (https://github.com/dbaku42/nf-core-snpclustering). I intentionally left them out of this PR to keep the subworkflow minimal and easier to review.

I’m happy to proceed in either of the following ways:

  1. Include all these components directly in this PR (my preferred option), or
  2. Add them in a dedicated follow-up PR immediately after this one is merged.

Please let me know which approach you’d prefer.

Thanks again!

Co-authored-by: Famke Bäuerle <45968370+famosab@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants