Conversation
|
Please join the nf-core organization on GitHub to enable the CI-tests to run on your PR. You can request to join the organization via #github-invitations in the nf-core slack. You can join the nf-core slack via https://nf-co.re/join. |
famosab
left a comment
There was a problem hiding this comment.
Thank you for your contribution to nf-core! We really appreciate it. I added a few comments to your PR.
| } | ||
|
|
||
| then { | ||
| assert workflow.success |
There was a problem hiding this comment.
We also want a snapshot here (look at other subworkflows)
There was a problem hiding this comment.
The test now passes with direct nf-test. The failure with nf-core subworkflows test is due to a temporary missing Wave container for the plink2/vcf module (manifest unknown). The logic and snapshot are correct.
| missing | ||
|
|
||
| main: | ||
| versions = Channel.empty() |
There was a problem hiding this comment.
Check for each module if they still export the versions I think at least bcftools/filter does not anymore
| FLASHPCA2 ( PLINK2_RECODE_VCF.out.vcf ) | ||
| versions = versions.mix(FLASHPCA2.out.versions.first()) | ||
|
|
||
| // TODO: qui aggiungeremo KMeans/DBSCAN/plot quando creeremo i moduli local |
There was a problem hiding this comment.
Is there still something to add?
There was a problem hiding this comment.
Thank you for your comment @famosab .
You’re absolutely right — the clustering components (KMeans, DBSCAN), internal validation metrics (Silhouette, Calinski–Harabasz, Davies–Bouldin), non-linear embeddings (t-SNE/UMAP), and the final HTML report still need to be integrated.
These features are already implemented in the original pipeline (https://github.com/dbaku42/nf-core-snpclustering). I intentionally left them out of this PR to keep the subworkflow minimal and easier to review.
I’m happy to proceed in either of the following ways:
- Include all these components directly in this PR (my preferred option), or
- Add them in a dedicated follow-up PR immediately after this one is merged.
Please let me know which approach you’d prefer.
Thanks again!
Co-authored-by: Famke Bäuerle <45968370+famosab@users.noreply.github.com>
Description
This PR adds the
snpclusteringsubworkflow for end-to-end unsupervised clustering of genomic samples directly from multi-sample VCF files.Features
bcftools/filterplink2/indeppairwiseplink2/recodevcfflashpca2The subworkflow was developed in relation to the accepted nf-core proposal for the
consepopgenpipeline.Related to:
Checklist
nf-core subworkflows lint snpclusteringpassednf-core subworkflows test snpclusteringpassedCloses # (no specific issue)