Skip to content

new module: vcfpartition#11098

Open
camlloyd wants to merge 4 commits intonf-core:masterfrom
camlloyd:vcfpartition
Open

new module: vcfpartition#11098
camlloyd wants to merge 4 commits intonf-core:masterfrom
camlloyd:vcfpartition

Conversation

@camlloyd
Copy link
Copy Markdown
Member

PR checklist

Closes #11097 by adding new module: vcfpartition

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the module conventions in the contribution docs
  • If necessary, include test data in your PR.
  • Remove all TODO statements.
  • Broadcast software version numbers to topic: versions - See version_topics
  • Follow the naming conventions.
  • Follow the parameters requirements.
  • Follow the input/output options guidelines.
  • Add a resource label
  • Use BioConda and BioContainers if possible to fulfil software requirements.
  • Ensure that the test works with either Docker / Singularity. Conda CI tests can be quite flaky:
    • For modules:
      • nf-core modules test <MODULE> --profile docker
      • nf-core modules test <MODULE> --profile singularity
      • nf-core modules test <MODULE> --profile conda
    • For subworkflows:
      • nf-core subworkflows test <SUBWORKFLOW> --profile docker
      • nf-core subworkflows test <SUBWORKFLOW> --profile singularity
      • nf-core subworkflows test <SUBWORKFLOW> --profile conda

@camlloyd camlloyd marked this pull request as draft March 31, 2026 17:28
@camlloyd camlloyd marked this pull request as ready for review March 31, 2026 17:42
Copy link
Copy Markdown
Contributor

@Joon-Klaps Joon-Klaps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice addition! Minor suggestions.
Wondering if this is actually not a subcommand of a larger tool bio2zarr, which would make the name of the tool modules/nf-core/bio2zarr/vcfpartition, thoughts?

Comment on lines +34 to +36
sanitizeOutput(process.out),
file(process.out.partitions[0][1]).readLines().size(),
process.out.findAll { key, val -> key.startsWith("versions") }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sanitizeOutput(process.out),
file(process.out.partitions[0][1]).readLines().size(),
process.out.findAll { key, val -> key.startsWith("versions") }
sanitizeOutput(process.out),

You're snapshotting the entire ouput, so we can remove individual file checks

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right - I wanted to be able to see the number of partitions at a glance - since:

Note that both the number of partitions and sizes are a target, and the returned number of partitions may not exactly correspond 1.

Happy to remove!

Footnotes

  1. https://sgkit-dev.github.io/bio2zarr/vcfpartition/cli_ref.html

Comment on lines +67 to +68
file(process.out.partitions[0][1]).readLines().size(),
process.out.findAll { key, val -> key.startsWith("versions") }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
file(process.out.partitions[0][1]).readLines().size(),
process.out.findAll { key, val -> key.startsWith("versions") }

Comment on lines +99 to +100
file(process.out.partitions[0][1]).readLines().size(),
process.out.findAll { key, val -> key.startsWith("versions") }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
file(process.out.partitions[0][1]).readLines().size(),
process.out.findAll { key, val -> key.startsWith("versions") }

Comment on lines +131 to +132
file(process.out.partitions[0][1]).readLines().size(),
process.out.findAll { key, val -> key.startsWith("versions") }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
file(process.out.partitions[0][1]).readLines().size(),
process.out.findAll { key, val -> key.startsWith("versions") }

tuple val(meta), path("*.icf"), emit: icf
tuple val("${task.process}"), val('vcf2zarr'), eval('vcf2zarr --version |& sed -n "s/.* //p"'), emit: versions_vcf2zarr, topic: versions
tuple val(meta), path("*.tsv"), emit: partitions
tuple val("${task.process}"), val('vcfpartition'), eval("vcfpartition --version | sed 's/vcfpartition, version //'"), topic: versions, emit: versions_vcfpartition
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tuple val("${task.process}"), val('vcfpartition'), eval("vcfpartition --version | sed 's/vcfpartition, version //'"), topic: versions, emit: versions_vcfpartition
tuple val("${task.process}"), val('vcfpartition'), eval("vcfpartition --version | sed 's/.* //'"), topic: versions, emit: versions_vcfpartition

@camlloyd
Copy link
Copy Markdown
Member Author

camlloyd commented Apr 1, 2026

Nice addition! Minor suggestions. Wondering if this is actually not a subcommand of a larger tool bio2zarr, which would make the name of the tool modules/nf-core/bio2zarr/vcfpartition, thoughts?

Good question. I think one argument is that bio2zarr is the repository/package name, and vcfpartition exists at the tool level. The package also contains vcf2zarr, which has its own subtools convert, explode etc.

I previously added the modules vcf2zarr/convert and vcf2zarr/explode.

I didn't consider the other tools then, but now I want to work on plink2zarr, so I'm happy to re-think this approach.

@camlloyd
Copy link
Copy Markdown
Member Author

camlloyd commented Apr 1, 2026

Or, more simply, there is no bio2zarr command.

@jfy133
Copy link
Copy Markdown
Member

jfy133 commented Apr 1, 2026

Nice addition! Minor suggestions. Wondering if this is actually not a subcommand of a larger tool bio2zarr, which would make the name of the tool modules/nf-core/bio2zarr/vcfpartition, thoughts?

Good question. I think one argument is that bio2zarr is the repository/package name, and vcfpartition exists at the tool level. The package also contains vcf2zarr, which has its own subtools convert, explode etc.

Are there other commands within the bio2zarr package/repository? If so all would go under bio2zarr/<command name> in my opinion.

But even then, if vcfpartition is the sole tool within the repository name, it's a very generic name 😅 so prefixing with bio2zarr would be nicer.

I previously added the modules vcf2zarr/convert and vcf2zarr/explode.

I didn't consider the other tools then, but now I want to work on plink2zarr, so I'm happy to re-think this approach.

I see why you would opt for that here though... it's a bit nasty from the developers...

When did you add the previous modules? Do you think they are already being used in pipelines, or are they fresh modules that we could rename them (e.g. to `bio2zarr/vcf2zarrexplode)?

Another annoying example i've encoutnered is the conda recipe being called ncbi-datasets but the actual executed command is just datasets (very generic)

But let's also see what other maintainers think... ultimately it's about findabilty at this poitn

@camlloyd
Copy link
Copy Markdown
Member Author

camlloyd commented Apr 1, 2026

One yesterday, one last year. Neither used in pipelines as far as I can tell.

bio2zarr contains:

  • vcf2zarr
  • plink2zarr
  • tskit2zarr
  • vcfpartition

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

new module: vcfpartition

3 participants