-
Notifications
You must be signed in to change notification settings - Fork 955
traitar module is drafted #9605
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds a new nf-core module for Traitar3, which performs phenotype prediction from assembled nucleotide FASTA files using protein families to infer microbial traits.
- Implements Traitar3 phenotype mode for trait prediction
- Provides both script and stub implementations for testing
- Includes comprehensive test configurations and expected outputs
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| modules/nf-core/traitar/main.nf | Core process implementation with Traitar3 phenotype analysis, including script and stub sections for prediction outputs |
| modules/nf-core/traitar/meta.yml | Module metadata defining inputs/outputs, tool information, and EDAM ontology annotations |
| modules/nf-core/traitar/environment.yml | Conda environment specification pinning traitar to version 3.0.1 |
| modules/nf-core/traitar/tests/main.nf.test | nf-test implementation with stub test for proteome input |
| modules/nf-core/traitar/tests/main.nf.test.snap | Test snapshot file with expected MD5 checksums for stub outputs |
| modules/nf-core/traitar/tests/nextflow.config | Test-specific Nextflow configuration for module execution |
| modules/nf-core/traitar/tests/nf-test.config | Additional nf-test configuration settings |
| modules/nf-core/traitar/tests/config/nf-test.config | Alternative test configuration file for different test scenarios |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Hi @rpetit3 could you please review the module? :) |
Joon-Klaps
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @brovolia,
Thanks for contributing Traitar3. I've left a couple of comments but they don't adress everything.
I've noticed that there are some files that shouldn't be there, .nftignore, nf-test.config, config/, ... All this are typicall for pipelines but not within nf-core modules.
I also noticed that the main.nf contains a bit of complexity and bash scripting, try to minimize this as much as possible. If you are struggeling decompressing files, there are plenty of examples already out there that resolve this issue, see this search
I would suggest having a read through the docs on does and don'ts of nf-core modules. I would also suggest to have a look at some already made modules like samtools/stats or trimgalore to get an idea of how the modules are structured.
e6e0e7b to
03c8b12
Compare
- Real data test commented out (non-deterministic output across conda/container) - Regenerated snapshots for stub test only (deterministic) - Cleaned up obsolete snapshots - CI will pass with stub test - Real test can be run locally with: nf-test test tests/main.nf.test --profile=+singularity
890405a to
a9d7474
Compare
Joon-Klaps
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still a couple of stuff that needs double checking
| // Real data test - run locally with: nf-test test tests/main.nf.test --profile=+singularity | ||
| // Commented out for CI: non-deterministic output across conda/container environments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // Real data test - run locally with: nf-test test tests/main.nf.test --profile=+singularity | |
| // Commented out for CI: non-deterministic output across conda/container environments |
Comment says non-deterministic but you snapshot for everything, so I think we can remove
| // Real data test - run locally with: nf-test test tests/main.nf.test --profile=+singularity | ||
| // Commented out for CI: non-deterministic output across conda/container environments | ||
| /* | ||
| test("traitar - proteomes") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would add another test, where we also test unzipped input given that we have several lines of code attributed to it.
| "gene_prediction": [ | ||
|
|
||
| ], | ||
| "pfam_annotation": [ | ||
|
|
||
| ], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familar with traitar, is there an easy way to have output files for these variables, by adding more tests?
| # Download PFAM data for traitar annotation | ||
| traitar pfam pfam_data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should be given as an input variable; create a new module for it.
Have a look at nextclade/get_dataset as an example. Snapshotting both the db version as well as traitar is best practice.
For testing this module, you can use the downloaded one or use the one from nf-core/testdatasets (look for pfam)
| } | ||
|
|
||
| """ | ||
| mkdir -p input_dir pfam_data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| mkdir -p input_dir pfam_data |
| input_dir \\ | ||
| samples.txt \\ | ||
| ${input_type} \\ | ||
| ${prefix} \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ${prefix} \ | |
| ${prefix} \\ |
| 'community.wave.seqera.io/library/hmmer_prodigal_pandas_pip_pruned:a83f0296374a52e6' }" | ||
|
|
||
| input: | ||
| tuple val(meta), path(proteins) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| tuple val(meta), path(proteins) | |
| tuple val(meta), path(fasta) |
Keep it simple, it was proteins or nucleotides for input.
| description: Protein sequences in FASTA format (or nucleotide sequences if input_type | ||
| is from_nucleotides) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| description: Protein sequences in FASTA format (or nucleotide sequences if input_type | |
| is from_nucleotides) | |
| description: Sequences in FASTA format (protein or nucleotide sequences) |
| type: file | ||
| description: Protein sequences in FASTA format (or nucleotide sequences if input_type | ||
| is from_nucleotides) | ||
| pattern: "*.{fa,fasta,faa,fna}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could also be gzipped right?
| - edam: "http://edamontology.org/format_1929" | ||
| - input_type: | ||
| type: string | ||
| description: Input type specifying the format of input sequences (from_nucleotides, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double check if it's accurate
This module runs Traitar3 in phenotype mode to infer phenotype profiles from assembled nucleotide FASTA files, producing tabular trait prediction results per sample.