Skip to content

Conversation

@brovolia
Copy link

This module runs Traitar3 in phenotype mode to infer phenotype profiles from assembled nucleotide FASTA files, producing tabular trait prediction results per sample.

@brovolia brovolia linked an issue Dec 22, 2025 that may be closed by this pull request
4 tasks
@brovolia brovolia self-assigned this Dec 22, 2025
@brovolia brovolia requested a review from Copilot December 22, 2025 14:04
@brovolia brovolia requested a review from rpetit3 December 22, 2025 14:04
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new nf-core module for Traitar3, which performs phenotype prediction from assembled nucleotide FASTA files using protein families to infer microbial traits.

  • Implements Traitar3 phenotype mode for trait prediction
  • Provides both script and stub implementations for testing
  • Includes comprehensive test configurations and expected outputs

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
modules/nf-core/traitar/main.nf Core process implementation with Traitar3 phenotype analysis, including script and stub sections for prediction outputs
modules/nf-core/traitar/meta.yml Module metadata defining inputs/outputs, tool information, and EDAM ontology annotations
modules/nf-core/traitar/environment.yml Conda environment specification pinning traitar to version 3.0.1
modules/nf-core/traitar/tests/main.nf.test nf-test implementation with stub test for proteome input
modules/nf-core/traitar/tests/main.nf.test.snap Test snapshot file with expected MD5 checksums for stub outputs
modules/nf-core/traitar/tests/nextflow.config Test-specific Nextflow configuration for module execution
modules/nf-core/traitar/tests/nf-test.config Additional nf-test configuration settings
modules/nf-core/traitar/tests/config/nf-test.config Alternative test configuration file for different test scenarios

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@brovolia
Copy link
Author

brovolia commented Jan 5, 2026

Hi @rpetit3 could you please review the module? :)

Copy link
Contributor

@Joon-Klaps Joon-Klaps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @brovolia,

Thanks for contributing Traitar3. I've left a couple of comments but they don't adress everything.

I've noticed that there are some files that shouldn't be there, .nftignore, nf-test.config, config/, ... All this are typicall for pipelines but not within nf-core modules.

I also noticed that the main.nf contains a bit of complexity and bash scripting, try to minimize this as much as possible. If you are struggeling decompressing files, there are plenty of examples already out there that resolve this issue, see this search

I would suggest having a read through the docs on does and don'ts of nf-core modules. I would also suggest to have a look at some already made modules like samtools/stats or trimgalore to get an idea of how the modules are structured.

@brovolia brovolia force-pushed the 9575-new-module-traitar3 branch from e6e0e7b to 03c8b12 Compare January 14, 2026 12:27
@brovolia brovolia force-pushed the 9575-new-module-traitar3 branch from 890405a to a9d7474 Compare January 15, 2026 15:03
Copy link
Contributor

@Joon-Klaps Joon-Klaps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still a couple of stuff that needs double checking

Comment on lines +12 to +13
// Real data test - run locally with: nf-test test tests/main.nf.test --profile=+singularity
// Commented out for CI: non-deterministic output across conda/container environments
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Real data test - run locally with: nf-test test tests/main.nf.test --profile=+singularity
// Commented out for CI: non-deterministic output across conda/container environments

Comment says non-deterministic but you snapshot for everything, so I think we can remove

// Real data test - run locally with: nf-test test tests/main.nf.test --profile=+singularity
// Commented out for CI: non-deterministic output across conda/container environments
/*
test("traitar - proteomes") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would add another test, where we also test unzipped input given that we have several lines of code attributed to it.

Comment on lines +58 to +63
"gene_prediction": [

],
"pfam_annotation": [

],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familar with traitar, is there an easy way to have output files for these variables, by adding more tests?

Comment on lines +52 to +53
# Download PFAM data for traitar annotation
traitar pfam pfam_data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be given as an input variable; create a new module for it.

Have a look at nextclade/get_dataset as an example. Snapshotting both the db version as well as traitar is best practice.

For testing this module, you can use the downloaded one or use the one from nf-core/testdatasets (look for pfam)

}

"""
mkdir -p input_dir pfam_data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
mkdir -p input_dir pfam_data

input_dir \\
samples.txt \\
${input_type} \\
${prefix} \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
${prefix} \
${prefix} \\

'community.wave.seqera.io/library/hmmer_prodigal_pandas_pip_pruned:a83f0296374a52e6' }"

input:
tuple val(meta), path(proteins)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tuple val(meta), path(proteins)
tuple val(meta), path(fasta)

Keep it simple, it was proteins or nucleotides for input.

Comment on lines +30 to +31
description: Protein sequences in FASTA format (or nucleotide sequences if input_type
is from_nucleotides)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
description: Protein sequences in FASTA format (or nucleotide sequences if input_type
is from_nucleotides)
description: Sequences in FASTA format (protein or nucleotide sequences)

type: file
description: Protein sequences in FASTA format (or nucleotide sequences if input_type
is from_nucleotides)
pattern: "*.{fa,fasta,faa,fna}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could also be gzipped right?

- edam: "http://edamontology.org/format_1929"
- input_type:
type: string
description: Input type specifying the format of input sequences (from_nucleotides,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double check if it's accurate

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

new module: traitar3

3 participants