Skip to content

labgem/kegg-module-mapper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

kegg-module-mapper

A Rust implementation of a KEGG MAPPER 'reconstruct'-like tool to identify KEGG modules based on a list of KEGG orthology groups (KO) and the definitions of the KEGG modules.


Usage

Map KEGG KO list to a set of KEGG module, using the Boolean formula definition

Usage: kegg-module-mapper --ko-list <KO_LIST> --ko-definitions <KO_DEFINITIONS> --output <OUTPUT_FILE>

Options:
  -i, --ko-list <KO_LIST>
          A list of KEGG KO, one identifier per line.
  -d, --ko-definitions <KO_DEFINITIONS>
          A TSV file with the KEGG module name in column 1 and its definition in column 2.
  -o, --output <OUTPUT_FILE>
          A list with the KEGG module name whose definition is satisfied by the KEGG module.
  -h, --help
          Print help
  -V, --version
          Print version

Usage example on a E. coli KO set

Given a file eco_ko.list containing a list of KO for Escherichia coli, one identifier per line, and a file module_definitions.tsv with two columns separated by a tabulation, with column 1 corresponding to the identifier of the module and column 2 its definition as a formula on KEGG KO identifiers, we can obtain a list of KEGG modules identifier with a command as follows:

kegg-module-mapper --ko-list eco_ko.list --ko-definitions module_definitions.tsv --output eco_modules.list

The output file eco_modules.list will contain the list of KEGG module identifiers whose definition 'Boolean' formula evaluates to true when considering the KEGG KO of the given list as true literals, and absent ones as false literals.

Setup guide

From source

To install kegg-module-mapper from source, you will need Rust installed on your machine.

First, clone the git repository:

git clone https://gitlab.com/sortion/kegg-module-mapper.git
cd kegg-module-mapper

Then, compile kegg-module-mapper

cargo build --release

The kegg-module-mapper binary will be available in ./target/release/kegg-module-mapper.

Prepare the module_definitions.tsv file

A Rust program makes it possible to extract a TSV file from a KEGG FTP kegg/module/module file with two columns: the first column being the identifier of the module and the second one the expression of the KEGG module definition in terms of KEGG KO.

What follows is an example of command-line to generates such a file, given a KEGG FTP dump file in directory kegg/

# First compile the dedicated program
cd prepare_kegg_definitions
cargo build --release
# Then, extract the KEGG module definitions TSV from the large KEGG module dump file
./target/release/prepare_kegg_definitions --module kegg/module/module --output module_definitions.tsv

How does this work?

kegg-module-mapper parses the KEGG module definition as a Boolean formula abstract syntax tree with the following semantics associated to the KEGG module definition symbols:

  • Space ' ' corresponds to logical AND
  • Comma ',' corresponds to logical OR
  • Plus '+' corresponds to essential components, and we treat them as a logical AND
  • Minus '-' corresponds to optional components. KEGG orthology groups with a leading minus sign evaluates to true regardless of their presence in the list of KO, and so are potential parenthesized sub-expressions after a minus sign.
  • Parenthesis '(' and ')' wraps sub expressions
  • Line breaks corresponds to mediators and we treat them as logical AND

A '+' sign have higher precedence than a ',' sign, that is to say: $K01,K02+K3$ is equivalent to $K01,(K02+K03)$.

Having the KEGG definition expressed as a Boolean formula abstract syntax tree, we evaluate recursively the branches of the tree, to check if the expression evaluates to true with the values taken by the KEGG KO Boolean literals set to true when they are in the KO list.

About

Given a set of KEGG orthology group identifiers, identify the KEGG modules whose definition is satisfied

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages