A Rust implementation of a KEGG MAPPER 'reconstruct'-like tool to identify KEGG modules based on a list of KEGG orthology groups (KO) and the definitions of the KEGG modules.
Map KEGG KO list to a set of KEGG module, using the Boolean formula definition
Usage: kegg-module-mapper --ko-list <KO_LIST> --ko-definitions <KO_DEFINITIONS> --output <OUTPUT_FILE>
Options:
-i, --ko-list <KO_LIST>
A list of KEGG KO, one identifier per line.
-d, --ko-definitions <KO_DEFINITIONS>
A TSV file with the KEGG module name in column 1 and its definition in column 2.
-o, --output <OUTPUT_FILE>
A list with the KEGG module name whose definition is satisfied by the KEGG module.
-h, --help
Print help
-V, --version
Print version
Given a file eco_ko.list containing a list of KO for Escherichia coli, one identifier per line, and a file module_definitions.tsv with two columns separated by a tabulation, with column 1 corresponding to the identifier of the module and column 2 its definition as a formula on KEGG KO identifiers, we can obtain a list of KEGG modules identifier with a command as follows:
kegg-module-mapper --ko-list eco_ko.list --ko-definitions module_definitions.tsv --output eco_modules.listThe output file eco_modules.list will contain the list of KEGG module identifiers whose definition 'Boolean' formula evaluates to true when considering the KEGG KO of the given list as true literals, and absent ones as false literals.
To install kegg-module-mapper from source, you will need Rust installed on your machine.
First, clone the git repository:
git clone https://gitlab.com/sortion/kegg-module-mapper.git
cd kegg-module-mapperThen, compile kegg-module-mapper
cargo build --releaseThe kegg-module-mapper binary will be available in ./target/release/kegg-module-mapper.
A Rust program makes it possible to extract a TSV file from a KEGG FTP kegg/module/module file with two columns: the first column being the identifier of the module and the second one the expression of the KEGG module definition in terms of KEGG KO.
What follows is an example of command-line to generates such a file, given a KEGG FTP dump file in directory kegg/
# First compile the dedicated program
cd prepare_kegg_definitions
cargo build --release
# Then, extract the KEGG module definitions TSV from the large KEGG module dump file
./target/release/prepare_kegg_definitions --module kegg/module/module --output module_definitions.tsvkegg-module-mapper parses the KEGG module definition as a Boolean formula abstract syntax tree with the following semantics associated to the KEGG module definition symbols:
- Space ' ' corresponds to logical AND
- Comma ',' corresponds to logical OR
- Plus '+' corresponds to essential components, and we treat them as a logical AND
- Minus '-' corresponds to optional components. KEGG orthology groups with a leading minus sign evaluates to true regardless of their presence in the list of KO, and so are potential parenthesized sub-expressions after a minus sign.
- Parenthesis '(' and ')' wraps sub expressions
- Line breaks corresponds to mediators and we treat them as logical AND
A '+' sign have higher precedence than a ',' sign, that is to say:
Having the KEGG definition expressed as a Boolean formula abstract syntax tree, we evaluate recursively the branches of the tree, to check if the expression evaluates to true with the values taken by the KEGG KO Boolean literals set to true when they are in the KO list.