A set of libraries to handle peptide centric mass spectrometry calculations. Built to handle very complex peptidoforms in a sensible way. Centered around the following HUPO-PSI standards:
- ProForma A standard notation for proteo/peptidoforms allowing for highly complex definitions
- mzSpecLib A standard notation for spectral libraries
- mzPAF A standard notation of peak fragment annotation
- mzTab A standard notation for matched peptidoforms from database and de novo searches
For raw data centered HUPO-PSI standards support (eg mzML, USI) see mzdata.
| Crate | Crates.io | Docs |
|---|---|---|
| π¦ mzcore | ||
| π¦ mzannotate | ||
| π¦ mzalign | ||
| π¦ imgt | ||
| π¦ mzident | ||
| π¦ mzcv | ||
| π rustyms-py |
- mzcore
- mzannotate
- Generate theoretical fragments with control over the fragmentation model from any ProForma peptidoform
- Complex features supported: chimeric spectra, cross-links (also disulfides), modifications of unknown position
- Generate peptide backbone (a, b, c, x, y, and z) and satellite ion fragments (d, v, and w)
- Generate glycan fragments (B, Y, and internal fragments)
- Integrated with mzdata
- Read and write mzSpecLib and mzPAF
- Match spectra to the generated fragments
- Generate theoretical fragments with control over the fragmentation model from any ProForma peptidoform
- mzalign
- Align peptides based on mass
- Consecutive alignment of one sequence on a stretch of multiple sequences
- Indexed alignment for fast alignments for big datasets
- Multiple sequence alignment based on the same mass-based alignment
- imgt
- Fast access to the IMGT database of antibody germlines
- mzident
- Reading of multiple PSM file formats (amongst others: mzTab, Fasta, MaxQuant, MSFragger, Novor, OPair, Peaks, and Sage)
- Writing of mzTab files
- mzcv
- Handle ontologies both statically included and runtime updating
- rustyms-py
- Python bindings are provided to several core components of the libraries. Go to the Python documentation for more information.
The final goal would be to support all open standards (or at least the ones that are (widely) used) for both reading and writing. Below is the list of formats that are currently supported.
| Format | Version | crate | Reading | Writing | Comment |
|---|---|---|---|---|---|
| ProForma | 2.0 & 2.1 | mzcore | β | β | Nearly full 2.1 support (full support is planned) |
| mzPAF | 1.0 | mzannotate | β | β | |
| mzSpecLib | 1.0 | mzannotate | β | β | Not all metadata is used |
| FASTA | - | mzident | β | β | |
| mzTab | 1.0 | mzident | β | β | Not all metadata is accessible, peptides and small molecules are ignored |
| Spectrum Sequence List (SSL) | - | mzident | β | β | Small molecules are ignored |
For raw data related formats (MGF/mzML/USI) see mzdata.
These formats are envisioned to have support for. Open an issue if you have a need for these or if you have some thoughts on the implementation. PRs to add support for these are also very welcome. Regardless of inclusion in this list any (open) standard can be suggested for inclusion.
| Format | Version | crate | Comment |
|---|---|---|---|
| mzIdentML | 1.3 | mzident | Including support for cross-linked identifications and mzSpecLib like annotated spectra |
| PEFF | 1.0 | mzident |
These are the main libraries. This contains all source code, databases (Unimod etc) and example data.
Some examples on how to use the libraries provided here, see the readme file in the examples themselves for more details.
The harness to fuzz test the libraries for increased stability, see the readme for more details.
This Rust library provides python bindings (using pyO3) for rustyms.
Using the rustyms-generate-databases the definitions for the elemental data can be updated. See the readme on the download locations for all databases. Then run cargo run -p rustyms-generate-databases (from the root folder of this repository).
Using the mzcore-update the modification databases can be updated. All databases expect RESID will be downloaded and updated. For RESID download the file ftp://ftp.proteininformationresource.org/pir_databases/other_databases/resid/RESIDUES.XML and place this at mzcore-update/data/RESID.xml, a version is already provided there are as RESID is not developed any more you likely do not need to download the file. Then run cargo run --releases -p mzcore-update (from the root folder of this repository).
Using the imgt-update the definitions for the germlines can be updated. Put the imgt.dat.Z file in the imgt-update/data directory and unpack it (this can be downloaded from https://www.imgt.org/download/LIGM-DB/imgt.dat.Z). Then run cargo run --release -p imgt-update (from the root folder of this repository).
Any contribution is welcome (especially adding/fixing documentation as that is very hard to do as main developer).