Implementation: prefer traits approach or fixed structures approach #10
Replies: 7 comments 9 replies
-
|
Ideally I would like to check the three proposed solutions:
|
Beta Was this translation helpful? Give feedback.
-
|
Keep in mind the trade-offs of generics in Rust. This also applies to traits when used as generic bounds, which is often preferable to
I like the ultra-generic designs though, because they are high art. |
Beta Was this translation helpful? Give feedback.
-
|
Over the last days I have been thinking about this a bit more, I have warmed up to the use of traits. I would propose to use something similar to the traits I listed in the original discussion, but otherwise try to keep the use of generics reasonable as commented on above by @mobiusklein. My reasoning for the quite intense generics in the traits I posted above are the following:
Note: for ion series I envisioned to use a post processor on the output from the written trait that generates all feasible ions given the method based on offsets from the given possible links broken. The same post processing method can be applied for neutral losses and charge states as well. Additional traits that are needed:
PS: Implementation note |
Beta Was this translation helpful? Give feedback.
-
|
I think there are merits to both approaches (concrete-only vs traits + providing concrete types). In any other language, I would suggest only have the concrete types and be done with it - but since this is Rust, there might be a substantial portion of users that care deeply about performance/memory use. Joshua's concerns are completely merited, and tbh I tend to avoid the use of traits - unless absolutely necessary - because of their complexity. I suggested traits because I wasn't a huge fan of the initial proposed API with 64-byte AminoAcids and AminoAcidTables everywhere. If there is no need to support extensibility on the concrete type-level (e.g. defining a new AA), then there is less need for traits. That being said, I see traits as a nice way to handle interop between different Rust programs and the rusteomics ecosystem. Obviously, there are not many public MS/Rust projects yet, but I expect we will see more soon. If we can make it so that rusty-ms, mzdata and Sage can all plug-in by just defining a couple traits on top of their existing concrete types, that seems like an easy win. Of course, the same thing could be achieved by converting to rusteomics types as needed... I like the high-art composability of traits - very Haskell-ish. Two considerations:
|
Beta Was this translation helpful? Give feedback.
-
|
Just committed my changes: Will do a PR tomorrow |
Beta Was this translation helpful? Give feedback.
-
|
The PR #2 is now flying. In this PR I tried to provide new layers of abstraction using traits, but I also tried to keep certain degree of flexibility to define custom amino acids. I also tried the provide an API which can hide all this complexity, for an example see https://github.com/david-bouyssie/mzcore/blob/b63e27993ee7276ac48928580f6029701c111050/mzcore-rs/src/ms/mass_calc.rs#L112 Thus, this should work (with the proper imports): let mono_mass_from_seq = "INTERSTELLAR".amino_acids_as_bytes().fold(0.0, |mass_sum, aa_byte| mass_sum + aa_byte.mono_mass()) + WATER_MONO_MASS;I hope this new API is a good baseline for a consensus. Regarding ProForma, since this a large topic I would prefer to treat this in a specific branch (and maybe a specific crate feature). |
Beta Was this translation helpful? Give feedback.
-
|
@mobiusklein I just discoverd that you developed this other related crate: There is a strong overlap between this crate, rustyms and now rusteomics. This PR is heavily inspired by rustms, but I could also have a deeper look at your own implem to see if we are missing stuffs. Regarding deisotopping I was thinking about putting this within the ms module of mzcore, and eventually set this as an optional mzcore feature. I'm looking forward, I think we can make quick progress if we manage to work together :) |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
For reference here my post from the discussion in [mzcore#1]:
My implementation
My own code I organised by creating a fixed memory representation for the basic types (an enum for
AminoAcidand quite an intense struct forPeptide) and I wrote all my functions to work on these types. This mean it would impossible to design custom amino acids (except as mods to existing aas), but I have never needed them in my work. For the peptides I would be quite cautious to use a standard library memory layout -- especially&strbecause this adds a lot of complexity in terms of handling weird chars in unicode while something like[u8]is a bit more sensible -- this makes it really easy to write incorrect amino acid codes which I see as a major reason to use Rust in my code. I saw in the feature list somewhere that ProForma support is a goal, which means that we wil have to build is support for a lot of very 'weird' things in spectra -- global isotope modifications, ambiguous modifications, chimeric spectra, glycans, cross linking, other charge carriers -- and handling those nicely steered me to creating my own memory representation.To consolidate a bit, my goals for rustyms was to represent anything that can be written in ProForma, as this is an exhaustive specification for my use cases. So I have never thought about the reasoning for adding user defined amino acids.
Generic traits
If for extensibility besides the standard representation used in
mzcoreother memory layouts are to be supported with traits I would prefer a trait selection that tries to minimise the number of tasks a single trait has to do. So that would give you something likeHasMassmass + average weightHasPKaLinearSequenceany linear sequence of elements which have a defined mass, can contain all fragmentation code for peptides, rna, dna, whatever you can dream of that fragments (essentiallyimpl Iterator<Item=impl HasMass>)ComplexSequencea more complex sequence like branched / cyclic peptides + could contain glycan branches for glycan fragmentation (essentially a graph type where the element implements HasMass)If extensibility is a primary design goal such a set of composable traits is preferred over a design that has a smaller number of traits but where each trait needs to support so many separate methods that implementing this on your own get pretty much impossible.
Differences
The code for any design that uses traits however will be quite a bit harder to develop because there are many more 'moving parts'. To any casual user both designs (with defined memory layout, and with extensive use of traits) will be identical, they would only use the defined functions on the predefined standard types. The downside of using fixed memory layouts is mainly memory usage, if a
Peptidestruct always takes 64 bytes for example for features you in your case do not need, but you have a layout which does it in 2 bytes (&[u8]) you could potentially have a higher memory usage then absolutely needed, but I have not seen this as a problem. For performance there should not be a difference between a fully featured struct and a trimmed down one except for the difference in time it takes to copy this number of bytes and as long as we are not talking about simd where specific memory layouts can be very beneficial.Wrap up
To me the question boils down to: is the proposed extensibility needed? My answer would be no, as long as the provided implementation works for any peptide that is valid in ProForma. But I would say if extensibility is deemed to be necessary, then go all in and make it as generic as possible (and reflect this in the used names).
Very generic traits example
As I have not tried a potential traits setup I was trying it out now, below is my try for a set of traits that I think can encapsulate all the features I had to build in to support ProForma entirely. I tried to define the traits as general as possible, this allows us to write the (hard!) fragmentation for branched and internal frgaments once and apply it to all things we want (like branched peptides, glycans, and whatever people tend to trow on mass specs nowadays). Potentially simpler traits could be defined that do not have branched structures, or multiple 'masses' per element, which could then use speedier code for the fragments generation, but that would be 'just' a slimmed down version of the big traits and where possible I would vote against making multiple traits but instead focus our efforts on making the generic case as fast as possible.
Click to see the code
Ping @lazear, you will likely be interested and this is maybe easier to find than the slack.
2 votes ·
Beta Was this translation helpful? Give feedback.
All reactions