Right now a ProteinSequence is just an unchecked ASCII sequence.
We should make an AminoAcid type like Nucleotide that represents only the valid amino acids ABCDEFGHIKLMNPQRSTVWYZ, and then AminoAcidAmbiguous which includes ambiguity code X. See #18.
Thought: if we assign these by ASCII codes like enum AminoAcid { A = 65, B = 66, ... } then we can cast &[u8] to &[AminoAcid] after validation. But maybe that's a bit overzealous if we normally have to strip whitespace and uppercase the string anyway.
Consideration: What about . (deletion) and * (terminator)? Should we treat those as ambiguity codes or as something else?
Right now a ProteinSequence is just an unchecked ASCII sequence.
We should make an
AminoAcidtype likeNucleotidethat represents only the valid amino acidsABCDEFGHIKLMNPQRSTVWYZ, and thenAminoAcidAmbiguouswhich includes ambiguity codeX. See #18.Thought: if we assign these by ASCII codes like
enum AminoAcid { A = 65, B = 66, ... }then we can cast&[u8]to&[AminoAcid]after validation. But maybe that's a bit overzealous if we normally have to strip whitespace and uppercase the string anyway.Consideration: What about
.(deletion) and*(terminator)? Should we treat those as ambiguity codes or as something else?