feat: VRS Correctness#106
Open
bencap wants to merge 2 commits into
Open
Conversation
Add `vrs_utils.identify_allele` and `normalize_and_identify`, and switch `vrs_map._construct_vrs_allele` to use them in place of direct `ga4gh_identify` calls. The GA4GH Merkle-tree caches sub-object digests on the object after first identification, so any subsequent mutation (notably the pre-map `refgetAccession` swap, normalization, or state coercion) leaves a stale id unless the cached digests are cleared first. Clearing both the location and allele digests before identification ensures the id always reflects current content. - vrs_utils: new module centralizing the digest-correctness invariant - vrs_map: route both the ref-identical and SNV/delins branches through the new helpers; the ref-identical branch also gains a reassigned `normalize` return value as a side-effect - tests/test_vrs_utils: offline coverage for content-addressing, the stale-digest clearing invariant, the normalize+identify pairing, and the malformed-input error path - align: incidental formatter-driven whitespace tweak
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request introduces a new utility module to centralize and enforce correct GA4GH VRS allele identification, addressing a subtle bug where cached digests could result in stale or incorrect identifiers after object mutation or normalization. All allele identification is now routed through new helper functions to guarantee digest correctness. The change also includes comprehensive tests for these helpers and updates existing code to use them.
VRS allele identification and normalization improvements:
vrs_utils.pywithidentify_alleleandnormalize_and_identifyhelpers, ensuring that all GA4GH VRS allele identifiers are recomputed from current object content and not affected by stale cached digests._construct_vrs_alleleinvrs_map.pyto usenormalize_and_identifyandidentify_alleleinstead of callingga4gh_identifydirectly, enforcing the new digest-correctness invariant. [1] [2]ga4gh_identifyfromvrs_map.py, further centralizing identification logic in the new utility.vrs_map.pyfor use in allele construction.Testing:
test_vrs_utils.pywith thorough tests covering digest correctness, mutation safety, normalization pairing, and error handling in the new helper functions.Other:
_run_blatfor readability (no functional change).