Skip to content

feat: VRS Correctness#106

Open
bencap wants to merge 2 commits into
mavedb-devfrom
feature/bencap/vrs-correctness
Open

feat: VRS Correctness#106
bencap wants to merge 2 commits into
mavedb-devfrom
feature/bencap/vrs-correctness

Conversation

@bencap

@bencap bencap commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

This pull request introduces a new utility module to centralize and enforce correct GA4GH VRS allele identification, addressing a subtle bug where cached digests could result in stale or incorrect identifiers after object mutation or normalization. All allele identification is now routed through new helper functions to guarantee digest correctness. The change also includes comprehensive tests for these helpers and updates existing code to use them.

VRS allele identification and normalization improvements:

  • Added a new module vrs_utils.py with identify_allele and normalize_and_identify helpers, ensuring that all GA4GH VRS allele identifiers are recomputed from current object content and not affected by stale cached digests.
  • Updated _construct_vrs_allele in vrs_map.py to use normalize_and_identify and identify_allele instead of calling ga4gh_identify directly, enforcing the new digest-correctness invariant. [1] [2]
  • Removed the direct import of ga4gh_identify from vrs_map.py, further centralizing identification logic in the new utility.
  • Imported the new helpers in vrs_map.py for use in allele construction.

Testing:

  • Added a new test module test_vrs_utils.py with thorough tests covering digest correctness, mutation safety, normalization pairing, and error handling in the new helper functions.

Other:

  • Minor code formatting improvement in _run_blat for readability (no functional change).

bencap added 2 commits June 3, 2026 11:11
Add `vrs_utils.identify_allele` and `normalize_and_identify`, and switch
`vrs_map._construct_vrs_allele` to use them in place of direct
`ga4gh_identify` calls. The GA4GH Merkle-tree caches sub-object digests
on the object after first identification, so any subsequent mutation
(notably the pre-map `refgetAccession` swap, normalization, or state
coercion) leaves a stale id unless the cached digests are cleared first.
Clearing both the location and allele digests before identification
ensures the id always reflects current content.

- vrs_utils: new module centralizing the digest-correctness invariant
- vrs_map: route both the ref-identical and SNV/delins branches through
  the new helpers; the ref-identical branch also gains a reassigned
  `normalize` return value as a side-effect
- tests/test_vrs_utils: offline coverage for content-addressing, the
  stale-digest clearing invariant, the normalize+identify pairing, and
  the malformed-input error path
- align: incidental formatter-driven whitespace tweak
@coveralls

Copy link
Copy Markdown

Coverage Status

coverage: 0.0%. remained the same — feature/bencap/vrs-correctness into mavedb-dev

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants