Skip to content

Remove assembly and gene from TargetAccession; derive from accession during mapping #696

@bencap

Description

@bencap

Background

TargetAccession stores assembly (e.g. "hg38") and gene (e.g. "BRCA1") as user-supplied free-text fields. Both are derivable from the accession itself via CDOT, which the application already uses downstream in the mapping pipeline. Storing them as user input creates a data integrity risk — a user can submit accession=NM_007294.3, gene=TP53, assembly=hg19 and the API accepts it without complaint.

Proposed Changes

Remove assembly

assembly is used in four places, none load-bearing:

  • Full-text search (lib/score_sets.py:128) — gene and accession provide sufficient search signal
  • Statistics endpoint (routers/statistics.py:395) — marginal utility; assembly is already implicit in the versioned accession
  • Target gene identity check (lib/target_genes.py:50) — accession + is_base_editor is a sufficient composite key; same accession with differing assembly strings indicates a data problem, not a distinct target
  • The gene OR assembly required validator (view_models/target_accession.py:20) — removed along with the field

A versioned accession (e.g. NM_007294.3) already encodes assembly context implicitly via CDOT. The free-text field adds no information and creates a place for it to be wrong.

Remove gene; derive it during mapping

gene is derivable from the accession via CDOT. It should be populated on TargetGene by the mapping job, consistent with how mapped_hgnc_name is already handled today. The mapping job becomes the authoritative source of gene symbol for accession-based targets.

After this change, TargetAccessionCreate accepts only accession and is_base_editor.

Breaking Changes

Change Breaking?
TargetAccessionCreate Remove assembly, gene fields input
TargetAccession response Remove assembly, gene fields response

Migration Notes

  • assembly and gene columns dropped from target_accessions
  • Existing data can be discarded — assembly is implicit in the accession; gene will be re-derived by the mapping job

Metadata

Metadata

Assignees

No one assigned

    Labels

    app: backendTask implementation touches the backendapp: databaseTask implementation requires database changesapp: frontendTask implementation touches the frontendapp: workerTask implementation touches the workertype: enhancementEnhancement to an existing featuretype: maintenanceMaintaining this project

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions