-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Background
TargetAccession stores assembly (e.g. "hg38") and gene (e.g. "BRCA1") as user-supplied free-text fields. Both are derivable from the accession itself via CDOT, which the application already uses downstream in the mapping pipeline. Storing them as user input creates a data integrity risk — a user can submit accession=NM_007294.3, gene=TP53, assembly=hg19 and the API accepts it without complaint.
Proposed Changes
Remove assembly
assembly is used in four places, none load-bearing:
- Full-text search (
lib/score_sets.py:128) —geneandaccessionprovide sufficient search signal - Statistics endpoint (
routers/statistics.py:395) — marginal utility; assembly is already implicit in the versioned accession - Target gene identity check (
lib/target_genes.py:50) —accession+is_base_editoris a sufficient composite key; same accession with differing assembly strings indicates a data problem, not a distinct target - The
gene OR assemblyrequired validator (view_models/target_accession.py:20) — removed along with the field
A versioned accession (e.g. NM_007294.3) already encodes assembly context implicitly via CDOT. The free-text field adds no information and creates a place for it to be wrong.
Remove gene; derive it during mapping
gene is derivable from the accession via CDOT. It should be populated on TargetGene by the mapping job, consistent with how mapped_hgnc_name is already handled today. The mapping job becomes the authoritative source of gene symbol for accession-based targets.
After this change, TargetAccessionCreate accepts only accession and is_base_editor.
Breaking Changes
| Change | Breaking? | |
|---|---|---|
TargetAccessionCreate |
Remove assembly, gene fields |
input |
TargetAccession response |
Remove assembly, gene fields |
response |
Migration Notes
assemblyandgenecolumns dropped fromtarget_accessions- Existing data can be discarded — assembly is implicit in the accession; gene will be re-derived by the mapping job