Summing position and read length exceeds chromosome size?

Hello,

I have a question about the position and read lengths in the read headers produced by NanoSim. I am using the `human_giab_hg002_sub1M_kitv14_dorado_v3.2.1` pretrained model to simulate sequences from hg002, hg003 and hg004. I observed some interesting behaviour when mapping the simulated reads and wanted to plot the amount of coverage the reads should provide based on where they are sampled from and compare that to what I get when I actually map the reads.

I thought I could use the position in the header as a start point, and the sum of the position and the length of the alignable middle region as the end point of where this read was obtained from in the original genome. When I try that, however, I get end positions for some of the reads that are larger than the chromosome they were sampled from. I can also see positions that are higher than the entire chromosome size.

Is this expected behaviour? If so, can you help me better understand what the position and read lengths mean?

I am also wondering about what the position means for reverse reads, is it the start or end point?

Below are the first few lines of some of my reads that show this pattern:
<img width="1194" height="228" alt="Image" src="https://github.com/user-attachments/assets/e66badd4-b886-4691-a1ea-7a381fa720c2" />

I created this in R using the code below. The input comb table was created by reading in all the headers from a nanosim file and processing it, and chr_sizes is another table with the lengths of the chromosomes in my reference genome (hg38).
> comb %>% 
  left_join(chr_sizes) %>% 
  filter(pos+len_middle_region > size) %>% 
  mutate(len_diff = size-pos-len_middle_region) %>% 
  arrange(len_diff)

I'd really appreciate any insights you could give me. Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Summing position and read length exceeds chromosome size? #257

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Summing position and read length exceeds chromosome size? #257

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions