Skip to content

Commit 9028ee7

Browse files
README.md update with results (#12)
1 parent 129231a commit 9028ee7

File tree

3 files changed

+118
-14
lines changed

3 files changed

+118
-14
lines changed

.gitignore

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,6 @@ ramexample.root
44
*.perf
55
*.bam
66
*.sam
7-
*.png
87
*.bam.bai
98
*.root
109
*.root.idx

README.md

Lines changed: 118 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,26 +1,131 @@
1-
ROOT scripts to convert a SAM file to a RAM (ROOT Alignment/Map) file and to work on RAM files.
1+
# RAMTools - ROOT Alignment/Map Format Tools
22

3-
- To convert a SAM file to a RAM file do:
3+
RAMTools provides efficient tools for converting SAM files to ROOT's modern, columnar RNTuple format (RAM - ROOT Alignment/Map) and working with genomic alignment data.
44

5+
## Features
6+
7+
- High-performance SAM to RAM conversion with an RNTuple backend
8+
- Chromosome-based splitting for parallel processing
9+
- Region-based querying capabilities
10+
11+
## Requirements
12+
13+
- ROOT 6.26+
14+
- C++17 compatible compiler
15+
- CMake 3.16+
16+
17+
## Quick Start
18+
```bash
19+
# 1. Build the tools
20+
mkdir build && cd build
21+
cmake ..
22+
make -j$(nproc)
23+
24+
# 2. Convert a SAM file to the RAM format
25+
./tools/samtoramntuple ../test/samexample.sam output.root
26+
27+
# 3. Query a specific region from the command line
28+
./tools/ramntupleview output.root "chr1:15700-15800"
29+
```
30+
31+
## Command-Line Tools
32+
33+
The primary way to interact with RAMTools is through these command-line executables.
34+
35+
### SAM to RAM Conversion
36+
37+
Convert a standard SAM file into the optimized RNTuple-based RAM format.
538
```bash
6-
$ root
7-
root [0] .x samtoram.C
8-
root [1] .q
39+
# Basic conversion
40+
./tools/samtoramntuple input.sam output.root
41+
42+
# Split by chromosome for parallel processing
43+
# (Creates output-chr1.root, output-chr2.root, etc.)
44+
./tools/samtoramntuple input.sam output -split
945
```
1046

11-
- To test read a RAM file do:
47+
### Region Querying (RNTuple)
48+
49+
Query a specific genomic region from a RAM file, similar to samtools view.
50+
```bash
51+
# Usage: ./tools/ramntupleview [input.root] "[chromosome]:[start]-[end]"
52+
./tools/ramntupleview output.root "chr1:10150-10300"
53+
```
54+
55+
## Benchmark Results
56+
57+
Tested with HG00154 sample from the 1000 Genomes Project (196M reads, 72GB SAM file):
58+
59+
![File Size Comparison](./img/file-size-comparison.jpg)
60+
61+
### Region Query Performance(LZMA Compression)
62+
63+
| Format | Region | Time (s) | CPU (s) | Reads/sec | Total Reads |
64+
|--------|--------|----------|---------|-----------|-------------|
65+
| **TTree** | Small (100bp) | 4.18 | 1.50 | 4.7 | 7 |
66+
| | Gene (BRCA2) | 1.87 | 1.79 | 24,010 | 42,961 |
67+
| | 10Mb | 41.2 | 39.7 | 75,105 | 2,977,922 |
68+
| | 100Mb | 36.3 | 35.4 | 80,492 | 2,852,438 |
69+
| **RNTuple** | Small (100bp) | 4.10 | 2.55 | 2.7 | 7 |
70+
| | Gene (BRCA2) | 1.20 | 1.15 | 37,308 | 42,961 |
71+
| | 10Mb | 7.40 | 6.67 | 446,331 | 2,977,922 |
72+
| | 100Mb | 6.46 | 6.36 | 448,688 | 2,852,438 |
73+
74+
### Region Query Performance (LZ4 Compression)
75+
76+
| Format | Region | Time (s) | CPU (s) | Reads/sec | Total Reads |
77+
|--------|--------|----------|---------|-----------|-------------|
78+
| **TTree** | Small (100bp) | 3.23 | 1.86 | 3.8 | 7 |
79+
| | Gene (BRCA2) | 0.941 | 0.842 | 51,030 | 42,961 |
80+
| | 10Mb | 14.5 | 11.0 | 269,797 | 2,977,922 |
81+
| | 100Mb | 9.05 | 9.05 | 315,088 | 2,852,438 |
82+
| **RNTuple** | Small (100bp) | 2.98 | 1.67 | 4.2 | 7 |
83+
| | Gene (BRCA2) | 1.01 | 0.948 | 45,321 | 42,961 |
84+
| | 10Mb | 7.43 | 6.68 | 445,709 | 2,977,922 |
85+
| | 100Mb | 6.47 | 6.29 | 453,736 | 2,852,438 |
86+
87+
### Region Query Performance (ZLIB Compression)
88+
89+
| Format | Region | Time (s) | CPU (s) | Reads/sec | Total Reads |
90+
|--------|--------|----------|---------|-----------|-------------|
91+
| **TTree** | Small (100bp) | 4.19 | 2.31 | 3.0 | 7 |
92+
| | Gene (BRCA2) | 1.22 | 1.11 | 38,661 | 42,961 |
93+
| | 10Mb | 18.2 | 16.3 | 183,021 | 2,977,922 |
94+
| | 100Mb | 14.4 | 14.4 | 197,815 | 2,852,438 |
95+
| **RNTuple** | Small (100bp) | 2.85 | 1.73 | 4.0 | 7 |
96+
| | Gene (BRCA2) | 1.19 | 1.14 | 37,529 | 42,961 |
97+
| | 10Mb | 7.40 | 6.62 | 449,599 | 2,977,922 |
98+
| | 100Mb | 6.49 | 6.41 | 445,148 | 2,852,438 |
99+
100+
**Key Findings**:
101+
- RNTuple demonstrates **1.4-2.5x faster** query performance for large regions compared to TTree
102+
- LZ4 compression provides the best query performance among all compression algorithms
103+
- For a 100Mb region query: RNTuple processes **453,736 reads/sec** vs TTree+ZLIB's **197,815 reads/sec**
104+
105+
## TTree Implementation (Legacy)
106+
107+
ROOT scripts to convert a SAM file to a RAM (ROOT Alignment/Map) file using the older TTree format and to work with those files.
108+
109+
### Convert SAM to RAM with TTree
110+
```bash
111+
$ root
112+
root [0] .x samtoram.C
113+
root [1] .q
114+
```
12115

116+
### Read a RAM file (TTree)
13117
```bash
14-
$ root
15-
root [0] .x ramreader.C
16-
root [1] .q
118+
$ root
119+
root [0] .x ramreader.C
120+
root [1] .q
17121
```
18122

19-
- To view a region, the equivalent of `samtools view bamexample.bam chr1:10150-10300`, do:
123+
### View a specific region (TTree)
20124

125+
To view a region, the equivalent of `samtools view bamexample.bam chr1:10150-10300`:
21126
```bash
22-
$ root
23-
root [0] .x ramview.C("ramexample.root","chr1:10150-10300")
24-
root [1] .q
127+
$ root
128+
root [0] .x ramview.C("ramexample.root","chr1:10150-10300")
129+
root [1] .q
25130
```
26131

img/file-size-comparison.jpg

20.5 KB
Loading

0 commit comments

Comments
 (0)