Commit 6e8395f
committed
YT-26680: Optimize rows digest computer (8-33x faster)
Before (MD5 \+ sort for each row):
```
--------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations
--------------------------------------------------------------------------------------------------
BM_TRowsDigestComputer_ProcessRowsFewColumns_mean 203 ns 203 ns 5
BM_TRowsDigestComputer_ProcessRowsManyColumns_mean 7678 ns 7677 ns 5
BM_TRowsDigestComputer_ProcessRowsRandomOrder_mean 2794 ns 2794 ns 5
BM_TRowsDigestComputer_ProcessRowsMixedTypes_mean 496 ns 496 ns 5
BM_TRowsDigestComputer_ProcessRowsLongStrings_mean 1707 ns 1707 ns 5
BM_TRowsDigestComputer_ProcessRowsDynamicColumns_mean 885 ns 885 ns 5
BM_TRowsDigestComputer_ProcessRowsSparse_mean 435 ns 435 ns 5
```
After (XXH3 \+ (binary search \+ insert for each new column) \+ batching buffer):
```
--------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations
--------------------------------------------------------------------------------------------------
BM_TRowsDigestComputer_ProcessRowsFewColumns_mean 23.8 ns 23.8 ns 5
BM_TRowsDigestComputer_ProcessRowsManyColumns_mean 232 ns 232 ns 5
BM_TRowsDigestComputer_ProcessRowsRandomOrder_mean 99.5 ns 99.5 ns 5
BM_TRowsDigestComputer_ProcessRowsMixedTypes_mean 38.4 ns 38.4 ns 5
BM_TRowsDigestComputer_ProcessRowsLongStrings_mean 89.0 ns 89.0 ns 5
BM_TRowsDigestComputer_ProcessRowsDynamicColumns_mean 56.9 ns 56.9 ns 5
BM_TRowsDigestComputer_ProcessRowsSparse_mean 35.0 ns 35.0 ns 5
```
Without batching (XXH3 \+ (binary search \+ insert for each new column)):
```
--------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations
--------------------------------------------------------------------------------------------------
BM_TRowsDigestComputer_ProcessRowsFewColumns_mean 42.4 ns 42.4 ns 5
BM_TRowsDigestComputer_ProcessRowsManyColumns_mean 554 ns 554 ns 5
BM_TRowsDigestComputer_ProcessRowsRandomOrder_mean 228 ns 228 ns 5
BM_TRowsDigestComputer_ProcessRowsMixedTypes_mean 73.3 ns 73.3 ns 5
BM_TRowsDigestComputer_ProcessRowsLongStrings_mean 94.7 ns 94.7 ns 5
BM_TRowsDigestComputer_ProcessRowsDynamicColumns_mean 117 ns 117 ns 5
BM_TRowsDigestComputer_ProcessRowsSparse_mean 65.9 ns 65.9 ns 5
```
XXH64 is slower than XXH3 by about 5-46% (XXH64 \+ (binary search \+ insert for each new column)):
```
--------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations
--------------------------------------------------------------------------------------------------
BM_TRowsDigestComputer_ProcessRowsFewColumns_mean 47.2 ns 47.2 ns 5
BM_TRowsDigestComputer_ProcessRowsManyColumns_mean 712 ns 712 ns 5
BM_TRowsDigestComputer_ProcessRowsRandomOrder_mean 289 ns 289 ns 5
BM_TRowsDigestComputer_ProcessRowsMixedTypes_mean 87.8 ns 87.8 ns 5
BM_TRowsDigestComputer_ProcessRowsLongStrings_mean 129 ns 129 ns 5
BM_TRowsDigestComputer_ProcessRowsDynamicColumns_mean 144 ns 144 ns 5
BM_TRowsDigestComputer_ProcessRowsSparse_mean 79.0 ns 79.0 ns 5
```
commit_hash:5e99e75ed9baf15ffdd32bd0a7a132a7911f08291 parent fbafe03 commit 6e8395f
File tree
14 files changed
+22
-14
lines changed- yt/yt
- client
- api
- formats
- queue_client
- table_client
- library/formats
14 files changed
+22
-14
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
172 | 172 | | |
173 | 173 | | |
174 | 174 | | |
175 | | - | |
| 175 | + | |
176 | 176 | | |
177 | 177 | | |
178 | 178 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
114 | 114 | | |
115 | 115 | | |
116 | 116 | | |
117 | | - | |
| 117 | + | |
118 | 118 | | |
119 | 119 | | |
120 | 120 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
34 | 34 | | |
35 | 35 | | |
36 | 36 | | |
37 | | - | |
| 37 | + | |
38 | 38 | | |
39 | 39 | | |
40 | 40 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
| 5 | + | |
| 6 | + | |
5 | 7 | | |
6 | 8 | | |
7 | 9 | | |
| |||
175 | 177 | | |
176 | 178 | | |
177 | 179 | | |
178 | | - | |
| 180 | + | |
179 | 181 | | |
180 | 182 | | |
181 | 183 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
128 | 128 | | |
129 | 129 | | |
130 | 130 | | |
131 | | - | |
| 131 | + | |
132 | 132 | | |
133 | 133 | | |
134 | 134 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
234 | 234 | | |
235 | 235 | | |
236 | 236 | | |
237 | | - | |
| 237 | + | |
238 | 238 | | |
239 | 239 | | |
240 | 240 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
480 | 480 | | |
481 | 481 | | |
482 | 482 | | |
| 483 | + | |
| 484 | + | |
| 485 | + | |
| 486 | + | |
483 | 487 | | |
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
65 | 65 | | |
66 | 66 | | |
67 | 67 | | |
68 | | - | |
| 68 | + | |
69 | 69 | | |
70 | 70 | | |
71 | 71 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
32 | 32 | | |
33 | 33 | | |
34 | 34 | | |
| 35 | + | |
| 36 | + | |
35 | 37 | | |
36 | 38 | | |
37 | 39 | | |
38 | 40 | | |
39 | 41 | | |
40 | 42 | | |
41 | 43 | | |
42 | | - | |
| 44 | + | |
43 | 45 | | |
44 | 46 | | |
45 | 47 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1282 | 1282 | | |
1283 | 1283 | | |
1284 | 1284 | | |
1285 | | - | |
| 1285 | + | |
1286 | 1286 | | |
1287 | 1287 | | |
1288 | 1288 | | |
| |||
0 commit comments