Reduce size of `Bytes` #20

emilyyyylime · 2025-06-04T14:20:51Z

I was randomly looking at the source of this package for inspiration on an unrelated project which required a compressed trie datastructure and I soon noticed that the Bytes struct, whose size constitute a bottleneck for the maximum size of words which can be hyphenated without allocation, had a significant amount of extraneous storage.

The array::IntoIter type stores two usize indices to be able to implement DoubleSidedIterator, and we stored an extra usize just for the length of the iterator, which can already be statically computed from the iterator itself.
Switching to storing the array inline and reimplementing the iterator methods lowered this overhead by 16 bytes (3 usizes to just one). This doesn't even make the code look much lower level

We may further reduce memory usage by realising our indices have low bounds, so we can just use a single u8 for the index.
By changing that into a NonZeroU8 we also remove memory overhead when alloc is enabled, thanks to niche optimisation.

Previously, the max word size has been documented as 40 bytes in some places and 41 in others, whilst in practice it was only 38, as two bytes are reserved for matching the start and end of words.
With this change, the (true) max has been increased to 45 bytes, as alignment restrictions mean any length between 38 and 45 (inclusive) will result in the same size for the Bytes struct.

The previous size of Bytes was 72, bringing the size of Syllables to 96. The new sizes are 48 and 72, respectively.
If we were to push Bytes up to the limit of its previous size of 72, we could store as many as 69 bytes of word without allocating.

Additionally, I opted to expose a new public constant MAX_WORD_LEN to avoid a future desync between the actual max length and the documented one, especially in downstream users.
I also added tests to ensure that constant is correct. The old code passes those tests for a MAX_WORD_LEN value of 38.

I haven't found a performance regression on my machine using cargo bench -p hypher-bench. I'm not sure if that's how that extra crate is meant to be used, usually I just see benchmarks specified in the bench section of Cargo.toml.

emilyyyylime · 2025-06-04T14:23:48Z

Sorry oversight on my part there

saecki

Thank you! I quite like this optimization. I've left a few little nitpicky comments, but otherwise I think looks good!

src/lib.rs

laurmaedje

just two minor nitpicks. otherwise, lgtm!

src/lib.rs

emilyyyylime · 2025-06-13T16:22:43Z

rust-analyzer trolling me

emilyyyylime · 2025-06-19T16:09:01Z

Assuming you forgot to take a second look at this, if not all good ^^

laurmaedje · 2025-06-19T16:11:40Z

It was still on my board and back of my mind, but I hadn't yet gotten back to working through the board. But no more blockers here. Thanks!

emilyyyylime force-pushed the size branch from e10e7fd to 5d2ae76 Compare June 4, 2025 14:23

laurmaedje requested a review from saecki June 12, 2025 08:08

saecki reviewed Jun 12, 2025

View reviewed changes

src/lib.rs Outdated Show resolved Hide resolved

src/lib.rs Outdated Show resolved Hide resolved

src/lib.rs Outdated Show resolved Hide resolved

emilyyyylime force-pushed the size branch from 5d2ae76 to 731c035 Compare June 13, 2025 08:00

saecki approved these changes Jun 13, 2025

View reviewed changes

laurmaedje reviewed Jun 13, 2025

View reviewed changes

src/lib.rs Outdated Show resolved Hide resolved

src/lib.rs Outdated Show resolved Hide resolved

emilyyyylime force-pushed the size branch from 731c035 to 11cdf29 Compare June 13, 2025 16:17

Reduce size of Bytes

34637a1

emilyyyylime force-pushed the size branch from 11cdf29 to 34637a1 Compare June 13, 2025 16:22

laurmaedje merged commit 1e888ff into typst:main Jun 19, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reduce size of `Bytes` #20

Reduce size of `Bytes` #20

Uh oh!

emilyyyylime commented Jun 4, 2025

Uh oh!

emilyyyylime commented Jun 4, 2025

Uh oh!

saecki left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

laurmaedje left a comment

Uh oh!

Uh oh!

Uh oh!

emilyyyylime commented Jun 13, 2025

Uh oh!

emilyyyylime commented Jun 19, 2025

Uh oh!

Uh oh!

laurmaedje commented Jun 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Reduce size of Bytes #20

Reduce size of Bytes #20

Uh oh!

Conversation

emilyyyylime commented Jun 4, 2025

Uh oh!

emilyyyylime commented Jun 4, 2025

Uh oh!

saecki left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

laurmaedje left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

emilyyyylime commented Jun 13, 2025

Uh oh!

emilyyyylime commented Jun 19, 2025

Uh oh!

Uh oh!

laurmaedje commented Jun 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Reduce size of `Bytes` #20

Reduce size of `Bytes` #20