perf: improve performance of translate by reusing buffers #19533

viirya · 2025-12-29T02:37:24Z

Optimize translate function by reusing HashMap and Vec buffers across all rows instead of allocating new ones for each row.

Changes:

Moved HashMap and Vec allocations outside the main loop
Clear and reuse buffers for each row instead of reallocating
Use explicit loops instead of chained iterators for better control
Added benchmark to measure performance improvements

Optimization:

Before: Allocated HashMap + 4 Vecs for every row
After: Single set of reusable buffers cleared for each row

Benchmark Results:

size=1024, str_len=8: 234.6 µs → 147.9 µs (37% faster)
size=1024, str_len=32: 628.6 µs → 394.2 µs (37% faster)
size=4096, str_len=8: 964.4 µs → 575.2 µs (40% faster)
size=4096, str_len=32: 2.54 ms → 1.56 ms (39% faster)

The optimization shows exceptional 37-40% performance improvements across all test cases. The HashMap reuse is particularly impactful since HashMap creation/destruction has significant overhead with hashing, bucket allocation, and internal bookkeeping. Combined with eliminating 4 Vec allocations per row, this becomes the most significant optimization in this series.

Which issue does this PR close?

Closes #.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Optimize translate function by reusing HashMap and Vec buffers across all rows instead of allocating new ones for each row. Changes: - Moved HashMap and Vec allocations outside the main loop - Clear and reuse buffers for each row instead of reallocating - Use explicit loops instead of chained iterators for better control - Added benchmark to measure performance improvements Optimization: - Before: Allocated HashMap + 4 Vecs for every row - After: Single set of reusable buffers cleared for each row Benchmark Results: - size=1024, str_len=8: 234.6 µs → 147.9 µs (37% faster) - size=1024, str_len=32: 628.6 µs → 394.2 µs (37% faster) - size=4096, str_len=8: 964.4 µs → 575.2 µs (40% faster) - size=4096, str_len=32: 2.54 ms → 1.56 ms (39% faster) The optimization shows exceptional 37-40% performance improvements across all test cases. The HashMap reuse is particularly impactful since HashMap creation/destruction has significant overhead with hashing, bucket allocation, and internal bookkeeping. Combined with eliminating 4 Vec allocations per row, this becomes the most significant optimization in this series.

github-actions bot added the functions Changes to functions implementation label Dec 29, 2025

viirya force-pushed the translate_optimize branch from c452244 to d8d7f97 Compare December 29, 2025 02:39

viirya force-pushed the translate_optimize branch from d8d7f97 to 647b56a Compare December 29, 2025 02:51

viirya requested a review from andygrove December 29, 2025 08:02

rluvaton added the performance Make DataFusion faster label Dec 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: improve performance of translate by reusing buffers #19533

perf: improve performance of translate by reusing buffers #19533

viirya commented Dec 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

perf: improve performance of translate by reusing buffers #19533

Are you sure you want to change the base?

perf: improve performance of translate by reusing buffers #19533

Conversation

viirya commented Dec 29, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants