Skip to content

Conversation

@viirya
Copy link
Member

@viirya viirya commented Dec 29, 2025

Optimize translate function by reusing HashMap and Vec buffers across all rows instead of allocating new ones for each row.

Changes:

  • Moved HashMap and Vec allocations outside the main loop
  • Clear and reuse buffers for each row instead of reallocating
  • Use explicit loops instead of chained iterators for better control
  • Added benchmark to measure performance improvements

Optimization:

  • Before: Allocated HashMap + 4 Vecs for every row
  • After: Single set of reusable buffers cleared for each row

Benchmark Results:

  • size=1024, str_len=8: 234.6 µs → 147.9 µs (37% faster)
  • size=1024, str_len=32: 628.6 µs → 394.2 µs (37% faster)
  • size=4096, str_len=8: 964.4 µs → 575.2 µs (40% faster)
  • size=4096, str_len=32: 2.54 ms → 1.56 ms (39% faster)

The optimization shows exceptional 37-40% performance improvements across all test cases. The HashMap reuse is particularly impactful since HashMap creation/destruction has significant overhead with hashing, bucket allocation, and internal bookkeeping. Combined with eliminating 4 Vec allocations per row, this becomes the most significant optimization in this series.

Which issue does this PR close?

  • Closes #.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the functions Changes to functions implementation label Dec 29, 2025
@viirya viirya force-pushed the translate_optimize branch from c452244 to d8d7f97 Compare December 29, 2025 02:39
Optimize translate function by reusing HashMap and Vec buffers across
all rows instead of allocating new ones for each row.

Changes:
- Moved HashMap and Vec allocations outside the main loop
- Clear and reuse buffers for each row instead of reallocating
- Use explicit loops instead of chained iterators for better control
- Added benchmark to measure performance improvements

Optimization:
- Before: Allocated HashMap + 4 Vecs for every row
- After: Single set of reusable buffers cleared for each row

Benchmark Results:
- size=1024, str_len=8:  234.6 µs → 147.9 µs (37% faster)
- size=1024, str_len=32: 628.6 µs → 394.2 µs (37% faster)
- size=4096, str_len=8:  964.4 µs → 575.2 µs (40% faster)
- size=4096, str_len=32: 2.54 ms → 1.56 ms (39% faster)

The optimization shows exceptional 37-40% performance improvements
across all test cases. The HashMap reuse is particularly impactful
since HashMap creation/destruction has significant overhead with
hashing, bucket allocation, and internal bookkeeping. Combined with
eliminating 4 Vec allocations per row, this becomes the most
significant optimization in this series.
@viirya viirya force-pushed the translate_optimize branch from d8d7f97 to 647b56a Compare December 29, 2025 02:51
@viirya viirya requested a review from andygrove December 29, 2025 08:02
@rluvaton rluvaton added the performance Make DataFusion faster label Dec 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation performance Make DataFusion faster

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants