Skip to content

Conversation

@andygrove
Copy link
Member

@andygrove andygrove commented Dec 27, 2025

Which issue does this PR close?

  • Closes #.

Rationale for this change

  • Scalar argument optimization delivers 3.6x-8x speedup for the common case of starts_with(column, 'literal') or ends_with(column, 'literal')
  • StringViewArray benefits even more (~6-8x) than StringArray (~3.6-3.8x)
  • The optimization uses Arrow's Scalar wrapper to avoid broadcasting scalar values to full arrays

starts_with

Benchmark Before After Speedup
StringArray + scalar 32.38 µs 8.49 µs 3.8x
StringViewArray + scalar 78.15 µs 9.82 µs 8.0x

ends_with

Benchmark Before After Speedup
StringArray + scalar 32.76 µs 9.06 µs 3.6x
StringViewArray + scalar 76.44 µs 12.04 µs 6.4x

What changes are included in this PR?

Handle all combinations of array and scalar arguments without converting scalars to arrays

Are these changes tested?

Yes, new unit tests added in this PR.

Are there any user-facing changes?

No, just faster performance.

@github-actions github-actions bot added the functions Changes to functions implementation label Dec 27, 2025
@andygrove andygrove marked this pull request as ready for review December 27, 2025 19:57
@andygrove andygrove requested a review from Dandandan December 27, 2025 20:24
@Dandandan Dandandan added this pull request to the merge queue Dec 28, 2025
Merged via the queue into apache:main with commit bb4e0ec Dec 28, 2025
31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants