perf: Optimize lpad, rpad for ASCII strings by neilconway · Pull Request #20278 · apache/datafusion

neilconway · 2026-02-10T19:45:44Z

The previous implementation incurred the overhead of Unicode machinery, even for the common case that both the input string and the fill string consistent only of ASCII characters. For the ASCII-only case, we can assume that the length in bytes equals the length in characters, and avoid expensive graphene-based segmentation. This follows similar optimizations applied elsewhere in the codebase.

Benchmarks indicate this is a significant performance win for ASCII-only input (4x-10x faster) but only a mild regression for Unicode input (2-5% slower).

Along the way:

Combine: a few instances of write_str(str)? + append_value("") with append_value(str), which saves a few cycles
Add a missing test case for truncating the input string
Add benchmarks for Unicode input

Which issue does this PR close?

Closes Optimize lpad, rpad for ASCII-only strings #20277.

Are these changes tested?

Covered by existing tests. Added new benchmarks for Unicode inputs.

Are there any user-facing changes?

No.

neilconway · 2026-02-10T19:46:42Z

Benchmark results:

     Running benches/pad.rs (target/release/deps/pad-b74c12aa445bf68e)
Gnuplot not found, using plotters backend
lpad size=1024/lpad utf8 [size=1024, str_len=5, target=20]
                        time:   [13.059 µs 13.073 µs 13.086 µs]
                        change: [−75.614% −75.346% −75.137%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) low mild
lpad size=1024/lpad stringview [size=1024, str_len=5, target=20]
                        time:   [11.552 µs 11.560 µs 11.569 µs]
                        change: [−78.830% −78.528% −78.298%] (p = 0.00 < 0.05)
                        Performance has improved.
lpad size=1024/lpad utf8 [size=1024, str_len=20, target=50]
                        time:   [11.373 µs 11.420 µs 11.458 µs]
                        change: [−93.139% −92.998% −92.888%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) low mild
lpad size=1024/lpad stringview [size=1024, str_len=20, target=50]
                        time:   [11.857 µs 11.871 µs 11.887 µs]
                        change: [−92.972% −92.825% −92.700%] (p = 0.00 < 0.05)
                        Performance has improved.
lpad size=1024/lpad utf8 unicode [size=1024, target=20]
                        time:   [92.289 µs 93.798 µs 95.872 µs]
                        change: [−4.0607% −2.2791% −0.0744%] (p = 0.05 < 0.05)
                        Change within noise threshold.
Found 2 outliers among 10 measurements (20.00%)
  1 (10.00%) high mild
  1 (10.00%) high severe
lpad size=1024/lpad stringview unicode [size=1024, target=20]
                        time:   [95.919 µs 96.579 µs 97.458 µs]
                        change: [+3.0933% +4.1235% +5.2351%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 10 measurements (20.00%)
  1 (10.00%) high mild
  1 (10.00%) high severe

lpad size=4096/lpad utf8 [size=4096, str_len=5, target=20]
                        time:   [55.219 µs 55.744 µs 56.437 µs]
                        change: [−74.845% −74.463% −74.067%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild
lpad size=4096/lpad stringview [size=4096, str_len=5, target=20]
                        time:   [47.605 µs 47.737 µs 47.887 µs]
                        change: [−78.282% −78.097% −77.945%] (p = 0.00 < 0.05)
                        Performance has improved.
lpad size=4096/lpad utf8 [size=4096, str_len=20, target=50]
                        time:   [46.430 µs 47.324 µs 48.286 µs]
                        change: [−93.049% −92.852% −92.662%] (p = 0.00 < 0.05)
                        Performance has improved.
lpad size=4096/lpad stringview [size=4096, str_len=20, target=50]
                        time:   [47.352 µs 48.110 µs 49.133 µs]
                        change: [−92.810% −92.629% −92.423%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild
lpad size=4096/lpad utf8 unicode [size=4096, target=20]
                        time:   [376.29 µs 378.75 µs 381.86 µs]
                        change: [+1.7985% +2.5712% +3.4954%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild
lpad size=4096/lpad stringview unicode [size=4096, target=20]
                        time:   [380.75 µs 383.62 µs 387.43 µs]
                        change: [−1.2725% −0.2318% +0.8972%] (p = 0.70 > 0.05)
                        No change in performance detected.
Found 2 outliers among 10 measurements (20.00%)
  2 (20.00%) high severe

rpad size=1024/rpad utf8 [size=1024, str_len=5, target=20]
                        time:   [13.597 µs 13.665 µs 13.748 µs]
                        change: [−77.429% −77.259% −77.102%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild
rpad size=1024/rpad stringview [size=1024, str_len=5, target=20]
                        time:   [13.854 µs 13.908 µs 13.970 µs]
                        change: [−76.702% −76.483% −76.293%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 10 measurements (20.00%)
  2 (20.00%) high mild
rpad size=1024/rpad utf8 [size=1024, str_len=20, target=50]
                        time:   [12.804 µs 12.850 µs 12.903 µs]
                        change: [−92.564% −92.437% −92.325%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild
rpad size=1024/rpad stringview [size=1024, str_len=20, target=50]
                        time:   [13.173 µs 13.204 µs 13.238 µs]
                        change: [−92.356% −92.207% −92.115%] (p = 0.00 < 0.05)
                        Performance has improved.
rpad size=1024/rpad utf8 unicode [size=1024, target=20]
                        time:   [98.236 µs 98.714 µs 99.357 µs]
                        change: [+2.2886% +3.1339% +3.9890%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high severe
rpad size=1024/rpad stringview unicode [size=1024, target=20]
                        time:   [97.562 µs 103.38 µs 113.92 µs]
                        change: [−0.4527% +5.6605% +16.577%] (p = 0.30 > 0.05)
                        No change in performance detected.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high severe

rpad size=4096/rpad utf8 [size=4096, str_len=5, target=20]
                        time:   [57.742 µs 58.722 µs 59.893 µs]
                        change: [−76.131% −75.713% −75.202%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 10 measurements (20.00%)
  2 (20.00%) high mild
rpad size=4096/rpad stringview [size=4096, str_len=5, target=20]
                        time:   [57.256 µs 58.176 µs 59.196 µs]
                        change: [−75.652% −75.151% −74.661%] (p = 0.00 < 0.05)
                        Performance has improved.
rpad size=4096/rpad utf8 [size=4096, str_len=20, target=50]
                        time:   [52.659 µs 55.964 µs 61.240 µs]
                        change: [−92.224% −91.701% −90.893%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high severe
rpad size=4096/rpad stringview [size=4096, str_len=20, target=50]
                        time:   [50.029 µs 50.950 µs 51.995 µs]
                        change: [−92.638% −92.455% −92.266%] (p = 0.00 < 0.05)
                        Performance has improved.
rpad size=4096/rpad utf8 unicode [size=4096, target=20]
                        time:   [368.78 µs 370.27 µs 371.98 µs]
                        change: [−6.9765% −5.8825% −4.9873%] (p = 0.00 < 0.05)
                        Performance has improved.
rpad size=4096/rpad stringview unicode [size=4096, target=20]
                        time:   [372.78 µs 374.26 µs 376.59 µs]
                        change: [−6.8942% −6.1219% −5.3899%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high severe

datafusion/functions/src/unicode/rpad.rs

The previous implementation incurred the overhead of Unicode machinery, even for the common case that both the input string and the fill string consistent only of ASCII characters. For the ASCII-only case, we can assume that the length in bytes equals the length in characters, and avoid expensive graphene-based segmentation. This follows similar optimizations applied elsewhere in the codebase. Benchmarks indicate this is a significant performance win for ASCII-only input (4x-10x faster) but only a mild regression for Unicode input (2-5% slower). Along the way: * Combine: a few instances of `write_str(str)? + append_value("")` with `append_value(str)`, which saves a few cycles * Add a missing test case for truncating the input string * Add benchmarks for Unicode input

neilconway · 2026-02-15T15:33:34Z

@martin-g Is this okay to land in main, do you think? Lmk if you have other feedback or concerns.

github-actions bot added the functions Changes to functions implementation label Feb 10, 2026

neilconway force-pushed the neilc/optimize-lpad-rpad branch 3 times, most recently from f3b449e to 53b7236 Compare February 10, 2026 21:32

github-actions bot added the documentation Improvements or additions to documentation label Feb 10, 2026

martin-g approved these changes Feb 12, 2026

View reviewed changes

datafusion/functions/src/unicode/rpad.rs Show resolved Hide resolved

neilconway mentioned this pull request Feb 13, 2026

[EPIC] Optimize performance for slow expressions apache/datafusion-comet#2986

Open

neilconway force-pushed the neilc/optimize-lpad-rpad branch 2 times, most recently from 880029d to f6cc464 Compare February 15, 2026 00:31

neilconway force-pushed the neilc/optimize-lpad-rpad branch from f6cc464 to d6c1374 Compare February 15, 2026 01:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Optimize lpad, rpad for ASCII strings#20278

perf: Optimize lpad, rpad for ASCII strings#20278
neilconway wants to merge 1 commit intoapache:mainfrom
neilconway:neilc/optimize-lpad-rpad

neilconway commented Feb 10, 2026

Uh oh!

neilconway commented Feb 10, 2026

Uh oh!

Uh oh!

neilconway commented Feb 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

neilconway commented Feb 10, 2026

Which issue does this PR close?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

neilconway commented Feb 10, 2026

Uh oh!

Uh oh!

neilconway commented Feb 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants