Skip to content

feat: Optimize from_bitwise_binary_op with 64-bit alignment#9441

Open
kunalsinghdadhwal wants to merge 1 commit intoapache:mainfrom
kunalsinghdadhwal:kunal/optimize-bitwise-binary-op-9378
Open

feat: Optimize from_bitwise_binary_op with 64-bit alignment#9441
kunalsinghdadhwal wants to merge 1 commit intoapache:mainfrom
kunalsinghdadhwal:kunal/optimize-bitwise-binary-op-9378

Conversation

@kunalsinghdadhwal
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

the optimizations as listed in the issue description

  • Align to 8 bytes
  • Don't try to return a buffer with bit_offset 0 but round it to a multiple of 64
  • Use chunk_exact for the fallback path

What changes are included in this PR?

When both inputs share the same sub-64-bit alignment (left_offset % 64 == right_offset % 64), the optimized path is used. This covers the common cases (both offset 0, both sliced equally, etc.). The BitChunks fallback is retained only when the two offsets have different sub-64-bit alignment.

Are these changes tested?

Yes the tests are changed and they are included

Are there any user-facing changes?

Yes, this is a minor breaking change to from_bitwise_binary_op:

  • The returned BooleanBuffer may now have a non-zero offset (previously always 0)
  • The returned BooleanBuffer may have padding bits set outside the logical range in values()

Signed-off-by: Kunal Singh Dadhwal <kunalsinghdadhwal@gmail.com>
@github-actions github-actions bot added the arrow Changes to the arrow crate label Feb 19, 2026
@kunalsinghdadhwal
Copy link
Contributor Author

@Dandandan kindly review

@Dandandan
Copy link
Contributor

run benchmark boolean_kernels

@kunalsinghdadhwal
Copy link
Contributor Author

kunalsinghdadhwal commented Feb 19, 2026

and                     time:   [129.08 ns 129.76 ns 130.46 ns]
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low severe
  3 (3.00%) low mild
  2 (2.00%) high mild
  2 (2.00%) high severe

or                      time:   [134.48 ns 135.29 ns 136.17 ns]
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe

not                     time:   [91.808 ns 92.431 ns 93.130 ns]
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) high mild
  2 (2.00%) high severe

and_sliced_1            time:   [596.55 ns 600.04 ns 604.23 ns]
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

or_sliced_1             time:   [599.21 ns 601.99 ns 604.87 ns]
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild

not_sliced_1            time:   [90.421 ns 90.955 ns 91.544 ns]
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe

and_sliced_24           time:   [116.06 ns 116.83 ns 117.75 ns]
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild
  2 (2.00%) high severe

or_sliced_24            time:   [116.09 ns 116.94 ns 117.91 ns]
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild

not_slice_24            time:   [90.518 ns 91.550 ns 92.754 ns]
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

here is the comparsion

Benchmark main optimized speedup
and 128.33 ns 130.22 ns 0.98x
or 132.71 ns 134.03 ns 0.99x
not 91.78 ns 91.78 ns 1.00x
and_sliced_1 656.07 ns 650.42 ns 1.01x
or_sliced_1 669.51 ns 662.51 ns 1.01x
not_sliced_1 114.27 ns 112.00 ns 1.02x
and_sliced_24 141.51 ns 139.42 ns 1.01x
or_sliced_24 138.28 ns 114.78 ns 1.20x
not_slice_24 90.24 ns 113.18 ns 0.80x

@kunalsinghdadhwal
Copy link
Contributor Author

@Dandandan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize from_bitwise_binary_op

2 participants

Comments