Skip to content

Conversation

@robsdedude
Copy link
Member

Vector.from_native and Vector.to_native both used to have alternative implementations using NumPy or the Rust extensions when available to speed up the process. It turns out, that both didn't follow the reference Python implementation (internally relying on struct.pack/struct.unpack) for 32 bit float values.

To harmonize the behavior, only the Python implementation remains. It has been significantly sped up by not iterating over the elements and calling struct.(un)pack multiple times, but instead making use of the fact that the format string allows to specify a number of values to (un)pack. With this the performance of the Python implementation was never slower than 5 times of what Rust/NumPy offered and for low dimension vectors even faster. It's deemed not worth the effort of maintaining multiple implementations assuring their behavioral parity for such a small performance gain.

The same multi-value (un)pack approach has been applied to the pure Python implementation of the byte swapping to improve it's performance by a factor of roughly 30 (still a factor of ~10 slower than using NumPy/Rust).

Fixes: DRIVERS-162

`Vector.from_native` and `Vector.to_native` both used to have alternative
implementations using NumPy or the Rust extensions when available to speed up
the process. It turns out, that both didn't follow the reference Python
implementation (internally relying on `struct.pack`/`struct.unpack`) for 32 bit
float values.

 * Python 3.14+ preserves the signaling/quieting bit of `NaN` values whereas
   NumPy doesn't and Rust doesn't make strong guarantees on how f64s are cast to
   f32s:
   https://github.com/rust-lang/rfcs/blob/master/text/3514-float-semantics.md
 * Both NumPy and Rust turn large f64s into `inf` when cast to f32, while Python
   raises an `OverflowError`.

To harmonize the behavior, only the Python implementation remains. It has been
significantly sped up by not iterating over the elements and calling
`struct.(un)pack` multiple times, but instead making use of the fact that the
format string allows to specify a number of values to (un)pack. With this the
performance of the Python implementation was never slower than 5 times of what
Rust/NumPy offered and for low dimension vectors even faster. It's deemed not
worth the effort of maintaining multiple implementations assuring their
behavioral parity for such a small performance gain.

The same multi-value (un)pack approach has been applied to the pure Python
implementation of the byte swapping to improve it's performance by a factor of
roughly 30 (still a factor of ~10 slower than using NumPy/Rust).
Copy link

@RichardIrons-neo4j RichardIrons-neo4j left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good change, makes the code much easier to understand.

@robsdedude robsdedude merged commit 3908319 into neo4j:6.x Nov 5, 2025
31 checks passed
@robsdedude robsdedude deleted the fix/vector-type-transformations branch November 5, 2025 17:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants