perf: remove dtype + fill val handling per chunk#124
Conversation
c7fe896 to
4eca1b8
Compare
|
@LDeakin Up to you if we wait for |
This reverts commit 86b8118.
|
Locally I'm getting " |
Just run a |
LDeakin
left a comment
There was a problem hiding this comment.
Nice one Ilan! I might rerun some perf benchmarks with this after we release
Tried that out but oddly enough didn't work. I'm cycling through the other options available but at least I see this is a local-only problem now, so I'll handle it. |
Calling
to_native_dtype+__str__came up as one of the only python-CPU-bound things when doing some benchmarking. My use-case is quite contrived (generating thousands ofWithSubsetobjects) but I think it's probably worth investigating getting rid of these calls. Some observations:1. I wonder if all getting theSo it turns out the array actually doesn't have to be created yet when the pipeline is generated. So I ended up doing something similar with the metadata (making it an actualdtypeandfill_valbe wrapped up in just relying on https://docs.rs/zarrs/latest/zarrs/array/struct.Array.html#method.open and then using the values directly (there are probably other benefits of doing this) but I think this is a separate PRStructand then working with that to get fill and dtype).2. Regardless, most of this refactor is around removing
Basicanyway so that chunk handling is independent of the ability. I noticed thatChunkRepresentationrequires ownership over its arguments which means we copy per-chunk. Not sure what would go into making that a reference, but it's no worse than the previous situation where I think we were generating copies repeatedly, but from PyO3 calling python --> Hooray! No longer!The benefit wasn't crazy ~5% but I think going in this direction is good (see point 1)