WIP: Python buffer (PEP 3118) implementation by jakebolewski · Pull Request #70 · JuliaPy/PyCall.jl

jakebolewski · 2014-03-23T23:26:48Z

This is very much a work in progress branch to explore replacing PyCall's numpy dependency with Python Buffers / memory views (#38, and JuliaLang/IJulia.jl#49). Link to PEP proposal: http://legacy.python.org/dev/peps/pep-3118/

Note: to get something working I've ignored for the time being important things like proper reference counting, and python initialization. I have also removed including the numpy code to make testing easier.

I want to get some feedback and to bring up some questions / problems I can foresee in implementing the rest of this. This pull request currently only works for Python v. > 3.2 as the layout of the Buffer structure changed in 3.3. Python initialization is deferred until runtime, so is it best to defer PyBuffer type creation until runtime?. Now with faster startup times, is there a reason not to initialize libpython when first loading the module? Another wrinkle is that Buffer support has only been backported to 2.7. The other option is the keep the pointer opaque and do a copy of the data upon Buffer creation.

Currently the buffer implementation matches the functionality of the numpy interface (most of the code is largely the same with few modifications) under the assumption that the buffers are contiguous. Python's buffer interface allows for non-contiguous buffers so this will have to be implemented. Sould there be a distinction between the two PyArray wrapper types? There have been recent discussions about Julia's Array type hierarchy reflecting the underlying data layout.

The biggest advantage of utilizing python's buffer interface is that it supports heterogeneous structure types. This will allow for conversions between numpy's record arrays and julia array's of immutable types. Parsing the buffer format specification is going to take a bit more work. The main question I have is how to best generate the types for these buffers. For instance upon being passed a numpy record array specifying some structure, do we create a requisite type on the fly or look to the global environment and try to match an existing type? I would be interested in any thoughts anyone might have as for how this would work.

The last thing to do would be to implement the buffer interface for Julia's array types to support Python -> Julia interfaces. What is the best way to do this?

stevengj · 2014-03-28T14:54:22Z

Hi Jake, thanks for doing this!

I don't think non-contiguous buffers should be handled differently. As long as it is strided data, it falls under the same DenseArray abstract class (or whatever we decide to call it in 0.3).

The reason I don't initialize when first loading the module is to allow more freedom in how Python is initialized. But maybe I should switch to doing that purely via environment variables?

I would generate a separate type for each heterogeneous structure, but cache the types. This way, PyCall will return consistent types for multiple calls with the same data, which I think is important. I don't think we should look at the global environment for matching structures, as that could lead to very surprising and inconsistent results.

To implement Julia arrays as Python buffers, we'll want to implement a new Python type that implements the requisite method slots. See, for example, jl_IOType in io.jl.

stevengj · 2014-03-28T14:55:28Z

If we stick with the current pyinitialize pattern for now, you could always define PyBuffer at initialization time by calling eval.

Ok, that seems like the least disruptive change.

@jakebolewski, why do you say that the obj field is not in the struct for versions ≤ 3.2?

I just looked in the Python 2.7.6 Include/object.h header file, and Py_buffer is defined as

/* Py3k buffer interface */ typedef struct bufferinfo { void *buf; PyObject *obj; /* owned reference */ Py_ssize_t len; Py_ssize_t itemsize; /* This is Py_ssize_t so it can be pointed to by strides in simple case.*/ int readonly; int ndim; char *format; Py_ssize_t *shape; Py_ssize_t *strides; Py_ssize_t *suboffsets; Py_ssize_t smalltable[2]; /* static store for shape and strides of mono-dimensional buffers. */ void *internal; } Py_buffer;

which seems identical except for the smalltable field (which is missing in Python 3.4). However, we can just add some padding to the structure for compatibility, since we don't actually access those fields ourselves.

@stevengj I had a look, that comment is just wrong. Perhaps I wrote that initially based on the documentation ->
Python 2.7 vs Python 3.4 where the obj field is documentated in the latter.

It was a bit speculative at the time, but I really should finish this PR. Bidirectional zero copy buffers would be a really nice feature.

jakebolewski · 2014-03-30T19:47:54Z

@stevengj thanks for the comments, I'll works towards integrating your suggestions. This was prompted by doing a bit of profiling with pyjulia. Using memory views over numpy data types delivers a boost in performance with Cython, I hope that the same is true with PyCall when all is said and done. Using buffers eliminates object introspection for numpy arrays so I'm hoping it will give a boost in performance.

stevengj · 2014-04-06T03:16:40Z

Yes, the NumPy array interface that we are currently using seems like it will have greater overhead than the buffer-interface C calls. Though for operations on large arrays the overhead of introspection shouldn't be an issue.

jakevdp · 2014-05-02T20:15:08Z

copied from?

jakevdp · 2014-05-02T20:19:01Z

Just catching up on this - why did you go with buffer objects rather than memoryview objects? Or maybe the thing to do here is to actually define a new JuliaArray object which correctly exposes the buffer protocol?

jakebolewski · 2014-05-02T21:15:56Z

PyBuffer should be lowlevel, it just redefines the Py_Buffer struct in julia so we can pass the julia object by reference in ccall when working with the buffer C API directly. We could wrap this in a julia memoryview object similar to how Cython works, but this is really what PyArray is doing. Maybe PyArray should be renamed to PyMemoryView as it is similar to Cython's memory view object.

Exposing julia objects to python through the buffer protocol you need to define a new type like you suggested and how @stevengj did with julia's IO type in src/io.jl. He would be the best person to comment about how to go about doing this.

jakevdp · 2014-05-02T23:42:02Z

Cool – I spent some time reading through src/io.jl and I think I understand what's going on. It looks like src/pytype.jl defines a bunch of convenience routines and macros for creating a new C-level Python object within Julia itself.

What needs to be done is to use this to create a Python object which exposes the buffer interface and contains a reference to a Julia array object. Once that's done, it's just a matter of creating a PyObject constructor specialization for the Julia array type which returns this custom Python type.

I think that if we went that route, we wouldn't need the current PyArray type at all. We'd just need the above PyObject constructor which returns the appropriate Python-viewable Julia structure.

Does that sound about right?

stevengj · 2014-05-03T01:46:37Z

@jakevdp, yes, that sounds about right.

However, that would be for Julia -> Python conversions, PyArray is for Python -> Julia conversions, and we certainly want some Array (eventually DenseArray) subtype that provides a transparent copy-free wrapper around NumPy arrays and Python buffers; I'm fine with using the buffer interface rather than the NumPy array interface for this.

jakevdp · 2014-05-03T02:17:50Z

I think the best way to do numpy->Julia would be via the buffer interface as well. If we make a method whereby anything with a buffer interface can be transparently viewed as a Julia array, then numpy will come for free! I haven't thought much about that direction, but it seems like it wouldn't be too much extra work.

stevengj · 2014-05-03T02:21:16Z

@jakevdp, that's basically what @jakebolewski has done with PyBuffer. Re-implementing PyArray on top of PyBuffer (or replacing the former with the latter) should be simple; I would prefer to keep the name PyArray for this functionality.

jakevdp · 2014-05-03T02:43:17Z

OK - the functionality I desire, though, is to have something like this in Python:

j = julia.Julia()
x = j.run("[1:10]")  # x is now a JuliaArray object, which exposes the buffer interface
xA = np.asarray(x)  # xA is a numpy array view of the julia array

Is this possible with the current PyBuffer/PyArray approach?

jakevdp · 2014-05-03T02:46:13Z

I should specify that I want any julia expression which returns an object compatible with a strided array to have the same behavior.

stevengj · 2014-05-03T04:06:40Z

@jakevdp, that's the Julia -> Python direction, and yes that requires defining a new type like in io.jl. (Straightforward but a but tedious and requires some care because of the dangerous nature of such low-level coding.)

jakevdp · 2014-05-03T05:02:16Z

Do you see any way to have the Python->Julia direction and Julia->Python direction using the same framework? I don't think I've quite wrapped my mind around the whole problem yet.

stevengj · 2014-05-03T15:00:39Z

No, the two directions generally require different code. If you look at the PyCall source, you'll see that for every converted type T there are two functions: a PyObject(x::T) function (Julia→Python) and a convert(::Type{T}, o::PyObject) function (Python→Julia).

jakevdp · 2014-05-04T13:53:51Z

Just had a thought on this: what if the JuliaArray object contains a constructor which uses Python's buffer interface to create the internal Julia array? Then the single object structure could handle both directions for a very general set of Python array-like objects.

stevengj · 2014-05-04T22:06:25Z

@jakevdp, whether you put the function in PyObject(x::DenseArray) or in PyArray(x::DenseArray), you still have to implement separate functions for Julia→Python and Python→Julia conversions. But you need the former in any case, because a PyObject constructor is needed in PyCall for every Julia type that is going to get passed to Python.

(Also, in Julia, functions don't really "belong" to objects in the way that they do in an OO language like Python.)

jakevdp · 2014-05-05T00:30:25Z

I understand that it's two distinct functions that are needed - the point I was making was that perhaps you could take advantage of Python's buffer interface for both directions, rather than re-implementing the concept within the PyObject constructor for only numpy arrays. Then rather than requiring a numpy array to create a Julia array via PyObject, you'd be able to create the Julia array from any Python object which defines the buffer interface. It seems like that would be much more general and much more useful in the long run.

stevengj · 2014-05-05T15:52:17Z

@jakevdp, I think we all agree that it would be better to make a buffer object for Julia→Python than relying on NumPy.

jakevdp · 2014-05-05T16:18:23Z

@stevengj Yes - but my primary point is that it would also be good to exploit the buffer interface for Python→Julia.

jakevdp · 2014-05-05T17:18:42Z

I think that buf and obj are switched: https://docs.python.org/3.3/c-api/buffer.html#Py_buffer

That looks like a mistake in the doc: see https://github.com/python/cpython/blob/master/Include/object.h#L178-191

jakevdp · 2014-05-05T17:42:00Z

@stevengj Yes - but my primary point is that it would also be good to exploit the buffer interface for Python→Julia.

Just looking more closely at this... I see that this is already done in the PyArray object. Sorry for the confusion on that.

Just to be clear, it sounds like what you have in mind is to keep the current PyArray object and create a new Python object (lets say JuliaArray) which implements the Python Buffer interface using a means similar to that in src/io.jl. So then there would basically be two object types: PyArray, which is a Julia object that uses the buffer interface to convert Python→Julia, and JuliaArray which is a Python object (defined in Julia) that uses the buffer interface to convert Julia→Python.

My question here (which may have been lost in my own confusion on things) is this: would it not be simpler to define a single structure which accomplishes both these things? The components at the beginning of the structure could define the Python side of things ala src/io.jl, and additional pieces could be used to define what Julia needs. Then there can be both Python-side and Julia-side constructors & operations on this unified interface object.

Is there any particular reason to separate the two functionalities rather than taking this unified approach?

stevengj · 2014-05-05T19:38:33Z

The problem is that JuliaArray is not a type in Julia, it is a type in Python. Hence it cannot be the same structure as PyArray, and in fact will have virtually no code in common with PyArray.

As an analogy, look at the conversion of Function objects to/from Python callable objects. Converting Python→Julia has literally zero code in common with converting Julia→Python, nor could the two conceivably share any code.

jakevdp · 2014-05-05T23:45:06Z

Ah, OK. I think I'm convinced now 😄 Thanks for the patience.

I've been working today on understanding the Python buffer protocol. There's not much out there, so I'm writing a quick tutorial that I'll put on my blog. Once I've figured that out, I'll take a stab at implementing it in Julia.

turn off np.frombuffer tests and switch to RECORDS buffer protocol support

Only immutable types have C-ABI compatability in Julia. Here we make the Py_buffer struct immutable and wrap it with PyBuffer so we can attach a finalizer for auto memory management. This will enable us to reuse the Py_buffer struct for the Julia -> Python buffer implementation.

…on of the original buffer. Calling asarray preserves this information.

stevengj · 2015-03-03T17:05:48Z

Why is the length zero for ndim == 0 ... aren't zero-dimensional arrays normally length 1? Why do you have this check here?

The Python documentation says The number of dimensions the memory represents as an n-dimensional array. If it is 0, buf points to a single item representing a scalar.

Also, is shape is NULL then itemsize should be disregarded and assumed to be 1, according to the docs.

stevengj · 2015-03-03T18:35:25Z

@jakebolewski, I merged a modified subset of this PR in order to fix JuliaPy/PyPlot.jl#118 — basically, my PyObject(::IO) wrappers weren't working because they need to use the buffer interface to get access to the raw bytes in order to implement write.

It would be good to have an updated PR which adds the other stuff I omitted, in particular your new NumPy-free PyArray (probably in a separate pyarray.jl file).

PallHaraldsson · 2016-10-15T22:50:39Z

"I merged a modified subset of this PR in order to fix JuliaPy/PyPlot.jl#118"

[Because of this and old comments] I only scanned this issue, and should it still be open?

Is it my correct understanding, that even without, PyCall is pretty good (at least the multidimensional aspects)? This issue would just be icing-on-the cake? Not really faster, just loosing a dependency? [I see (unrelated) bugs fixed all the time, in PyCall, I think, and README seems to support, that it mostly works, rather than the opposite.]

In the source code (can ignore if you want, I'm just trying to understand, so I can maybe help..):

function NpyArray{T<:NPY_TYPES}(a::StridedArray{T}, revdims::Bool) #not needing default =false (as I think not exported)
[..]

PyReverseDims{T<:NPY_TYPES}(a::StridedArray{T}) = NpyArray(a, true)
PyReverseDims(a::BitArray) = PyReverseDims(Array(a))

[doc]
PyReverseDims(a::AbstractArray) #this stray line is puzzling to me.. I thought a function body needed, can't do similar myself..

[PyReverseDims of course comes at an unavoidable performance cost], unless you chose not to flip, that is often ok; unclear from my reading of the code, if there's a cost for 1D arrays, and someone uses PyReverseDims out of habit.]

stevengj reviewed Mar 28, 2014
View reviewed changes

jakebolewski mentioned this pull request May 2, 2014

update library name for 0.3 JuliaLang/IJulia.jl#167

Closed

jakevdp reviewed May 2, 2014
View reviewed changes

Comment thread src/buffer.jl Outdated

Copy link
Copy Markdown

jakevdp May 2, 2014

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copied from?

jakevdp reviewed May 5, 2014
View reviewed changes

jakebolewski added 2 commits May 5, 2014 23:32

initial python buffer implementation

fba7b54

cleanup tests , change PyBuffer field types to C typealiases

f21dbc6

jakebolewski added 3 commits May 6, 2014 02:34

make tests pass again

4402378

turn off np.frombuffer tests and switch to RECORDS buffer protocol support

numpy's frombuffer does not seem to preserve shape / stride informati…

65e6508

…on of the original buffer. Calling asarray preserves this information.

stevengj mentioned this pull request Oct 27, 2014

tests segfault with Julia 0.4 #95

Closed

stevengj reviewed Mar 3, 2015
View reviewed changes

stevengj added a commit that referenced this pull request Mar 3, 2015

incorporate minimal subset of #70 in order to fix JuliaPy/PyPlot.jl#118

ce1a43c

stevengj mentioned this pull request May 14, 2015

keep python nested list/array structure #142

Open

stevengj mentioned this pull request Apr 4, 2018

PyArray conversion speedups and PyArrayFromBuffer #487

Merged

Conversation

jakebolewski commented Mar 23, 2014

Uh oh!

stevengj commented Mar 28, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jakebolewski commented Mar 30, 2014

Uh oh!

stevengj commented Apr 6, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jakevdp commented May 2, 2014

Uh oh!

jakebolewski commented May 2, 2014

Uh oh!

jakevdp commented May 2, 2014

Uh oh!

stevengj commented May 3, 2014

Uh oh!

jakevdp commented May 3, 2014

Uh oh!

stevengj commented May 3, 2014

Uh oh!

jakevdp commented May 3, 2014

Uh oh!

jakevdp commented May 3, 2014

Uh oh!

stevengj commented May 3, 2014

Uh oh!

jakevdp commented May 3, 2014

Uh oh!

stevengj commented May 3, 2014

Uh oh!

jakevdp commented May 4, 2014

Uh oh!

stevengj commented May 4, 2014

Uh oh!

jakevdp commented May 5, 2014

Uh oh!

stevengj commented May 5, 2014

Uh oh!

jakevdp commented May 5, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jakevdp commented May 5, 2014

Uh oh!

stevengj commented May 5, 2014

Uh oh!

jakevdp commented May 5, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stevengj commented Mar 3, 2015

Uh oh!

PallHaraldsson commented Oct 15, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

PallHaraldsson commented Oct 15, 2016 •

edited

Loading