feat: async and multi-result set APIs WIP#3607
feat: async and multi-result set APIs WIP#3607zeroshade wants to merge 5 commits intoapache:mainfrom
Conversation
| /// being returned successfully. ADBC_STATUS_NOT_FOUND is returned | ||
| /// when there are no more result sets. | ||
| ADBC_EXPORT | ||
| AdbcStatusCode AdbcStatementNextResultSet(struct AdbcStatement* statement, |
There was a problem hiding this comment.
It would be nice if this also worked for "schema-only" evaluation. (I started down this road last year and my work-in-progress was at main...CurtHagenlocher:arrow-adbc:MoreResults.)
There was a problem hiding this comment.
We have AdbcStatementExecuteSchema for schema-only evaluation, so you're suggesting a NextResultSet function for that one too?
|
Do we want async variants of things like GetObjects, Prepare, etc.? |
|
From a tactical perspective, should 1.2-related work all happen on a separate branch (at least until we feel the proposed changes have been proven out)? On an unrelated note, are there Arrow interop tests yet for the device APIs? I seem to recall that there weren't any a year ago. |
Yeah, I think that'll have to be what happens due to the header changes. |
c/include/arrow-adbc/adbc.h
Outdated
| /// | ||
| /// A partition can be retrieved from AdbcPartitions. | ||
| /// | ||
| /// This AdbcConnection must outlive the ArrowAsyncDeviceStreamHandler. |
There was a problem hiding this comment.
I think we need to comment on the lifetime of serialized_partition too - does it need to live until the call returns or until the callback finishes?
c/include/arrow-adbc/adbc.h
Outdated
| ADBC_EXPORT | ||
| AdbcStatusCode AdbcStatementExecuteQueryAsync( | ||
| struct AdbcStatement* statement, struct ArrowAsyncDeviceStreamHandler* handler, | ||
| int64_t* rows_affected, struct AdbcError* error); |
There was a problem hiding this comment.
When would rows_affected be populated? Maybe it should be included in schema metadata or something instead...
There was a problem hiding this comment.
I guess it could go in ArrowAsyncProducer's additional_metadata. Or, provide an AdbcAsyncProducerGetMetadata(ArrowAsyncProducer*) like how we have an AdbcErrorFromArrayStream. (Speaking of which, we need an equivalent of that for the async stream.)
There was a problem hiding this comment.
Hmm, that's a good point. Since it's an asynchronous execution it wouldn't be populated before returning. I guess we can define a canonical metadata key to indicate total rows?
|
Not to expand the scope too much, but looking at #1514 I wonder if we should add this parameter to Execute. (It might be cleaner to just add a separate call for it, though.) |
I'm not sure if an async variant of Would an async variant for Prepare be something like |
|
Added more functions based on the comments and back-and-forth here. The current collection of new functions is:
personally, I'd prefer a separate call for this and do it in a follow-up PR rather than in this one. But I'm open to it here if there's consensus. |
|
Prepare may have to do I/O, hence it should have an async variant. |
paleolimbot
left a comment
There was a problem hiding this comment.
Awesome! Probably an async wrapper around a sync array stream in nanoarrow would help fill some of these in.
It's a bummer we have to repeat so much documentation but I suppose the existing documentation has mostly stayed the same.
Not to expand the scope too much, but looking at #1514 I wonder if we should add this parameter to Execute. (It might be cleaner to just add a separate call for it, though.)
I would love this (although I think a separate function makes sense). I can write that up as separate PR if that would be helpful.
Wouldn't it make more sense to just have nanoarrow implement the Async stream handler from the C Data interface?
It would! Thanks! |
Added StatementPrepareAsync |
|
|
Just a shower thought (that can be split into a separate discussion) but I wonder, instead of adding async and non-async versions of APIs that don't return data, we could just add a set of |
I like the idea that lets us avoid needing to have async and non-async versions of the functions, but I think this might be too complex. @CurtHagenlocher @paleolimbot thoughts? |
| /// being returned successfully. ADBC_STATUS_NOT_FOUND is returned | ||
| /// when there are no more result sets. |
There was a problem hiding this comment.
I don't like overloading the return code. I think it'd be more consistent to return a schema with no release callback.
There was a problem hiding this comment.
I can update this to do that, then I'll attempt to implement these APIs and see how it goes
Extracted from #3607 with influence by the comments there and main...CurtHagenlocher:arrow-adbc:MoreResults, this contains a proposal for handling multi-result set query execution via ADBC by adding a new function for drivers, `AdbcStatementNextResultSet`. This also includes the necessary changes for an ADBC API Revision 1.2.0 (macro defines and so on). The comment above the function includes all the semantic definitions of the behavior.
A draft for a possible update to the ADBC API to add several driver methods:
This would also be a 1.2.0 revision of the ADBC API
Actually implementing these functions would be done in a separate PR once we've reached consensus on this addition to the API