feat: route use_sea=True through ADBC-Rust kernel via PyO3#782
Draft
vikrantpuppala wants to merge 1 commit intomainfrom
Draft
feat: route use_sea=True through ADBC-Rust kernel via PyO3#782vikrantpuppala wants to merge 1 commit intomainfrom
vikrantpuppala wants to merge 1 commit intomainfrom
Conversation
Adds a new backend, `AdbcDatabricksClient`, that delegates query execution to the `databricks_adbc_pyo3` extension module (PyO3 bindings over the Databricks ADBC Rust kernel). When `use_sea=True` is passed to `sql.connect`, requests now flow through the Rust kernel instead of the existing Python-SEA backend. This is the Python-side companion to the satellite PyO3 binding being prototyped in adbc-drivers/databricks#423. **Draft** while that binding is not yet on PyPI — `import databricks_adbc_pyo3` will fail unless the binding is installed locally via `maturin develop`. What's wired through the public API: - sql.connect(..., use_sea=True) → Rust kernel - cursor.execute(...) → SEA + CloudFetch - cursor.fetchone() / fetchmany(n) / fetchall() → Row tuples - cursor.fetchall_arrow() / fetchmany_arrow(n) → pyarrow.Table - cursor.description → PEP-249 7-tuples - iteration (`for row in cursor`), context mgrs What is NOT yet wired (raises NotImplementedError): - Parameterized queries (`parameters=[...]`) - Async execution (`async_op=True`) - Metadata methods (catalogs, schemas, tables, columns) Auth: PAT only for now; OAuth M2M / U2M / Azure SP / external credential providers are not yet plumbed through the Rust binding. Code layout: src/databricks/sql/backend/adbc/ __init__.py — re-exports AdbcDatabricksClient client.py — DatabricksClient impl, delegates to PyO3 result_set.py — ResultSet impl over the streaming PyO3 ResultSet, with batch buffering for fetchone / fetchmany. The old `backend/sea/` tree is left in place and unreachable from sql.connect; deletion is a separate cleanup once this backend reaches parity.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Draft / RFC. Companion to the PyO3 satellite binding being prototyped in adbc-drivers/databricks#423.
Adds a new backend,
AdbcDatabricksClient, that delegates query execution to thedatabricks_adbc_pyo3extension module (PyO3 bindings over the Databricks ADBC Rust kernel). Whenuse_sea=Trueis passed tosql.connect, requests now flow through the Rust kernel instead of the existing Python-SEA backend.What this proves out
The kernel-strategy design (
docs/kernel-strategy-final-recommendation.mdin the kernel repo) calls foruse_sea=Trueto be powered by a single Rust SEA implementation shared across all Databricks language drivers. This PR is the Python-side wiring to validate that path end-to-end.Performance vs the existing Thrift backend on a dogfood warehouse, randomized interleaved benchmark, median wall time,
fetchall_arrowpath:SELECT 1What's wired through the public API
sql.connect(..., use_sea=True)opens a Rust-kernel-backed sessioncursor.execute(sql)runs queries (sync, PAT-only)cursor.fetchone() / fetchmany(n) / fetchall()returnsRownamedtuplescursor.fetchall_arrow() / fetchmany_arrow(n)returnspyarrow.Table(zero-copy from Rust via Arrow C Data Interface)cursor.descriptionreturns PEP-249 7-tuples derived from the Arrow schemafor row in cursor) and context managersWhat is NOT yet wired (raises
NotImplementedError)parameters=[...])async_op=True) andcancel()cursor.catalogs() / schemas() / tables() / columns())loggingDatabaseError/OperationalError/ProgrammingErrorCode layout
The old
backend/sea/tree is left in place and unreachable fromsql.connect; deletion is a separate cleanup once this backend reaches parity with the rest of the design doc.Why draft?
databricks_adbc_pyo3is not yet on PyPI. CI here will fail to import the new backend until the satellite is published. To run locally:Open questions
python-driver-rust-adbc-sea-design.mdin the kernel repo) plans for deletion of the existingbackend/sea/tree. Is keeping it in place for one release acceptable for the migration window?Test plan
import databricks.sqlworkssql.connect(use_sea=True)succeeds against a dogfood warehouse with a PATfetchone()/fetchall_arrow()fetchall_arrow()fetchmany(n)slices correctly across batch boundariescursor.descriptionreturns sensible typesThis pull request and its description were written by Isaac.