-
Notifications
You must be signed in to change notification settings - Fork 2
feat(duckdb): Bake in spatial and excel extensions
#36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(duckdb): Bake in spatial and excel extensions
#36
Conversation
📝 WalkthroughWalkthroughThe PR updates _get_duckdb_connection to create a logger, iterate over extensions ["spatial", "excel"], and attempt to Sequence DiagramsequenceDiagram
participant App as Application
participant getConn as _get_duckdb_connection()
participant importExt as import_extension()
participant DuckDB as DuckDB Connection
participant Logger as Logger
App->>getConn: request/init connection
getConn->>Logger: create logger
rect rgb(248,249,250)
Note over getConn,importExt: Loop over extensions ["spatial","excel"]
loop for each extension
getConn->>importExt: import_extension(name, force_install=True, con)
alt import_ext success
importExt-->>getConn: wheel available
getConn->>DuckDB: load_extension(extension_name)
alt load success
DuckDB-->>getConn: extension loaded
else load fails
DuckDB-->>getConn: error
getConn->>Logger: logger.warning(...)
end
else import_ext fails
importExt-->>getConn: error
getConn->>Logger: logger.warning(...)
end
end
end
getConn->>DuckDB: set pandas_analyze_sample
alt DuckDB >= 1.1.0
getConn->>DuckDB: set python_scan_all_frames = true/false
end
getConn-->>App: return connection (extensions attempted, connection usable)
Pre-merge checks❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: ASSERTIVE Plan: Pro Disabled knowledge base sources:
📒 Files selected for processing (1)
🧰 Additional context used🧬 Code graph analysis (1)tests/unit/test_duckdb_sql.py (1)
🪛 Ruff (0.14.5)tests/unit/test_duckdb_sql.py28-28: Remove implied (PT003) 37-37: Possible SQL injection vector through string-based query construction (S608) 46-46: Possible SQL injection vector through string-based query construction (S608) 67-67: Boolean positional value in function call (FBT003) 73-73: Boolean positional value in function call (FBT003) 130-130: Possible SQL injection vector through string-based query construction (S608) 135-135: Possible SQL injection vector through string-based query construction (S608) ⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
🔇 Additional comments (3)
Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #36 +/- ##
==========================================
+ Coverage 72.89% 72.92% +0.03%
==========================================
Files 94 94
Lines 5168 5175 +7
Branches 754 755 +1
==========================================
+ Hits 3767 3774 +7
Misses 1157 1157
Partials 244 244
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
|
📦 Python package built successfully!
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
poetry.lockis excluded by!**/*.lock
📒 Files selected for processing (3)
deepnote_toolkit/sql/duckdb_sql.py(2 hunks)pyproject.toml(1 hunks)tests/unit/test_duckdb_sql.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/unit/test_duckdb_sql.py (1)
deepnote_toolkit/sql/duckdb_sql.py (3)
_get_duckdb_connection(33-71)_set_sample_size(74-75)_set_scan_all_frames(78-81)
🪛 Ruff (0.14.5)
tests/unit/test_duckdb_sql.py
11-11: scope='function' is implied in @pytest.fixture()
Remove implied scope argument
(PT003)
29-29: Possible SQL injection vector through string-based query construction
(S608)
38-38: Possible SQL injection vector through string-based query construction
(S608)
59-59: Boolean positional value in function call
(FBT003)
65-65: Boolean positional value in function call
(FBT003)
90-90: Possible SQL injection vector through string-based query construction
(S608)
95-95: Possible SQL injection vector through string-based query construction
(S608)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Test - Python 3.13
- GitHub Check: Test - Python 3.9
- GitHub Check: Test - Python 3.11
- GitHub Check: Test - Python 3.10
- GitHub Check: Test - Python 3.12
- GitHub Check: Typecheck - 3.13
- GitHub Check: Typecheck - 3.12
🔇 Additional comments (6)
deepnote_toolkit/sql/duckdb_sql.py (1)
4-4: LGTM: Clean import.tests/unit/test_duckdb_sql.py (5)
26-40: LGTM: Comprehensive extension verification.Parametrized test correctly verifies both extensions are installed and loaded. SQL injection warnings are false positives—extension names are from controlled parametrize values.
43-47: LGTM: Singleton pattern verified.
50-55: LGTM: Configuration test.
58-69: LGTM: Toggle behavior verified.Boolean positional arguments are clear in context despite static analysis warning.
72-97: LGTM: Thorough Excel integration test.Tests both spatial and excel extensions' ability to read Excel files, verifying round-trip integrity. SQL injection warnings are false positives—paths are from controlled tmp_path fixture.
|
🚀 Review App Deployment Started
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
deepnote_toolkit/sql/duckdb_sql.py(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: Test - Python 3.11
- GitHub Check: Test - Python 3.9
- GitHub Check: Test - Python 3.10
- GitHub Check: Build and push artifacts for Python 3.13
- GitHub Check: Build and push artifacts for Python 3.12
- GitHub Check: Build and push artifacts for Python 3.10
- GitHub Check: Build and push artifacts for Python 3.11
- GitHub Check: Build and push artifacts for Python 3.9
🔇 Additional comments (2)
deepnote_toolkit/sql/duckdb_sql.py (2)
4-4: LGTM.Import is necessary for the new bundled extension approach.
57-61: Version constraints are already properly configured.Extension packages (spatial, excel) and DuckDB are pinned to matching version ranges (1.x) in pyproject.toml. No changes needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
deepnote_toolkit/sql/duckdb_sql.py(2 hunks)
🧰 Additional context used
🪛 Ruff (0.14.5)
deepnote_toolkit/sql/duckdb_sql.py
68-68: Do not catch blind exception: Exception
(BLE001)
69-69: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
69-69: Logging statement uses f-string
(G004)
🔇 Additional comments (2)
deepnote_toolkit/sql/duckdb_sql.py (2)
4-8: LGTM: Imports support new extension loading.The new imports are correct and necessary for bundled extension management.
46-59: LGTM: Logger and comments are clear.Logger initialization is correct, and comments accurately explain the extension loading strategy and future considerations.
Artmann
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
1b8269a
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
deepnote_toolkit/sql/duckdb_sql.py (1)
39-41: Tighten logging style and align docstring with behaviorImplementation looks good and matches the tests; a couple of small polish points:
- The docstring still says “installs the spatial extension” while you now handle both spatial and excel; consider updating it to mention both.
- You intentionally need a broad
except Exceptionhere, but Ruff flags it; you can acknowledge that and fix the logging hint at the same time:- # Install and load the spatial extension. Primary use case: reading xlsx files + # Install and load spatial and excel extensions. Primary use case: reading xlsx files @@ - for extension_name in ["spatial", "excel"]: - try: + for extension_name in ["spatial", "excel"]: + try: import_extension( name=extension_name, force_install=True, con=_DEEPNOTE_DUCKDB_CONNECTION, ) - _DEEPNOTE_DUCKDB_CONNECTION.load_extension(extension_name) - except Exception as e: + _DEEPNOTE_DUCKDB_CONNECTION.load_extension(extension_name) + except Exception as exc: # noqa: BLE001 # Extensions are optional and connection still works, users are able to load # them manually if needed (pulling them from internet in this case as fallback) - logger.warning(f"Failed to load DuckDB {extension_name} extension: {e}") + logger.warning( + "Failed to load DuckDB %s extension: %s", extension_name, exc + )This keeps the current behavior, while making the linter intent explicit and avoiding the f-string logging warning. Based on static analysis hints.
Also applies to: 53-55, 58-71
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
deepnote_toolkit/sql/duckdb_sql.py(2 hunks)tests/unit/test_duckdb_sql.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/unit/test_duckdb_sql.py (1)
deepnote_toolkit/sql/duckdb_sql.py (3)
_get_duckdb_connection(35-81)_set_sample_size(84-85)_set_scan_all_frames(88-91)
🪛 Ruff (0.14.5)
deepnote_toolkit/sql/duckdb_sql.py
68-68: Do not catch blind exception: Exception
(BLE001)
71-71: Logging statement uses f-string
(G004)
tests/unit/test_duckdb_sql.py
28-28: scope='function' is implied in @pytest.fixture()
Remove implied scope argument
(PT003)
37-37: Possible SQL injection vector through string-based query construction
(S608)
46-46: Possible SQL injection vector through string-based query construction
(S608)
67-67: Boolean positional value in function call
(FBT003)
73-73: Boolean positional value in function call
(FBT003)
94-94: Avoid equality comparisons to False; use not result["loaded"].all(): for false checks
Replace with not result["loaded"].all()
(E712)
109-109: Avoid equality comparisons to False; use not result["loaded"].all(): for false checks
Replace with not result["loaded"].all()
(E712)
130-130: Possible SQL injection vector through string-based query construction
(S608)
135-135: Possible SQL injection vector through string-based query construction
(S608)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: Test - Python 3.10
- GitHub Check: Test - Python 3.12
- GitHub Check: Test - Python 3.9
- GitHub Check: Test - Python 3.11
- GitHub Check: Build and push artifacts for Python 3.9
- GitHub Check: Build and push artifacts for Python 3.13
- GitHub Check: Build and push artifacts for Python 3.10
- GitHub Check: Build and push artifacts for Python 3.12
- GitHub Check: Build and push artifacts for Python 3.11
🔇 Additional comments (2)
deepnote_toolkit/sql/duckdb_sql.py (1)
4-7: Imports and logger wiring look soundUsing
duckdb_extensions.import_extensionandLoggerManager().get_logger()here is consistent with the rest of the module and keeps logging centralized; no change needed.Also applies to: 46-46
tests/unit/test_duckdb_sql.py (1)
14-31: Connection reset helper and fixture are appropriate
fresh_duckdb_connectionplus the per-testduckdb_connectionfixture give clean, isolated in‑memory connections and correctly reset the singleton before and after use. Keepingscope="function"explicit here also matches the documented intent. Based on learnings.
Summary by CodeRabbit
New Features
Bug Fixes
Chores
Tests
✏️ Tip: You can customize this high-level summary in your review settings.