Skip to content

Conversation

@m1so
Copy link
Member

@m1so m1so commented Nov 19, 2025

Summary by CodeRabbit

  • New Features

    • DuckDB spatial and Excel extensions bundled for offline use; sample-size and scan-all-frames settings enabled when supported.
  • Bug Fixes

    • More resilient extension handling: connection remains usable if extension install/load fails and warnings are logged.
  • Chores

    • Packaged DuckDB extension dependencies added to avoid fetching at startup.
  • Tests

    • Added unit and integration tests covering extensions, connection behavior, settings toggles, failure scenarios, and Excel read/write round-trip.

✏️ Tip: You can customize this high-level summary in your review settings.

@linear
Copy link

linear bot commented Nov 19, 2025

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 19, 2025

📝 Walkthrough

Walkthrough

The PR updates _get_duckdb_connection to create a logger, iterate over extensions ["spatial", "excel"], and attempt to import_extension(name=..., force_install=True, con=_DEEPNOTE_DUCKDB_CONNECTION) then load_extension(extension_name) inside try/except blocks that log warnings on failure so the connection remains usable if extensions are unavailable. It retains setting of pandas_analyze_sample and adds a conditional to enable python_scan_all_frames for DuckDB >= 1.1.0. pyproject.toml adds duckdb-extensions, duckdb-extension-spatial, and duckdb-extension-excel dependencies (noting they are baked in). Unit tests exercise extension installation/loading (including mocked failures), the connection singleton, sample-size and scan-all-frames settings, and an Excel read/write round‑trip.

Sequence Diagram

sequenceDiagram
    participant App as Application
    participant getConn as _get_duckdb_connection()
    participant importExt as import_extension()
    participant DuckDB as DuckDB Connection
    participant Logger as Logger

    App->>getConn: request/init connection
    getConn->>Logger: create logger

    rect rgb(248,249,250)
    Note over getConn,importExt: Loop over extensions ["spatial","excel"]
    loop for each extension
        getConn->>importExt: import_extension(name, force_install=True, con)
        alt import_ext success
            importExt-->>getConn: wheel available
            getConn->>DuckDB: load_extension(extension_name)
            alt load success
                DuckDB-->>getConn: extension loaded
            else load fails
                DuckDB-->>getConn: error
                getConn->>Logger: logger.warning(...)
            end
        else import_ext fails
            importExt-->>getConn: error
            getConn->>Logger: logger.warning(...)
        end
    end
    end

    getConn->>DuckDB: set pandas_analyze_sample
    alt DuckDB >= 1.1.0
        getConn->>DuckDB: set python_scan_all_frames = true/false
    end

    getConn-->>App: return connection (extensions attempted, connection usable)
Loading

Pre-merge checks

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 10.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: baking in spatial and excel extensions to avoid internet fetches.

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 1b8269a and 4d932e7.

📒 Files selected for processing (1)
  • tests/unit/test_duckdb_sql.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/unit/test_duckdb_sql.py (1)
deepnote_toolkit/sql/duckdb_sql.py (3)
  • _get_duckdb_connection (35-81)
  • _set_sample_size (84-85)
  • _set_scan_all_frames (88-91)
🪛 Ruff (0.14.5)
tests/unit/test_duckdb_sql.py

28-28: scope='function' is implied in @pytest.fixture()

Remove implied scope argument

(PT003)


37-37: Possible SQL injection vector through string-based query construction

(S608)


46-46: Possible SQL injection vector through string-based query construction

(S608)


67-67: Boolean positional value in function call

(FBT003)


73-73: Boolean positional value in function call

(FBT003)


130-130: Possible SQL injection vector through string-based query construction

(S608)


135-135: Possible SQL injection vector through string-based query construction

(S608)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: Build and push artifacts for Python 3.12
  • GitHub Check: Build and push artifacts for Python 3.13
  • GitHub Check: Build and push artifacts for Python 3.9
  • GitHub Check: Build and push artifacts for Python 3.10
  • GitHub Check: Build and push artifacts for Python 3.11
  • GitHub Check: Test - Python 3.9
  • GitHub Check: Test - Python 3.11
  • GitHub Check: Test - Python 3.13
  • GitHub Check: Test - Python 3.10
🔇 Additional comments (3)
tests/unit/test_duckdb_sql.py (3)

14-26: Good test isolation pattern.

Context manager properly resets singleton state and ensures cleanup.


51-55: LGTM.

Singleton test is straightforward and correct.


58-63: LGTM.

Sample size test correctly validates the setting.


Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link

codecov bot commented Nov 19, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.92%. Comparing base (9dadd3d) to head (4d932e7).
⚠️ Report is 1 commits behind head on main.
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #36      +/-   ##
==========================================
+ Coverage   72.89%   72.92%   +0.03%     
==========================================
  Files          94       94              
  Lines        5168     5175       +7     
  Branches      754      755       +1     
==========================================
+ Hits         3767     3774       +7     
  Misses       1157     1157              
  Partials      244      244              
Flag Coverage Δ
combined 72.92% <100.00%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@github-actions
Copy link

github-actions bot commented Nov 19, 2025

📦 Python package built successfully!

  • Version: 1.1.4.dev7+0589f07
  • Wheel: deepnote_toolkit-1.1.4.dev7+0589f07-py3-none-any.whl
  • Install:
    pip install "deepnote-toolkit @ https://deepnote-staging-runtime-artifactory.s3.amazonaws.com/deepnote-toolkit-packages/1.1.4.dev7%2B0589f07/deepnote_toolkit-1.1.4.dev7%2B0589f07-py3-none-any.whl"

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 9dadd3d and e136775.

⛔ Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (3)
  • deepnote_toolkit/sql/duckdb_sql.py (2 hunks)
  • pyproject.toml (1 hunks)
  • tests/unit/test_duckdb_sql.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/unit/test_duckdb_sql.py (1)
deepnote_toolkit/sql/duckdb_sql.py (3)
  • _get_duckdb_connection (33-71)
  • _set_sample_size (74-75)
  • _set_scan_all_frames (78-81)
🪛 Ruff (0.14.5)
tests/unit/test_duckdb_sql.py

11-11: scope='function' is implied in @pytest.fixture()

Remove implied scope argument

(PT003)


29-29: Possible SQL injection vector through string-based query construction

(S608)


38-38: Possible SQL injection vector through string-based query construction

(S608)


59-59: Boolean positional value in function call

(FBT003)


65-65: Boolean positional value in function call

(FBT003)


90-90: Possible SQL injection vector through string-based query construction

(S608)


95-95: Possible SQL injection vector through string-based query construction

(S608)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: Test - Python 3.13
  • GitHub Check: Test - Python 3.9
  • GitHub Check: Test - Python 3.11
  • GitHub Check: Test - Python 3.10
  • GitHub Check: Test - Python 3.12
  • GitHub Check: Typecheck - 3.13
  • GitHub Check: Typecheck - 3.12
🔇 Additional comments (6)
deepnote_toolkit/sql/duckdb_sql.py (1)

4-4: LGTM: Clean import.

tests/unit/test_duckdb_sql.py (5)

26-40: LGTM: Comprehensive extension verification.

Parametrized test correctly verifies both extensions are installed and loaded. SQL injection warnings are false positives—extension names are from controlled parametrize values.


43-47: LGTM: Singleton pattern verified.


50-55: LGTM: Configuration test.


58-69: LGTM: Toggle behavior verified.

Boolean positional arguments are clear in context despite static analysis warning.


72-97: LGTM: Thorough Excel integration test.

Tests both spatial and excel extensions' ability to read Excel files, verifying round-trip integrity. SQL injection warnings are false positives—paths are from controlled tmp_path fixture.

coderabbitai[bot]
coderabbitai bot previously approved these changes Nov 19, 2025
@deepnote-bot
Copy link

deepnote-bot commented Nov 19, 2025

🚀 Review App Deployment Started

📝 Description 🌐 Link / Info
🌍 Review application ra-36
🔑 Sign-in URL Click to sign-in
📊 Application logs View logs
🔄 Actions Click to redeploy
🚀 ArgoCD deployment View deployment
Last deployed 2025-11-20 10:48:03 (UTC)
📜 Deployed commit 8193ae825d803a1f78bedf02f63e8a236eb60bdf
🛠️ Toolkit version 0589f07

@m1so m1so marked this pull request as ready for review November 19, 2025 16:04
@m1so m1so requested a review from a team as a code owner November 19, 2025 16:04
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between e136775 and 49419a7.

📒 Files selected for processing (1)
  • deepnote_toolkit/sql/duckdb_sql.py (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: Test - Python 3.11
  • GitHub Check: Test - Python 3.9
  • GitHub Check: Test - Python 3.10
  • GitHub Check: Build and push artifacts for Python 3.13
  • GitHub Check: Build and push artifacts for Python 3.12
  • GitHub Check: Build and push artifacts for Python 3.10
  • GitHub Check: Build and push artifacts for Python 3.11
  • GitHub Check: Build and push artifacts for Python 3.9
🔇 Additional comments (2)
deepnote_toolkit/sql/duckdb_sql.py (2)

4-4: LGTM.

Import is necessary for the new bundled extension approach.


57-61: Version constraints are already properly configured.

Extension packages (spatial, excel) and DuckDB are pinned to matching version ranges (1.x) in pyproject.toml. No changes needed.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 49419a7 and 7ce9993.

📒 Files selected for processing (1)
  • deepnote_toolkit/sql/duckdb_sql.py (2 hunks)
🧰 Additional context used
🪛 Ruff (0.14.5)
deepnote_toolkit/sql/duckdb_sql.py

68-68: Do not catch blind exception: Exception

(BLE001)


69-69: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


69-69: Logging statement uses f-string

(G004)

🔇 Additional comments (2)
deepnote_toolkit/sql/duckdb_sql.py (2)

4-8: LGTM: Imports support new extension loading.

The new imports are correct and necessary for bundled extension management.


46-59: LGTM: Logger and comments are clear.

Logger initialization is correct, and comments accurately explain the extension loading strategy and future considerations.

coderabbitai[bot]
coderabbitai bot previously approved these changes Nov 19, 2025
Artmann
Artmann previously approved these changes Nov 19, 2025
Copy link
Member

@Artmann Artmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

FilipPyrek
FilipPyrek previously approved these changes Nov 19, 2025
@m1so m1so dismissed stale reviews from FilipPyrek, Artmann, and coderabbitai[bot] via 1b8269a November 20, 2025 10:31
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
deepnote_toolkit/sql/duckdb_sql.py (1)

39-41: Tighten logging style and align docstring with behavior

Implementation looks good and matches the tests; a couple of small polish points:

  • The docstring still says “installs the spatial extension” while you now handle both spatial and excel; consider updating it to mention both.
  • You intentionally need a broad except Exception here, but Ruff flags it; you can acknowledge that and fix the logging hint at the same time:
-        # Install and load the spatial extension. Primary use case: reading xlsx files
+        # Install and load spatial and excel extensions. Primary use case: reading xlsx files
@@
-        for extension_name in ["spatial", "excel"]:
-            try:
+        for extension_name in ["spatial", "excel"]:
+            try:
                 import_extension(
                     name=extension_name,
                     force_install=True,
                     con=_DEEPNOTE_DUCKDB_CONNECTION,
                 )
-                _DEEPNOTE_DUCKDB_CONNECTION.load_extension(extension_name)
-            except Exception as e:
+                _DEEPNOTE_DUCKDB_CONNECTION.load_extension(extension_name)
+            except Exception as exc:  # noqa: BLE001
                 # Extensions are optional and connection still works, users are able to load
                 # them manually if needed (pulling them from internet in this case as fallback)
-                logger.warning(f"Failed to load DuckDB {extension_name} extension: {e}")
+                logger.warning(
+                    "Failed to load DuckDB %s extension: %s", extension_name, exc
+                )

This keeps the current behavior, while making the linter intent explicit and avoiding the f-string logging warning. Based on static analysis hints.

Also applies to: 53-55, 58-71

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 7ce9993 and 1b8269a.

📒 Files selected for processing (2)
  • deepnote_toolkit/sql/duckdb_sql.py (2 hunks)
  • tests/unit/test_duckdb_sql.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/unit/test_duckdb_sql.py (1)
deepnote_toolkit/sql/duckdb_sql.py (3)
  • _get_duckdb_connection (35-81)
  • _set_sample_size (84-85)
  • _set_scan_all_frames (88-91)
🪛 Ruff (0.14.5)
deepnote_toolkit/sql/duckdb_sql.py

68-68: Do not catch blind exception: Exception

(BLE001)


71-71: Logging statement uses f-string

(G004)

tests/unit/test_duckdb_sql.py

28-28: scope='function' is implied in @pytest.fixture()

Remove implied scope argument

(PT003)


37-37: Possible SQL injection vector through string-based query construction

(S608)


46-46: Possible SQL injection vector through string-based query construction

(S608)


67-67: Boolean positional value in function call

(FBT003)


73-73: Boolean positional value in function call

(FBT003)


94-94: Avoid equality comparisons to False; use not result["loaded"].all(): for false checks

Replace with not result["loaded"].all()

(E712)


109-109: Avoid equality comparisons to False; use not result["loaded"].all(): for false checks

Replace with not result["loaded"].all()

(E712)


130-130: Possible SQL injection vector through string-based query construction

(S608)


135-135: Possible SQL injection vector through string-based query construction

(S608)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: Test - Python 3.10
  • GitHub Check: Test - Python 3.12
  • GitHub Check: Test - Python 3.9
  • GitHub Check: Test - Python 3.11
  • GitHub Check: Build and push artifacts for Python 3.9
  • GitHub Check: Build and push artifacts for Python 3.13
  • GitHub Check: Build and push artifacts for Python 3.10
  • GitHub Check: Build and push artifacts for Python 3.12
  • GitHub Check: Build and push artifacts for Python 3.11
🔇 Additional comments (2)
deepnote_toolkit/sql/duckdb_sql.py (1)

4-7: Imports and logger wiring look sound

Using duckdb_extensions.import_extension and LoggerManager().get_logger() here is consistent with the rest of the module and keeps logging centralized; no change needed.

Also applies to: 46-46

tests/unit/test_duckdb_sql.py (1)

14-31: Connection reset helper and fixture are appropriate

fresh_duckdb_connection plus the per-test duckdb_connection fixture give clean, isolated in‑memory connections and correctly reset the singleton before and after use. Keeping scope="function" explicit here also matches the documented intent. Based on learnings.

@m1so m1so merged commit 249e9d6 into main Nov 20, 2025
35 checks passed
@m1so m1so deleted the michalbaumgartner/blu-5180-bake-in-duckdb-extensions-to-not-load-them-from-internet branch November 20, 2025 10:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants