Skip to content

fix(sql_execution): Fix is large number check to use 2**53 as cutoff#73

Draft
tkislan wants to merge 6 commits intomainfrom
tk/fix-is-large-number-serialization
Draft

fix(sql_execution): Fix is large number check to use 2**53 as cutoff#73
tkislan wants to merge 6 commits intomainfrom
tk/fix-is-large-number-serialization

Conversation

@tkislan
Copy link
Contributor

@tkislan tkislan commented Mar 10, 2026

Summary by CodeRabbit

  • Bug Fixes

    • Integers above the float64 exact range (2**53) and other unsafe numeric values are now represented as strings in query results, JSON, and Parquet exports.
  • New Features

    • Added numeric-safety utilities that detect large/unsafe numeric values and convert only affected columns to strings during serialization.
  • Tests

    • Expanded coverage for boundary values, mixed-type columns, Decimals, infinities/NaN, and large-integer behavior.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 10, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: fa1482f6-e439-461d-bbd6-c5841dc8a4cc

📥 Commits

Reviewing files that changed from the base of the PR and between ed33db8 and ed6dea6.

📒 Files selected for processing (1)
  • deepnote_toolkit/ocelots/pandas/utils.py

📝 Walkthrough

Walkthrough

Replaces a local large-number helper in deepnote_toolkit/sql/sql_execution.py with is_large_number from deepnote_toolkit.ocelots.pandas.utils and removes the local _is_large_number and Decimal import. Adds MAX_SAFE_FLOAT64_INTEGER, is_large_number, and cast_large_numbers_to_string in deepnote_toolkit/ocelots/pandas/utils.py. Calls cast_large_numbers_to_string(df_copy) from deepnote_toolkit/ocelots/pandas/implementation.py inside to_records when mode="json". Expands tests and fixtures to cover 2**53 boundary behavior, large integers, Decimals, mixed-type columns, and precision-preservation cases.

Sequence Diagram(s)

sequenceDiagram
  participant Caller
  participant Ocelots as Ocelots.to_records
  participant Utils as utils.cast_large_numbers_to_string
  participant DF as DataFrame
  Caller->>Ocelots: to_records(df_copy, mode="json")
  Ocelots->>Utils: cast_large_numbers_to_string(df_copy)
  Utils->>DF: inspect columns using is_large_number / MAX_SAFE_FLOAT64_INTEGER
  DF-->>Utils: convert large-number columns to strings
  Utils-->>Ocelots: return sanitized df_copy
  Ocelots->>Caller: return records (from sanitized df_copy)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 73.91% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Updates Docs ⚠️ Warning No documentation updates found in deepnote-toolkit repo for new public API functions. Primary repos cannot be verified. Update docs in deepnote-toolkit repo and verify updates in deepnote/deepnote OSS and deepnote/deepnote-internal repos.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: refactoring large-number detection to use 2**53 as the float64 precision cutoff across multiple modules.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link

github-actions bot commented Mar 10, 2026

📦 Python package built successfully!

  • Version: 2.1.2.dev15+ee8ddc8
  • Wheel: deepnote_toolkit-2.1.2.dev15+ee8ddc8-py3-none-any.whl
  • Install:
    pip install "deepnote-toolkit @ https://deepnote-staging-runtime-artifactory.s3.amazonaws.com/deepnote-toolkit-packages/2.1.2.dev15%2Bee8ddc8/deepnote_toolkit-2.1.2.dev15%2Bee8ddc8-py3-none-any.whl"

coderabbitai[bot]
coderabbitai bot previously approved these changes Mar 10, 2026
@deepnote-bot
Copy link

deepnote-bot commented Mar 10, 2026

🚀 Review App Deployment Started

📝 Description 🌐 Link / Info
🌍 Review application ra-73
🔑 Sign-in URL Click to sign-in
📊 Application logs View logs
🔄 Actions Click to redeploy
🚀 ArgoCD deployment View deployment
Last deployed 2026-03-10 20:43:05 (UTC)
📜 Deployed commit 6128b89bd47afba10f13ecb12e7d5895c6444e95
🛠️ Toolkit version ee8ddc8

@codecov
Copy link

codecov bot commented Mar 10, 2026

❌ 1 Tests Failed:

Tests completed Failed Passed Skipped
874 1 873 38
View the top 1 failed test(s) by shortest run time
tests/unit/test_sql_execution_internal.py::test_is_large_number
Stack Traces | 0.001s run time
tests/unit/test_sql_execution_internal.py:250: in test_is_large_number
    assert se._is_large_number(2**53) is False
E   AttributeError: module 'deepnote_toolkit.sql.sql_execution' has no attribute '_is_large_number'. Did you mean: 'is_large_number'?

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

…JSON compatibility

- Introduced `cast_large_numbers_to_string` function to convert numeric values exceeding the float64 safe integer range (2**53) to strings, preserving precision for JSON serialization.
- Updated `PandasImplementation.to_json` method to utilize the new function.
- Added unit tests to ensure correct behavior for large numbers in dataframes.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@deepnote_toolkit/ocelots/pandas/utils.py`:
- Around line 119-120: The return statement exceeds the 88-character limit;
break it into multiple shorter lines by splitting the isinstance check and the
comparison across lines or by assigning parts to temporaries: e.g., compute a
boolean like is_num = isinstance(x, (int, float, Decimal, np.integer,
np.floating)) on one line and then return is_num and abs(x) >
MAX_SAFE_FLOAT64_INTEGER on the next, updating the return in the function
containing that line to use these shorter lines.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: fa49302c-c577-464b-8417-e4db1b3b012a

📥 Commits

Reviewing files that changed from the base of the PR and between b12a99b and 731e71b.

📒 Files selected for processing (5)
  • deepnote_toolkit/ocelots/pandas/implementation.py
  • deepnote_toolkit/ocelots/pandas/utils.py
  • deepnote_toolkit/sql/sql_execution.py
  • tests/unit/helpers/testing_dataframes.py
  • tests/unit/test_dataframe_utils.py

tkislan added 2 commits March 10, 2026 19:11
- Reformatted the `is_large_number` and `cast_large_numbers_to_string` functions for better readability by using multi-line expressions.
- Updated unit tests to call the `is_large_number` function directly instead of through a different module, ensuring consistency and clarity in test cases.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@deepnote_toolkit/ocelots/pandas/utils.py`:
- Around line 128-142: The function cast_large_numbers_to_string is missing
explicit type hints; update its signature to include pandas DataFrame input and
return types (e.g., df: pd.DataFrame -> pd.DataFrame) and ensure pandas is
imported as pd in the module if not already; keep the implementation and
references to helpers (is_pure_numeric, is_large_number, safe_convert_to_string)
unchanged so callers and static type checkers recognize the typed signature.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: ff36bff8-c27f-468a-83ad-596bc6377706

📥 Commits

Reviewing files that changed from the base of the PR and between 731e71b and 7fbfcf1.

📒 Files selected for processing (3)
  • deepnote_toolkit/ocelots/pandas/utils.py
  • deepnote_toolkit/sql/sql_execution.py
  • tests/unit/test_sql_execution_internal.py

Add explicit pd.DataFrame input and return type annotations to
cast_large_numbers_to_string so callers and static type checkers
(mypy) recognise the typed signature.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@deepnote_toolkit/ocelots/pandas/utils.py`:
- Around line 112-125: Add an explicit type hint for the parameter in
is_large_number by changing its signature to is_large_number(x: Any) -> bool and
ensure typing.Any is imported at the top of the module; keep the existing
isinstance checks (Decimal, np.integer, np.floating, int, float) and return
logic using MAX_SAFE_FLOAT64_INTEGER unchanged so type checking tools accept the
function signature.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 845868d1-da3a-41ce-a2a6-36905b079c94

📥 Commits

Reviewing files that changed from the base of the PR and between 7fbfcf1 and ed33db8.

📒 Files selected for processing (1)
  • deepnote_toolkit/ocelots/pandas/utils.py

Import typing.Any and annotate the x parameter of is_large_number
so static type checkers (mypy) accept the function signature.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants