-
Notifications
You must be signed in to change notification settings - Fork 5
fix: convert ISO datetime string columns before VegaFusion pre-transform #110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,20 +1,61 @@ | ||
| from typing import Any, List, Optional | ||
| from typing import Any, List, Optional, Set | ||
|
|
||
| import pandas as pd | ||
|
|
||
| import deepnote_toolkit.ocelots as oc | ||
|
|
||
|
|
||
| def sanitize_dataframe_for_chart(pd_df: pd.DataFrame): | ||
| def sanitize_dataframe_for_chart( | ||
| pd_df: pd.DataFrame, temporal_fields: Optional[Set[str]] = None | ||
| ) -> pd.DataFrame: | ||
| sanitized_dataframe = pd_df.copy() | ||
|
|
||
| oc.pandas.utils.deduplicate_columns(sanitized_dataframe) | ||
| _convert_timedelta_columns_to_seconds(sanitized_dataframe) | ||
| _convert_column_names_to_string(sanitized_dataframe) | ||
| _convert_datetime_string_columns(sanitized_dataframe, temporal_fields) | ||
|
|
||
| return sanitized_dataframe | ||
|
|
||
|
|
||
| def _convert_datetime_string_columns( | ||
| pd_df: pd.DataFrame, temporal_fields: Optional[Set[str]] = None | ||
| ) -> None: | ||
| """ | ||
| Converts object columns that contain ISO datetime strings to datetime64. | ||
|
|
||
| VegaFusion treats datetime64 columns as temporal natively. When columns are | ||
| left as object (string), VegaFusion attempts to parse them using the axis | ||
| display format from the Vega-Lite spec (e.g. ``'%B %d, %Y %H:%M'``), which | ||
| fails for ISO 8601 strings and raises a DataFusion ValueError. | ||
|
|
||
| Only columns named in ``temporal_fields`` (fields the spec encodes as | ||
| ``"type": "temporal"``) are converted. This avoids turning nominal string | ||
| axes such as years (``"2020"``), months (``"2024-01"``) or numeric codes | ||
| into time scales just because they happen to parse as ISO 8601. When | ||
| ``temporal_fields`` is ``None`` or empty, no columns are converted. | ||
|
|
||
| WARNING: This function modifies the DataFrame in-place. | ||
| """ | ||
| if not temporal_fields: | ||
| return | ||
|
|
||
| for col in pd_df.columns: | ||
| if col not in temporal_fields: | ||
| continue | ||
| if pd_df[col].dtype != object: | ||
| continue | ||
| non_null = pd_df[col].dropna() | ||
| if len(non_null) == 0: | ||
| continue | ||
| try: | ||
| converted = pd.to_datetime(non_null, format="ISO8601", utc=True, errors="coerce") | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
# Check if minimum pandas version is specified and compatible with ISO8601 format
echo "Checking pandas version constraints..."
# Check pyproject.toml or setup.py for pandas version requirements
fd -t f -e toml -e py -e txt | rg -l "pandas" | head -5 | while read -r file; do
echo "=== $file ==="
rg -C2 "pandas" "$file"
done
# Check if there's a minimum pandas version that guarantees format="ISO8601" support (2.0.0+)
echo -e "\n=== Searching for pandas version constraints ==="
rg -n "pandas\s*[><=]" --type toml --type py --type txtRepository: deepnote/deepnote-toolkit Length of output: 1602 🏁 Script executed: cd deepnote_toolkit/chart && wc -l utils.pyRepository: deepnote/deepnote-toolkit Length of output: 82 🏁 Script executed: cd deepnote_toolkit/chart && sed -n '45,60p' utils.pyRepository: deepnote/deepnote-toolkit Length of output: 600 🏁 Script executed: cd deepnote_toolkit/chart && sed -n '35,75p' utils.pyRepository: deepnote/deepnote-toolkit Length of output: 1332 🏁 Script executed: rg -n "pandas|version|ISO8601" deepnote_toolkit/chart/utils.pyRepository: deepnote/deepnote-toolkit Length of output: 356 Handle pandas < 2.0.0 incompatibility with The 🤖 Prompt for AI Agents |
||
| if converted.notna().all(): | ||
| pd_df[col] = pd.to_datetime(pd_df[col], format="ISO8601", utc=True, errors="coerce") | ||
| except Exception: | ||
| pass | ||
|
|
||
|
|
||
| def _convert_column_names_to_string(pd_df: pd.DataFrame): | ||
| """ | ||
| Converts dataframe column names to strings. | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
Use explicit type hint for return value.
Return type should be
Set[str]instead ofsetper coding guidelines.📝 Proposed fix
Add
Setto imports at top of file:+from typing import Set🤖 Prompt for AI Agents
Source: Coding guidelines