Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 11 additions & 9 deletions deepnote_toolkit/ocelots/pandas/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,15 +34,17 @@ def flatten_column_name(item):


def fix_nan_category(df):
for i in range(len(df.columns)):
column = df.iloc[
:, i
] # We need to use iloc because it works if column names have duplicates

# If the column is categorical, we need to create a category for nan
if column.dtype.name == "category":
df.iloc[:, i] = column.cat.add_categories("nan")

# Collect indices of categorical columns to avoid repeated dtype checks
categorical_indices = [
i for i, dtype in enumerate(df.dtypes) if dtype.name == "category"
]
Comment on lines +38 to +40
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider using pandas API for dtype checking.

dtype.name == "category" works but pd.api.types.is_categorical_dtype(dtype) is more idiomatic and robust.

     categorical_indices = [
-        i for i, dtype in enumerate(df.dtypes) if dtype.name == "category"
+        i for i, dtype in enumerate(df.dtypes) if pd.api.types.is_categorical_dtype(dtype)
     ]
🤖 Prompt for AI Agents
In deepnote_toolkit/ocelots/pandas/utils.py around lines 38 to 40, replace the
dtype name string comparison with the pandas API for checking categorical
dtypes: use pd.api.types.is_categorical_dtype(dtype) when filtering df.dtypes so
detection is more idiomatic and robust; update the list comprehension
accordingly and ensure pd.api.types is imported/accessible in the module.

if not categorical_indices:
return df

# Apply add_categories in bulk for categorical columns
for i in categorical_indices:
column = df.iloc[:, i]
df.iloc[:, i] = column.cat.add_categories("nan")
Comment on lines +45 to +47
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Minor: simplify the assignment.

Lines 33-34 can be combined into one statement without the intermediate variable.

     for i in categorical_indices:
-        column = df.iloc[:, i]
-        df.iloc[:, i] = column.cat.add_categories("nan")
+        df.iloc[:, i] = df.iloc[:, i].cat.add_categories("nan")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for i in categorical_indices:
column = df.iloc[:, i]
df.iloc[:, i] = column.cat.add_categories("nan")
for i in categorical_indices:
df.iloc[:, i] = df.iloc[:, i].cat.add_categories("nan")
🤖 Prompt for AI Agents
In deepnote_toolkit/ocelots/pandas/utils.py around lines 32 to 34, the code uses
an intermediate variable 'column' to add a category; simplify by replacing the
two statements with a single assignment that updates the DataFrame column in
place, e.g. assign the result of df.iloc[:, i].cat.add_categories("nan")
directly back to df.iloc[:, i].

return df


Expand Down