Skip to content

Conversation

@phoeenniixx
Copy link
Member

This PR solves the bug in data_module where the static_categorical_features and static_continuous_features were not correctly calculated in __getitem__ of nested class

@phoeenniixx
Copy link
Member Author

This solves the bug that was found while writing the integration tests for tft model in #1812

@codecov
Copy link

codecov bot commented May 14, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (main@5685c59). Learn more about missing BASE report.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1834   +/-   ##
=======================================
  Coverage        ?   86.82%           
=======================================
  Files           ?       51           
  Lines           ?     5668           
  Branches        ?        0           
=======================================
  Hits            ?     4921           
  Misses          ?      747           
  Partials        ?        0           
Flag Coverage Δ
cpu 86.82% <100.00%> (?)
pytest 86.82% <100.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

fkiraly
fkiraly previously approved these changes May 14, 2025
Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It fixes the bug, so it is approved.

But, the new code seems unnecessarily complicated?
Is it not possible to avoid the loops by simply using array operations smartly?

@phoeenniixx
Copy link
Member Author

Sure I'll try to find a better way

@fkiraly fkiraly added the bug Something isn't working label May 14, 2025
@fkiraly
Copy link
Collaborator

fkiraly commented May 14, 2025

the "basic principle" is to replace sth like

my_list = [] * len(my_vector)
for i, x in enumerate(my_vector):
    my_list[i] = 42*x

by something like

my_list = list(42*my_vector)

@phoeenniixx
Copy link
Member Author

phoeenniixx commented May 14, 2025

the "basic principle" is to replace sth like

my_list = [] * len(my_vector)
for i, x in enumerate(my_vector):
    my_list[i] = 42*x

by something like

my_list = list(42*my_vector)

so we can use this approach by creating a mask right? then this mask can easily be mutiplied to get the final st_cat_values_for_item?
Like

is_categorical_mask = [
                self.data_module.time_series_metadata["col_type"].get(col_name) == "C"
                for col_name in static_col_names
            ]
is_continuous_mask = [not b for b in is_categorical_mask]
st_cat_values_for_item = raw_st_tensor[is_categorical_mask] 
st_cont_values_for_item = raw_st_tensor[is_continuous_mask]

(similar to what we do in timeseries to get the groups)

Thinking of which, we can use tensors here, then in as in place of

is_continuous_mask = [not b for b in is_categorical_mask]

it will just be

is_continuous_mask = ~is_categorical_mask

that is simpler?

@fkiraly fkiraly moved this to PR in progress in May - Sep 2025 mentee projects May 15, 2025
@fkiraly fkiraly moved this from PR in progress to PR under review in May - Sep 2025 mentee projects May 15, 2025
@fkiraly
Copy link
Collaborator

fkiraly commented May 15, 2025

yes, exactly

@fkiraly
Copy link
Collaborator

fkiraly commented May 15, 2025

plus, naively, I would expect something like

is_categorical_mask =  self.data_module.time_series_metadata["col_type"] == "C"

to work.

@phoeenniixx
Copy link
Member Author

plus, naively, I would expect something like

is_categorical_mask =  self.data_module.time_series_metadata["col_type"] == "C"

to work.

We just need the static_categorical in this case, not all categorical
and i am not sure this comparison works with dict, I am sure similar way is possible with pandas, but with dict?

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think this is better!

@fkiraly fkiraly merged commit bfbe6b2 into sktime:main May 16, 2025
34 of 35 checks passed
@github-project-automation github-project-automation bot moved this from PR under review to Done in May - Sep 2025 mentee projects May 16, 2025
@phoeenniixx phoeenniixx deleted the datamodule-bug branch May 16, 2025 20:45
@agobbifbk
Copy link

Hello! I have a comment on the current situation: we are assuming that the data are already sorted and there is no gap in the temporal index. Since the output is taken and windowing in the d2 layer, we need to be sure that all the data is equally spaced.
Maybe it is better to check it somehow during the getitem, and in a second moment to enlarge the dataset and put nans in the holes. What do you think?

@phoeenniixx
Copy link
Member Author

Hello! I have a comment on the current situation: we are assuming that the data are already sorted and there is no gap in the temporal index. Since the output is taken and windowing in the d2 layer, we need to be sure that all the data is equally spaced. Maybe it is better to check it somehow during the getitem, and in a second moment to enlarge the dataset and put nans in the holes. What do you think?

Yes! ig we should do this, but I think we could add it in next iterations? First complete a basic e2e prototype? We have added the warning, so the users know the code still has loop holes.

@agobbifbk
Copy link

Ok it was just a comment, we need to remember this :-)

PranavBhatP pushed a commit to PranavBhatP/pytorch-forecasting that referenced this pull request May 22, 2025
This PR solves the bug in `data_module` where the
`static_categorical_features` and `static_continuous_features` were not
correctly calculated in `__getitem__` of nested class
PranavBhatP pushed a commit to PranavBhatP/pytorch-forecasting that referenced this pull request May 22, 2025
This PR solves the bug in `data_module` where the
`static_categorical_features` and `static_continuous_features` were not
correctly calculated in `__getitem__` of nested class
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

Status: Fixed/resolved

Development

Successfully merging this pull request may close these issues.

3 participants