-
Notifications
You must be signed in to change notification settings - Fork 733
[BUG] EXPERIMENTAL PR: Solve the bug in data_module
#1834
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This solves the bug that was found while writing the integration tests for |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1834 +/- ##
=======================================
Coverage ? 86.82%
=======================================
Files ? 51
Lines ? 5668
Branches ? 0
=======================================
Hits ? 4921
Misses ? 747
Partials ? 0
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
fkiraly
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It fixes the bug, so it is approved.
But, the new code seems unnecessarily complicated?
Is it not possible to avoid the loops by simply using array operations smartly?
|
Sure I'll try to find a better way |
|
the "basic principle" is to replace sth like my_list = [] * len(my_vector)
for i, x in enumerate(my_vector):
my_list[i] = 42*xby something like my_list = list(42*my_vector) |
so we can use this approach by creating a mask right? then this mask can easily be mutiplied to get the final is_categorical_mask = [
self.data_module.time_series_metadata["col_type"].get(col_name) == "C"
for col_name in static_col_names
]
is_continuous_mask = [not b for b in is_categorical_mask]
st_cat_values_for_item = raw_st_tensor[is_categorical_mask]
st_cont_values_for_item = raw_st_tensor[is_continuous_mask](similar to what we do in Thinking of which, we can use tensors here, then in as in place of is_continuous_mask = [not b for b in is_categorical_mask]it will just be is_continuous_mask = ~is_categorical_maskthat is simpler? |
|
yes, exactly |
|
plus, naively, I would expect something like is_categorical_mask = self.data_module.time_series_metadata["col_type"] == "C"to work. |
We just need the |
fkiraly
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think this is better!
|
Hello! I have a comment on the current situation: we are assuming that the data are already sorted and there is no gap in the temporal index. Since the output is taken and windowing in the d2 layer, we need to be sure that all the data is equally spaced. |
Yes! ig we should do this, but I think we could add it in next iterations? First complete a basic e2e prototype? We have added the warning, so the users know the code still has loop holes. |
|
Ok it was just a comment, we need to remember this :-) |
This PR solves the bug in `data_module` where the `static_categorical_features` and `static_continuous_features` were not correctly calculated in `__getitem__` of nested class
This PR solves the bug in `data_module` where the `static_categorical_features` and `static_continuous_features` were not correctly calculated in `__getitem__` of nested class
This PR solves the bug in
data_modulewhere thestatic_categorical_featuresandstatic_continuous_featureswere not correctly calculated in__getitem__of nested class