-
Notifications
You must be signed in to change notification settings - Fork 65
[ENH] skforecast integration for time series hyperparameter tuning
#208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
ab1768d
bbb153f
fbcbba3
3d92878
7c26c0c
87da9db
ca013a4
3e7f96a
59d05ce
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,63 @@ | ||
| """ | ||
| Skforecast Integration Example - Hyperparameter Tuning for Time Series Forecasting | ||
|
|
||
| This example demonstrates how to use Hyperactive to tune hyperparameters of a | ||
| skforecast ForecasterRecursive model. It uses the SkforecastOptCV class which | ||
| provides a familiar sklearn-like API for integrating skforecast models with | ||
| Hyperactive's optimization algorithms. | ||
|
|
||
| Characteristics: | ||
| - Integration with skforecast's backtesting functionality | ||
| - Tuning of regressor hyperparameters (e.g., RandomForestRegressor) | ||
| - Uses HillClimbing optimizer (can be swapped for any Hyperactive optimizer) | ||
| - Time series cross-validation via backtesting | ||
| """ | ||
|
|
||
| import numpy as np | ||
| import pandas as pd | ||
| from skforecast.recursive import ForecasterRecursive | ||
| from sklearn.ensemble import RandomForestRegressor | ||
| from hyperactive.opt import HillClimbing | ||
| from hyperactive.integrations.skforecast import SkforecastOptCV | ||
|
|
||
| # Generate synthetic data | ||
| data = pd.Series( | ||
| np.random.randn(100), | ||
| index=pd.date_range(start="2020-01-01", periods=100, freq="D"), | ||
| name="y", | ||
| ) | ||
|
|
||
| # Define forecaster | ||
| forecaster = ForecasterRecursive( | ||
| regressor=RandomForestRegressor(random_state=123), lags=5 | ||
| ) | ||
|
|
||
| # Define optimizer | ||
| optimizer = HillClimbing( | ||
| search_space={ | ||
| "n_estimators": list(range(10, 100, 10)), | ||
| "max_depth": list(range(2, 10)), | ||
| }, | ||
| n_iter=10, | ||
| ) | ||
|
|
||
| # Define SkforecastOptCV | ||
| opt_cv = SkforecastOptCV( | ||
| forecaster=forecaster, | ||
| optimizer=optimizer, | ||
| steps=5, | ||
| metric="mean_squared_error", | ||
| initial_train_size=50, | ||
| verbose=True, | ||
| ) | ||
|
|
||
| # Fit | ||
| print("Fitting...") | ||
| opt_cv.fit(y=data) | ||
|
|
||
| # Predict | ||
| print("Predicting...") | ||
| predictions = opt_cv.predict(steps=5) | ||
| print("Predictions:") | ||
| print(predictions) | ||
| print("Best params:", opt_cv.best_params_) |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -52,6 +52,16 @@ sklearn-integration = [ | |
| sktime-integration = [ | ||
| "skpro", | ||
| 'sktime; python_version < "3.14"', | ||
| 'skforecast; python_version < "3.14"', | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not sure if it makes sense to add it in here. I would rather leave it out. Edit:
@fkiraly Is it enough, that the dependency is added to the general "integrations" or does it really belong with sktime integrations? |
||
| ] | ||
| skforecast-integration = [ | ||
| 'skforecast; python_version < "3.14"', | ||
| ] | ||
| integrations = [ | ||
| "scikit-learn <1.8.0", | ||
| "skpro", | ||
| 'sktime; python_version < "3.14"', | ||
| 'skforecast; python_version < "3.14"', | ||
| ] | ||
| build = [ | ||
| "setuptools", | ||
|
|
@@ -77,7 +87,6 @@ all_extras = [ | |
| "lightning", | ||
| ] | ||
|
|
||
|
|
||
| [project.urls] | ||
| "Homepage" = "https://github.com/SimonBlanke/Hyperactive" | ||
| "Bug Reports" = "https://github.com/SimonBlanke/Hyperactive/issues" | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,230 @@ | ||
| """Experiment adapter for skforecast backtesting experiments.""" | ||
| # copyright: hyperactive developers, MIT License (see LICENSE file) | ||
|
|
||
| import copy | ||
|
|
||
| from hyperactive.base import BaseExperiment | ||
|
|
||
|
|
||
| class SkforecastExperiment(BaseExperiment): | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A |
||
| """Experiment adapter for skforecast backtesting experiments. | ||
| This class is used to perform backtesting experiments using a given | ||
| skforecast forecaster. It allows for hyperparameter tuning and evaluation of | ||
| the model's performance. | ||
| Parameters | ||
| ---------- | ||
| forecaster : skforecast forecaster | ||
| skforecast forecaster to benchmark. | ||
| y : pandas Series | ||
| Target time series used in the evaluation experiment. | ||
| exog : pandas Series or DataFrame, default=None | ||
| Exogenous variable/s used in the evaluation experiment. | ||
| steps : int | ||
| Number of steps to predict. | ||
| metric : str or callable | ||
Omswastik-11 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| Metric used to quantify the goodness of fit of the model. | ||
| If string, it must be a metric name allowed by skforecast | ||
| (e.g., 'mean_squared_error'). | ||
| If callable, it must take (y_true, y_pred) and return a float. | ||
| initial_train_size : int | ||
| Number of samples in the initial training set. | ||
| refit : bool, default=False | ||
| Whether to re-fit the forecaster in each iteration. | ||
| fixed_train_size : bool, default=False | ||
| If True, the train size doesn't increase but moves by `steps` in each iteration. | ||
| gap : int, default=0 | ||
| Number of samples to exclude from the end of each training set and the | ||
| start of the test set. | ||
| allow_incomplete_fold : bool, default=True | ||
| If True, the last fold is allowed to have fewer samples than `steps`. | ||
| return_best : bool, default=False | ||
| If True, the best model is returned. | ||
| n_jobs : int or 'auto', default="auto" | ||
| Number of jobs to run in parallel. | ||
| verbose : bool, default=False | ||
| Print summary figures. | ||
| show_progress : bool, default=False | ||
| Whether to show a progress bar. | ||
| higher_is_better : bool, default=False | ||
| Whether higher metric values indicate better performance. | ||
| Set to False (default) for error metrics like MSE, MAE, MAPE where | ||
| lower values are better. Set to True for metrics like R2 where | ||
| higher values indicate better model performance. | ||
| """ | ||
|
|
||
Omswastik-11 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| _tags = { | ||
| "authors": ["Omswastik-11", "JoaquinAmatRodrigo"], | ||
| "maintainers": ["Omswastik-11", "fkiraly", "JoaquinAmatRodrigo", "SimonBlanke"], | ||
| "python_dependencies": "skforecast", | ||
| } | ||
|
|
||
| def __init__( | ||
| self, | ||
| forecaster, | ||
| y, | ||
| steps, | ||
| metric, | ||
| initial_train_size, | ||
| exog=None, | ||
| refit=False, | ||
| fixed_train_size=False, | ||
| gap=0, | ||
| allow_incomplete_fold=True, | ||
| return_best=False, | ||
| n_jobs="auto", | ||
| verbose=False, | ||
| show_progress=False, | ||
| higher_is_better=False, | ||
| ): | ||
| self.forecaster = forecaster | ||
| self.y = y | ||
| self.steps = steps | ||
| self.metric = metric | ||
| self.initial_train_size = initial_train_size | ||
| self.exog = exog | ||
| self.refit = refit | ||
| self.fixed_train_size = fixed_train_size | ||
| self.gap = gap | ||
| self.allow_incomplete_fold = allow_incomplete_fold | ||
| self.return_best = return_best | ||
| self.n_jobs = n_jobs | ||
| self.verbose = verbose | ||
| self.show_progress = show_progress | ||
| self.higher_is_better = higher_is_better | ||
|
|
||
| super().__init__() | ||
|
|
||
| # Set the optimization direction based on higher_is_better parameter | ||
| higher_or_lower = "higher" if higher_is_better else "lower" | ||
| self.set_tags(**{"property:higher_or_lower_is_better": higher_or_lower}) | ||
|
|
||
| @classmethod | ||
| def get_test_params(cls, parameter_set="default"): | ||
| """Return testing parameter settings for the estimator. | ||
| Parameters | ||
| ---------- | ||
| parameter_set : str, default="default" | ||
| Name of the parameter set to return. | ||
| Returns | ||
| ------- | ||
| params : dict or list of dict, default = {} | ||
| Parameters to create testing instances of the class | ||
| Each dict are parameters to construct an "interesting" test instance, | ||
| i.e., MyClass(**params) or MyClass(**params[i]) creates a valid test | ||
| instance. | ||
| create_test_instance uses the first (or only) dictionary in `params` | ||
| """ | ||
| from skbase.utils.dependencies import _check_soft_dependencies | ||
|
|
||
| if not _check_soft_dependencies("skforecast", severity="none"): | ||
| return [] | ||
|
|
||
| import numpy as np | ||
Omswastik-11 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| import pandas as pd | ||
| from skforecast.recursive import ForecasterRecursive | ||
| from sklearn.ensemble import RandomForestRegressor | ||
|
|
||
| forecaster = ForecasterRecursive( | ||
| regressor=RandomForestRegressor(random_state=123), | ||
| lags=2, | ||
| ) | ||
|
|
||
| y = pd.Series( | ||
| np.random.randn(20), | ||
| index=pd.date_range(start="2020-01-01", periods=20, freq="D"), | ||
| name="y", | ||
| ) | ||
|
|
||
| params = { | ||
| "forecaster": forecaster, | ||
| "y": y, | ||
| "steps": 3, | ||
| "metric": "mean_squared_error", | ||
| "initial_train_size": 10, | ||
| } | ||
| return [params] | ||
|
|
||
| @classmethod | ||
| def _get_score_params(cls): | ||
| """Return settings for testing score/evaluate functions. Used in tests only. | ||
| Returns a list, the i-th element should be valid arguments for | ||
| self.evaluate and self.score, of an instance constructed with | ||
| self.get_test_params()[i]. | ||
| Returns | ||
| ------- | ||
| list of dict | ||
| The parameters to be used for scoring. | ||
| """ | ||
| return [{"n_estimators": 5}] | ||
|
|
||
| def _evaluate(self, params): | ||
| """Evaluate the parameters. | ||
| Parameters | ||
| ---------- | ||
| params : dict with string keys | ||
| Parameters to evaluate. | ||
| Returns | ||
| ------- | ||
| float | ||
| The value of the parameters as per evaluation. | ||
| dict | ||
| Additional metadata about the search. | ||
| """ | ||
| from skforecast.model_selection import TimeSeriesFold, backtesting_forecaster | ||
|
|
||
| forecaster = copy.deepcopy(self.forecaster) | ||
| forecaster.set_params(params) | ||
|
|
||
| cv = TimeSeriesFold( | ||
| steps=self.steps, | ||
| initial_train_size=self.initial_train_size, | ||
| refit=self.refit, | ||
| fixed_train_size=self.fixed_train_size, | ||
| gap=self.gap, | ||
| allow_incomplete_fold=self.allow_incomplete_fold, | ||
| ) | ||
|
|
||
| results, _ = backtesting_forecaster( | ||
| forecaster=forecaster, | ||
| y=self.y, | ||
| cv=cv, | ||
| metric=self.metric, | ||
| exog=self.exog, | ||
| n_jobs=self.n_jobs, | ||
| verbose=self.verbose, | ||
| show_progress=self.show_progress, | ||
| ) | ||
|
|
||
| if isinstance(self.metric, str): | ||
| metric_name = self.metric | ||
| else: | ||
| metric_name = ( | ||
| self.metric.__name__ if hasattr(self.metric, "__name__") else "score" | ||
| ) | ||
|
|
||
| # backtesting_forecaster returns a DataFrame | ||
| res_float = results[metric_name].iloc[0] | ||
|
|
||
| return res_float, {"results": results} | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,6 @@ | ||
| """Skforecast integration package.""" | ||
| # copyright: hyperactive developers, MIT License (see LICENSE file) | ||
|
|
||
| from hyperactive.integrations.skforecast.skforecast_opt_cv import SkforecastOptCV | ||
|
|
||
| __all__ = ["SkforecastOptCV"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to find another solution for this. This code depends on the runner image structure and may break with updates and must be maintained (in multiple blocks). It also feels 'hacky' to run sudo rm -rf commands in the CI.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @SimonBlanke !! Can you suggest what alternate solution could work ?