Xgboost time series forcasting with sktime [ep#2]

Patiparn Nualchan
6 min readDec 29, 2022

build,fine- tune and eveluate ML model on time series data with sktime

Again Hello, in this ep#2 we are going to talk about step 3–5 (as below). Don’t waste the time, let’s go !!

  1. Time series data (rough concept to deal with one single — sales data)
  2. Time series analysis (EDA and stationary testing)

3. Time series forecasting (modeling and prediction)

4. Time series cross validation (temporal cross validation)

5. fine-tune xgboost (get best parameter)

Briefly recap, from ep#1 we get the data ready y_train, y_test (by temporal_train_test_split) from and fh (by ForecastingHorizon). Our goal is to create model to predict 12 week(3 month) sales ahead.

3 Time series forecasting, Sktime transform time sequence to supervise and ready to do Machine Learning task, so we can mention now we are working on Regression. I pick up some of Regression model as follow.

  • AutoARIMA* (a classic Time series technique)
  • KNeighborsRegressor
  • LinearRegression
  • XgbRegressor

for the model performance was take mean_absolute_percentage_error(MAPE) by sktime.performance_metrics.forecasting [MAPE is measured in percentage error relative to the test data. Because it takes the absolute value rather than square the percentage forecast error]. code here.

from sktime.performance_metrics.forecasting import mean_absolute_percentage_error
print('MAPE: %.4f' % mean_absolute_percentage_error(y_test, y_pred, symmetric=False))

AutoARIMA

In Auto ARIMA, the model itself will generate the optimal p, d, and q values which would be suitable for the data set to provide better forecasting.

code here,

from sktime.forecasting.arima import AutoARIMA

forecaster = AutoARIMA(start_p=8, max_p=9, suppress_warnings=True)

forecaster.fit(y_train)
y_pred = forecaster.predict(fh)
plot_series(y_train, y_test, y_pred, labels=["y_train", "y_test", "y_pred"])

as you can see, y_pred can’t predict any up and down trend and MAPE showed 0.75(75% of error).

KNeighborsRegressor

The KNN algorithm uses ‘feature similarity’ to predict the values of any new data points. This means that the new point is assigned a value based on how closely it resembles the points in the training set.

code here,

regressor = KNeighborsRegressor(n_neighbors=3)

forecaster = make_reduction(regressor, strategy="recursive", window_length= window_length)
forecaster.fit(y_train)
y_pred = forecaster.predict(fh)
plot_series(y_train, y_test, y_pred, labels=["y_train", "y_test", "y_pred"])

as above KNNRegressor can predict trend of up and down, but it had a gap between actual and predict value (as was my yellow color hand writing) formal reapresent by MAPE 0.8094. Howevey, it can capture trend line so I make it as candidate model

LinearRegression

Linear regression shows the linear relationship between the independent variable (X-axis) and the dependent variable (Y-axis), consequently

code here,

from sklearn.linear_model import LinearRegression
regressor = LinearRegression()

forecaster = make_reduction(regressor, strategy="recursive", window_length= window_length)
forecaster.fit(y_train)
y_pred = forecaster.predict(fh)
plot_series(y_train, y_test, y_pred, labels=["y_train", "y_test", "y_pred"])

as above Linear Regression predicted heavy up and down trend. It was the effect of noise data and MAPE was so bad at 1.87 (more explore can do by remove that noise and re-predict and see how does it can improve.)

XgbRegressor

XGBoost is a decision-tree-based ensemble Machine Learning algorithm that uses a gradient boosting framework. In prediction problems involving unstructured data (images, text, etc.) artificial neural networks tend to outperform all other algorithms or frameworks.

code here,

from xgboost import XGBRegressor
regressor = XGBRegressor(objective='reg:squarederror', random_state=42)

forecaster = make_reduction(regressor, strategy="recursive", window_length= window_length)
forecaster.fit(y=y_train)
y_pred = forecaster.predict(fh=fh)
plot_series(y_train, y_test, y_pred, labels=["y_train", "y_test", "y_pred"])

XgboostRegressor can predict up and down trend in a acceptable level, but almost same error as KNN (yellow gap) and MAPE at 0.96. However, it can predict so I pick it us as a candidate model to improve.

4 Time series cross-validation, to make more model robustness and well performance. we have to do cross-validation.

first, We don’t use K-Fold as we familiar for Time series. (sequential data can’t shuffle or random) so cross-validation for Time series is special (temporal cross validation).

SlidingWindowSplitter/ ExpandingWindowSplitter are two main of temporal cross validation.

In this artical, I used ExpandingWindowSplitter and pick up only 2 candidate model KNeighborsRegressor and XgbRegressor.

ExpandingWindowSplitter concept by author

KNeighborsRegressor: ExpandingWindowSplitter

regressor = KNeighborsRegressor(n_neighbors=3)
forecaster = make_reduction(regressor, strategy="recursive", window_length= window_length)

cv = ExpandingWindowSplitter(step_length=12, fh=fh, initial_window=52)
results = evaluate(
forecaster=forecaster, y=y, cv=cv, strategy="refit", return_data=True
)
results.iloc[:, :5].head()

KNeighborsRegressor: ExpandingWindowSplitter prediction line trend was up and down trend, but in the last len MAPE was higher than previous. By eye we can see big gap as well.

XgbRegressor: ExpandingWindowSplitter

regressor = XGBRegressor(objective='reg:squarederror', random_state=42)
forecaster = make_reduction(regressor, strategy="recursive", window_length= window_length)

cv = ExpandingWindowSplitter(step_length=12, fh=fh, initial_window=52)
results = evaluate(
forecaster=forecaster, y=y, cv=cv, strategy="refit", return_data=True
)
results.iloc[:, :5].head()

as above XgbRegressor: ExpandingWindowSplitter prediction line trend was getting close to the train line compare with KNeighborsRegressor. By result summary table MAPE was getting better by len_train_window. >>>>>> Pick up XgbRegressor to fine-tune >>>>>>

5 XgbRegressor Fine- Tuning, Hyperparameter tuning is process of determining the right combination of hyperparameters that maximizes the model performance.

code here,

from sktime.forecasting.model_selection import (ForecastingGridSearchCV,
SlidingWindowSplitter)

param_grid = {
'estimator__max_depth': [3, 6, 10, 15],
'estimator__learning_rate': [0.01, 0.1, 0.2, 0.3],
'estimator__colsample_bytree': np.arange(0.4, 1.0),
'estimator__n_estimators': [100, 500, 1000]
}

regressor = XGBRegressor(objective='reg:squarederror', random_state=42)
forecaster = make_reduction(regressor, strategy="recursive")

cv = ExpandingWindowSplitter(step_length=12, fh=fh, initial_window=52)
gscv = ForecastingGridSearchCV(
forecaster, cv=cv, param_grid=param_grid, strategy="refit"
)

MAPE was better from begining (single model) at 0.9653, ExpandingWindowSplitter (Validation) last len at 0.5319 and last perfomance (Hyperparameter tuning) was 0.5114. Hyperparameter tuning can improve 47% improve from single model and 4% from Validation.

Conclusion

  • We explored 4 models to predict 12 weeks ahead
  • KNeighborsRegressor and XgbRegressor were 2 candiate models to make more model robustness
  • final XgbRegressor with Hyperparameter tuning

Future work we can re-work on Xgboost with noise removing or add more train data or add more feature untill feature engineering on date term to generate more interesting feature.

Thank for your reading till the end, I’m not expert person in any time series or sktime world, I just one who need to learn and explore to practice myself by doing and note it, I will be appreciate on any advise from some expert on some point I was wrong and other additional commemt.

you can take full code on my github here https://github.com/MossMojito/sktime_Xgboost/blob/main/sktime_xgboost.ipynb

--

--