Need help on poor forecast results

#2673
aranes-rcaranes-rc
opened 2 months ago
Author

Hi, I'm pretty new to the scene and I've been stuck for days how to fix this random spikes on my forecasts.

Image

My data

This is how my dataset looks like

Date Arrivals 2008-01-01 279338 2008-02-01 265827 2008-03-01 263862 2008-04-01 235895 2008-05-01 242822 ... 2025-03-01 ...

As you can see it's a monthly data, I've followed most tips the docs have provided when it comes to non-daily data.

Even my holidays are adjusted to my aggregated data (I'm not sure if this is what the docs is telling me to do):

Maundy Thursday: 2008-03-20 -> Maundy Thursday: 2008-03-01

Plotting the data gives the following: Image

(Jan 2022 is a missing value from my dataset that I kinda just filled with a temporary 'get the mean of neighboring months' solution)

In monthly data, yearly seasonality can also be modeled with binary extra regressors. In particular, the model can use 12 extra regressors like is_jan, is_feb, etc. where is_jan is 1 if the date is in Jan and 0 otherwise. This approach would avoid the within-month unidentifiability seen above. Be sure to use yearly_seasonality=False if monthly extra regressors are being added.

I also did the following tip ^^

months = ['jan', 'feb', 'mar', 'apr', 'may', 'jun', 'jul', 'aug', 'sep', 'oct', 'nov', 'dec'] for i, month in enumerate(months, 1): prophet_df[f'is_{month}'] = (prophet_df['ds'].dt.month == i).astype(int) for month in months: model.add_regressor(f'is_{month}') # ...

Handling the COVID shock

My dataset is also affected by the COVID19 pandemic. So I followed again most tips from the docs.

I mainly used the following tips

Treating COVID-19 lockdowns as a one-off holidays

lockdowns = pd.DataFrame([ {'holiday': 'lockdown_1', 'ds': '2020-03-01', 'lower_window': 0, 'ds_upper': '2020-07-01'}, {'holiday': 'lockdown_2', 'ds': '2020-08-01', 'lower_window': 0, 'ds_upper': '2021-03-01'}, {'holiday': 'lockdown_3', 'ds': '2021-04-01', 'lower_window': 0, 'ds_upper': '2021-12-01'}, {'holiday': 'recovery_phase', 'ds': '2022-01-01', 'lower_window': 0, 'ds_upper': '2022-07-01'}, {'holiday': 'recovery_phase_2', 'ds': '2022-08-01', 'lower_window': 0, 'ds_upper': '2022-09-01'}, ]) for t_col in ['ds', 'ds_upper']: lockdowns[t_col] = pd.to_datetime(lockdowns[t_col]) lockdowns['upper_window'] = (lockdowns['ds_upper'] - lockdowns['ds']).dt.days lockdowns

Changes in seasonality between pre- and post-COVID

Here I'm not quite sure how to tweak the custom monthly seasonality I added here. I might need help :/

covid_outbreak_date = '2020-03-21' prophet_df['pre_covid'] = pd.to_datetime(prophet_df['ds']) < pd.to_datetime(covid_outbreak_date) prophet_df['post_covid'] = ~prophet_df['pre_covid'] monthly_period = 30.5 fourier_order = 5 model.add_seasonality(name='monthly_pre_covid', period=monthly_period, fourier_order=fourier_order, condition_name='pre_covid') model.add_seasonality(name='monthly_post_covid', period=monthly_period, fourier_order=fourier_order, condition_name='post_covid') # ...

My model

model = Prophet( yearly_seasonality=False, seasonality_mode='multiplicative', holidays=pd.concat([lockdowns, holiday_adjusted]) ) months = ['jan', 'feb', 'mar', 'apr', 'may', 'jun', 'jul', 'aug', 'sep', 'oct', 'nov', 'dec'] for i, month in enumerate(months, 1): prophet_df[f'is_{month}'] = (prophet_df['ds'].dt.month == i).astype(int) for month in months: model.add_regressor(f'is_{month}') covid_outbreak_date = '2020-03-21' prophet_df['pre_covid'] = pd.to_datetime(prophet_df['ds']) < pd.to_datetime(covid_outbreak_date) prophet_df['post_covid'] = ~prophet_df['pre_covid'] monthly_period = 30.5 fourier_order = 10 model.add_seasonality(name='monthly_pre_covid', period=monthly_period, fourier_order=fourier_order, condition_name='pre_covid') model.add_seasonality(name='monthly_post_covid', period=monthly_period, fourier_order=fourier_order, condition_name='post_covid') model.fit(prophet_df) future = model.make_future_dataframe(periods=12*6, freq='MS') future['pre_covid'] = pd.to_datetime(future['ds']) < pd.to_datetime(covid_outbreak_date) future['post_covid'] = ~future['pre_covid'] for i, month in enumerate(months, 1): future[f'is_{month}'] = (future['ds'].dt.month == i).astype(int) forecast = model.predict(future)

plot_components displays the following: Image

Cross-validation results are insanely high.


I need help!!!

  • Is it overfitting?
  • Also, would like to ask how can I tell if my model is overfitting or not?