Time Series Forecasting
Basics
Terminology
- Market Efficiency: Measures the level of difficulty in forecasting future values
- Arbitrage: Buy and sell commodities and make a safe profit while the price adjusts
- Sample EDA
- Intervals between data points should be identical
- Encode / Impute missing data / time periods as needed
- Setting the frequency (
df.asfreq()method) will add rows corresponding to any missing periods for the desired frequency - When handling missing data, it is usually NOT a good idea to fill the missing values with mean values
- This is appropriate only when all data points for the column fluctuate heavilly around the mean which is rarely the case
- See Handle Missing Data for sample code
- Setting the frequency (
- Encode / Impute missing data / time periods as needed
- We normally use dates as the dataframe indexes for time series data
- When resampling data, the date column is automatically set as the index column
- In time series analysis, we normally work with one dependent variable at a time
- We cannot shuffle time series data as they need to be in chronological order
-
If the data has any non linear distribution (quadratic, polynomial, logarithmic, exponential etc.),
- converting the values to get a more linear trend can be helpful
-
helps reducing the noise and bring out underlying trends
-
Various Power Transformation techniques can be used in such scenarios
- Used only when there is no trend or seasonality in the series
- Moving Average Smoothing is the process of creating a new series where the values correspond to the "moving averages" of the original values
- Involves determining the window width and position (leading, trailing, centered etc.)
- Centered and Leading positions require a knowledge of future values and are not useful in predictions (since future value is what we are trying to predict)
- Exponential Smoothing is the process of creating a new series where the values correspond to the "weighted averages" of the original values with larger weight given to the more recent values
Note
-
A Fast Learner model is one where the smoothing constant (\(\alpha\)) is closer to 1
- More importance given to the newer values
-
A Slow Learner model is one where the smoothing constant (\(\alpha\)) is closer to 0
- More importance given to the older values
-
-
may result in better forecasting for some of the models
-
Some time series models expect de-trended and de-seasonalized data
- Use Differencing to remove the trend and seasonality from the data in such cases
- Use
Lag 1 Differencingto get rid of the linear trends- For getting rid of the quadratic trends, we apply differencing on the "differenced" series again.
- Use
Lag 7 Differencingto get rid of weekly seasonality,Lag 12 Differencingto get rid of monthly seasonality and so on- This is applied on the "differenced" data that was used to remove trends
- Use
- Use Differencing to remove the trend and seasonality from the data in such cases
-
Plot data to understand patterns in data and use corresponding feature engineering techniques to address the patterns
- For example, use Lag Scatter plots to understand if there is any relation between lag periods and current periods
- Use Lag Features to address these patterns
- Refer Sample EDA for more examples
- For example, use Lag Scatter plots to understand if there is any relation between lag periods and current periods
- For a good model the residuals will be random (stationary white noise) and the residual coefficients will not be significant
- For train-test split we need to ensure that
- Training data is from beginning to a certain cut off point in time
- Test data starts at the cut off point and continues till the end
White Noise
- A special type of time series where the data does not follow a pattern
- Is a random series
- Hence difficult to model or forecast
- Conditions:
- Have a constant mean and variance
- No autocorrelation in any period
- No clear relation between the past and present values in the time series
- Is stationary
Random Walk (Drunkard Walk)
- A special type of time series where the next value is only dependent on the current value
- The best esitmator of today's value is yesterday's value
- The best estimator of tomorrow's value is today's value
- This process is also referred to as Naive Forecasting
- Apart from the dependency on the previous value, the entire series is random
Stationarity
- Implies that taking consecutive sets of data with the same size should have identical covariances regarless of the starting point
- Also referred as weak-form stationarity or covariance stationarity
- Assumptions:
- Have a constant mean and variance
- Consitent covariance between periods at constant distance from one another
- White Noise is an example of a weak form stationarity
- Dickey-Fuller Test (also called DF Test) can be used to check if data is from a stationary process
- Null hypothesis is that the data comes from a non-stationary process
- Reject Null hypothesis if the test statistic is less than the critical value for the desired significance level in the Dickey-Fuller table
Seasonality
- Suggests that certain trends in the data appear on a cyclical basis
- If the data is seasonal, we need to consider factors other than the current period for prediction
- Testing approaches
- Decomposition (Naive Decomposition)
- Splits the time series into 3 effects:
- Trend -> Consistent patterns in data
- Seasonal -> Cyclical patterns
- Residual/Noise -> Prediction Error or Random Variation
- Expects a linear relationship between the three effects
- Uses the previous period values as trend-setter
- Types:
- Approaches:
- Additive: For any time period, the observed value is the sum of the three effets
- Multiplicative: For any time period, the observed value is the product of the three effets
- If the data is seasonal, the resulting plot will show the seasonal pattern
- Approaches:
- Splits the time series into 3 effects:
- Decomposition (Naive Decomposition)
Autocorrelation
- Represents the correlation between an observation and a "lagged" version of itself
- Includes both direct and indirect effects
- Effect of lag t on current does includes the effects of lags t-1,t-2, etc. on lag t (which may indirectly effect the current data point)
- Includes both direct and indirect effects
- Autocorrelation in data
- at daily frequency checks for correlation between yesterday's and today's data
- at monthly frequency checks for correlation between last month and current month data
- In the ACF plot,
- X Axis represnts the lags
- Y Axis shows the correlation values
- The Blue region around the x axis represents significance
ACF Interpretation
- If the autocorrelation values are higher than the blue region, it suggests that the coefficients are significant indicating that there is time dependence in the data
- If the values fall withiin the blue area, the coefficients are not significant
- A sharp drop-off indicates the lag beyond which correlations are not significant
Partial Autocorrelation
- Measures the correlation between an observation and its lagged values while adjusting for the effects of intervening observations
- Helps identify the direct effect of a specific lagged version on the time series
- Removes the influence of other intermediate lags on the concerned lags
- Effect of lag t on current does not include the effects of lags t-1,t-2, etc. on lag t (which may indirectly effect the current data point)
- Removes the influence of other intermediate lags on the concerned lags
Model Selection
Tip
When comparing models, we should select the one with Higher Log Likelihood and Lower Information Criteria (AIC and BIC values)