Time Series Analysis: Developing the Intuition
Time series data exhibits a linear trend over time or has a seasonal effect. How does this have an impact on how it's analyzed to predict the future?
Join the DZone community and get the full member experience.
Join For FreeA time series can be defined as a sequence of measurements taken over time at a regular interval (most often). Another important aspect of time series is its ordering, as it is highly dependent on the way it's been ordered. And because of its dependency, changing the order could also change the meaning of the data.
The theory for time series is based on the assumption of second-order stationarity. Real-life data are often not stationary; they exhibit a linear trend over time, or they have a seasonal effect.
What Is the Objective of Any Time Series Analysis?
The basic objective of time series analysis is to determine a model that can describe the pattern of the time series data to answer questions like:
- What are the important features of the time series pattern?
- How well does the past explain the future?
- Can we forecast the future based on the historic time series?
Time series can also be described as a set of statistics, collected at regular intervals. Time series data occurs naturally in many areas:
- Economics: Monthly unemployment data, hospital admissions
- Healthcare: Daily claim submissions, denials, rework
- Environmental: Daily air quality, rainfall
The Classical Decomposition of Time Series
One simple method of defining a time series is the notion of decomposing it into four main elements:
- Trend (Tt): Long-term movements in the mean.
- Seasonal Effect (It): Calendar-related cyclical fluctuations.
- Cycles (Ct): Business cycle-related fluctuations.
- Residuals (Et): Random or systematic fluctuations.
The thought behind the above elements is to create separate models for each of them and then combine them either additively or multiplicatively.
To choose when to combine the elements via additive or multiplicative methods, keep the following points in mind:
- If the magnitude of the seasonal component is relatively constant regardless of changes in the trend, an additive model is suitable.
- If it varies with changes in the trend, a multiplicative model is suitable.
However, if the series contains values close or equal to zero and the magnitude of the seasonal component appears to be dependent on the trend level, then the pseudo-additive model is most appropriate.
Types of Time Series Models
AR model (autoregressive model): This type of model can forecast the future value based on its immediate prior value within the time series. This type of model is considered as a linear model.
MA model (moving average models (MA): These are very different from the auto-regressive models and is always stationary. An MA model specifies that the output variable depends linearly on the current and various past values of the random term. The moving average term in a time series model is a past error (multiplied by coefficient).
ARIMA model (autoregressive integrates moving average model): As the name suggests, it is a combination of the above two modeling techniques that keeps the below-listed scenarios in mind:
- When a model involves only autoregressive terms, it's an AR model.
- When a model involves only moving average terms, it's an MA model.
The ARIMA model is also known as Box-Jenkins model that may include auto-regressive, moving average terms.
Note: It is worth mentioning that within ARIMA, there are seasonal and non-seasonal models that can be further studied to gain a solid understanding of various time series models.
Now that we have gained some basic understanding of the time series analysis, it's a good idea to look at how to perform time series analysis or modeling activity. The forecasting process using the time series methods is a set of connected activities that transforms one or more inputs into one or more outputs. The activities are as follows:
Problem definition: This activity involves developing an understanding with regard to how the forecast will be consumed along with the expectations of the customer/client/end user). It is believed that the success of any forecasting model depends on how well the model predicts and is aligned with the client's expectations captured during this phase.
Data collection: Data collection consists of obtaining the relevant history for the variable(s) that are to be forecast, including historical information on potential predictor variables. The key here is "relevant" — often, it is necessary to deal with missing values of some variables, potential outliers, or other data-related problems that have occurred in the past.
Data analysis: This phase is a very important preliminary step to select the most effective forecasting model. Any potential outliers should be identified and flagged during this phase itself. Numerical summaries like sample mean, standard deviation, percentiles, and autocorrelations should be evaluated during this phase.
Model selection and fitting: This phase consists of choosing one or more forecasting models and fitting the model to the data. Fitting in this context means estimating the unknown model parameters by implementing the least square methods.
Model validation: This consists of evaluating how well the model is forecasting and should be able to showcase the magnitude of forecast error when it will be used to forecast fresh or new data. The fitting errors will always be less than the forecast errors.
Lastly, some common forecast accuracy evaluation statistics:
- ME: Mean error
- MAD: Mean absolute deviation
- MSE: Mean squared error
- MAPE: Mean absolute percentage error
Published at DZone with permission of Sunil Kappal, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments