Autoregression: Time Series, Models, Trading, Python and more

14 min read

Autoregression emerges as a powerful tool for anticipating future values in time-based data. This data, known as a time series, consists of observations collected at various timestamps, spaced either regularly or irregularly. Leveraging historical trends, patterns, and other hidden influences, autoregression models unlock the capability to forecast the value for the next time step.

By analysing and learning from past data, these models (including various options beyond autoregression) paint a picture of future outcomes. This article delves deeper into one particular type: the autoregression model, often abbreviated as the AR model.

This article covers:


What is autoregression?

Autoregression models time-series data as a linear function of its past values. It assumes that the value of a variable today is a weighted sum of its previous values.

For example, analysing the past one month’s performance of AAPL (APPLE) to predict future performance.


Formula of autoregression

In simpler terms, autoregression says: "Today's value depends on yesterday's value, the day before that, and so on."

We express this relationship mathematically using a formula:

$$X_t = c + φ_1 X_t-_1 + φ_2 X_t-_2 + ... + φ_p X_t-_p + ε_t$$\begin{align} Where,\\ &\bullet X_t\;is \;the \;current\; value\; in \;the \;time \;series.\\ &\bullet\;c\;is\;a\;constant\;or\;intercept\;term.\\ &\bullet\phi_1, \phi_2, ..., \phi_p\;are\;the\;autoregressive\;coefficients.\\ &\bullet\;X_t-_1, X_t-_2, ..., X_t-_p\;are\;the\;past\;values\;of\;the\;time\;series.\\ &\bulletε_t\;is\;the\;error\;term\;representing\;the\;random\;fluctuations\;or\;unobserved\;factors.\\ \end{align}

Autoregression calculation

The autoregressive coefficients (1, 2,.....,p) are typically estimated using statistical methods like least squares regression.

In the context of autoregressive (AR) models, the coefficients represent the weights assigned to the lagged values of the time series to predict the current value. These coefficients capture the relationship between the current observation and its past values.

The goal is to find the coefficients that best fit the historical data, allowing the model to accurately capture the underlying patterns and trends. Once the coefficients are determined, they can be used to forecast future values in the time series based on the observed values from previous time points. Hence, the autoregression calculation helps to create an autoregressive model for time series forecasting.

You can explore this video below for finding out more about autoregression.


Autoregression model

Before delving into autoregression, it's beneficial to revisit the concept of a regression model.⁽¹⁾

A regression model serves as a statistical method to determine the association between a dependent variable (often denoted as y) and an independent variable (typically represented as X). Thus, in regression analysis, the focus is on understanding the relationship between these two variables.

For instance, consider having the stock prices of Bank of America (ticker: BAC) and J.P. Morgan (ticker: JPM).

If the objective is to forecast the stock price of JPM based on BAC's stock price, then JPM's stock price would be the dependent variable, y, while BAC's stock price would act as the independent variable, X. Assuming a linear association between X and y, the regression equation would be:

$$y=mX + c$$

here,
m represents the slope, and c denotes the intercept of the equation.

However, if you possess only one set of data, such as the stock prices of JPM, and wish to forecast its future values based on its past values, you can employ autoregression. Let's denote the stock price at time t as yt.

The relationship between yt and its preceding value yt−1 can be modelled using:

$$AR(1) = y_t = \phi_1 y_t-_1 + c$$

Here, Φ1 is the model parameter, and c remains the constant. This equation represents an autoregressive model of order 1, signifying regression against a variable's own earlier values.

Similar to linear regression, the autoregressive model presupposes a linear connection between yt and yt−1 , termed as autocorrelation. A deeper exploration of this concept will follow subsequently.

Autoregression models of order 2 and generalise to order p

Let's delve into autoregression models, starting with order 2 and then generalising to order p.

Autoregression Model of Order 2 (AR(2))

In an autoregression model of order 2 (AR(2)), the current value yt is predicted based on its two most recent lagged values, ​yt-1 and yt-2 .

$$y_t = c + \phi_1y_t-_1 + \phi_2y_t-_2 + ε_t$$ \begin{align} Where,\\ &\bullet c\;is\;a\;constant\\ &\bullet\phi_1\;and\;\phi_2\;are\;the\;autoregressive\;coefficients\;for\;the\;first\;and\;secondlags,\;respectively\\ &\bulletε_t\;represents\;the\;error\;term \end{align}

Generalising to order p (AR(p))

For an autoregression model of order p (AR(p)), the current value yt is predicted based on its p most recent lagged values.

$$y_t = c + \phi_1y_t-_1 + \phi_2y_t-_2 +...+\phi_py_t-_p +ε_t$$ \begin{align} Where,\\ &\bullet c\;is\;a\;constant\\ &\bullet\phi_1\;\phi_2,...,\phi_p\;are\;the\;autoregressive\;coefficients\;for\;the\;respective\;lagged\;terms\\ & y_t-_1,y_t-_2,...y_t-_p\\ &\bulletε_t\;represents\;the\;error\;term\\ \end{align}

In essence, an AR(p) model considers the influence of the p previous observations on the current value. The choice of p depends on the specific time series data and is often determined using methods like information criteria or examination of autocorrelation and partial autocorrelation plots.

The higher the order p, the more complex the model becomes, capturing more historical information but also potentially becoming more prone to overfitting. Therefore, it's essential to strike a balance and select an appropriate p based on the data characteristics and model diagnostics.


Autoregression vs Autocorrelation

Before finding out the difference between autoregression and autocorrelation, you can find out the introduction of autocorrelation with this video below. This video will help you learn about autocorrelation with some interesting examples.

Now, let us find out the difference between autoregression and autocorrelation in a simplified manner below.

Aspect

Autoregression

Autocorrelation

Modelling

Incorporates past observations to predict future values.

Describes the linear relationship between a variable and its lags.

Output

Model coefficients (lags) and forecasted values.

Correlation coefficients at various lags.

Diagnostics

ACF and PACF plots to determine model order.

ACF plot to visualise autocorrelation at different lags.

Applications

Stock price forecasting, weather prediction, etc.

Signal processing, econometrics, quality control, etc.


Autoregression vs Linear Regression

Now, let us see the difference between autoregression and linear regression below. Linear regression can be learnt better and in an elaborate manner with this video below.

Aspect

Autoregression

Linear Regression

Model Type

Specifically for time series data where past values predict the future.

Generalised for any data with independent and dependent variables.

Predictors

Past values of the same variable (lags).

Independent variables can be diverse (not necessarily past values).

Purpose

Forecasting future values based on historical data.

Predicting an outcome based on one or more input variables.

Assumptions

Time series stationarity, no multicollinearity among lags.

Linearity, independence, homoscedasticity, no multicollinearity.

Diagnostics

ACF and PACF mainly.

Residual plots, Quantile-Quantile plots, etc.

Applications

Stock price prediction, economic forecasting, etc.

Marketing analytics, medical research, machine learning, etc.


Autoregression vs spatial autoregression

Further, let us figure out the difference between autoregression and spatial autoregression.

Feature

Autoregressive (AR)

Spatial Autoregression (SAR)

Focus

Temporal dependence: How a variable at a given time point depends on its own past values

Spatial dependence: How a variable at a specific location depends on the values of the same variable at neighboring locations

Model structure

AR(p): Y_t = φ_1 * Y_(t-1) + ... + φ_p * Y_(t-p) + ε_t

SAR: Y_i = β * Y_(i-neighbors) + γ * AR(p) term + ε_i

Applications

Forecasting future values, analyzing time series trends

Identifying spatial patterns, modeling spillover effects, understanding spatial diffusion

Examples

One line example: Predicting daily temperature (Y_t) based on its values from the previous 3 days (AR(3))

One line example: Modeling house price (Y_i) influenced by average price in surrounding neighborhood (Y_(i-neighbors)) and historical price trends (AR(p) term)

Complexity

Relatively simple

More complex due to defining spatial weight matrix and potential interaction with AR component

Combining models

AR can be incorporated into SAR

Not applicable

Choice of model

Depends on data nature and research question

More suitable for data with spatial dependence


Autocorrelation Function and Partial Autocorrelation Function

Let's walk through how to create Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) plots using Python's statsmodels library and then interpret them with examples.⁽²⁾ ⁽³⁾ ⁽⁴⁾

Step 1: Install Required Libraries

First, ensure you have the necessary libraries installed:

Step 2: Import Libraries

Step 3: Create Sample Time Series Data

Let's create a simple synthetic time series for demonstration:

Step 4: Plot ACF and PACF

Now, plot the ACF and PACF plots for the time series:

Output:

Auto Correlation Function and Partial Auto Correlation Function in autoregressive model

Interpretation

ACF Plot:

  • Observations at lag 1, 2, etc., are significantly correlated with the original series. This means stock prices on consecutive days show a noticeable pattern of relationship.
  • The ACF gradually decreases, suggesting a linear trend in the data.

The ACF measures the correlation between a time series and its lagged values. A decreasing ACF value suggests that the relationship between today's value and its past values is diminishing as the lag increases and vice versa.

PACF Plot:

  • The PACF drops off after lag 1, indicating that observations beyond the first lag are not significantly correlated with the original series after controlling for the effect of intervening lags.

Hence, when we look at the Partial Autocorrelation Function (PACF) plot, we see that the correlation between our data point and its immediate previous point (lag 1) is strong. However, after that, the correlation with even earlier points (like lag 2, lag 3, etc.) becomes less important.

  • This suggests that an autoregressive model of order 1 (AR(1)) may be appropriate for modelling this time series. This pattern suggests that our data is mainly influenced by its very recent past, just one step back. So, we might only need to consider the last data point to predict the next one. Hence, using a simpler model that looks at just one previous point (like an AR(1) model) might be a good fit for our data.

By examining the ACF and PACF plots and their significant lags, you can gain insights into the temporal dependencies within the time series and make informed decisions about model specification in Python.


Steps to build an autoregressive model

Building an autoregressive model involves several steps to ensure that the model is appropriately specified, validated, and optimised for forecasting. Here are the steps to build an autoregressive model:

Step 1: Data Collection

  • Gather historical time series data for the variable of interest.
  • Ensure the data covers a sufficiently long period and is consistent in terms of frequency (e.g., daily, monthly).

Step 2: Data Exploration and Visualisation

  • Plot the time series data to visualise trends, seasonality, and any other patterns.
  • Check for outliers or missing values that may require preprocessing.

Step 3: Data Preprocessing

  • Ensure the data is stationary. If not, apply differencing techniques or transformations (e.g., logarithmic) to achieve stationarity.
  • Handle missing values using appropriate methods such as interpolation or imputation.

Step 4: Model Specification

  • Determine the appropriate lag order (p) based on the autocorrelation function (ACF) and partial autocorrelation function (PACF) plots.
  • Decide on the inclusion of any exogenous variables or external predictors that may improve the model's forecasting ability.

Step 5: Model Estimation

  • Use estimation techniques such as ordinary least squares (OLS) or maximum likelihood to estimate the model parameters.
  • Consider using regularisation techniques like ridge regression if multicollinearity is a concern.

Step 6: Model Validation

  • Split the data into training and validation sets.
  • Fit the model on the training data and validate its performance on the validation set.
  • Use metrics such as Mean Absolute Error (MAE), Root Mean Square Error (RMSE), or forecast accuracy to assess the model's predictive accuracy.

Step 7: Model Refinement

  • If the model performance is unsatisfactory, consider adjusting the lag order, incorporating additional predictors, or applying transformation techniques.
  • Conduct residual analysis to diagnose any remaining issues such as autocorrelation or heteroscedasticity.

Step 8: Model Deployment and Forecasting

  • Once satisfied with the model's performance, deploy it to make forecasts for future time periods.
  • Continuously monitor and evaluate the model's forecasts against actual outcomes to assess its ongoing reliability and relevance.

Step 9: Documentation and Communication

  • Document the model's specifications, assumptions, and validation results.
  • Communicate the model's findings, limitations, and implications to stakeholders or end-users.

By following these steps systematically and iteratively refining the model as needed, you can develop a robust autoregressive model tailored to the specific characteristics and requirements of your time series data.


Example of autoregressive model in Python for trading

Below is a step-by-step example demonstrating how to build an autoregressive (AR) model for time series forecasting in trading using Python. We'll use historical stock price data for Bank of America Corp (ticker: BAC) and the statsmodels library to construct the AR model.⁽⁵⁾

Let us now see the steps in Python below.

Step 1: Install Required Packages

If you haven't already, install the necessary Python packages:

Step 2: Import Libraries

Step 3: Load Historical Stock Price Data

Output:

WEAT price Autoregression analysis

Step 4: Train the AR model using ARIMA

Let us train the AR(1) model using the ARIMA method from the statsmodels library.⁽³⁾

The ARIMA method can be imported as below.

Using the ARIMA method, the autoregressive model can be trained as

$$ARIMA(data, (p, d, q))$$

where,

  • p is the AR parameter that needs to be defined.
  • d is the difference parameter. This will be zero in case of AR models. You will learn about this later.
  • q is the MA parameter. This will also be zero in case of an AR model. You will learn about this later.

Hence, the autoregressive model can be trained as

$$ARIMA(data, (p, 0, 0))$$

Output:

const     11.55
ar.L1      1.00
sigma2     0.05
dtype: float64

From the output above, you can see that

c = 14.26
Φ
1
Φ1 = 0.99

Therefore, the model becomes

$$𝐴𝑅(1)= y_t =14.26+0.99∗y_t-_1$$

Step 5: Evaluate model performance

Output:

The Mean Absolute Error is 0.28
The Mean Squared Error is 0.12
The Root Mean Squared Error is 0.34
The Mean Absolute Percentage Error is 4.93
Autoregression model performance
  • From the first plot above, you can see that the predicted values are close to the observed value.
  • From the second plot above, you can see that the residuals are random and are more negative than positive. Hence the model made higher predictions in most cases.
  • From the third plot above, you can see that there is no autocorrelation between the residuals as all the points lie within the blue region.

**Note: You can log into quantra.quantinsti.com and enrol in the course on  Financial Time Series to find out the detailed autoregressive model in Python.**

Going forward, it is a must to mention that, at times, the predictive prices may be above or below the actual prices.

Here are a couple of reasons why predictive prices are below the actual prices:

  • Underestimation: The model underestimates the future values of the stock prices, indicating that it might not fully capture the underlying trends, patterns, or external factors influencing the stock price movement.
  • Model Accuracy: The predictive accuracy of the AR model may be suboptimal, suggesting potential limitations in the model's specification or the need for additional explanatory variables.

Also, here are some reasons why the predictive prices appear more than the actual prices:

  • Model Misspecification: The AR model's assumptions or specifications may not align with the true data-generating process, leading to biased forecasts.
  • Lag Selection: Incorrectly specifying the lag order in the AR model can result in misleading predictions. Including too many or too few lags may distort the model's predictive accuracy.
  • Missed Trends or Seasonality: The AR model may not adequately capture underlying trends, seasonality, or other temporal patterns in the data, leading to inaccurate predictions.
  • External Factors: Unaccounted external variables or events that influence the time series but are not included in the model can lead to discrepancies between predicted and actual prices.
  • Data Anomalies: Outliers, anomalies, or sudden shocks in the data that were not accounted for in the model can distort the predictions, especially if the model is sensitive to extreme values.
  • Stationarity Assumption: If the time series is not stationary, applying an AR model can produce unreliable forecasts. Stationarity is a key assumption for the validity of AR models.

Hence, you may need to perform additional data preprocessing, model diagnostics, and validation to develop a robust trading model.


Applications of autoregression model in trading

Autoregression (AR) models have been applied in various ways within the realm of trading and finance. Here are some applications of autoregression in trading:

  • Technical Analysis: Traders often use autoregressive models to analyse historical price data and identify patterns or trends that might indicate potential future price movements. For instance, if there's a strong autocorrelation between today's price and yesterday's price, traders might expect a continuation of the trend.
  • Risk Management: Autoregression can be used to model and forecast volatility in financial markets. By understanding past volatility patterns, traders can better manage their risk exposure and make informed decisions about position sizing and leverage.
  • Pairs Trading: In pairs trading, traders identify two assets that historically move together (have a cointegrated relationship). Autoregressive models can help in understanding the historical relationship between the prices of these assets and formulating trading strategies based on deviations from their historical relationship.
  • Market Microstructure: Autoregression can be used to model the behaviour of individual market participants, such as high-frequency traders or market makers. Understanding the trading strategies and patterns of these participants can provide insights into market dynamics and liquidity provision.

Common challenges of autoregression models

Following are common challenges of the autoregression model:

  • Overfitting: Autoregressive models can become too complex and fit the noise in the data rather than the underlying trend or pattern. This can lead to poor out-of-sample performance and unreliable forecasts.
  • Stationarity: Many financial time series exhibit non-stationary behaviour, meaning their statistical properties (like mean and variance) change over time. Autoregressive models assume stationarity, so failure to account for non-stationarity can result in inaccurate model estimates.
  • Model Specification: Determining the appropriate lag order (p) in an autoregressive model is challenging. Too few lags might miss important information, while too many lags can introduce unnecessary complexity.
  • Multicollinearity: In models with multiple lagged terms, there can be high correlation among the predictors (lagged values). This multicollinearity can destabilise coefficient estimates and make them sensitive to small changes in the data.
  • Seasonality and Periodicity: Autoregressive models might not capture seasonal patterns or other periodic effects present in the data, leading to biased forecasts.
  • Model Validation: Proper validation techniques, such as out-of-sample testing, are crucial for assessing the predictive performance of autoregressive models. Inadequate validation can result in overly optimistic performance estimates.
  • Computational Complexity: As the number of lagged terms increases, the computational complexity of estimating the model parameters also increases, which can be problematic for large datasets.

Tips for optimizing autoregressive model performance

Now, let us see some tips for optimising the autoregressive model’s performance below.

  • Data Preprocessing: Ensure the data is stationary or apply techniques like differencing to achieve stationarity before fitting the autoregressive model.
  • Model Selection: Use information criteria (e.g., AIC, BIC) or cross-validation techniques to select the appropriate lag order (p) and avoid overfitting.
  • Regularisation: Consider using regularisation techniques like ridge regression or LASSO to mitigate multicollinearity and stabilise coefficient estimates.
  • Include Exogenous Variables: Incorporate relevant external factors or predictors that might improve the model's forecasting accuracy.
  • Model Diagnostics: Conduct thorough diagnostics, such as examining residuals for autocorrelation, heteroscedasticity, and other anomalies, to ensure the model's assumptions are met.
  • Ensemble Methods: Combine multiple autoregressive models or integrate with other forecasting methods (e.g., moving averages, exponential smoothing) to leverage the strengths of each approach.
  • Continuous Monitoring and Updating: Financial markets and economic conditions evolve over time. Regularly re-evaluate and update the model to incorporate new data and adapt to changing dynamics.
  • Domain Knowledge: Incorporate domain expertise and market insights into the model-building process to ensure the model captures relevant patterns and relationships in the data.

By addressing these challenges and following the optimisation tips, practitioners can develop more robust and reliable autoregressive models for forecasting and decision-making in trading and finance.


Conclusion

Utilising time series modelling, specifically Autoregression (AR), offers insights into predicting future values based on historical data. We comprehensively covered the AR model, its formula, calculations, and applications in trading.

By understanding the nuances between autoregression, autocorrelation, and linear regression, traders can make informed decisions, optimise model performance, and navigate challenges in forecasting financial markets. Last but not the least, continuous monitoring, model refinement, and incorporating domain knowledge are vital for enhancing predictive accuracy and adapting to dynamic market conditions.

You can learn more with our course on Financial Time Series Analysis for Trading for learning the analysis of financial time series in detail. With this course, you will learn the concepts of Time Series Analysis and also how to implement them in live trading markets. Starting from basic AR and MA models, to advanced models like SARIMA, ARCH and GARCH, this course will help you learn it all. Also, you will be able to apply time series analysis to data exhibiting characteristics like seasonality and non-constant volatility after learning from this course.


Author: Chainika Thakar (Originally written by Satyapriya Chaudhari)


Note: The original post has been revamped on 25th January 2024 for accuracy, and recentness.

Disclaimer: All investments and trading in the stock market involve risk. Any decision to place trades in the financial markets, including trading in stock or options or other financial instruments is a personal decision that should only be made after thorough research, including a personal risk and financial assessment and the engagement of professional assistance to the extent you believe necessary. The trading strategies or related information mentioned in this article is for informational purposes only.

Live Webinar: EPAT Curriculum