AutoRegressive Moving Average (ARMA) models: A Comprehensive Guide

By José Carlos Gonzáles Tanaka

The ARMA model is one of the most powerful econometric models for trading. Here you will find a comprehensive guide. The first part will walk you through the theoretical aspects of the different versions of the model. Part 2 will concentrate on the application of the model in Python and Part 3 will do the same in R. Let's learn about ARMA modelling!

In this article, we will delve into AutoRegressive Moving Average (ARMA) models, covering topics such as:

Stationarity
Lag Operators
Moving average processes and Invertibility
Autoregressive process and Stationarity
Is ARMA a linear model?
Is the ARMA better than just AR or MA?
What is the difference between an ARMA and an ARIMA model?

We will explore how ARMA models serve as a fundamental tool for time series analysis, balancing simplicity and power for forecasting and understanding time series data structure.

This blog is for you if you are motivated by:

Ideation: Delving into the theoretical underpinnings of ARMA models and their place within time series analysis.
Implementation: Learning to construct and utilise ARMA models for practical applications in forecasting.
Comparative Analysis: Understanding the nuances of ARMA compared to AR, MA, and ARIMA models.

Reading Level: Intermediate

Prerequisites:

To fully benefit from this content, it is recommended to follow a structured learning path covering time series fundamentals, stationarity, mean reversion, and multivariate modeling techniques before diving into ARMA.
Start with Introduction to Time Series to grasp the core concepts of trend analysis, seasonality, and autocorrelation.
If you’re interested in a deep learning alternative to traditional time series methods, explore Time Series vs LSTM Models to understand the differences between statistical and neural network-based forecasting techniques.
Since ARMA models require stationary data, it's crucial to study Stationarity to learn how to transform non-stationary series.
Complement this with The Hurst Exponent to analyze long-term dependencies in time series data, and Mean Reversion in Time Series to understand reversion-to-the-mean behavior, a key assumption in many ARMA-based trading strategies.
Once you’re comfortable with these concepts, progress to more advanced econometric techniques. Vector Autoregression (VAR) introduces multivariate time series modeling, while Johansen Cointegration explains how multiple asset prices move together over time.
If you're interested in dynamic forecasting, Time-Varying-Parameter VAR (TVP-VAR) explores stochastic volatility and model adaptability.
This structured roadmap ensures you gain the necessary theoretical and practical background to fully grasp ARMA models. Basic proficiency in R or Python is also recommended for implementing these models in time series forecasting. If you're new to Python, start with Basics of Python Programming. Additionally, the Python for Trading: Basic free course provides a structured approach to learning Python.

Stationarity

So I guess you are probably a trader who is new to autoregressive moving average models (ARMA models). The first thing you should know is that in order to create an ARMA model from a time series, you need to have the time series stationarity-based behaved.

How?

Well, if the time series has a trend, then you should remove the trend from the series. This process is called detrending. If the time series needs to be differenced to get it stationary, i.e., you need to subtract the previous value to the actual value to make it stationary, then this process is called differencing.

The process of differencing a time series is the following: If you have a time series named Y which is I(1), i.e. it has an order of integration of 1, then you need to difference the time series once, as follows:

$$\Delta Y_{t} = Y_{t}-Y_{t-1}$$

Where

$ \Delta Y_{t} \text{: is stationary.}$

If the time series Y is I(2), i.e. it has an order of integration of 2, then you need to difference the time series twice, as follows:

$$\Delta Y_{t} = Y_{t}-Y_{t-1}$$ $$\Delta² Y_{t} = \Delta Y_{t} - \Delta Y_{t-1}$$

Where

$ Y_{t}\text{: is I(2).}$

$ \Delta Y_{t}\text{: is I(1).}$

$ \Delta² Y_{t}\text{: is I(0), i.e., it's stationary.}$

Now, you can guess that if a time series is I(d), then you have to difference the time series “d” times. This ‘d’ is called the order of integration of the observed time series.

How do you determine the order of integration of any time series?

Financial Time Series Analysis for Trading

Financial Time Series Analysis for Smarter Trading

Enroll Now

Econometric tools and techniques make it trivial to compute it. You do it by applying a unit root test. There are several unit roots tests available, the most famous being the Augmented Dickey-Fuller test. The algorithm to find the order of integration goes like this:

Imagine you have a time series called Y, then:

You apply the ADF to Y and:
If you reject the null hypothesis, then the process is I(0), i.e., Y is stationary.
If you don’t reject the null hypothesis, then you continue
You apply the ADF to the first difference and:
If you reject the null hypothesis, then the process is I(0), i.e., d(Y), or the first difference of Y, is stationary.
If you don’t reject the null hypothesis, then you continue
You apply the ADF to the second difference, third, etc. until you get to reject the null hypothesis.

You can check these articles about stationarity and the ADF unit root test to learn more.

In order to understand better stationarity, it's useful to understand about the following topic to do the math of stationary processes.

Lag Operators

Time series can be identified as a trend:

$$Y_{t} = t$$

Where t is time.

Or it can be understood as a constant:

$$Y_{t} = c$$

Or it can be described as a Gaussian white noise process (or any other distribution):

$$Y_{t} = \epsilon_t$$

To sum up, we can identify the time series y(t) as a function of something else, as y = f(x) = w(x,z). f() would be an operator that has as input the number “x” or group of numbers x and z.

A time series “operator” allows us to transform a time series Y into a new time series.

We can have a multiplication operator for y(t):

$$Y_{t} = \beta*t$$

Or an addition operator:

$$Y_{t} = x_t*z_t$$

Now, let’s look at the lag operator.

So, imagine we have the following representation of y(t):

$$Y_{t} = x_{t-1}$$

You can apply a lag operator to the whole time series x(t). The representation is going to use the letter “L” in this way:

$$L*x_{t} = x_{t-1}$$

If you would like to have x in time (t-2), you would do something like this:

$$L(Lx_{t}) = x_{t-2}$$

This double L can also be represented as

$$L²x_{t} = x_{t-2}$$

Generally speaking, you can write as follows:

$$L^kx_{t} = x_{t-k}$$

For example:

$$L⁵x_{t} = x_{t-5}$$

We’ll learn more about the importance of the lag operator in the following sections. They'll be useful to provide ARMA model examples.

Moving average processes and Invertibility

From now on, you'll learn some basic ARMA model equations.

The first-order moving average process, also known as MA(1) can be mathematically described as

$$Y_{t} = \mu+\epsilon_{t}+\theta\epsilon_{t-1}$$

Where:

$ Y_{t} \text{: The asset price time series you want to model.}$

$ \epsilon_{t} \text{: An identically and independently distributed (a.k.a. as i.i.d.) random time series with mean 0 and variance } \sigma^2\text{.}$

$ \epsilon_{t-1} \text{: The first lag of the previous random time series } \epsilon\text{.}$

$ \theta \text{: The estimator/parameter of } \epsilon_{t-1}\text{.}$

A bit of calculation, (see Hamilton ⁽¹⁾, 1994) and you will get the following properties:

$E(Y_t) = mean(Y_t) = \mu$

$E(Y_t-\mu)^2 = \sigma_{Y_t} = (1+\theta^2)\sigma^2$

The first autocovariance is:

$$E(Y_t-\mu)(Y_{t-1}-\mu) = \theta\sigma²$$

Higher autocovariances are equal to zero.

The first autocorrelation is given by:

$$\rho_1 = \frac{\theta\sigma^2}{(1+\theta^2)\sigma^2} = \frac{\theta}{(1+\theta^2)}$$

Higher autocorrelations are equal to zero.

The qth-order moving average process, MA(q) is characterized by:

$$Y_t = \mu + \epsilon_t + \theta_1\epsilon_{t-1} + \theta_2\epsilon_{t-2} + … + \theta_q\epsilon_{t-q}$$

Can you guess what the mean would be for this process? Since for any lag of the error, the mean is always zero, then you get:

$$E(Y_t) = mean(Y_t) = \mu$$

The first autocovariance is:

$$\gamma_0 = \sigma^2\left(1+\theta_1^2+\theta_2^2+ … + \theta_q^2\right)$$

And the following autocovariance functions can be described as

$\gamma_j = \left(\theta_j + \theta_{j+1}\theta_1 + \theta_{j+2}\theta_2 + … + \theta_{q}\theta_{q-j}\right) \text{, for j = 1, 2, …, q}$
$\gamma_j = 0 \text{, for }j>q$

For example, for an MA(2) process:

$\gamma_0 = \left(1+\theta_1^2+\theta_2^2\right)\sigma$
$\gamma_1 = \left(\theta_1+\theta_{2}\theta_1\right)\sigma^2$
$\gamma_2 = \theta_{2}\sigma^2$
$\gamma_3 = \gamma_4 = … = 0$

Would the MA(q) process of any order q be stationary?

Yes! The reason is that the MA(q) model is built with error terms which are i.i.d. with mean and variance finite values. Thus, a MA(q) model will always be stationary.

Let’s now talk about Invertibility.

An MA model is invertible if you can convert it into an infinite AR model for the asset price time series.

How? Let’s see:

Consider a MA(1) model:

$$Y_t - \mu = \epsilon_t + \theta\epsilon_{t-1}$$

The model can be rewritten using a lag operator as

$$Y_t - \mu = (1+\theta L)\epsilon_t$$

With

$E(\epsilon_t,\epsilon_{\tau}) = \sigma^2 \text{, for t = }\tau\text{, 0 otherwise.}$

Provided that theta in absolute value is less than one, you can convert this model into

$$\left(1 - \theta L - \theta^2 L^2 - \theta^3 L^3 + … \right)\left(Y_t - \mu\right) = \epsilon_t$$

Which is an infinite autoregressive model. Whenever you estimate a MA or ARMA model, you have to ascertain that the model is invertible.

At this stage, you may wonder: What is an autoregressive model?

Read on!

Autoregressive process and Stationarity

Let’s first begin with the first-order autoregressive model, also known as the AR(1) model.

$$Y_t = c + \phi Y_{t-1} + \epsilon_t$$

Where

$Y_t\text{: The asset price time series at time t}$

$c\text{: constant}$

$Y_{t-1}\text{: The asset price time series at time }t-1\text{, i.e. the first lag of }Y_t$

$\phi\text{: The estimator of }Y_{t-1}$

$\epsilon_t\text{: The error term that follows an i.i.d. distribution with mean zero and variance }\sigma^2$

Is this AR(1) model stationary?

Well, you know that financial time series are not always stationary, actually they’re often non-stationary. If you know that the true process of any asset price time series is an AR(1), you can do the following conversion to know if the time series is stationary. We’re going to use the lag operator for this purpose:

$Y_t = c + \phi L Y_t + \epsilon_t$
$Y_t - \phi L Y_t = c + \epsilon_t$
$\left(1-\phi L\right)Y_t = c + \epsilon_t$

Where

$(1-\phi L)\text{: The characteristic polynomial.}$

The goal of this polynomial evaluation is about finding the value of phi, which in turn, is the determinant of the formula.

In order to analyze the stationarity of this AR(1) process, you need to check the first-order characteristic polynomial of (lambda-phi). You have two ways to analyze it. You need to find the value of lambda.

$\lambda-\phi=0$
$1-\phi\lambda=0$

In the first (second) equation, if lambda is lower (higher) than 1, then the model is stationary.

Thus, if phi in absolute value is less than one, then the AR(1) process is stationary. If it’s higher, we say the process is not stationary.

Let’s look at the second-order autoregressive model, also known as AR(2) model.

$$Y_t = c + \phi_1 Y_{t-1} + \phi_2 Y_{t-2} + \epsilon_t$$

Where

$Y_t\text{: The asset price time series at time t}$

$c\text{: constant}$

$Y_{t-1}\text{: The asset price time series at time }t-1\text{, i.e. the first lag of }Y_t$

$\phi_1\text{: The estimator of }Y_{t-1}$

$\epsilon_t\text{: The error term that follows an i.i.d. distribution with mean zero and variance }\sigma^2$

$Y_{t-2}\text{: The asset price time series at time }t-2\text{, i.e. the second lag of }Y_t$

$\phi_2\text{: The estimator of }Y_{t-2}$

$\epsilon_t\text{: The error term that follows an i.i.d. distribution with mean zero and variance }\sigma^2$

Let’s check for stationarity.

$$Y_t - \phi_1 Y_{t-1} - \phi_2 Y_{t-2}$$ $$= Y_t \left(1-\phi_1 L - \phi_2 L^2\right)$$ $$=(1-\phi_1 L - \phi_2 L^2)$$

Then, we convert this equation to its characteristic polynomial as:

$$\left(\lambda_1 - \phi_1 \lambda_2 - \phi_2\right)$$

We know that the solution to this polynomial is:

$$\lambda_1, \lambda_2 = \frac{\phi_1 \pm \sqrt{\phi_1^2 + 4 \phi_2}}{2}$$

If both lambdas are less than 1, or if they’re complex numbers and their modulus is less than 1, then the model is stationary.

You can trust us (or check Hamilton’s book), to know that the following metrics are:

The average of the model's time series:

$$\mu = c/\left(1-\phi_1 - \phi_2\right)$$

Autocovariance functions:

$$\gamma_j = \phi_1 \gamma_{j-1} + \phi_2 \gamma_{j-2}\text{, for j = 1,2,...}$$

Autocorrelation functions:

$$\rho_j = \phi_1 \rho_{j-1}+\phi_2 \rho_{j-2}$$

Autocovariance at lag 0 or variance:

$$\gamma_0 = \frac{\left(1-\phi_2\right) \sigma^2}{\left(1+\phi_2\right)\left(\left(1-\phi_2\right)^2-\phi_1^2\right)}$$

Generalizing, the pth-order autoregressive process is as follows,:

$$Y_t = c + \phi_1 Y_{t-1} + \phi_2 Y_{t-2} + ... + \phi_p Y_{t-p}+\epsilon_t$$

Where the autocovariance functions are:

$\gamma_j=\phi_1 \gamma_j-1+\phi_2 \gamma_{j-2} +... + \phi_p \gamma_{j-p}\text{, for j = 1, 2, …}$

$\gamma_0 = \phi_1 \gamma_1+\phi_2 \gamma_2 + ... + \phi_p \gamma_p + \sigma^2\text{, for j = 1, 2, …}$

The autocorrelation for lag j follows the same structure as the autocovariance functions.

We can combine the AR and MA models to arrive at an ARMA model.

A stationary ARMA(p,q) model is presented as:

$$Y_t = c + \phi_1 Y_{t-1} + \phi_2 Y_{t-2} + ... + \phi_p Y_{t-p}+\epsilon_t+ \theta_1 \epsilon_{t-1} + \theta_2 \epsilon_t-2 + … + \theta_q \epsilon_{t-q}$$

You have to check here also stationary and invertibility. Don’t forget about these two important things to consider. Once you ascertain both in your model, then you can continue to check the statistics of your model.

Brief of Box-Jenkins methodology

With the last equation, you might have asked yourself: How many lags p and q should I choose to create my ARMA model? How should I proceed?

You can follow the Box-Jenkins methodology to create your model. Follow this procedure:

Once you have your data, find the integration order to make your data stationary.
Identify the lags of AR and the MA components of your model.
For AR models, the sample ACF decays smoothly and gradually, and the PACF is significant only up to lag p.
For MA models, the sample PACF decays smoothly and gradually, while the ACF is significant only up to lag q.
For ARMA models, you will find a starting point by observing the “p” value in the number of significant PACFs and you will find the “q” value in the number of significant ACFs.
Estimate the ARMA(p,q) model and check if your residuals are uncorrelated.
If that’s the case, congratulations! You have your ARMA(p,q) model for your time series.
In case it’s not, estimate again your model varying p and q until you find the model that has uncorrelated residuals.

Is ARMA a linear model?

Yes, it is. In econometrics, a model is linear whenever the model is “parameter-based linear”. What does it mean? It means that whenever you take the partial derivative of the model w.r.t. the parameters, then you will see that this derivative doesn’t have the parameters multiplied or divided.

So let’s present two models:

$\text{Model A: }Y_t = \phi_1 Y_{t-1} + \epsilon_t$

$\text{Model B: }Y_t = \phi_1^{\phi_2}Y_{t-1} + \epsilon_t$

Which of these models is linear?

Let’s take the first partial derivative of models A and B

$\frac{\Delta Y_t}{\Delta\phi_1} = Y_{t-1}$
$\frac{\Delta Y_t}{\Delta\phi_1} = \phi_1^{\phi_2-1} \phi_2 Y_{t-1}$

The model A is the AR(1) and linear, the model B is not linear.

Is the ARMA better than just AR or MA?

Not necessarily! It depends on the same data. You have to estimate the best model, i.e., the model that fits the best in your time series data.

What is the difference between an ARMA and an ARIMA model?

It’s almost the same. The ARIMA model is described as ARIMA(p,d,q) where d is the order of integration of the time series.

So, imagine you have a time series

$$\{Y_{t}\}^T_{t=0}$$

which is I(1), then

If we want to create an ARMA model, we would need to differentiate the data once in order to use it. So,

$$\Delta Y_t \sim \text{ ARMA(p,q)} \text{ or } Y_t \sim \text{ ARIMA(p,1,q)}$$

In case the time series

$$\{Y_{t}\}^T_{t=0}$$

is I(2), then:

$$\Delta² Y_t \sim \text{ ARMA(p,q)} \text{ or } Y_t \sim \text{ ARIMA(p,2,q)}$$

And so on.

Conclusion

We have learned the basic theory of ARMA models. We have gone through the basic ARMA models. Now you are able to deduce how an ARMA with higher values of p and q can be understood. In the second and third parts, you will learn how to implement this model in Python and R, respectively.

This model is an econometric model. Do you want to learn more about this topic and other algo trading models? Don’t hesitate to subscribe to our course Algorithmic Trading for Beginners! You’ll learn a lot!

After understanding the foundations of time series modeling, strengthen your skills by exploring Autocorrelation & Autocovariance to learn how past values influence future observations in time series data. Expand on ARMA’s principles by studying ARIMA Models for integrated forecasting and ARFIMA Models for analyzing long-memory processes in financial markets.

If your goal is trading strategy development, incorporating multiple techniques can enhance your ability to discover alpha. Consider applying Technical Analysis to detect price trends and patterns, integrating Trading Risk Management to control market exposure, experimenting with Pairs Trading to capitalize on asset correlations, and studying Market Microstructure to understand the impact of order flows and liquidity on price movements.

For a structured approach to algorithmic trading and quantitative strategy development, consider enrolling in the Executive Programme in Algorithmic Trading (EPAT). This rigorous course provides hands-on learning in time series analysis (stationarity, ACF, PACF), advanced statistical modeling (ARIMA, ARCH, GARCH), and Python-based trading strategies, equipping you with the expertise needed to apply ARMA and related models effectively in live trading environments.

Disclaimer: All investments and trading in the stock market involve risk. Any decision to place trades in the financial markets, including trading in stock or options or other financial instruments is a personal decision that should only be made after thorough research, including a personal risk and financial assessment and the engagement of professional assistance to the extent you believe necessary. The trading strategies or related information mentioned in this article is for informational purposes only.

EPAT Walkthrough & Live Q&A

Stationarity

Lag Operators

Moving average processes and Invertibility

Autoregressive process and Stationarity

Brief of Box-Jenkins methodology

Is ARMA a linear model?

Is the ARMA better than just AR or MA?

What is the difference between an ARMA and an ARIMA model?

Conclusion

Share Article:

Jose Carlos Gonzales Tanaka

Overnight Trading: What it is, How it works, Benefits and Examples

A Beginner’s Guide: How to Day Trade?