AutoRegressive Moving Average (ARMA) models: A Comprehensive Guide

10 min read

By José Carlos Gonzáles Tanaka

The ARMA model is one of the most powerful econometric models for trading. Here you will find a comprehensive guide. The first part will walk you through the theoretical aspects of the different versions of the model. Part 2 will concentrate on the application of the model in Python and Part 3 will do the same in R. Let's learn about ARMA modelling!


So I guess you are probably a trader who is new to autoregressive moving average models (ARMA models). The first thing you should know is that in order to create an ARMA model from a time series, you need to have the time series stationarity-based behaved.


Well, if the time series has a trend, then you should remove the trend from the series. This process is called detrending. If the time series needs to be differenced to get it stationary, i.e., you need to subtract the previous value to the actual value to make it stationary, then this process is called differencing.

The process of differencing a time series is the following: If you have a time series named Y which is I(1), i.e. it has an order of integration of 1, then you need to difference the time series once, as follows:

$$\Delta Y_{t} = Y_{t}-Y_{t-1}$$


\( \Delta Y_{t} \text{: is stationary.}\)

If the time series Y is I(2), i.e. it has an order of integration of 2, then you need to difference the time series twice, as follows:

$$\Delta Y_{t} = Y_{t}-Y_{t-1}$$ $$\Delta² Y_{t} = \Delta Y_{t} - \Delta Y_{t-1}$$


  • \( Y_{t}\text{: is I(2).}\)
  • \( \Delta Y_{t}\text{: is I(1).}\)
  • \( \Delta² Y_{t}\text{: is I(0), i.e., it's stationary.}\)

    Now, you can guess that if a time series is I(d), then you have to difference the time series “d” times. This ‘d’ is called the order of integration of the observed time series.

    How do you determine the order of integration of any time series?

    Financial Time Series Analysis for Trading

    Financial Time Series Analysis for Smarter Trading

    Enroll Now

    Econometric tools and techniques make it trivial to compute it. You do it by applying a unit root test. There are several unit roots tests available, the most famous being the Augmented Dickey-Fuller test. The algorithm to find the order of integration goes like this:

    Imagine you have a time series called Y, then:

    1. You apply the ADF to Y and:
    2. If you reject the null hypothesis, then the process is I(0), i.e., Y is stationary.
    3. If you don’t reject the null hypothesis, then you continue
    4. You apply the ADF to the first difference and:
    5. If you reject the null hypothesis, then the process is I(0), i.e., d(Y), or the first difference of Y, is stationary.
    6. If you don’t reject the null hypothesis, then you continue
    7. You apply the ADF to the second difference, third, etc. until you get to reject the null hypothesis.

    You can check these articles about stationarity and the ADF unit root test to learn more.

    In order to understand better stationarity, it's useful to understand about the following topic to do the math of stationary processes.

    Suggested reads:

    Lag Operators

    Time series can be identified as a trend:

    $$Y_{t} = t$$

    Where t is time.

    Or it can be understood as a constant:

    $$Y_{t} = c$$

    Or it can be described as a Gaussian white noise process (or any other distribution):

    $$Y_{t} = \epsilon_t$$

    To sum up, we can identify the time series y(t) as a function of something else, as y = f(x) = w(x,z). f() would be an operator that has as input the number “x” or group of numbers x and z.

    A time series “operator” allows us to transform a time series Y into a new time series.

    We can have a multiplication operator for y(t):

    $$Y_{t} = \beta*t$$

    Or an addition operator:

    $$Y_{t} = x_t*z_t$$

    Now, let’s look at the lag operator.

    So, imagine we have the following representation of y(t):

    $$Y_{t} = x_{t-1}$$

    You can apply a lag operator to the whole time series x(t). The representation is going to use the letter “L” in this way:

    $$L*x_{t} = x_{t-1}$$

    If you would like to have x in time (t-2), you would do something like this:

    $$L(Lx_{t}) = x_{t-2}$$

    This double L can also be represented as

    $$L²x_{t} = x_{t-2}$$

    Generally speaking, you can write as follows:

    $$L^kx_{t} = x_{t-k}$$

    For example:

    $$L⁵x_{t} = x_{t-5}$$

    We’ll learn more about the importance of the lag operator in the following sections. They'll be useful to provide ARMA model examples.

    Moving average processes and Invertibility

    From now on, you'll learn some basic ARMA model equations.

    The first-order moving average process, also known as MA(1) can be mathematically described as

    $$Y_{t} = \mu+\epsilon_{t}+\theta\epsilon_{t-1}$$


  • \( Y_{t} \text{: The asset price time series you want to model.}\)
  • \( \epsilon_{t} \text{: An identically and independently distributed (a.k.a. as i.i.d.) random time series with mean 0 and variance } \sigma^2\text{.}\)
  • \( \epsilon_{t-1} \text{: The first lag of the previous random time series } \epsilon\text{.}\)
  • \( \theta \text{: The estimator/parameter of } \epsilon_{t-1}\text{.}\)
  • A bit of calculation, (see Hamilton ⁽¹⁾, 1994) and you will get the following properties:

  • \(E(Y_t) = mean(Y_t) = \mu\)
  • \(E(Y_t-\mu)^2 = \sigma_{Y_t} = (1+\theta^2)\sigma^2\)

    The first autocovariance is:

    $$E(Y_t-\mu)(Y_{t-1}-\mu) = \theta\sigma²$$

    Higher autocovariances are equal to zero.

    The first autocorrelation is given by:

    $$\rho_1 = \frac{\theta\sigma^2}{(1+\theta^2)\sigma^2} = \frac{\theta}{(1+\theta^2)}$$

    Higher autocorrelations are equal to zero.

    The qth-order moving average process, MA(q) is characterized by:

    $$Y_t = \mu + \epsilon_t + \theta_1\epsilon_{t-1} + \theta_2\epsilon_{t-2} + … + \theta_q\epsilon_{t-q}$$

    Can you guess what the mean would be for this process? Since for any lag of the error, the mean is always zero, then you get:

    $$E(Y_t) = mean(Y_t) = \mu$$

    The first autocovariance is:

    $$\gamma_0 = \sigma^2\left(1+\theta_1^2+\theta_2^2+ … + \theta_q^2\right)$$

    And the following autocovariance functions can be described as

    \(\gamma_j = \left(\theta_j + \theta_{j+1}\theta_1 + \theta_{j+2}\theta_2 + … + \theta_{q}\theta_{q-j}\right) \text{, for j = 1, 2, …, q}\)
    \(\gamma_j = 0 \text{, for }j>q\)

    For example, for an MA(2) process:

    \(\gamma_0 = \left(1+\theta_1^2+\theta_2^2\right)\sigma\)
    \(\gamma_1 = \left(\theta_1+\theta_{2}\theta_1\right)\sigma^2\)
    \(\gamma_2 = \theta_{2}\sigma^2\)
    \(\gamma_3 = \gamma_4 = … = 0\)

    Would the MA(q) process of any order q be stationary?

    Yes! The reason is that the MA(q) model is built with error terms which are i.i.d. with mean and variance finite values. Thus, a MA(q) model will always be stationary.

    Let’s now talk about Invertibility.

    An MA model is invertible if you can convert it into an infinite AR model for the asset price time series.

    How? Let’s see:

    Consider a MA(1) model:

    $$Y_t - \mu = \epsilon_t + \theta\epsilon_{t-1}$$

    The model can be rewritten using a lag operator as

    $$Y_t - \mu = (1+\theta L)\epsilon_t$$


    \(E(\epsilon_t,\epsilon_{\tau}) = \sigma^2 \text{, for t = }\tau\text{, 0 otherwise.}\)

    Provided that theta in absolute value is less than one, you can convert this model into

    $$\left(1 - \theta L - \theta^2 L^2 - \theta^3 L^3 + … \right)\left(Y_t - \mu\right) = \epsilon_t$$

    Which is an infinite autoregressive model. Whenever you estimate a MA or ARMA model, you have to ascertain that the model is invertible.

    At this stage, you may wonder: What is an autoregressive model?

    Read on!

    Autoregressive process and Stationarity

    Let’s first begin with the first-order autoregressive model, also known as the AR(1) model.

    $$Y_t = c + \phi Y_{t-1} + \epsilon_t$$


  • \(Y_t\text{: The asset price time series at time t}\)
  • \(c\text{: constant}\)
  • \(Y_{t-1}\text{: The asset price time series at time }t-1\text{, i.e. the first lag of }Y_t\)
  • \(\phi\text{: The estimator of }Y_{t-1}\)
  • \(\epsilon_t\text{: The error term that follows an i.i.d. distribution with mean zero and variance }\sigma^2\)

    Is this AR(1) model stationary?

    Well, you know that financial time series are not always stationary, actually they’re often non-stationary. If you know that the true process of any asset price time series is an AR(1), you can do the following conversion to know if the time series is stationary. We’re going to use the lag operator for this purpose:

    \(Y_t = c + \phi L Y_t + \epsilon_t\)
    \(Y_t - \phi L Y_t = c + \epsilon_t\)
    \(\left(1-\phi L\right)Y_t = c + \epsilon_t\)


    \((1-\phi L)\text{: The characteristic polynomial.}\)

    The goal of this polynomial evaluation is about finding the value of phi, which in turn, is the determinant of the formula.

    In order to analyze the stationarity of this AR(1) process, you need to check the first-order characteristic polynomial of (lambda-phi). You have two ways to analyze it. You need to find the value of lambda.


    In the first (second) equation, if lambda is lower (higher) than 1, then the model is stationary.

    Thus, if phi in absolute value is less than one, then the AR(1) process is stationary. If it’s higher, we say the process is not stationary.

    Let’s look at the second-order autoregressive model, also known as AR(2) model.

    $$Y_t = c + \phi_1 Y_{t-1} + \phi_2 Y_{t-2} + \epsilon_t$$


  • \(Y_t\text{: The asset price time series at time t}\)
  • \(c\text{: constant}\)
  • \(Y_{t-1}\text{: The asset price time series at time }t-1\text{, i.e. the first lag of }Y_t\)
  • \(\phi_1\text{: The estimator of }Y_{t-1}\)
  • \(\epsilon_t\text{: The error term that follows an i.i.d. distribution with mean zero and variance }\sigma^2\)
  • \(Y_{t-2}\text{: The asset price time series at time }t-2\text{, i.e. the second lag of }Y_t\)
  • \(\phi_2\text{: The estimator of }Y_{t-2}\)
  • \(\epsilon_t\text{: The error term that follows an i.i.d. distribution with mean zero and variance }\sigma^2\)

    Let’s check for stationarity.

    $$Y_t - \phi_1 Y_{t-1} - \phi_2 Y_{t-2}$$ $$= Y_t \left(1-\phi_1 L - \phi_2 L^2\right)$$ $$=(1-\phi_1 L - \phi_2 L^2)$$

    Then, we convert this equation to its characteristic polynomial as:

    $$\left(\lambda_1 - \phi_1 \lambda_2 - \phi_2\right)$$

    We know that the solution to this polynomial is:

    $$\lambda_1, \lambda_2 = \frac{\phi_1 \pm \sqrt{\phi_1^2 + 4 \phi_2}}{2}$$

    If both lambdas are less than 1, or if they’re complex numbers and their modulus is less than 1, then the model is stationary.

    You can trust us (or check Hamilton’s book), to know that the following metrics are:

    The average of the model's time series:

    $$\mu = c/\left(1-\phi_1 - \phi_2\right)$$

    Autocovariance functions:

    $$\gamma_j = \phi_1 \gamma_{j-1} + \phi_2 \gamma_{j-2}\text{, for j = 1,2,...}$$

    Autocorrelation functions:

    $$\rho_j = \phi_1 \rho_{j-1}+\phi_2 \rho_{j-2}$$

    Autocovariance at lag 0 or variance:

    $$\gamma_0 = \frac{\left(1-\phi_2\right) \sigma^2}{\left(1+\phi_2\right)\left(\left(1-\phi_2\right)^2-\phi_1^2\right)}$$

    Generalizing, the pth-order autoregressive process is as follows,:

    $$Y_t = c + \phi_1 Y_{t-1} + \phi_2 Y_{t-2} + ... + \phi_p Y_{t-p}+\epsilon_t$$

    Where the autocovariance functions are:

  • \(\gamma_j=\phi_1 \gamma_j-1+\phi_2 \gamma_{j-2} +... + \phi_p \gamma_{j-p}\text{, for j = 1, 2, …}\)
  • \(\gamma_0 = \phi_1 \gamma_1+\phi_2 \gamma_2 + ... + \phi_p \gamma_p + \sigma^2\text{, for j = 1, 2, …}\)

    The autocorrelation for lag j follows the same structure as the autocovariance functions.

    We can combine the AR and MA models to arrive at an ARMA model.

    A stationary ARMA(p,q) model is presented as:

    $$Y_t = c + \phi_1 Y_{t-1} + \phi_2 Y_{t-2} + ... + \phi_p Y_{t-p}+\epsilon_t+ \theta_1 \epsilon_{t-1} + \theta_2 \epsilon_t-2 + … + \theta_q \epsilon_{t-q}$$

    You have to check here also stationary and invertibility. Don’t forget about these two important things to consider. Once you ascertain both in your model, then you can continue to check the statistics of your model.

    Brief of Box-Jenkins methodology

    With the last equation, you might have asked yourself: How many lags p and q should I choose to create my ARMA model? How should I proceed?

    You can follow the Box-Jenkins methodology to create your model. Follow this procedure:

    1. Once you have your data, find the integration order to make your data stationary.
    2. Identify the lags of AR and the MA components of your model.
    3. For AR models, the sample ACF decays smoothly and gradually, and the PACF is significant only up to lag p.
    4. For MA models, the sample PACF decays smoothly and gradually, while the ACF is significant only up to lag q.
    5. For ARMA models, you will find a starting point by observing the “p” value in the number of significant PACFs and you will find the “q” value in the number of significant ACFs.
    6. Estimate the ARMA(p,q) model and check if your residuals are uncorrelated.
    7. If that’s the case, congratulations! You have your ARMA(p,q) model for your time series.
    8. In case it’s not, estimate again your model varying p and q until you find the model that has uncorrelated residuals.

    Is ARMA a linear model?

    Yes, it is. In econometrics, a model is linear whenever the model is “parameter-based linear”. What does it mean? It means that whenever you take the partial derivative of the model w.r.t. the parameters, then you will see that this derivative doesn’t have the parameters multiplied or divided.

    So let’s present two models:

  • \(\text{Model A: }Y_t = \phi_1 Y_{t-1} + \epsilon_t\)
  • \(\text{Model B: }Y_t = \phi_1^{\phi_2}Y_{t-1} + \epsilon_t\)

    Which of these models is linear?

    Let’s take the first partial derivative of models A and B

    \(\frac{\Delta Y_t}{\Delta\phi_1} = Y_{t-1}\)
    \(\frac{\Delta Y_t}{\Delta\phi_1} = \phi_1^{\phi_2-1} \phi_2 Y_{t-1}\)

    The model A is the AR(1) and linear, the model B is not linear.

    Is the ARMA better than just AR or MA?

    Not necessarily! It depends on the same data. You have to estimate the best model, i.e., the model that fits the best in your time series data.

    What is the difference between an ARMA and an ARIMA model?

    It’s almost the same. The ARIMA model is described as ARIMA(p,d,q) where d is the order of integration of the time series.

    So, imagine you have a time series


    which is I(1), then

    If we want to create an ARMA model, we would need to differentiate the data once in order to use it. So,

    $$\Delta Y_t \sim \text{ ARMA(p,q)} \text{ or } Y_t \sim \text{ ARIMA(p,1,q)}$$

    In case the time series


    is I(2), then:

    $$\Delta² Y_t \sim \text{ ARMA(p,q)} \text{ or } Y_t \sim \text{ ARIMA(p,2,q)}$$

    And so on.


    We have learned the basic theory of ARMA models. We have gone through the basic ARMA models. Now you are able to deduce how an ARMA with higher values of p and q can be understood. In the second and third parts, you will learn how to implement this model in Python and R, respectively.

    This model is an econometric model. Do you want to learn more about this topic and other algo trading models? Don’t hesitate to subscribe to our course Algorithmic Trading for Beginners! You’ll learn a lot!

    Disclaimer: All investments and trading in the stock market involve risk. Any decision to place trades in the financial markets, including trading in stock or options or other financial instruments is a personal decision that should only be made after thorough research, including a personal risk and financial assessment and the engagement of professional assistance to the extent you believe necessary. The trading strategies or related information mentioned in this article is for informational purposes only.

  • EOV webinar