If we had to explain Kalman Filter in one line, we would say that it is used to provide an accurate prediction of a variable which cannot be directly measured. In fact, one of the earliest uses of the Kalman filter was to calculate the position of the Apollo space rockets by NASA to make sure it was on the right path.

But how is it applicable in trading? Well, we can use Kalman Filter to implement pairs trading, or even find arbitrage opportunities in the Futures market. But before we start the applications of Kalman filters, let us understand how to use it. Thus, in this blog we will cover the following topics:

- Statistical terms and concepts used in Kalman Filter
- Equations in Kalman Filter
- Pairs trading using Kalman Filter in Python

As such, Kalman filter can be considered a heavy topic when it comes to the use of math and statistics. Thus, we will go through a few terms before we dig into the equations. Feel free to skip this section and head directly to the equations if you wish.

## Statistical terms and concepts used in Kalman Filter

Kalman Filter uses the concept of a **normal distribution** in its equation to give us an idea about the accuracy of the estimate. Let us step back a little and understand how we get a normal distribution of a variable.

Let us suppose we have a football team of ten people who are playing the nationals. As part of a standard health check-up, we measure their weights. The weights of the players are given below.

Player Number |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |

Weight |
72 |
75 |
76 |
69 |
65 |
71 |
70 |
74 |
76 |
72 |

Now if we calculate the average weight, ie the **mean**, we get the value as (Total of all player weights) / (Total no. of players)

= 720/10 = 72

The mean is usually denoted by the Greek alphabet μ. If we consider the weights as w_{1}, w_{2} respectively and the total number of players as N, we can write it as:
μ = (w_{1} + w_{2}+ w_{3}+ w_{4}+.....+ w_{n})/N

Or

$$ \mu = \frac{1}{N}\sum_{i=1}^n W_i$$Now, on a hunch, we decide on seeing how much each player’s weight varies from the mean. This can be easily calculated by subtracting the individual’s weight from the mean value.

Now, the first team player’s weight varies in the following manner, (Individual player’s weight) - (Mean value) = 72 - 72 = 0.

Similarly, the second player’s weight varies by the following: 75 - 72 = 3. Let’s update the table now.

Player Number |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |

Weight |
72 |
75 |
76 |
69 |
65 |
71 |
70 |
74 |
76 |
72 |

Difference from mean |
0 |
3 |
4 |
-3 |
-7 |
-1 |
-2 |
2 |
4 |
0 |

Now, we want to see how much the entire team’s weights’ varies from the mean. A simple addition of the entire team’s weight difference from the mean would be 0 as shown below. Thus we square each individual’s weight difference and find the average. Squaring is done to eliminate the negative sign of a score + penalise greater divergence from mean.

The updated table is as follows:

Player Number |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |

Weight |
72 |
75 |
76 |
69 |
65 |
71 |
70 |
74 |
76 |
72 |

Difference from mean |
0 |
3 |
4 |
-3 |
-7 |
-1 |
-2 |
2 |
4 |
0 |

Squared difference from the mean |
0 |
9 |
16 |
9 |
49 |
1 |
4 |
4 |
16 |
0 |

Now if we take the average, we get the equation as,

$$ \frac{1}{N}\sum_{i=1}^n {(W_i-\mu)}^2=10.8$$The **variance** tells us how much the weights have been spread. Since the variance is the average of the squares, we will take the square root of the variance to give us a better idea of the distribution of weights. We call this term the **standard deviation** and denote it by σ.

Since standard deviation is denoted by σ, the variance is denoted by σ^{2.}

But why do we need standard deviation? While we calculated the variance and standard deviation of one football team, maybe we could find for all the football teams in the tournament, or if we are more ambitious, we can do the same for all the football teams in the world. That would be a large dataset.

One thing to understand is that for a small dataset w used all the values, ie the entire population to compute the values. However, if it is a large dataset, we usually take a sample at random from the entire population and find the estimated values. In this case, we replace N by (N-1) to get the most accurate answer as per Bessel's correction. Of course, this introduces some error, but we will ignore it for now.

Thus, the updated equation is,

$$ \sigma=\sqrt{\frac{1}{N-1}\sum_{i=1}^n {(W_i-\mu)}^2}$$Now, looking at different researches conducted in the past, it was found that given a large dataset, most of the data was concentrated around the mean, with 68% of the entire data variables coming within one standard deviation from the mean.

This means that if we had data about millions of football players, and we got the same standard deviation and variance which we received now, we would say that the probability that the player’s weight is +-3.46 from 72 kg is 68.26%. This means that 68.26% of the players’ weights would be from 68.53 kg to 75.46.

Of course, for this to be right, the data should be random.

Let’s draw a graph to understand this further. This is just a reference of how the distribution will look if we had the weights of 100 people with mean as 72 and standard deviation as 3.46.

This shows how the weights are concentrated around the mean and tapers off towards the extremes. If we create a curve, you will find that it is shaped like a bell and thus we call it a **bell curve**. The normal distribution of the weights with mean as 72 and standard deviation as 3.46 will look similar to the following diagram.

Normal distribution is also called a probability density function. While the derivation is quite lengthy, we have certain observations regarding the probability density function.

One standard deviation contains **68.26**% of the population.

Two standard deviations contain **95.44**% of the population while three contain 99.74%.

The probability density function is given as follows,

$$ f(w, μ, \sigma^2) =\frac{e^{\frac{-(w-μ)}{2σ^2}}}{\sqrt{2兀\sigma^2}} $$You can find out more about probability density function in this blog. The reason we talked about normal distribution is that it forms an important part in Kalman filters. Let’s now move on to the main topic in the next section of the Kalman filter tutorial.

## Equations in Kalman Filter

Kalman Filter is a type of prediction algorithm. Thus, the Kalman filter’s success depends on our estimated values and its variance from the actual values. In Kalman Filter, we assume that depending on the previous state, we can predict the next state.

At the outset, we would like to clarify that this article on the Kalman filter tutorial is not about the derivation of the equations but trying to explain how the equations help us in estimating or predicting a value. Now, as we said earlier, we are trying to predict the value of something which cannot be directly measured. Thus, there will obviously be some error in the predicted value and the actual value.

If the system itself contains some errors, then it is called measurement noise. For example, if the weighing scales itself shows different readings for the same football player, it will be measurement noise.

If the process when the measurement takes place has certain factors which are not taken into account, then it is called as process noise. For example, if we are predicting the Apollo Rocket’s position, and we could not account for the wind during the initial blast off phase, then we will encounter some error between the actual location and the predicted location.

Kalman Filter is used to reduce these errors and successfully predict the next state.

Now, supposing we pick out one player and weigh that individual 10 times, we might get different values due to some measurement errors.

Mr. Rudolf Kalman developed the status update equation taking into account three values, ie

- True value
- The estimated or predicted value
- Measured value

### Status update equation

The **status update equation** is as follows:

Current state estimated value

= Predicted value of current state + Kalman Gain * ( measured value - predicted value of the state)

Let us understand this equation further.

In our example, we can say that given the measured values of all ten measurements, we will take the average of the values to estimate the true value.

To work this equation, we take one measurement which becomes the measured value. In the initial step, we guess the predicted value.

Now since the average is computed, in this example, the Kalman gain would be (1/N) as with each successive iteration, the second part of the equation would be decreasing, thus giving us a better-estimated value.

We should note that the current estimated value becomes the predicted value of the current state in the next iteration.

For now, we knew that the actual weight is constant, and hence it was easy to predict the estimated value. But what if we had to take into account that the state of the system (which was the weight in this case) changes. For that we will now move on to the next equation in the Kalman Filter tutorial i.e. State extrapolation.

### State extrapolation equation

The state extrapolation system helps us to find the relation between the current state and the next state i.e. predict the next state of the system.

Until now, we understood that the Kalman filter is **recursive** in nature and uses the previous values to predict the next value in a system. While we can easily give the formula and be done with it, we want to understand exactly why it is used. In that respect, we will take another example to illustrate the state extrapolation equation.

Now, let’s take the example of a company trying to develop a robotic bike. If you think about it, when someone is riding a bike, they have to balance the bike, control the accelerator, turn etc.

Let’s say that we have a straight road and we have to control the bike’s velocity. For this, we would have to know the bike’s position. As a simple case, we measure the wheels’ rotation to predict how much the bike has moved. We remember that the distance travelled by an object is equal to the velocity of the object multiplied by the time travelled.

Now, Let’s suppose we measure the rotation at a certain instant of time, ie Δt.

If we say that the bike has a constant velocity v, then we can say the following:

The predicted position of the bike is equal to the current estimated position of the bike + the distance covered by the bike in time Δt.

Here the distance covered by the bike will be the result of Δt multiplied by the velocity of the bike.

Suppose that the velocity is kept constant at 2 m/s. And the time Δt is 5 seconds. That means the bike moves 10 metres between every successive measurement.

But what if we check the next time and find out the bike moved 12 metres. This gives us an error of 2 metres. This could mean two things,

- The device used to measure the velocity has error (measurement error)
- The bike is moving with different velocities, in this instance maybe it is a downhill slope (process error)

We try to find out how to minimise this error by having different gains to apply to the state update equation.

Now, we will introduce a new concept to the Kalman filter tutorial, ie the **α - β filter**.

Now, if we recall the status update equation, it was given as,

Current state estimated value

= Predicted value of current state + Kalman Gain * ( measured value - predicted value of the state)

We will say that α is used to reduce the error in the measurement, and thus it will be used to predict the value of the position of the object.

Now if we keep the α in place of the Kalman gain, you can deduce that a high value of α gives more importance to the measured value and a low level of α gives less weightage to the measured value. In this way, we can reduce the error while predicting the position.

Now, if we assume that the bike is moving with different velocities, we would have to use another equation to compute the velocity and which in turn would lead to a better prediction to the position of the bike. Here we use β in place of Kalman gain to estimate the velocity of the bike.

We tried to see the relation of how α and β impact the predicted value. But how do we know for sure the correct value of α and β in order to get the predicted value closer to the actual value.

Let us move on to the next equation in the Kalman filter tutorial, ie the Kalman Gain equation.

### Kalman Gain equation

Recall that we talked about the **normal distribution** in the initial part of this blog. Now, we can say that the errors, whether measurement or process, are random and normally distributed in nature. In fact, taking it further, there is a higher chance that the estimated values will be within one standard deviation from the actual value.

Now, Kalman gain is a term which talks about the uncertainty of the error in the estimate. Put it simply, we denote ρ as the estimate uncertainty.

Since we use σ as the standard deviation, we would denote the variance of the measurement σ^{2} due to the uncertainty as ⋎.

In the Kalman filter, the **Kalman gain** can be used to change the estimate depending on the estimate measure.

Since we saw the computation of the Kalman gain, in the next equation we will understand how to update the estimate uncertainty.

Before we move to the next equation in the Kalman filter tutorial, we will see the concepts we have gone through so far. We first looked at the state update equation which is the main equation of the Kalman filter. We further understood how we extrapolate the current estimated value to the predicted value which becomes the current estimate in the next step. The third equation is the Kalman gain equation which tells us how the uncertainty in the error plays a role in calculating the Kalman gain. Now we will see how we update the Kalman gain in the Kalman filter equation. Let’s move on to the fourth equation in the kalman filter tutorial.

### Estimate uncertainty update

In the Kalman filter tutorial, we saw that the Kalman gain was dependent on the uncertainty in the estimation. Now, as we know that with every successive step, the Kalman filter continuously updates the predicted value so that we get the estimated value as close to the actual value of a variable, thus, we have to see how this uncertainty in the error can be reduced.

While the derivation of the equation is lengthy, we are only concerned about the equation.

Thus, the estimate uncertainty update equation tells us that the estimate uncertainty of current state varies from the previous estimate uncertainty by the factor of (1 - Kalman gain). We can also call this the covariance update equation.

This brings us to the last equation of the Kalman filter tutorial, which we will see below.

### Estimate uncertainty extrapolation

The reason why the Kalman filter is popular is because it continuously updates its state depending on the predicted and measured current value. Recall that in the second equation we had extrapolated the state of the estimate. Similarly, the estimate uncertainty of the current error is used to predict the uncertainty in the error in the next state.

Ok. That was a simple, no equations way to describe the Kalman filter. If you are confused, let us go through the process and see what we have learned so far.

For input, we have measured value. Initially, we use certain parameters for the Kalman gain as well as the predicted value. We will also make a note of the estimate uncertainty.

Now we use the Kalman filter equation to find the next predicted value.

In the next iteration, depending on how accurate our predicted variable was, we make changes to the uncertainty estimate which in turn would modify our Kalman gain.

Thus, we get a new predicted value which will be used as our current estimate in the next phase.

In this way, with each step, we would get closer to predicting the actual value with a reasonable amount of success.

That is all there is to it. We would reiterate in this Kalman filter tutorial that the reason the Kalman filter is popular is because it only needs the previous value as input and depending on the uncertainty in the measurement, the resulting value is predicted. In the real world, the Kalman filter is used by implementing matrix operations as the complexity increases when we take real-world situations. If you are interested in the math part of the Kalman filter, you can go through this resource to find many examples illustrating the individual equations of the Kalman filter.

When it comes to trading, Kalman filter forms an important component in the pairs trading strategy. Let us build a simple pairs trading strategy in Python now.

## Pairs trading using Kalman Filter in Python

(Thanks to Chamundeswari Koppisetti for providing the code.)

Let us start by importing the necessary libraries for Kalman Filter

We will consider the 4 year (Aug 2015 - Aug 2019) Adjusted Close price data for Bajaj Auto Limited (BAJAJ-AUTO.NS) and Hero MotoCorp Limited (HEROMOTOCO.NS).

We have included the data file in the zip file along with the code for you to run on your system later. The link to download the files can be found at the end of the blog.

The output will be as follows:

Hyperparameters of kalman filter can be changed for instance:

- Multi dimensional transition matrices, to use more of past information for making predictions at each point
- Different values of observation and transition covariance

### Pairs trading strategy

In pairs trading strategy we buy one stock and sell the other stock choosing the quantity as hedge ratio. You can learn more about pairs trading strategies in the statistical arbitrage course on Quantra.

The output is: 0.12282433836398741

## Conclusion

In the Kalman filter tutorial, we saw how we can estimate a value which cannot be measured directly by using a measure which is indirectly related to the value to be measured and predicting the next value using the Kalman gain.

You can learn more about different algorithmic trading strategies in the learning track, Automated Trading using Python & Interactive Brokers.

Do let us know if you loved the article and any other feedback in the comments below.

### Files in the Download

- Data file containing price data
- Pairs trading strategy using Kalman Filter code

All data and information provided in this article are for informational purposes only. QuantInsti® makes no representations as to accuracy, completeness, currentness, suitability, or validity of any information in this article and will not be liable for any errors, omissions, or delays in this information or any losses, injuries, or damages arising from its display or use. All information is provided on an as-is basis.