Essential Mathematical Concepts for Algorithmic Trading

26 min read

By Chainika Thakar

If you have landed on this article, one thing is for sure, that you aim to learn algorithmic trading from the mathematical point of view. Before starting the mathematical concepts of algorithmic trading, let us understand how imperative is maths in trading.

And before that, let us take a look at two important components of the same, which is a Trader and a Quant/Quantitative Analyst.

Who is a Trader?

In simple words, any individual who buys and sells financial assets in any financial market is a trader. This individual or trader can trade on the behalf of any other person as well here. A trader is usually someone who trades in shorter time periods as compared to an investor. This simply means that a trader holds assets for a short period to make profits on short-term trends. Whereas, an investor tends to hold assets for a longer-term.

Now, who is a Quant/Quantitative Analyst?

A quantitative analyst is the one who designs a complex framework for financial institutions that aids them to price and trade securities in the financial market. Quants can be of two types:

  • Front office quants - These are the ones who directly provide the trader with the price of the financial securities or the trading tools.
  • Back office quants - These quants are there to validate the framework and create new strategies after conducting thorough research.

Moving ahead, now let us find out more about algorithmic trading and its association with Maths.

Why does Algorithmic Trading require Maths?

Usually, when quants work, they keep an eye on the performance of the market.

But the interesting part is: “How do they predict or forecast on the basis of market data?”

And the answer is: They do it with MATHS!

Digging deeper, in this process, data is bought from the stock market and is analysed. It is then on the basis of this data that they come up with the possible percentage of odds (say, 65% or 75% and so on) with regard to the movements of stock prices. This is known as “predicting/forecasting the possibility of the stock prices in the long term or short term”.

Those involved in creating algorithms for High-Frequency Trading (HFT) keep in mind the involvement of a large number of trades in a short period.

For example, in one millisecond the price may go up or go down, and thus, thousands of trades happen in every passing second in HFT.

In this article, you will understand the following:

When and How Mathematics made it to Trading: A historical tour

Now, it was not until the late sixties that mathematicians made their first entry into the financial world of Trading. It all started with a professor of mathematics called Edward Thorp, at the University of California, who published a book called Beat the Market in 1967. In this book, he claimed that he had provided the foolproof way of earning money on the stock market. Also, this method/way was entirely based on a system that he had devised for beating casinos at blackjack. It is said that it became extremely famous due to which the casinos were forced to change their rules to “Beat the Market”.

Specifically, Beat the Market concept was nothing but the process of selling the stocks and bonds at one price and then buying them back at a lower price. This strategy became so popular and efficient that Edward Thorp founded a hedge fund named as Princeton/Newport Partners. This hedge fund proceeded to rule over the markets and hence, it became a full-fledged strategy. Soon after, a generation of physicists entered the depressed job market but on observing the quantum of money that could be made on Wall Street, many of them moved into finance consequently.

It was also observed that in Britain, the fall of the Soviet Union brought an influx of Warsaw Pact scientists. Hence, they brought with themselves a new methodology based on the concept of “analysing the data” along with the understanding that sufficient computer firepower can help predict the market. This brought along a new concept of quantitative analysis and a maths genius named Jim Simons became famous in bringing enough knowledge in the particular sphere. In 1982, Jim Simons also founded an exceptional hedge fund management company called Renaissance Technologies.

All in all, this was the brief on “how mathematics took off in algorithmic trading” and is so successful. Now let us head to the Mathematical concepts for algorithmic trading which are the core of this article.

Mathematical Concepts

Starting with the mathematical concepts of trading, it is a must to mention that mathematical concepts play an important role in algorithmic trading. Let us take a look at the broad categories of different concepts here:

  • Descriptive Statistics
  • Probability Theory
  • Linear Algebra
  • Linear Regression
  • Calculus

Descriptive Statistics

Let us walk through descriptive statistics, which summarize a given data set with brief descriptive coefficients. These can be a representation of either the whole or a sample from the population.

Measure of Central Tendency

Here, Mean, Median and Mode are the basic measures of central tendency. These are quite useful when it comes to taking out average value from a data set consisting of various values. Let us understand each measure one by one:

Mean

This one is the most used concept in the various fields concerning mathematics and in simple words, it is the average of the given dataset. Thus, if we take five numbers in a data set, say, 12, 13, 6, 7, 19, 21, the formula of the mean is

which makes it :

(12 + 13 + 6 + 7 + 19 + 21)/6 = 13

Furthermore, the trader tries to initiate the trade on the basis of the mean (moving average) or moving average crossover.

Here, let us understand two types of moving averages based on the ranges (number of days) of the time period they are calculated in and the moving average crossover:

  1. Faster moving average (Shorter time period) -

A faster moving average is the mean of a data set (stock prices) calculated over a short period of time, say past 20 days.

2. Slower moving average (Longer time period) -

A slower moving average is the one that is the mean of a data set (stock prices) calculated from a longer time period say 50 days.

Now, a faster-moving average and a slower moving average also come to a position together where a “crossover” occurs.

According to Wikipedia, “A crossover occurs when a faster-moving average (i.e., a shorter period moving average) crosses a slower moving average (i.e. a longer period moving average). In other words, this is when the shorter period moving average line crosses a longer period moving average line.”

Here, to explain it better, the graph image above is showing three moving lines. Blue one shows the trend line of the stock prices in general. It is further disintegrated into green and orange lines. The green one indicates a slower-moving average and orange one indicates a faster-moving average.

Now starting with the green line, (slower moving average) the entire trend line shows the varying means of stock prices over longer time periods. The trend line follows a zig-zag pattern and there are different crossovers.

For example, there is a crossover between October, 2018 and January, 2019 where orange line (faster-moving average) comes from above and crosses the green one (slower-moving average) while going down. This indicates that any individual or firm would be selling the stocks at this point since it shows a slump in the market.

This crossover point is called the “meeting point”. After the meeting point, ahead both the lines go down and then go up after a point to create one more (and then other) crossover(s).

Since there are many crossovers in the graph, you should be able to identify each of them on your own now.

Now, it is very important to note here that the “meeting point” is considered bullish if the faster-moving average crosses over the slower-moving average and goes beyond in the upward direction. On the contrary, it is considered bearish if the faster-moving average drops below the slower-moving average and goes beyond down. This is so because in the former scenario, it shows that in a short time, there came an upward trend for particular stocks. Whereas, in the latter scenario it shows that in the past few days there was a downward trend.

For example, we will be taking the same instances of the 20-days' moving average for faster-moving average and 50 days' moving average for slower-moving average. If 20-days' moving average goes up and crosses 50- days' moving average, it will show a bullish market since it indicates an upward trend in the past 20-days’ stocks. Whereas, if the 20-days' moving average goes below the 50-days' moving average, it will be bearish since it means that the stocks fell in the past 20-days.

According to Wikipedia, “In stock investing, this meeting point is used either to enter (buy or sell) or exit (sell or buy) the market.”

In short, Mean is a statistical indicator used for estimating a company’s or even the market’s stock performance over a period of time. This period of time can be days, months and even years.

Going forward, mean can also be computed with the help of an excel sheet, with the following formula:

=Average(B2: B6)

Let us understand what we have done in the image above. The image shows the stock cap of different companies belonging to an industry over a period of time (can be days, months, or years).

Now, to get the moving average (mean) of this industry in this particular time period, we need the formula =(Average(B2: B6)) to be applied against “Mean stock price”. This formula gives the command to the excel to average out the stock prices of all the companies mentioned from row B2 to B6.

As we apply this formula and press “Enter” we get the result 330. This is one of the simplest methods to compute Mean. Let us see how to compute the same in python code ahead.

For further use, in all the concepts, let us assume values on the basis of Apple’s (AAPL) data set. In order to keep it universal, we have taken the daily stock price data of Apple, Inc. from Dec 26, 2018, to Dec 26, 2019. You can download historical data from Yahoo Finance.

Now, For downloading the Apple closing price data, we will use the following for all python code based calculations ahead:

import yfinance as yf
aapl = yf.download('AAPL','2018-12-26', '2019-12-26')

In python, for taking out the mean of closing prices, the code will be as follows:

mean = np.mean (aapl[‘Adj Close’])
print(mean)

The Output is: 330

Ahead we will see how Median differs from Mean and how to compute it.

Median

Sometimes, the data set values can have a few values which are at the extreme ends, and this might cause the mean of the data set to portray an incorrect picture. Thus, we use the median, which gives the middle value of the sorted data set.

To find the median, you have to arrange the numbers in ascending order and then find the middle value. If the dataset contains an even number of values, you take the mean of the middle two values. For example, if the list of numbers are: 12, 13, 6, 7, 19, then,

In ascending order, the numbers are: 6, 7, 12, 13, 19

Now, we know there are in total 5 numbers and the formula for Median is:

(n+1)/2 value.

Hence, it will be n = 5 and

(5+1)/2 value will be 6/2= 3rd value.

Here, the 3rd value in the list is 12.

So, the median becomes 12 here.

Mainly, the advantage of the median is that unlike the mean, it remains extremely valid in case of extreme values of data set which is the case in stocks.

Median is required in case the average is to be calculated from a large data set, in which, the median shows an average which is a better representation of the data set.
For example, in case the data set is given as follows with values in INR:

75,000, 82,500, 60,000, 50,000, 1,00,000, 70,000 and 90,000.

Calculation of the median needs the prices to be first placed in ascending order, thus, prices in ascending order are:

50,000, 60,000, 70,000, 75,000, 82,500, 90,000, 1,00,000

Now, the calculation of the median will be:

As there are 7 items, the median is (7+1)/2 item, which makes it the 4th item. The 4th item in the ascending order is INR 75,000.

As you can see, INR 75,000 is a good representation of the data set, so this will be an ideal one.

In the financial world, where market prices vary time and again, the mean may not be able to represent the large values appropriately. Here, it was possible that the mean value would have not been able to represent the large data set.

So, one needs to use the median to find the one value that represents the entire data set appropriately.

Excel sheet helps in the following way to compute median:

=Median(B2:B6)

In the case of Median also, in the image above, we have stock prices of different companies belonging to a particular industry over a period of time (can be days, months, or years). Here, to get the moving average (median) of the industry in this particular time period, we have used the formula =Median(B2: B6). This formula gives the command to the excel to compute the median and as we input the same, we get the result 100.

Let us learn how to compute in the python code.

The python code here will be:

median = np.median (aapl[‘Adj Close’])
print(median)

The Output is: 100

Great! Now as you have got a fair idea about Mean and Median, let us move to another method now.

Mode

Mode is a very simple concept since it takes into consideration that number in the data set which is repetitive and occurs the most. Also, the mode is known as a modal value, representing the highest count of occurrences in the group of a data. It is also interesting to note that like mean and median, a mode is a value that represents the whole data set. It is extremely imperative to note that, in some of the cases there is a possibility of there being more than one mode in a given data set. And that data set which has two modes will be known as bimodal.

In the excel sheet, the mode can be calculated as follows:

=Mode.SNGL(B1: B5)

Similar to Mean and Median, Mode can also be calculated in the excel sheet as shown in the image above. For example, you can put in the values of different companies in the excel sheet and take out the Mode with the formula =Mode.SNGL(B1: B5)

(B1: B5) - represents the values from cell B1 till B5

Now, if we take the closing prices prices of Apple from Dec 26, 2018, to Dec 26, 2019, we will find there is no repeating value, and hence the mode of closing prices does not exist.

So when you try to calculate the Mode in python with the following code:

import statistics
mode = statistics.mode (aapl[‘Adj Close’])

It will throw the following error:

Hence, the mode does not make sense while observing closing price values.

Coming to the significance of the mode, it is most helpful when you need to take out the repetitive stock price from the previous particular time period. This time period can be days, months and even years. Basically, the mode of the data will help you understand if the same stock price is expected to repeat in the future or not.

Also, the mode is best utilised when you want to plot histograms and visualize the frequency distribution.

Amazing! This brings you to the end of the Measures of Central Tendency. Second, in the list of Descriptive Statistics is Measure of Dispersion. Let us take a look at yet another interesting concept.

Measure of Dispersion

You will find the meaning of “Measure of Dispersion” right in its title since it displays how scattered the data is around the central point. It simply tells the variation of each data value from one another, which helps to give a representation of the distribution of the data. Also, it portrays the homogeneity and heterogeneity of the distribution of the observations.

In short, it simply shows how much the entire data varies from their average value.

Measure of dispersion can be divided into:

  • Range
  • Quartile Deviation
  • Mean Absolute Data or Mean Deviation
  • Variance
  • Standard Deviation

Now, let us understand the concept of each category.

Range

This is the most simple out of all the measures of dispersion and is also easy to understand. Range simply implies the difference between two extreme observations or numbers of the data set.

For example, let X max and X min be two extreme observations or numbers. Here, Range will be the difference between the two of them.

Hence,

Range = X max - X min

It is also very important to note that Quant analysts keep a close follow up on ranges. This happens because the ranges determine the entry as well as exit points of trades. Not only the trades, but Range also helps the traders and investors in keeping a check on trading periods.

This makes the investors and traders indulge in Range-bound Trading strategies, which simply imply following a particular trendline. The trendlines are formed by high priced stocks (following an upper trendline) and low priced stocks (following a lower trendline). In this, the trader can purchase the security at the lower trendline and sell it at a higher trendline to earn profits.

Hence, in python, this simple code will be able to find needed values for you:

aapl [‘Adj Close’].describe()

The output is:

Let us take a look at how another measure, Quartile Deviation functions.

Quartile Deviation:

This is the type which divides a data set into quarters. It consists of First Quartile as Q1, Second Quartile as Q2 and Third Quartile as Q3.

Here,

Q1 - is the number that comes between the smallest and the median of the data (1/4th) or top 25%

Q2 - is the median of the data or

Q3 - is the number that comes between the median of data and the largest number (3/4th) or lower 25%

n - is the total number of values

And the formula for Quartile deviation is Q = ½ * (Q3 - Q1)

Since,

Q1 is top 25%, the formula for Q1 is - ¼ (n+1)

Q3 is also 25%, but the lower one, so the formula is - ¾ (n+1)

Hence, Quartile deviation = ½ * [(¾ (n+1) - ¼ (n+1)]

The major advantage, as well as the disadvantage of using this formula, is that it uses half of the data to show the dispersion from the mean or average.

You can use this type of measure of dispersion for studying the dispersion of the observations that lie in the middle.

This type of measures of dispersion helps you understand dispersion from the observed value and hence, differentiates between the large values in different Quarters.

In the financial world, when you have to study a large data set (stock prices) in different time periods and want to understand the dispersed value (prices) from an observed one (average-median), Quartile deviation can be used.

The python code here is by assuming a series of 10 random numbers:

from numpy import percentile
# calculate quartiles
All_quartiles = percentile(aapl['Adj Close'], [25, 50, 75])
# calculate min/max
Minimum, Maximum = aapl['Adj Close'].min(), aapl['Adj Close'].max()
# print the five number summary
print(Minimum)
print(All_quartiles[0])  #1 Quartile
print(All_quartiles[1])  #2 Quartile
print(All_quartiles[2])  #3 Quartile
print(Maximum)
IQR = All_quartiles[2] - All_quartiles[0]
IQR

The output is:

Great, moving ahead, Mean absolute deviation is yet another measure which is explained ahead.

Mean Absolute Deviation:

This type of dispersion is the arithmetic mean of the deviations between the numbers in a given data set from their mean or median (average).

Hence,

The formula of Mean Absolute Deviation is:

(D0 + D1 + D2 + D3 + D4 ….Dn)/ n

Here,

n = Total number of deviations in the data set and

D0, D1, D2, D3 are the deviations of each value from the average or median or mean in the data set and Dn means the end value in the data set.

Explaining the Mean deviation, we will take a look at the image below, which shows a “computed mean” of a data set and the difference between each value (in the dataset) from the mean value. These differences or the deviations are shown as D0, D1, D2, and D3, …..D7.

For an instance, if the mean values are as follows:

Then, the Mean here will be calculated using the mean formula:

3 + 6 + 6 + 7 + 8 + 11 + 15 + 16 / 8 = 9

As the mean comes out to be 9, next step is to find the deviation of each data value from the Mean value. So, let us compute the deviations, or let us subtract 9 from each value to find D0, D1, D2, D3, D4, D5, D6, D7, and D8, which gives us the values as such:

As we are now clear about all the deviations, let us see the mean value and all the deviations in the form of an image to get even more clarity on the same:

Hence, from a large data set, the mean deviation represents the required values from observed data value accurately.

In python code, the computation of Mean deviation is as follows:

from numpy import mean, absolute 
Mean_deviation = mean(absolute(aapl['Adj Close'] - mean(aapl['Adj Close'])))
Mean_deviation

The output is 26.252199899582642

It is important to note that Mean deviation helps with a large dataset with various values which is especially the case in the stock market.
Going ahead,  Variance is a related concept and is further explained.

Variance

Variance is a dispersion measure which suggests the average of differences from the mean, in a similar manner as Mean Deviation does, but here the deviations are squared.

So,  

Here, N = number of values in data set and

D0, D1, D2, D3 are the deviation of each value in the data set from the mean.

Here, taking the values from the example above, we simply square each deviation and then divide the sum of deviated values by the total number in the following manner:

In python code, it is as follows:

variance = np.var (aapl['Adj Close'])
variance

The output is 1154.50. Let us jump to another measure called Standard Deviation now.

Standard Deviation

In simple words, the standard deviation is a calculation of the spread out of numbers in a data set. The symbol  (sigma)represents Standard deviation and the formula is:

Also,

is the formula of standard deviation.

Here, let us take the same values as in the two examples above and calculate Variance. Hence,

Further, in python code, standard deviation can be computed using matplotlib library, as follows:

std = np.std(aapl['Adj Close'])
std

The output is: 34.0586724687285

All the types of measure of deviation bring out the required value from the observed one in a data set so as to give you the perfect insight into different values of a variable, which can be price, time, etc.

It is important to note that Mean absolute data, Variance and Standard Deviation, all help in differentiating the values from average in a given large data set.

Visualization

Visualization helps the analysts to decide on the basis of organized data distribution. There are four such types of Visualization approach, which are:

  • Histogram
  • Bar Chart
  • Pie Chart
  • Line Chart

Histogram

                                                                  Age groups

Here, in the image above, you can see the histogram with random data on x-axis (Age groups) and y-axis (Frequency). Since it looks at a large data in a summarised manner, it is mainly used for describing a single variable.

For an example, x-axis represents Age groups from 0 to 100 and y-axis represents the Frequency of catching up with routine eye check up between different Age groups. The histogram representation shows that between the age group 40 and 50, frequency of people showing up was highest.

Since histogram can be used for only a single variable, let us move on and see how bar chart differs.

Bar chart

In the image above, you can see the bar chart. This type of visualization helps you to analyse the variable value over a period of time. For an example, the number of sales in different years of different teams. You can see that the bar chart above shows two years shown as Period 1 and Period 2.

Now, in Period 1 (first year), Team 2 and Team 4 scored almost the same points in terms of number of sales. And, Team 1 was decently scoring but Team 3 scored the least.

Going ahead, in Period 2 (second year), Team 1 outperformed all the other teams and scored the maximum, although, Team 4 also scored decently well just after Team 1. Comparatively, Team 3 scored decently well, whereas, Team 2 scored the least.

Since this visual representation can take into consideration more than one variable and different periods in time, bar chart is quite helpful while representing a large data with various variables.

Let us now see ahead how Pie chart is useful in showing values in a data set.

Pie Chart

Above is the image of a Pie chart, and this representation helps you to present the percentage of each variable from the total data set. Whenever you have a data set in percentage form and you need to present it in a way that it shows different performances of different teams, this is the apt one.

For an example, in the Pie chart above, it is clearly visible that Team 2 and Team 4 have similar performance without even having to look at the actual numbers. Both the teams have outperformed the rest. Also, it shows that Team 1 did better than Team 3. Since it is so visually presentable, a Pie chart helps you in drawing an apt conclusion.

Moving further, the last in the series is a Line chart.

Line Chart

With this kind of representation, the relationship between two variables is clearer with the help of both y-axis and x-axis. This type also helps you to find trends between the mentioned variables.

In the Line chart above, there are two trend lines forming the visual representation of 4 different teams in two Periods (or two years). Both the trend lines are helping us be clear about the performance of different teams in two years and it is easier to compare the performance of two consecutive years. It clearly shows that in Period, 1 Team 2 and Team 4 performed well. Whereas, in Period 2, Team 1 outperformed the rest.

Okay, as we have a better understanding of Descriptive Statistics, we can move on to other mathematical concepts, their formulas as well as applications in algorithmic trading.

Probability Theory

Now let us go back in time and recall the example of finding probabilities of a dice roll. This is one finding that we all have studied. Given the numbers on dice i.e. 1,2,3,4,5, and 6, the probability of rolling a 1 is 1 out of 6 or ⅙. Such a probability is known as discrete in which there are a fixed number of results.

Now, similarly, probability of rolling a 2 is 1 out 6, probability of rolling a 3 is also 1 out of 6, and so on. A probability distribution is the list of all outcomes of a given event and it works with a limited set of outcomes in the way it is mentioned above. But, in case the outcomes are large, functions are to be used.

If the probability is discrete, we call the function a probability mass function. In the case of dice roll, it will be P(x) = 1/6 where x = {1,2,3,4,5,6}.

For discrete probabilities, there are certain cases which are so extensively studied, that their probability distribution has become standardised. Let’s take, for example, Bernoulli's distribution, which takes into account the probability of getting heads or tails when we toss a coin. We write its probability function as px (1 – p)(1 – x). Here x is the outcome, which could be written as heads = 0 and tails = 1.

Now, let us look into the Monte Carlo Simulation in understanding how it approaches the possibilities in the future, taking a historical approach. It is said that the Monte Carlo method is a stochastic one (in which there is sampling of random inputs) to solve a statistical problem.

Well, simply speaking, Monte Carlo simulation believes in obtaining a distribution of results of any statistical problem or data by sampling a large number of inputs over and over again. Also, it says that this way we can outperform the market without any risk.

One example of Monte Carlo simulation can be rolling a dice several million times to get the representative distribution of results or possible outcomes. With so many possible outcomes, it would be nearly impossible to go wrong with the prediction of actual outcome in future. Ideally, these tests are to be run efficiently and quickly which is what validates Monte Carlo simulation.

Although asset prices do not work by rolling a dice, they also resemble a random walk. Let us learn about Random walk now.

What is Random walk?

Random walk suggests that the changes in stock prices have the same distribution and are independent of each other. Hence, based on the past trend of a stock price, future price can not be predicted. Also, it believes that it is impossible to outperform the market without bearing some amount of risk.

Coming back to Monte Carlo simulation, it validates its own theory by considering a wide range of possibilities and on the assumption that it helps reduce uncertainty. Monte Carlo says that the problem is when only one roll of dice or a probable outcome or a few more are taken into consideration. Hence, the solution is to compare multiple future possibilities and customize the model of assets and portfolios accordingly.

After the Monte Carlo simulation, it is also important to understand Bayes’ theorem since it looks into the future probabilities based on some relatable past occurrences and hence, has usability.

In simple words, Bayes’ theorem displays the possibility of the occurrence of an event based on the past conditions that might have led to a relatable event to take place.

For example, say a particular age group between 50-55 had recorded maximum arthritis cases in months of December and January last year and last to last year also. Then it will be assumed that this year as well in the same months, the same age group may be diagnosed with arthritis.

This can be applied in probability theory, wherein, based on the past occurrences with regard to stock prices, the future ones can be predicted.

There is yet another one of the most important concepts of Mathematics, known as Linear Algebra which now we will learn about.

Linear Algebra

What is linear algebra?

In simple words, linear algebra is the branch of mathematics that consists of linear equations, such as a1 x1 + ……. + an xn  = b,. The most important thing to note here is that the Linear algebra is the mathematics of data, wherein, Matrices and Vectors are the core of data.

  • What are matrices?

A matrix or the matrices are an accumulation of numbers arranged in a particular number of rows and columns. Numbers included in a matrix can be real or complex numbers or both.

For example, M is a 3 by 3 matrix with the following numbers:

0      1      3

4      5      6

2      4      7

  • What are the vectors?

In simple words, Vector is that concept of linear algebra that has both, a direction and a magnitude.

For example, V is:

[9]

[6]

[-5]

Now, If X =

Then,

MX = V which will become ,

In this arrow, the point of the arrowhead shows the direction and the length of the same is magnitude.

Above examples must have given you a fair idea about linear algebra being all about linear combinations. These combinations make use of columns of numbers called vectors and arrays of numbers known as matrices, which concludes in creating new columns as well as arrays of numbers.

There is a known involvement of linear algebra in making algorithms or in computations. Hence, linear algebra has been optimized to meet the requirements of programming languages. Also, for improving efficiency, certain linear algebra implementations (BLAS and LAPACK) configure the algorithms in an automated manner. This helps the programmers to adapt to the specific nature of the computer system, like cache size, number of cores and so on.  

In python code :

import numpy as np
A = np.array ([[0, 1, 3], [4, 5, 6], [2, 4, 7]])
print ('rank of A:', np.linalg.matrix_rank (A))
print ('Trace of A:', np.trace (A))
print ('\Determinant of A:', np.linalg.det (A))

# Inverse of matrix A
print (“\nInverse of A:”, np.linalg.inv (A))
print (“\nMatrix A raised to power 3:\n”,
np.linalg.matrix_power(A,3)) 

The output is:

rank of A: 3

Trace of A: 12

Determinant of A: 2.0000000000000004

Let us move ahead to another known concept used in algorithmic trading called Linear Regression.

Linear Regression

Coming to Linear Regression, it is yet another topic that helps in creating algorithms and is a model which was originally developed in statistics. Linear Regression is an approach for modelling the relationship between a scalar dependent variable y and one or more explanatory variables (or independent variables) denoted x.

Nevertheless, despite it being a statistical model, it helps with the machine learning algorithm by showing the relationship between input and output numerical variables.

Now, How is Machine Learning helpful in creating algorithms?

Machine learning implies an initial manual intervention for feeding the machine with programs for performing tasks followed by an automatic situation based improvement that the system itself works on. It is such a concept that is quite helpful when it comes to computational statistics. Computational statistics is the interface between computer science and mathematical statistics. Hence, computational statistics, which is also called predictive analysis, makes the analysis of current and historical events to predict the future with which trading algorithms can be created.

In short, Machine learning with its systematic approach to predict future events helps create algorithms for successful automated trading.

Moving further with Linear regression, its basic formula is:

Y = mx+b

If you wish to read more on Linear regression and its advanced equations, refer to the link here.

Below, you will see the representations of x & y clearly in the graph:

In the graph above, x-axis and y-axis both show variables (x and y). Since more sales of handsets or demand (x-axis) of handsets is provoking a rise in supply (y-axis) of the same, the steep line is formed. Hence, to meet this rising demand, the supply or the number of handsets also rise.

Simply, y = how much the trend line goes up (Supply)

x = how far the trend line goes (Demand)

b = intercept of y (where the line crosses the y-axis)

In linear regression, the number of input values (x) are combined to produce the predicted output values (y) for that set of input values. Basically, both the input values and output values are numeric.

To read more, please refer to the blog here.

As we move ahead, let us take a look at another concept called Calculus which is also imperative for algorithmic trading.

Calculus

Calculus is one of the main concepts in algorithmic trading and was actually termed as infinitesimal calculus, which means the study of values that are really small to be even measured.

In general, Calculus is a study of continuous change and hence, very important for stock markets as they keep undergoing frequent changes.

Coming to the types of calculus, there are two broad terms:

  • Differential Calculus - It calculates the instantaneous change in rates and the slopes of curves.
  • Integral Calculus - This one calculates the quantities summed up together.

In Calculus, we usually calculate the distance (d) in a particular time period(t) as:

d = at^2

where d is distance, a is acceleration and t is time

Now, to simplify this calculation, let us suppose ‘a’ as ‘5’.

So,
d = 5t^2

Now, if time (t) is 1 second and distance covered is to be calculated in this time period which is 1 second, then,

d=5(1)^2   = 5 meters/second.

Here, it shows that the distance covered in 1 second is  5 meters.

But, if you want to find the speed at which 1 second was covered(current speed), then you will be needing a change in time, which will be t .

Now, as it is really less to be counted, t+t will denote o second.

Let us calculate the speed between t and t seconds as we know from the previous calculation that at 1 second, the distance covered was 5m/s.

Now, with the same formula we will also find distance covered at 0 seconds (t +t ):

So, d = 5 t^2

d = 5 (t + t )^2  m

d = 5 (1+t )^2 m

Expanding (1+t )^2 , we will get 1+ 2t + (t)^2

That brings us to  d = 5(1+ 2t + (t)^2  ) m

Solving it further we will get, d= 5 + 10t + 5(t)^2  m

Coming to the final conclusion,

Speed = distance/ time so, here, speed =  5 + 10t + 5(t)^2  m/ t s

This brings us to the conclusion, 10 + 5t m/s

Since t is considered to be a smaller value than 1 second,  and the speed is to be calculated at less than a second (current speed), the value of t will be close to zero.

Therefore, the current speed = 10m/s.

This study of continuous change can be appropriately used with linear algebra and also, can be utilised in probability theory. In linear algebra, it can be used to find the linear approximation for a set of values and in probability theory, it can determine the possibility of a continuous random variable.

Being a part of normal distribution, calculus can be used for finding out normal distribution as well. To read more on normal distribution, read here.

Awesome! This brings us to the end of all the essential mathematical concepts required for Quants/HFT/Algorithmic Trading.

Conclusion

In the entire article, we have covered various topics on mathematics as well as statistics and also the related subtopics of them all. Since algorithmic trading requires a thorough knowledge of mathematical concepts, we have learnt various necessary concepts namely :

  • Descriptive Statistics
  • Probability Theory
  • Linear Algebra
  • Linear Regression
  • Calculus

Explaining them all, there are subtopics providing you with important and deeper aspects of each with their mathematical equations and computation on platforms like excel and python.

As the entire article is aimed to get you closer to your next step in algorithmic trading, we wish you the best.

Disclaimer: All data and information provided in this article are for informational purposes only. QuantInsti® makes no representations as to accuracy, completeness, currentness, suitability, or validity of any information in this article and will not be liable for any errors, omissions, or delays in this information or any losses, injuries, or damages arising from its display or use. All information is provided on an as-is basis.