In pair trading, usually a pair of stocks is traded in a market neutral strategy, i.e. it doesn’t matter whether market is trending upwards or downwards, the two open positions for each stock hedge against each other. To be able to pair trade, the key challenges are to:

- Choose a pair which will give you good statistical arbitrage opportunities over time
- Choose the entry/exit points

**Correlation**

Though not common, a few pair trading strategies look at correlation to find a suitable pair to trade. Correlation is measurement of relationship between two variables, in this case, log returns of prices of stocks A and B. If correlation is high, say 0.8, traders may choose that pair. This high number represents a strong relationship between the two stocks. So if A goes up, the chances of B going up are also quite high. Based on this assumption a market neutral strategy is played where A is bought and B is sold; bought and sold decisions are made based on their individual patterns.
Just looking at correlation might give you spurious results. For instance, if your strategy is based on the spread between the prices of the two stocks, it is possible that the prices of the two stocks keep on increasing without ever mean reverting.

Spread = log(a) – *n*log(b), where ‘a’ and ‘b’ are prices of stocks A and B respectively. For each stock of A bought you have sold *n *stocks of B.

Now, both ‘a’ and ‘b’ increases in such as way that the value of spread decreases. This will result in a loss since stock A is increasing at a rate lower than stock B and you are short on stock B.

**Cointegration**

The most common test for pair trading is the co integration test. Cointegration is a statistical property of two or more time series variables which indicates if a linear combination of the variables is stationary. Let us understand this statement above. The two time series variables in this case are the log of prices of stocks A and B. Linear combination of these variables can be a linear equation defining the spread:
Spread = log(a) – *n*log(b), where ‘a’ and ‘b’ are prices of stocks A and B respectively. For each stock of A bought you have sold *n *stocks of B.

If A and B are cointegrated then it implies that this equation above is stationary. A stationary process has very valuable features which are required to model pair trading strategies. For instance, in this case if the equation above is stationary, that suggests that the mean and variance of this equation remains constant over time. So if we start with ‘n’, which is called the hedge ratio, so that spread = 0, the property of stationary implies that expected value of spread will remain as 0. Any deviation from this expected value is a case for statistical abnormality, hence a case for trading!

**How to choose a pair of stocks for trading?**

- For any pair of two stocks, define the spread as below:

*n*log(b), where ‘a’ and ‘b’ are prices of stocks A and B respectively.

Assumption: *n*, the hedge ratio, is a constant.

- Calculate ‘n’ using regression so that spread is as close to 0 as possible. Hence, we regress the stock prices to calculate the hedge ratio.

- Run the Dicky Fuller test on the spread (more complicated and popular version is called Augmented Dicky Fuller Test or ADF) values inserting the value of ‘n’. DF test is a hypothesis test which gives pValue as the result. If this value is less than 0.05 or 0.01, we can say with 95% or 99% confidence that the signal is stationary and we can choose this pair.

Further reading on statistical arbitrage: https://blog.quantinsti.com/incorrect-notions-statistical-arbitrage/