Answers:Is the equation over-fitting?
This was the first question I had asked. To know if your data is overfitting or not, the best way to test it would be to check the prediction error that the algorithm makes in the train and test data.
To do this, we will have to add a small piece of code to the already written code.
First, let me begin my explanation by apologizing for breaking the norms: going beyond the 80 column mark.
Second, if we run this piece of code, then the output would look something like this.
Our algorithm is doing better in the test data compared to the train data. This observation in itself is a red flag. There are a few reasons why our test data error could be better than the train data error:
- If the train data had a greater volatility (Daily range) compared to the test set, then the prediction would also exhibit greater volatility.
- If there was an inherent trend in the market that helped the algo make better predictions.
Next, to check if there was a trend, let us pass more data from a different time period.
If we run the code the result would look like this:
So, giving more data did not make your algorithm work better, but it made it worse. In a time series data, the inherent trend plays a very important role in the performance of the algorithm on the test data. As we saw above it can yield better than expected results sometimes. The main reason why our algo was doing so well was the test data was sticking to the main pattern observed in the train data.
So, if our algorithm can detect underlying the trend and use a strategy for that trend, then it should give better results. I will explain this in more detail:
- Can the machine learning algorithm detect the inherent trend or market phase (bull/bear/sideways/breakout/panic).
- Can the database be trimmed in a way to train different algos for different situations
We can divide the market into different regimes and then use these signals to trim the data and train different algorithms for these datasets. To achieve this, I choose to use an unsupervised machine learning algorithm.
From here on, this blog will be dedicated to creating an algorithm that can detect the inherent trend in the market without explicitly training for it.
First, let us import the necessary libraries.
Then we fetch the OHLC data from Google and shift it by one day to train the algorithm only on the past data.
Then drop all the NaN.
Next, we will instantiate an unsupervised machine learning algorithm using the ‘Gaussian mixture’ model from sklearn.
In the above code, I created an unsupervised-algo that will divide the market into 4 regimes, based on the criterion of its own choosing. We have not provided any train dataset with labels like in the previous blog.
Next, we will fit the data and predict the regimes. Then we will be storing these regime predictions in a new variable called regime.
Now let us calculate the returns of the day.
Then, create a dataframe called Regimes which will have the OHLC and Return values along with the corresponding regime classification.
After this, let us create a list called ‘order’ that has the values corresponding to the regime classification, and then plot these values to see how well the algo has classified.
The final regime differentiation would look like this:
This graph looks pretty good to me. Without actually looking at the factors based on which the classification was done, we can conclude a few things just by looking at the chart.
- The red zone is the low volatility or the sideways zone
- The purple zone is high volatility zone or panic zone.
- The green zone is a breakout zone.
- The blue zone: Not entirely sure but let us find out.
The output would look like this:
The data can be inferred as follows:
- Regime 0: Low mean and High covariance.
- Regime 1: High mean and High covariance.
- Regime 2: High mean and Low covariance.
- Regime 3: Low mean and Low covariance.
To rephrase Morpheus,
This is your last chance. After this, there is no turning back. You take the blue pill—the story ends, you wake up in your bed and believe that you can trade manually. You take the red pill—you stay in the Algoland, and I show you how deep the rabbit hole goes.
Remember: all I'm offering is the truth. Nothing more.
Next StepAt this moment, AI and Machine Learning have already progressed enough and they can predict stock prices with a great level of accuracy. So what makes it possible? read our post on 'Machine Learning For Trading – How To Predict Stock Prices Using Regression?' to know more.
Download Data Files