Machine Learning Classification Strategy In Python

9 min read

By Ishan Shah and Rekhit Pachanekar

Before you move on to build your machine learning classification algorithm, here’s a question for you. Do you know what Tesla and Netflix have in common?

Both companies’ CEOs prefer to follow the ancient philosophy of “First-principles”.

And what is the philosophy of first principles?

In simple terms, when faced with a difficult issue, you try to break it down into smaller parts and build your way from up there.

But why are ‘first principles’ mentioned in a blog on a machine learning classification model?

Before we build a classification model in python, we will start a bit from the basics, which is machine learning. Let’s see what’s in store for us in the blog:

A brief on machine learning

Let us take a simple definition of machine learning. Machine learning provides machines with the ability to learn autonomously based on past experiences, observations and analysing patterns within a given data set, without explicitly programming.

To break it down, earlier programmers used to explicitly code what the machine should do. Take the example of a car with power steering. If the driver turns the steering wheel to the right, there is a program which tells the car to move the wheels to the right. This will make the car turn right.

And what happens when we add machine learning to it?

We can essentially fit the car with a camera, and some proximity sensors which will help the program “see” what is in front of the car. And we will teach the machine to check if there is any obstacle in front of the car, and if not, then move forward.

This is obviously an over-simplification, but this is one of the first principles when it comes to making a “driverless” car.  And companies like Tesla, Ford, and BMW, are using machine learning and AI to do all of these tasks.

If you want to know more about the fundamentals of machine learning, do take a look at the machine learning basics blog.

How is machine learning being implemented in trading?

We can use data, which could be the OHLC or price data, fundamental data, or alternative data such as tweets and news data about a certain asset, to create the machine learning model, and then use it to predict the future.

When we say predict the future, we are saying that the machine learning model will give us signals, on whether to buy or sell an asset to increase our gains. Of course, nobody is perfect, and machine learning is still a long way to go before it accurately predicts the future.

Because if it could, everyone using it would be millionaires by now. However, in the trading domain, machine learning could give you that edge when it comes to outperforming the competition.

When it comes to machine learning, there are different types of approaches taken to solve an issue. Broadly speaking you can put machine learning models in three different categories:

In the next section, let us try to understand how machine learning can help predict the future using a machine learning classification model.

What is a classification algorithm in machine learning?

Humans are essentially beings who like to classify things.  In fact, it is one of the ways we learn to identify things.

Imagine a baby who is trying to understand the world he is seeing.

  • He sees his parents, who have two hands and two legs.
  • But once when he was in a garden, he also noticed a creature which had four legs and a tail.
  • The baby’s mom said that the creature was a dog.
  • Then the baby’s father shows a picture of a cat and said this is a different creature.
  • Now the baby can identify humans, cats and dogs.

This is a simplified version of how a machine learning model learns by classification as well. Initially, we provide the machine learning model with some examples to train the model. Then, we check how much the machine has learnt by giving it some examples as a test.

There are various types of machine learning models which cater to different types of classification tasks. Let’s check a few types of machine learning algorithms in the next section.

Types of classification algorithms in machine learning according to classification tasks

The following are the various type of classification algorithms in machine learning based on different classifications.

Binary classification

In this type of classification, you only have two classes for classification. A common example is spam or not spam, or buy or not buy.

Multi-class classification

As the name implies you have more than two types of classes in this type of classification task. For example, the model would have to classify as buy, hold, and sell.

Imbalance classification

If you are given data where the data for one class is a majority, then it is a type of imbalance classification task. For example, if the data of S&P500 is taken and the task is to predict whether we should hold it for a year, or not, we will find that since the S&P500 increases every year, the model would always say hold it for a year.

Depending on the tasks, you can select different classification models. In fact, check out the machine learning in trading book, where we have explained various machine learning models, including classification in Python. If you want to know about the different types of classification models in Python, you can read them here.

Let us now build a classification model in Python for helping us make a decision whether to buy an asset or not.

Implementing Classification in Python

A brief disclaimer here, you can build a machine learning classification model in other programming languages as well. But here, we focus on classification in Python because of its versatility and ease of use.

While it will take some time for you to master Python, you can understand the code and what it is doing, thanks to the support of various libraries in Python right from day one.

Without these libraries, you would have to spend time coding the various functions to perform the task you want. But with these libraries, you just have to write a few lines of code and the task will be done. Here we will use the “sklearn” Python library to create a simple classification model.

We will explain the code as we go along and you will understand the simplicity of the Python language.

We will implement a machine learning classification algorithm on S&P500 using Support Vector Classifier (SVC).

SVCs are supervised learning classification models. A set of training data is provided to the machine learning classification algorithm, each belonging to one of the categories.

For instance, the categories can be to either buy or sell a stock. The classification algorithm builds a model based on the training data and then, classifies the test data into one of the categories.

Step 1: Import the libraries

Before we start to code any trading strategy, we have to import the necessary Python libraries that will be needed. In our case, these include the sklearn machine learning library.

Further, we use the pandas library to store as well as perform various data manipulation tasks on the data. Finally, we use matplotlib and the seaborn library to visualise the data.

Step 2: Fetch data

We will download the S&P500 data from Yahoo! Finance using yfinance python library.

After that, we will drop the missing values from the data and plot the S&P500 close price series.

Price Graph of S&P500

Step 3: Determine the target variable

The target variable is that variable which the machine learning classification algorithm will predict. In this example, the target variable is whether the S&P500 price will increase or decrease the next trading day. In other words, whether the S&P500 will close up or close down on the next trading day.

We will first determine the actual trading signal using the following logic:

  • If the next trading day's close price is greater than today's close price, then we will buy the S&P500 index.
  • Else we will sell the S&P500 index.
  • We will store +1 for the buy signal and -1 for the sell signal.

We use .shift to shift the close price column. The parameter of -1 indicates we move the next day’s closing price up.

The target variable is one side of the equation. Now, we need something which will help the model learn, and these are the predictor variables. We will look at them in the next section.

Step 4: Creation of predictors variables

The X is a dataset that holds the predictor's variables which are used to predict the target variable, ‘y’.  If you ask someone how they would predict the next day’s price, they might say that they will look at today’s price, or rather, the close price to be specific.

In this manner, the close price can be a predictor variable. However, you should not use the close price, as it is the target variable itself. Thus, we use different variables here.

While you can use various technical indicator values, or fundamental data as well, we are going to focus on the OHLC/price data for simplicity.

Instead of directly using the OHLC data, we try to find a combination of them. Thus, the X consists of variables such as 'Open - Close' and 'High - Low'. These can be understood as indicators based on which the algorithm will predict the option price.

All right. We have both the predictor variables, as well as the target variable. So how do we proceed?

Remember that in the initial sections, we said that the machine needs some data to learn first, before it can start predicting. We call the learning phase “training the model”, and we test whether the model has learnt correctly or not in the “testing the model” phase.

Step 5: Test and train dataset split

Now, you cannot use the entire dataset to train the classification model in Python. Think about it, if you knew the exact questions that would be asked in an exam test, would it be a fair exam?

In a similar manner, we divide the dataset into train and test datasets to evaluate if the machine learning model has learned correctly or not. In this step, we will split data into the train dataset and the test dataset.

  1. First, 80% of data is used for training and the remaining data for testing
  2. X_train and y_train are train dataset
  3. X_test and y_test are test dataset

Step 6: Create the machine learning classification model using the train dataset

We will first create the machine learning in python classification model based on the train dataset. This model will be later used to predict the trading signal in the test dataset.

Step 7: The classification model accuracy_score in Python

We will compute the accuracy of the classification model on the train and test dataset, by comparing the actual values of the trading signal with the predicted values of the trading signal. The function accuracy_score() will be used to calculate the accuracy.

Syntax: accuracy_score(target_actual_value,target_predicted_value)

  1. target_actual_value: correct signal values
  2. target_predicted_value: predicted signal values

An accuracy of 50%+ in test data suggests that the classification model is effective.

Step 8: Prediction

We will predict the signal (buy or sell) for the test data set, using the cls.predict() function. Then, we will compute the strategy returns based on the signal predicted by the model in the test dataset. We save it in the column 'Strategy_Return'.

Step 9: Plotting classification data in matplotlib

Finally, we will plot the cumulative strategy returns using the matplotlib python package.

Strategy returns of machine learning classifier model

As seen from the graph, the machine learning in python classification strategy generates a return of around 25% in the test data set.

Of course, you can tweak the predictor variables or select different ones and check how it affects the accuracy and performance of the strategy.

In this way, we have learnt what classification is in machine learning. Further, you also built a classification model in Python to predict the future price of the S&P500.

If you want to learn more about classification models in Python, you can check the Trading with Machine Learning: Classification and SVM course which will help you create the classification strategy as well as live trade it.

If you want to become an ML expert, you can check Quantra’s Learning Track on Machine Learning & Deep Learning in Financial Markets, which is a set of 7 courses and covers all ML models including decision trees and neural networks.

File in the download

  • Machine Learning Classification Strategy Python Code

Update - We have noticed that some users are facing challenges while downloading the market data from Yahoo and Google Finance platforms. In case you are looking for an alternative source for market data, you can use Quandl for the same.

Disclaimer: All investments and trading in the stock market involve risk. Any decision to place trades in the financial markets, including trading in stock or options or other financial instruments is a personal decision that should only be made after thorough research, including a personal risk and financial assessment and the engagement of professional assistance to the extent you believe necessary. The trading strategies or related information mentioned in this article is for informational purposes only.

Learning Track: Machine Learning & Deep Learning in Financial Markets