Convolutional Neural Networks in Trading with Python: A Complete Guide for CNN

11 min read

By Chainika Thakar

Convolutional neural networks have gained immense popularity recently. You must be wondering what is convolutional neural network? Convolutional neural networks (CNN) is a part of deep learning technique that is mainly used for image recognition and computer vision tasks. Since data visualisation is an integral concept of algorithmic trading, CNN is widely used for the same.

The key characteristic of a CNN is its ability to automatically learn and extract features from raw input data through the use of its convoluted layers. These layers apply a set of filters (also called kernels) to the input data for learning.

These filters enable the network to detect different patterns and features at multiple spatial scales. The filters slide over the input data, performing element-wise multiplications and summations to generate feature maps.

In the trading domain, the performance and effectiveness of a CNN depend on the quality of the data, the design of the model architecture, and the size and diversity of the training data.

This blog will cover the convolutional neural networks aka CNN with the help of examples. It will help you understand how you can use this type of deep learning system in order to make informed decisions and for creating trading strategies that result in desirable returns. It also talks about the uses and applications of CNN in trading.

Last but not least, the Python code implementation will be discussed in the blog for training the CNN model to provide you with the best predictions (as per your parameters).

Let us dive deeper into the topic of convolutional neural networks and find out about CNN for trading.

This blog covers:


Layers of convolutional neural networks

CNNs include different types of layers, such as pooling layers and fully connected layers.

Pooling layers reduce the spatial dimensionality of the feature maps, thus, reducing the number of parameters and computations in subsequent layers. Hence, they allow the network to be more robust to small spatial translations or distortions in the input data.

The fully connected layers are responsible for the final classification or regression tasks, where the learned features are combined and mapped to the output labels.

Going forward, let us find out, briefly, the working of a convolutional neural networks.


How do convolutional neural networks work?

Working of CNN
Working of CNN

In general, the working of CNN can be seen above in which the system takes an image as input and passes it through CNN layers and fully connected layers where features are extracted and learned. Fully connected layers also perform classification or regression tasks, depending on the specific objective, to give the output layer.

To give an overview of the working, it goes as follows.

Feature extraction

  • Input layer: The first step is to define the input layer, which specifies the shape and size of the input images.
  • Convolutional layer + ReLU or the feature maps: The convolutional layer performs convolution operations by applying filters or kernels to the input images. These filters or kernels extract local features from the images, capturing patterns such as edges, textures, and shapes. This process creates feature maps that highlight the presence of specific features in different spatial locations. After the convolution operation, an activation function (ReLU) is applied element-wise to introduce non-linearity into the network.
  • Pooling layer: Pooling layers are used to downsample the feature maps generated by the convolutional layers. This layer reduces their spatial dimensions while retaining the most important information.

Classification

  • Flatten layer: At this stage, the feature maps from the previous layers are flattened into a 1-dimensional vector. This step converts the spatial representation of the features into a format that can be processed by fully connected layers.
  • Fully Connected layer: Fully connected layers are traditional neural network layers where each neuron is connected to every neuron in the previous and next layers. These layers are responsible for learning high-level representations by combining the extracted features from the previous layers. The fully connected layers often have a large number of parameters and are followed by activation functions.

Probabilistic distribution

  • Output layer: The output layer is the final layer of the network, responsible for producing the desired output. The number of neurons in this layer depends on the specific task. For example, in image classification, the output layer may have neurons corresponding to different classes. Also, a softmax activation function is often used to convert the output into probability scores. These scores are the prediction figures during tasks such as, predicting prices of financial instruments.

Types of convolutional neural networks

Also, there are some technical terms associated with CNN’s types to help you learn about each type and its purpose. You can see them below.

Types of CNN
Types of CNN

The above image shows each type of CNN introduced in a particular time frame. Hence, the timeline goes as follows.

  • ConvNet (1989) - ConvNet is nothing but short for convolutional neural networks. ConvNet is a specific type of neural network architecture designed for processing and analysing visual data, such as images and videos. ConvNets are particularly effective in tasks like image classification, object detection, and image segmentation.
  • LeNet (1998) - LeNet, short for LeNet-5, is one of the pioneering convolutional neural networks (CNN) architectures developed by Yann LeCun et al. in the 1990s. It was primarily designed for handwritten digit recognition and played a crucial role in advancing the field of deep learning.
  • AlexNet (2012) - AlexNet is a CNN architecture that gained prominence after winning the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in 2012. It introduced several key innovations, such as the use of Rectified Linear Units (ReLU), local response normalisation, and dropout regularisation. AlexNet played a significant role in popularising deep learning and CNNs.
  • GoogleNet or Inception V2,V3, V4 (2014) - GoogLeNet, also known as Inception, is an influential CNN architecture that introduced the concept of "inception modules." Inception modules allow the network to capture features at multiple scales by using parallel convolutional layers with different filter sizes. This architecture significantly reduced the number of parameters compared to previous models while maintaining performance.
  • VGG (2014) - The VGG network, developed by the Visual Geometry Group (VGG) at the University of Oxford, consists of 16 or 19 layers with small 3x3 filters and deeper architectures. It emphasised deeper networks and uniform architecture throughout the layers, which led to better performance but increased computational complexity.
  • ResNet (2015) - Residual Network (ResNet) is a groundbreaking CNN architecture that addressed the problem of vanishing gradients in very deep networks. ResNet introduced skip connections, also known as residual connections, that allow the network to learn residual mappings instead of directly trying to learn the desired mapping. This design enables the training of extremely deep CNNs with improved performance.
  • DenseNet (2016) - DenseNet introduced the idea of densely connected layers, where each layer is connected to every other layer in a feed-forward manner. This architecture promotes feature reuse, reduces the number of parameters, and mitigates the vanishing gradient problem.
  • ResNext (2017) - ResNext is an extension of ResNet that introduces the concept of "cardinality" to capture richer feature representations. It uses grouped convolutions and increases the model's capacity without significantly increasing the computational complexity.
  • Channel Boosted CNN (2018) - Channel Boosted CNN aimed to improve the performance of CNNs by explicitly modelling interdependencies between channels. It employed a channel attention mechanism to dynamically recalibrate the importance of each channel in the feature maps.
  • EfficientNet (2019/20) - EfficientNet used a compound scaling method to balance model depth, width, and resolution for efficient resource utilisation. It achieved state-of-the-art accuracy on ImageNet while being computationally efficient, making it suitable for mobile and edge devices.

Using convolutional neural networks in trading

Let us see how CNN works in the trading domain with this image below-

How CNN works in trading
How CNN works in trading

Briefly, following are the steps for using CNNs in trading domain:

  • Fetching the data and labelling the same
  • The image creation takes place
  • The image goes through CNN and it eventually leads to the financial evaluation result with the help of Keras in TensorFlow.

Working of convolutional neural networks in trading

The working of a convolutional neural networks (CNN) in trading involves several steps, including data preprocessing, model architecture design, training, evaluation with validation and prediction.

Here's a general overview of how a CNN can be applied in trading:

Step 1 - Data preprocessing

The first step is to gather relevant financial data, such as historical price and volume data which is used for future price predictions and for making trading decisions.

This data needs to be preprocessed and transformed into a suitable format for inputting into the CNN model. For example, the time series data may be organised into input matrices or image pixels as shown in the image above.

convolutional neural networks help to apply filters to the input data, allowing the network to automatically learn features and patterns from the data. (Learn neural network trading in detail in the Quantra course).

Step 2 - Model architecture design

This step consists of all the architectural layers through which the CNN model goes before training. These are:

  1. Input layer: Specifies the dimensions of the input data, such as the image width, height, and number of channels (e.g., RGB or grayscale).

2. Convolutional layers

  • Decides on the number of convolutional layers and their parameters, including the number of filters/kernels, filter size, stride, padding, and activation functions (e.g., ReLU).
  • Determine the architecture of each convolutional layer, including the number of filters and their sizes.
  • Consider using techniques like batch normalisation or dropout for regularisation and improving generalisation.

3. Pooling layers

  • Select the pooling strategy (e.g., max pooling, average pooling) and the pooling size.
  • Determine the stride and padding parameters for the pooling operation.

4. Fully connected layers

  • Decides on the number of fully connected (dense) layers and their sizes. Choose the activation functions for the fully connected layers.
  • Considers regularisation techniques like dropout or L2 regularisation.

5. Output Layer

  • Determine the number of output units, which depends on the specific task (e.g., binary classification, multi-class classification, regression).
  • Choose the appropriate activation function for the output layer (e.g., sigmoid, softmax for classification; linear for regression).

6. Loss Function

Select the appropriate loss function based on the task (e.g., binary cross-entropy, categorical cross-entropy, mean squared error).

7. Optimization Algorithm

Choose an optimization algorithm to update the model's parameters during training, such as stochastic gradient descent (SGD), Adam, or RMSprop.

Step 3 - Training

Next, the CNN needs to be trained using labelled training data. The training data typically consists of historical data with corresponding labels, such as price movements or trading signals.

During training, the CNN learns to optimise its internal parameters (weights and biases) to minimise a loss function, which measures the difference between predicted and actual labels. This process includes forward propagation, backpropagation, and gradient descent.

Step 4 - Evaluation and validation

After training, the performance of the CNN is evaluated using validation data. This helps assess how well the model generalises to unseen data and can guide the selection of hyperparameters or adjustments to the model architecture if necessary. Various evaluation metrics, such as accuracy, precision, recall, or profit/loss measures, can be used depending on the trading strategy and objectives.

Step 5 - Prediction

Once the CNN is trained and evaluated, it can be used for making price predictions for creating the trading strategy. The trained model takes the input data, applies the learned features and patterns, and generates predictions or trading signals.

These predictions can be used to make trading decisions or the strategies, such as whether to buy, sell, or hold an asset.


Steps to use convolutional neural networks in trading with Python

We will now see a simple model with the CNN architecture for the image with the candlestick patterns. The model will be trained for 10 epochs. Here, one Epoch is equivalent to one cycle for training a machine learning model.

The number of epochs keeps increasing until the validation error reduces.

The Conv2D layers define the convolutional layers with ReLU activation, while MaxPooling2D is used for regularisation. Also, the Dense layers are used for classification.

Hence, the final outcome will help you find out the performance of the model.

Step 1: Importing necessary libraries

We will first of all import TensorFlow and will use tf.keras.

Step 2: Generate random train and test data for demonstration

Step 3: Define the CNN model

Now, we will define the CNN model that will help with prediction in trading.

The model is defined using the Sequential API, and the layers are added sequentially. The architecture consists of several Conv2D layers with ReLU activation, followed by MaxPooling2D layers to reduce spatial dimensions. The final layers include a Flatten layer to flatten the output, fully connected Dense layers, and an output layer with softmax activation for classification.

Step 4: Normalise the training and test data

Step 5: Compile and train the model

Finally, the model is compiled, trained and made to make predictions on the new images.

The model is compiled with the Adam optimizer, sparse categorical cross-entropy loss function, and accuracy as the evaluation metric.

Output:
Epoch 1/10 32/32 [==============================] - 8s 223ms/step - loss: 2.3030 - accuracy: 0.0990
Epoch 2/10 32/32 [==============================] - 10s 330ms/step - loss: 2.2998 - accuracy: 0.1200
Epoch 3/10 32/32 [==============================] - 5s 172ms/step - loss: 2.3015 - accuracy: 0.1200
Epoch 4/10 32/32 [==============================] - 6s 201ms/step - loss: 2.2994 - accuracy: 0.1200
Epoch 5/10 32/32 [==============================] - 6s 183ms/step - loss: 2.2996 - accuracy: 0.1200
Epoch 6/10 32/32 [==============================] - 5s 170ms/step - loss: 2.2981 - accuracy: 0.1200
Epoch 7/10 32/32 [==============================] - 7s 210ms/step - loss: 2.2987 - accuracy: 0.1200
Epoch 8/10 32/32 [==============================] - 5s 168ms/step - loss: 2.2981 - accuracy: 0.1200
Epoch 9/10 32/32 [==============================] - 7s 216ms/step - loss: 2.2993 - accuracy: 0.1200 Epoch 10/10 32/32 [==============================] - 5s 167ms/step - loss: 2.2975 - accuracy: 0.1200 7/7 [==============================] - 0s 43ms/step

The above output shows the final loss and accuracy values on the test set.

In this specific output, the model did not achieve a very high accuracy on both the training and test sets. Hence, the output is not indicating a good performance.

Also, the final outcome shows that the loss values are not decreasing over the epochs, indicating that the model is not learning and improving its predictions.

For making the loss values decrease over the epochs and to make the model achieve a high accuracy rate, you need to input the model with more number of epochs and you can change the parameters accordingly.

In the similar manner, you can fetch the image data (candlestick pattern, line chart) for a stock (for example, AAPL, TSLA, GOOGL etc.) and train the model on a certain number of epochs.

Python codes for trading with CNN

For trading, you will need the following lines of code below to give you the result. In this case, also the result will be the computation of final loss and accuracy.

And, we reach the end of this blog! You can now use the convolutional neural networks on your own for training the CNN model.

You simply need to define your own parameters on the fetched dataset of your preferred financial instruments.

With the code lines in Python above, you can train your model with a certain number of epochs for it to provide you with the maximum accuracy of price prediction. Hence, that way you can increase the expected returns on your strategy.


Bibliography


Conclusion

We discussed the basics of convolutional neural networks along with the technical types of the same and some applications in trading. With this, we covered the most crucial aspects of using convolutional neural networks with Python.

If you wish to learn more about neural networks, enrol in our course on neural networks in trading where you can use advanced neural network techniques and the research models such as LSTM & RNN to predict markets and find trading opportunities. Keras, the relevant Python library is used in the course for a smooth experience.


Download Data File

  • Convolutional Neural Networks in Trading with Python.ipynb


Disclaimer: All data and information provided in this article are for informational purposes only. QuantInsti® makes no representations as to accuracy, completeness, currentness, suitability, or validity of any information in this article and will not be liable for any errors, omissions, or delays in this information or any losses, injuries, or damages arising from its display or use. All information is provided on an as-is basis.

 Advanced Momentum Trading: Machine Learning Strategies Course