Don't be deceived by the past. In the rapidly evolving domains of data science and financial machine learning, quicker calculations and more effective processing techniques are becoming more and more important. These days, a new set of open-source software libraries called RAPIDS is gaining popularity.
RAPIDS leverages GPU capabilities to expedite data science tasks. This post will look at every aspect of RAPIDS, including its libraries, hardware specifications, setup guidelines, useful applications, and drawbacks. Last but not least, as usual, I'm going to offer a trading strategy based on the RAPIDS suite!
We cover:
- Understanding RAPIDS Libraries
- RAPIDS Libraries Installation Guide
- Practical Examples of the RAPIDS Libraries
- A trading strategy using machine learning and the GPU
- Limitations of the Up-to-Date Libraries
Understanding RAPIDS Libraries
A new approach to speeding up data science and machine learning procedures is provided by the open-source software libraries collectively known as RAPIDS. It is necessary to use all RAPIDS libraries to fully take advantage of the computational and data analysis capabilities of GPUs.
Let’s look at the main RAPIDS Librarieshere:
- cuDF: A GPU-accelerated data frame manipulation and operation tool similar to Pandas but optimised for GPUs. It has a Pandas-like user interface and accelerates processing through GPU parallelism.
- cuML: This library is used for machine learning tasks. It provides GPU-accelerated algorithms for various tasks, such as clustering, regression, and classification. These algorithms are made to improve performance without compromising accuracy, which makes them suitable for use with large-scale datasets.
- cuPy: Identical in appearance to NumPy, cuPy is intended to be a GPU-accelerated array library that enables fast GPU array operations. It mimics NumPy's functionality to seamlessly transfer array-based code to GPU architectures, increasing computational speed.
These libraries are combined to create a single system that helps with data manipulation, analysis, and machine learning tasks by utilizing the parallel processing power of GPUs. This acceleration makes it possible to develop models and analyze data more quickly, which is helpful for tasks involving big datasets. It shortens processing times as well.
To make the most of GPU-accelerated computing, researchers, machine learning experts, and data scientists must grasp the nuances of the RAPIDS libraries. These libraries provide high-performance computing capabilities along with the ability to speed up and simplify a multitude of data processing tasks.
RAPIDS Libraries Installation Guide
The RAPIDS libraries can be installed using the following steps:
Step 1: System requirements
Please confirm that your system satisfies the requirements before proceeding with the installation. It is imperative to have a compatible GPU because RAPIDS libraries are optimized for NVIDIA GPUs. It only works in Linux-based operating systems. In case you have Windows, you can use WSL2 to have Ubuntu as a virtual machine. Verify that the Linux version on your machine is supported (such as Ubuntu or CentOS). Installing NVIDIA drivers that are compatible with your GPU is also required.
Step 2: Installing Conda
The installation and management of RAPIDS libraries require the use of Conda, a package manager and environment manager. Installing Miniconda or Anaconda, two Python distribution platforms that support Conda, should be your first step.
Follow the installation guidelines on the official website to download and install Miniconda or Anaconda.
For RAPIDS, create a new Conda environment to keep the setup tidy and isolated. The following command can be used to create an environment with the name "rapids" or any other desired name:
Step 3: Install the RAPIDS Libraries
Use the following command to activate the Conda environment after it has been created:
Next, use the following command to install RAPIDS libraries:
This command will install the RAPIDS suite in the specified Conda environment. The rapids=0.21 refers to the version of RAPIDS being installed.
Step 4: Verifying the Installation
Once the installation process is complete, you can verify that RAPIDS libraries have been successfully installed in your Conda environment. Open a Python interpreter within the Conda environment and import the desired libraries (e.g., cuDF, cuML, cuPy) to ensure they are accessible and functioning properly.
If the import statements execute without errors, it indicates the successful installation of RAPIDS libraries.
Practical Examples of the RAPIDS Libraries
Let’s understand how to use the 3 libraries from above. The examples will give a glimpse of what you can do with these libraries. As you’ll discover, they act very similar to numpy, pandas and scikit-learn. So you will not get confused at all while using them. They’re easy to handle and you’ll start coding quickly.
Ready to have some fun?
Let's explore!
cuPy Examples
We now create two random arrays with 10,000 observations. Then we multiply them.
Example 1: In this example, we create 10,000 random numbers and dot-multiply them to get a unique value as the result.
Example 2: Here we create two 2x2 matrices and compute the multiplication of both. We then print the resulting matrix.
cuDF Examples
Example 1: Next, we create a GPU-based dataframe with 2 columns A and B and 3 observations each and sum both columns and the result we save it in column C. So simple, right?
Example 2: Here we create a pandas dataframe obtained with a dictionary. Then we upload the pandas-based dataframe to the GPU memory using the cudf library. Then we print the dataframe.
cuML Examples
Example 1: We provide in this example two cupy arrays with 1000 random numbers each and use them to fit a k-means clustering algorithm with the cuml library. We then predict the labels of the features as per the model.
Example 2: Finally, in this example, we create random input and prediction features using the cuml library. Then, we split the data into train and test data and next perform a random forest classifier to the data. Finally we predict the X test data and show only 10 predictions.
Did you notice?
It’s like using CPU-based libraries! So smooth the coding, right?
A trading strategy using machine learning and the GPU
Using RAPIDS libraries, one can design a machine learning-based trading strategy. By integrating cuDF for data manipulation, cuML for predictive modelling, and cuPy for numerical operations, a trader can develop a strategy based on historical market data, applying various machine learning algorithms for predictive analysis to make trading decisions. Each day we estimate a random forest and forecast the next signal. We purge the train data to avoid having test data information overlapped in it.
Once we create the signal, we get the cumulative returns for a buy-and-hold and the strategy.
Let’s see the graph
We got good returns! But, be careful! Check always the strategy performance and do cross-validation to verify the edge of your strategy. Always add a risk management process. Watch out!
Limitations of the Up-to-Date Libraries
The limitations of these libraries can be listed as follows:
- By the time of the most recent update in March 2024, RAPIDS has advanced significantly. Like any developing technology, it has drawbacks as well, such as the fact that there are fewer algorithms implemented in cuML than in well-known CPU-based libraries like scikit-learn.
- Additionally, its reliance on NVIDIA GPUs limits its application on computers without this technology.
- Be careful of reproducibility, n_streams equal to 1 make the model have reproducibility, but a higher number will not make it.
- The VRAM might not be sufficient enough for a complex machine learning model and data. Whenever there’s a cuda memory error, you might need to decrease the model’s complexity or decrease the dataframe dimensions to have it run smoothly as per your hardware specifications.
Conclusion
As a new collection of libraries, RAPIDS uses GPU acceleration for activities related to data science and machine learning. Though it has a lot of potential, it is important to be aware of several algorithmic limits as well as hardware requirements. However, RAPIDS's ongoing development and community support indicate a promising trajectory for transforming the data science field.
Even with the limitations, we were able to create a trading strategy. Want to learn more about Python for trading? Please check this comprehensive 6-course learning track about Machine Learning and Deep Learning! You’ll find there are ML and Deep learning models to be applied to trading strategies. You can start using them with the Rapids library! Try it!
Ready to create your own strategy?
Go algo!
File in the Download:
- The Rapids AI library (Python notebook)
Author: José Carlos Gonzáles Tanaka
Disclaimer: All investments and trading in the stock market involve risk. Any decision to place trades in the financial markets, including trading in stock or options or other financial instruments is a personal decision that should only be made after thorough research, including a personal risk and financial assessment and the engagement of professional assistance to the extent you believe necessary. The trading strategies or related information mentioned in this article is for informational purposes only.