In this post, we have seen in a simple way what are decision trees and how to build them for classification and regression problems. We also saw that they are very useful for making qualitative or quantitative forecasts and that there isn’t a great difficulty in constructing them.
However, it is necessary to be aware of the problems that decision trees usually have, since the greedy approach that the algorithm has to build the tree, does not always offer the best possible output and the precision and robustness of the forecasts are not as stable as we want.
Typically, Decision Trees have the following known problems:
- Underfitting (bias)
- Overfitting (variance)
In this post, we will focus on the Bagging and Boosting to build decision trees in parallel and in sequence respectively, and ensemble the output to improve the forecast accuracy and reduce the variance.
What is Bagging?
Bagging, also known as bootstrapping, is an ensemble method to reduce the variance in the forecasts.
Bagging algorithm builds N trees in parallel with N randomly generated datasets with replacement to train the models, the final result is the average (for regression trees) or the top rated (for classification trees) of all results obtained on the trees.
So, the steps that the Bagging or Bootstrapping algorithm executes are simple:
- Generates N random datasets with the replacement of training dataset.
- Fit N predictors for each random training dataset in parallel, one set per tree.
- Average the results for regression trees or the most voting for classification trees.
We must consider that this method will not always be the optimal algorithm to improve our decision tree since it only applies when we have a complex tree that shows a great variance in its forecasts.
We use the function BaggingRegressor and BaggingClassifier from the library sklearn.ensemble to implement the bagging algorithm.
Building a Decision Tree in Python
Let's remember the main steps to build a decision tree in Python:
- Retrieve and sanitize market data for a financial instrument.
- Introduce the Predictor variables (i.e. Technical indicators, Sentiment indicators, Breadth indicators, etc.)
- Setup the Target variable or the desired output.
- Split data between training and test data.
- Generate the decision tree training the model.
- Testing and analyzing the model.
You can refer this blog to learn more about how to build a decision tree, decision tree classifiers and much more.
Enhancing a Decision Tree with Bagging
We can reuse almost all steps as stated above about Decision Trees. Although when we reach step five, we must replace it with the Bagging algorithm.
Unlike bagging, which is a parallel ensemble technique, boosting works sequentially. It aims to convert weak learners to strong learners by sequentially improving the previous classification, thus minimizing the bias error as we move forward.
How does Boosting work?
Initially, Boosting begins similar to bagging, by randomly choosing datasets from the training dataset. It creates a classification model using these features and tests the model on the existing ‘training dataset’.
Some of the data points from the training dataset are correctly classified. Now, for building the next random dataset, the instances or data points which were wrongly classified in the previous dataset will be given higher priority, which simply means that these instances or data points will have a higher likelihood of being selected in the next dataset.
In this way, boosting sequentially builds N random datasets using the data gained from the previously chosen instances.
Types of Boosting Algorithms
The Adaptive Boosting technique works iteratively to improve the classification happening at a certain stage. It uses decision stump.
What is Decision stump?
A decision stump is basically a one-level decision tree taking decision based on a single feature.
The decision stumps are used to classify the data points and are iteratively improved by increasing the priority of the misclassified data points. Different decision stumps can also be combined to create a better decision stump which will ensure that the data fits without any errors.
We can use the AdaBoostRegressor and AdaBoostClassifier from the library sklearn.ensemble to implement the AdaBoost algorithm.
Just like AdaBoost focuses on minimizing the misclassified points, gradient boosting focuses on minimizing the loss function within the model.
Once a loss function is defined for a particular model, gradient boosting is used to minimize the value of this function, thus minimizing the error while constructing another tree, by modifying the weights associated with the data points.
GradientBoostingRegressor and GradientBoostingClassifier can be used to implement this method in Python by using the library sklearn.ensemble.
XGBoost or Extreme Gradient Boosting model is an implementation of gradient boosting. It is a library which can be used for the implementation of the gradient boosting algorithm.
You can read about it in detail and learn how to implement it in R here.
ConclusionIn the fabulous world of ML, this enhancement method for our decision trees helps us to reduce the variance and increase the accuracy but it is necessary to know when to apply it because if our decision trees have other problems it will be more convenient to know and apply other serial or parallel enhancement methods.
You can learn more about the implementation of decision trees and ensemble methods in Python in this course.
Disclaimer: All data and information provided in this article are for informational purposes only. QuantInsti® makes no representations as to accuracy, completeness, currentness, suitability, or validity of any information in this article and will not be liable for any errors, omissions, or delays in this information or any losses, injuries, or damages arising from its display or use. All information is provided on an as-is basis.