# Gini Index For Decision Trees

Decision trees are often used while implementing machine learning algorithms. The hierarchical structure of a decision tree leads us to the final outcome by traversing through the nodes of the tree. Each node consists of an attribute or feature which is further split into more nodes as we move down the tree. But how do we decide:

• Which attribute/feature should be placed at the root node?
• Which features will act as internal nodes or leaf nodes?

To decide this, and how to split the tree, we use splitting measures like Gini Index, Information Gain, etc. In this blog, we will learn how the Gini Index can be used to split a decision tree.

We will go through the following topics in this blog:

Before starting with the Gini Index, let us first understand what splitting is and what are the measures used to perform it.

## What are Splitting Measures?

With more than one attribute taking part in the decision-making process, it is necessary to decide the relevance and importance of each of the attributes. Thus placing the most relevant at the root node and further traversing down by splitting the nodes.

As we move further down the tree, the level of impurity or uncertainty decreases, thus leading to a better classification or best split at every node. To decide the same, splitting measures such as Information Gain, Gini Index, etc. are used.

## What is Information Gain?

Information Gain is used to determine which feature/attribute gives us the maximum information about a class.

• Information Gain is based on the concept of entropy, which is the degree of uncertainty, impurity or disorder.
• Information Gain aims to reduce the level of entropy starting from the root node to the leave nodes.

### Formula for Entropy

$E(S)=\sum_{i=1}^{c}-p_{i}log_{2}p_{i}$

where, ‘p’, denotes the probability and E(S) denotes the entropy.

Entropy is not preferred due to the ‘log’ function as it increases the computational complexity.

## What is Gini Index?

Gini index or Gini impurity measures the degree or probability of a particular variable being wrongly classified when it is randomly chosen.

But what is actually meant by ‘impurity’?

If all the elements belong to a single class, then it can be called pure. The degree of Gini index varies between 0 and 1,
where,
0 denotes that all elements belong to a certain class or if there exists only one class, and
1 denotes that the elements are randomly distributed across various classes.

A Gini Index of 0.5 denotes equally distributed elements into some classes.

## Formula for Gini Index

$Gini=1-\sum_{i=1}^{n}(p_{i})^{2}$

where pi  is the probability of an object being classified to a particular class.

While building the decision tree, we would prefer choosing the attribute/feature with the least Gini index as the root node.

Let’s understand with a simple example of how the Gini Index works.

## Example of Gini Index

 Past Trend Open Interest Trading Volume Return Positive Low High Up Negative High Low Down Positive Low High Up Positive High High Up Negative Low High Down Positive Low Low Down Negative High High Down Negative Low High Down Positive Low Low Down Positive High High Up

Table: Gini Index example

## Calculating the Gini Index

### Calculating the Gini Index for Past Trend

P(Past Trend=Positive): 6/10

P(Past Trend=Negative): 4/10

• If (Past Trend = Positive & Return = Up), probability = 4/6
• If (Past Trend = Positive & Return = Down), probability = 2/6

Gini index = 1 - ((4/6)^2 + (2/6)^2) = 0.45

• If (Past Trend = Negative & Return = Up), probability = 0
• If (Past Trend = Negative & Return = Down), probability = 4/4

Gini index = 1 - ((0)^2 + (4/4)^2) = 0

• Weighted sum of the Gini Indices can be calculated as follows:

Gini Index for Past Trend = (6/10)0.45 + (4/10)0 = 0.27

### Calculation of Gini Index for Open Interest

P(Open Interest=High): 4/10

P(Open Interest=Low): 6/10

• If (Open Interest = High & Return = Up), probability = 2/4
• If (Open Interest = High & Return = Down), probability = 2/4

Gini index = 1 - ((2/4)^2 + (2/4)^2) = 0.5

• If (Open Interest = Low & Return = Up), probability = 2/6
• If (Open Interest = Low & Return = Down), probability = 4/6

Gini index = 1 - ((2/6)^2 + (4/6)^2) = 0.45

• Weighted sum of the Gini Indices can be calculated as follows:

Gini Index for Open Interest = (4/10)0.5 + (6/10)0.45 = 0.47

### Calculation of Gini Index for Trading Volume

• If (Trading Volume = High & Return = Up), probability = 4/7
• If (Trading Volume = High & Return = Down), probability = 3/7

Gini index = 1 - ((4/7)^2 + (3/7)^2) = 0.49

• If (Trading Volume = Low & Return = Up), probability = 0
• If (Trading Volume = Low & Return = Down), probability = 3/3

Gini index = 1 - ((0)^2 + (1)^2) = 0

• Weighted sum of the Gini Indices can be calculated as follows:

Gini Index for Trading Volume = (7/10)0.49 + (3/10)0 = 0.34

### Gini Index attributes or features

 Attributes/Features Gini Index Past Trend 0.27 Open Interest 0.47 Trading Volume 0.34

Table 1: Gini Index attributes or features

From the above table, we observe that ‘Past Trend’ has the lowest Gini Index and hence it will be chosen as the root node for how decision tree works.

We will repeat the same procedure to determine the sub-nodes or branches of the decision tree.

We will calculate the Gini Index for the ‘Positive’ branch of Past Trend as follows:

 Past Trend Open Interest Trading Volume Return Positive Low High Up Positive Low High Up Positive High High Up Positive Low Low Down Positive Low Low Down Positive High High Up

Table: Gini Index calculation for the Positive branch of Past Trend

### Calculation of Gini Index of Open Interest for Positive Past Trend

P(Open Interest=High): 2/6

P(Open Interest=Low): 4/6

• If (Open Interest = High & Return = Up), probability = 2/2
• If (Open Interest = High & Return = Down), probability = 0

Gini index = 1 - (sq(2/2) + sq(0)) = 0

• If (Open Interest = Low & Return = Up), probability = 2/4
• If (Open Interest = Low & Return = Down), probability = 2/4

Gini index = 1 - (sq(0) + sq(2/4)) = 0.50

• Weighted sum of the Gini Indices can be calculated as follows:

Gini Index for Open Interest = (2/6)0 + (4/6)0.50 = 0.33

### Calculation of Gini Index for Trading Volume

• If (Trading Volume = High & Return = Up), probability = 4/4
• If (Trading Volume = High & Return = Down), probability = 0

Gini index = 1 - (sq(4/4) + sq(0)) = 0

• If (Trading Volume = Low & Return = Up), probability = 0
• If (Trading Volume = Low & Return = Down), probability = 2/2

Gini index = 1 - (sq(0) + sq(2/2)) = 0

• Weighted sum of the Gini Indices can be calculated as follows:

Gini Index for Trading Volume = (4/6)0 + (2/6)0 = 0

### Gini Index attributes or features

 Attributes/Features Gini Index Open Interest 0.33 Trading Volume 0

Table 2: Gini Index attributes or features

We will split the node further using the ‘Trading Volume’ feature, as it has the minimum Gini index.

Learn how to make a decision tree to predict the markets and find trading opportunities using AI techniques with our Quantra course.

## Conclusion

Gini Index, unlike information gain, isn’t computationally intensive as it doesn’t involve the logarithm function used to calculate entropy in information gain. This is why Gini Index is preferred over Information gain.

You can learn more about different splitting measures including Gini Index, information gain, etc. in this course on Decision Trees offered by Dr. Ernest Chan, that teaches to predict markets and find trading opportunities using AI techniques.

If you want to learn various aspects of Algo trading and automated trading systems, then check out the Executive Programme in Algorithmic Trading (EPAT®) which equips you with the required skill sets to build a promising career in algorithmic trading.

Disclaimer: All data and information provided in this article are for informational purposes only. QuantInsti® makes no representations as to accuracy, completeness, currentness, suitability, or validity of any information in this article and will not be liable for any errors, omissions, or delays in this information or any losses, injuries, or damages arising from its display or use. All information is provided on an as-is basis.