Classification and regression trees

Classification is a type of supervised learning. It specifies the class to which data elements belong to and is best used when the output has finite and discrete values.

Regression is also a type of supervised learning. The difference between the two tasks is the fact that the dependent attribute is numerical for regression and categorical for classification.

In both cases, we can generate a feature tree. While in the classification problem, our main motto is to reduce the impurity of the nodes and for the regression problem, our motto is to reduce variance.

Binary Classification impurity calculation methods:

Entropy: (-p)log2(p) - (1-p)log2(1-p)
Gini Index: 2(p)(1-p)
Minority Class: minimum(p, 1-p)
√Gini Index

Consider the following example for growing a feature tree:

No alt text provided for this image
No alt text provided for this image

We have five positives and five negatives in our examples that depicts if the given attributes belong to a 'Dolphin' category or not. Now, consider each attribute and mark the number of positives and negatives against it.

Length: {[3,4,5]} - {[2+, 0-], [1+, 3-], [2+, 2-]}

Gills: {[yes, no]} - {[0+, 4-], [5+, 1-]}

Beak: {[yes, no]} - {[5+, 3-], [0+, 2-]}

Teeth: {[few, many]} - {[3+, 4-], [2+, 1-]}

Total Impurity = weighted average of individual impurities = Summation (weight * Entropy) = Summation (weight * Gini Index)

Note: In case of length = 3, it is forming a pure node that is it generates only positive class and no negative class with this rule. So impurity if automatically zero. And in case of length = 5, it is forming a balanced node of 50 percent negatives and 50 percent positives so the impurity in this case is maximum = 1 (P.S: Go ahead and calculate the values using one of the impurity criterion you would obtain the same values)

Total Entropy(length[3,4,5]) = (2/10) (0) + (4/10) [(-1/4)log2(1/4) - (3/4)log2(3/4)] + (4/10) (1) = 0.73

Total Entropy(Gills[yes,no]) = (4/10) (0) + (6/10) [(-5/6)log2(5/6) - (1/6)log2(5/6)] = 0.39

Total Entropy(Beak[yes,no]) = (8/10) [(-5/8)log2(5/8) - (3/8)log2(3/8)] + (2/10)(0) = 0.76

Total Entropy(Teeth[many,few]) = (7/10) [(-3/7)log2(3/7) - (4/7)log2(4/7)] + (3/10) [(-2/3)log2(2/3) - (1/3)log2(1/3)] = 0.97

Least impurity is observed in Gills, so we select Gills to split on.

In a similar manner, we can find the Gini Index criterion as well and values we would obtain is as follows:

Total Gini(Length) = (2/10)(0) + (4/10)[(2)(1/4)(3/4)] + (4/10)[(2)(2/4)(2/4)] = 0.35

Total Gini(Gills) = (4/10)(0) + (6/10)[(2)(5/6)(1/6)] = 0.17

Total Gini(Beak) = (8/10)[(2)(5/8)(3/8)] + (2/10)(0) = 0.38

Total Gini(Teeth) = (7/10)[(2)(3/7)(4/7)] + (3/10)[(2)(2/3)(1/3)] = 0.48

As seen, using Gini Index as impurity criterion also we get 'Gills' as an excellent feature to split on. So 'Gills' is the first split of the tree. To deduce the next split of the tree find the positives and negatives with respect to Gills = yes

Length: {[4,5]} - {[0+, 2-], [0+, 2-]}

Beak: {[yes, no]} - {[0+, 2-], [0+, 2-]}

Teeth: {[few, many]} - {[0+, 0-], [0+, 4-]}

Now, as observed for Gills = yes, we have received all pure nodes that is with impurity = 0, which depicts that Gills = yes itself is the leaf node of the tree and no further rules can be added to this decision rule.

So, now we find the positives and negatives with respect to Gills = no. And calculate impurity in the same manner. Please try this and come up with solutions. The final tree would look something like this:

No alt text provided for this image

This is the internal working of creation of a feature tree. In the next article, I will try to explain the reduction of variance and how to find the best split in case of regression trees.


To view or add a comment, sign in

More articles by Jayalakshmi Iyer

Others also viewed

Explore content categories