Classification and regression trees

Jayalakshmi Iyer

Published Nov 15, 2020

Classification is a type of supervised learning. It specifies the class to which data elements belong to and is best used when the output has finite and discrete values.

Regression is also a type of supervised learning. The difference between the two tasks is the fact that the dependent attribute is numerical for regression and categorical for classification.

In both cases, we can generate a feature tree. While in the classification problem, our main motto is to reduce the impurity of the nodes and for the regression problem, our motto is to reduce variance.

Binary Classification impurity calculation methods:

Entropy: (-p)log2(p) - (1-p)log2(1-p)

Gini Index: 2(p)(1-p)

Minority Class: minimum(p, 1-p)

√Gini Index

Consider the following example for growing a feature tree:

We have five positives and five negatives in our examples that depicts if the given attributes belong to a 'Dolphin' category or not. Now, consider each attribute and mark the number of positives and negatives against it.

Length: {[3,4,5]} - {[2+, 0-], [1+, 3-], [2+, 2-]}

Gills: {[yes, no]} - {[0+, 4-], [5+, 1-]}

Beak: {[yes, no]} - {[5+, 3-], [0+, 2-]}

Teeth: {[few, many]} - {[3+, 4-], [2+, 1-]}

Total Impurity = weighted average of individual impurities = Summation (weight * Entropy) = Summation (weight * Gini Index)

Note: In case of length = 3, it is forming a pure node that is it generates only positive class and no negative class with this rule. So impurity if automatically zero. And in case of length = 5, it is forming a balanced node of 50 percent negatives and 50 percent positives so the impurity in this case is maximum = 1 (P.S: Go ahead and calculate the values using one of the impurity criterion you would obtain the same values)

Total Entropy(length[3,4,5]) = (2/10) (0) + (4/10) [(-1/4)log2(1/4) - (3/4)log2(3/4)] + (4/10) (1) = 0.73

Total Entropy(Gills[yes,no]) = (4/10) (0) + (6/10) [(-5/6)log2(5/6) - (1/6)log2(5/6)] = 0.39

Total Entropy(Beak[yes,no]) = (8/10) [(-5/8)log2(5/8) - (3/8)log2(3/8)] + (2/10)(0) = 0.76

Total Entropy(Teeth[many,few]) = (7/10) [(-3/7)log2(3/7) - (4/7)log2(4/7)] + (3/10) [(-2/3)log2(2/3) - (1/3)log2(1/3)] = 0.97

Least impurity is observed in Gills, so we select Gills to split on.

In a similar manner, we can find the Gini Index criterion as well and values we would obtain is as follows:

Total Gini(Length) = (2/10)(0) + (4/10)[(2)(1/4)(3/4)] + (4/10)[(2)(2/4)(2/4)] = 0.35

Total Gini(Gills) = (4/10)(0) + (6/10)[(2)(5/6)(1/6)] = 0.17

Total Gini(Beak) = (8/10)[(2)(5/8)(3/8)] + (2/10)(0) = 0.38

Total Gini(Teeth) = (7/10)[(2)(3/7)(4/7)] + (3/10)[(2)(2/3)(1/3)] = 0.48

As seen, using Gini Index as impurity criterion also we get 'Gills' as an excellent feature to split on. So 'Gills' is the first split of the tree. To deduce the next split of the tree find the positives and negatives with respect to Gills = yes

Length: {[4,5]} - {[0+, 2-], [0+, 2-]}

Beak: {[yes, no]} - {[0+, 2-], [0+, 2-]}

Teeth: {[few, many]} - {[0+, 0-], [0+, 4-]}

Now, as observed for Gills = yes, we have received all pure nodes that is with impurity = 0, which depicts that Gills = yes itself is the leaf node of the tree and no further rules can be added to this decision rule.

So, now we find the positives and negatives with respect to Gills = no. And calculate impurity in the same manner. Please try this and come up with solutions. The final tree would look something like this:

This is the internal working of creation of a feature tree. In the next article, I will try to explain the reduction of variance and how to find the best split in case of regression trees.

Classification and regression trees

Jayalakshmi Iyer

More articles by Jayalakshmi Iyer

Others also viewed

Supervised Learning: Regression and Classification

Understanding the Foundations of Machine Learning: Data, Model, Objective Function, and Gradient Descent Optimization

Applying the supervised machine learning algorithm logistic regression to Haberman’s survival data set

Ridge and Lasso Regression

Different Types of Machine Learning: Part 2 – Supervised Learning: Linear Regression, Polynomial Regression, and Logistic Regression

What Is Logistic Regression in Machine Learning?

Regression in machine learning: Proper classification in this Machine learning for beginners Guide

Linear Regression

Explore content categories