Decision Tree

Decision tree is one of the most popular machine learning algorithms used all along.

Decision trees are used for both classification and regression problems

Why Decision trees?

We have couple of other algorithms there, so why do we have to choose Decision trees??

well, there might be many reasons but I believe a few which are

  1. Decision tress often mimic the human level thinking so its so simple to understand the data and make some good interpretations.
  2. Decision trees actually make you see the logic for the data to interpret(not like black box algorithms like SVM,NN,etc..)

For example : if we are classifying bank loan application for a customer, the decision tree may look like this


Here we can see the logic how it is making the decision.

It’s simple and clear.

So what is the decision tree?

may look like this

Here we can see the logic how it is making the decision.

It’s simple and clear.

So what is the decision tree??

A decision tree is a tree where each node represents a feature(attribute), each link(branch) represents a decision(rule) and each leaf represents an outcome(categorical or continues value).

The whole idea is to create a tree like this for the entire data and process a single outcome at every leaf(or minimize the error in every leaf).

There are couple of algorithms there to build a decision tree ,

  1. CART (Classification and Regression Trees) → uses Gini Index(Classification) as metric.
  2. ID3 (Iterative Dichotomiser 3) → uses Entropy function and Information gain as metrics.

Classification with using the ID3 algorithm.

Let’s just take a famous dataset in the machine learning world which is weather dataset(playing game Y or N based on weather condition).

We have four X values (outlook,temp,humidity and windy) being categorical and one y value (play Y or N) also being categorical.

so we need to learn the mapping (what machine learning always does) between X and y.

This is a binary classification problem, lets build the tree using the ID3 algorithm

To create a tree, we need to have a root node first and we know that nodes are features/attributes(outlook,temp,humidity and windy),

so which one do we need to pick first??

 determine the attribute that best classifies the training data; use this attribute at the root of the tree. Repeat this process at for each branch.

This means we are performing top-down, greedy search through the space of possible decision trees.

okay so how do we choose the best attribute?

use the attribute with the highest information gain in ID3

In order to define information gain precisely, we begin by defining a measure commonly used in information theory, called entropy that characterizes the (im)purity of an arbitrary collection of examples.”

Algorithm:

Generate decision tree. Generate a decision tree from the training tuples of data

partition D.

Input:

  • Data partition, D, which is a set of training tuples and their associated class labels;
  • attribute list, the set of candidate attributes;
  • Attribute selection method, a procedure to determine the splitting criterion that “best” partitions
  • the data tuples into individual classes. This criterion consists of a splitting attribute

and, possibly, either a split point or splitting subset.

Output: A decision tree.

Method:

(1) create a node N;

(2) if tuples in D are all of the same class, C then

(3) return N as a leaf node labeled with the class C;

(4) if attribute list is empty then

(5) return N as a leaf node labeled with the majority class in D; // majority voting

(6) apply Attribute selection method(D, attribute list) to find the “best” splitting criterion;

(7) label node N with splitting criterion;

(8) if splitting attribute is discrete-valued and

multiway splits allowed then // not restricted to binary trees

(9) attribute list attribute list 􀀀 splitting attribute; // remove splitting attribute

(10) for each outcome j of splitting criterion

// partition the tuples and grow subtrees for each partition

(11) let Dj be the set of data tuples in D satisfying outcome j; // a partition

(12) if Dj is empty then

(13) attach a leaf labeled with the majority class in D to node N;

(14) else attach the node returned by Generate decision tree(Dj, attribute list) to node N;

endfor

(15) return N;

To view or add a comment, sign in

More articles by Vivek Pawar

Others also viewed

Explore content categories