Decision Tree

Decision Tree

What is a Decision Tree?

A decision tree is a predictive model that uses a flowchart-like structure to make decisions based on input data. It divides data into branches and assigns outcomes to leaf nodes. Decision trees are used for classification and regression tasks, providing easy-to-understand models.

A decision tree is a hierarchical model used in decision support that depicts decisions and their potential outcomes, incorporating chance events, resource expenses, and utility. This algorithmic model utilizes conditional control statements and is non-parametric, supervised learning, useful for both classification and regression tasks. The tree structure is comprised of a root node, branches, internal nodes, and leaf nodes, forming a hierarchical, tree-like structure.

Decision Tree Terminologies:-

  • Root Nodes – It is the node present at the beginning of a decision tree from this node the population starts dividing according to various features.
  • Decision Nodes – the nodes we get after splitting the root nodes are called Decision Node
  • Leaf Nodes – the nodes where further splitting is not possible are called leaf nodes or terminal nodes
  • Sub-tree – just like a small portion of a graph is called sub-graph similarly a sub-section of this decision tree is called sub-tree.
  • Pruning – is nothing but cutting down some nodes to stop overfitting.

Example of Decision Tree:-

Suppose there is a candidate who has a job offer and wants to decide whether he should accept the offer or Not. So, to solve this problem, the decision tree starts with the root node (Salary attribute by ASM). The root node splits further into the next decision node (distance from the office) and one leaf node based on the corresponding labels. The next decision node further gets split into one decision node (Cab facility) and one leaf node. Finally, the decision node splits into two leaf nodes (Accepted offers and Declined offers).

some important terms related to decision trees.

Entropy

In machine learning, entropy is a measure of the randomness in the information being processed. The higher the entropy, the harder it is to draw any conclusions from that information.


Information Gain

Information gain can be defined as the amount of information gained about a random variable or signal from observing another random variable.It can be considered as the difference between the entropy of parent node and weighted average entropy of child nodes.

Gini Index

Gini index is a measure of how often a randomly chosen element from the set would be incorrectly labeled if it was randomly labeled according to the distribution of labels in the subset.

Gini index is lower bound by 0, with 0 occurring if the data set contains only one class.


To view or add a comment, sign in

More articles by Soham Ambre

  • Linear Regression

    Linear regression is a powerful statistical technique used to model the relationship between a dependent variable and…

Explore content categories