A brief introduction of Random forest algorithm in Machine Learning

Random forest is a supervised classification algorithm. The algorithm generates the forest with a number of trees (with random datasets). The data is selected randomly from input space and create multiple trees in the forest. In random forest classifier, the higher the number of trees in the forest, accuracy results are high.

Random Forests algorithm is based on a family of the decision tree. A decision tree denotes a classification or regression model in the form of a tree and each node in the tree denotes a feature from the input, each branch a decision and each leaf at the end of a branch the corresponding output value.

Random forest features:

  • The random forest classifier can use for both classification and the regression problems.
  • Random forest classifier will handle the missing values as well.
  • When we have more trees in the forest, the random forest classifier gives an accurate solution.
  • It can be used for categorical values and numerical features also.
  • Algorithm is very stable because if a new dataset is introduced, it is not affected much since new data may impact one tree only.

One important feature of Random forest is that it will fit for almost all of the ML problems.

Some example where the random forest algorithm is used:

  • Banking, Stock Market, E-commerce websites

There are a few disadvantages of Random forest as well:

  • It is slow if we generate a large number of trees. 
  • It is complex and model is difficult to understand as compared with decision tree algorithm.
  • Python library "sklearn" provide methods and functions to apply Random Forest on input datasets.



To view or add a comment, sign in

More articles by Ritu Ranjan Routray

  • Introduction to Data Science

    Data science offers expressive information based on large amounts of data which is used to train machine learning…

  • Machine learning vs Deep learning

    Latest developments in AI (Artificial Intelligence) can seem irresistible, but it boils down to two notions: 1. Machine…

  • Naive Bayes Classifier for text classification problems

    It is a classification technique/method based on Bayes Theorem. It anticipates that the presence of one feature in a…

  • Does the machine learning algorithm require retraining?

    In order for a machine learning model to predict accurately, the data on which predictions is made must have a similar…

  • Data and Distributions

    Data: The concept of data distribution provides the basis for analytics and inferential statistics. We have 2 types of…

  • Inferential statistics & hypothesis testing

    Inferential statistics is used to predict about a population based on a sample of data taken from the population…

Explore content categories