Difference between random forest and Gradient boosting Algo.

Difference between random forest and Gradient boosting Algo.

Evolution of Machine learning from Random forest to Gradient Boosting method

Let’s talk about Random forest first. Random forest is an ensemble learning method for classification and regression mostly. It works in two parts. The first step involves Bootstrapping technique for training and testing and second part involves decision tree for the prediction purpose. Now like every other predictive modelling technique we have the goal to minimize the generalization of error. To decrease that we make the balance between the bias and variance called as Bias-variance tradeoff.

What exactly is Bias-variance tradeoff?

1.      Bias is the error for the wrongful assumptions we make building the learning algorithm. It is the primary reason for the under fitting the model. When the bias becomes higher then there is huge gap in the relationship between the regressors and the response variable hence under fit model. 

2.      Variance is the error due to fluctuations in the training set. When we have high variance then our model is going to over fit.


To have the lowest generalization of error we need to find the best tradeoff of bias and variance.


If our decision tree are shallow then we have high bias and low variance and if our decision tree is too deep then it has low bias but high variance.

The idea of bagging in random forest is very important. Bagging means Bootstrap aggregation. Bootstrap means generating random samples from the dataset with the replacement. The main reason of bagging is to reduce the variance of the model class. So it’s obvious that if we are using bagging then we are basically going for deep trees as they have the low variance.  But the bias got increased. So we are not taking care of the bias.

Now to take of minimizing the bias we incorporate the idea of Boosting.

What is boosting?

1.      Boosting itself nullifies the overfitting issue and it takes care of the minimizing the bias.

It helps to find a predictor which is a weighted average of all the model used. It manipulates the training set to work on the area where we find high errors. It adds new trees to the original trees and helps to achieve the maximum accuracy. There are following steps we can follow:

1.      Assign a response variable which is the weighted average of all the models.

2.      Training of first model on the train set and use the response variable for just this model.

3.      For Gradient Boosting, redefine the supervised response variable with the alignment of some kind of residual between the ground truth and the overall response variable.

4.      Train the new modified data as training set and use the updated response variable as the predictor.

5.      Repeat the above 3rd and 4th steps.

Basically boosting used the simple technique of weighted majority vote for classification from all model classifications.

Now let’s come to the differences between the gradient boosting and Random forest.

1.      Gradient boosting uses regression trees for prediction purpose where a random forest use decision tree.

2.      The boosting strategy for training takes care the minimization of bias which the random forest lacks.

3.      The random forest is easy to parallelize but boosted trees are hard to do.

  1. Random forests overfit a sample of the training data and then reduces the overfit by simple averaging the predictors. But GBM repeatedly train trees or the residuals of the previous predictors.
  2. In the practice random forest is easy to use . we can blindly apply RF and can get decent performance with a little chance of overfit but without cross validation GBM is useless. GBM need much care to setup. As in GM we can tune the hyperparameters like no of trees, depth, learning rate so the prediction and performance is better than the Random forest.
  3. One last advantage of GBM is  about modeling, because boosted trees are derived by optimizing an objective function, basically it can be used to solve almost all objective we can write gradient out. This including things like ranking, poission regression, which RF is harder to achieve.

If I miss anything please provide feedback. This is my understanding. Please correct me if I am wrong.

I'm not sure I agree with the statement (or maybe I misunderstand it how you've phrased it): "Boosting itself nullifies the overfitting issue and it takes care of the minimizing the bias."  My understanding is that boosting (without e.g., regularization) can easily lead to overfitting of the training data, especially when large ensembles of trees are used: you keep refitting trees to the residuals in the training data until they're practically zero, but then the ensemble doesn't generalize well to new data. Regularization, data subsampling, and hyperparameter tuning (e.g., maximum tree depth), as well as stopping rules for model training can reduce overfitting. But boosting itself does not nullify the problem of overfitting; boosted trees are more prone to overfitting than random forests (in my understanding).

"If our decision tree is shallow then we have high bias and low variance and if our decision tree is too deep then it has low bias but high variance." Here you are saying deep trees then it will have high variance. But in below statement, you are saying the exact opposite. "So it’s obvious that if we are using bagging then we are basically going for deep trees as they have the low variance" I think deep trees means you are making more strict rules and that's why it will have High Variance.

To view or add a comment, sign in

More articles by Tirthankar Goon

Others also viewed

Explore content categories