XGBoost Vs LightGBM

XGBoost Vs LightGBM

XGBOOST Algorithm:

A very popular and in-demand algorithm often referred to as the winning algorithm for various competitions on different platforms. XGBOOST stands for Extreme Gradient Boosting. This algorithm is an improved version of the Gradient Boosting Algorithm. The base algorithm is Gradient Boosting Decision Tree Algorithm. Its powerful predictive power and easy to implement approach has made it float throughout many machine learning notebooks. Some key points of the algorithm are as follows:

  1. It does not build the full tree structure but builds it greedily.
  2. As compared to LightGBM it splits level-wise rather than leaf-wise.
  3. In Gradient Boosting, negative gradients are taken into account to optimize the loss function but here Taylor’s expansion is taken into account.
  4. The regularization term penalizes from building complex tree models.

Some parameters which can be tuned to increase the performance are as follows:

General Parameters include the following:

  1. booster: It has 2 options — gbtree and gblinear.
  2. silent: If kept to 1 no running messages will be shown while the code is executing.
  3. nthread: Mainly used for parallel processing. The number of cores is specified here.

Booster Parameters include the following:

  1. eta: Makes model robust by shrinkage of weights at each step.
  2. max_depth: Should be set accordingly to avoid overfitting.
  3. max_leaf_nodes: If this parameter is defined then the model will ignore max_depth.
  4. gamma: Specifies the minimum loss reduction which is required to make a split.
  5. lambda: L2 regularization term on the weights.

Learning Task Parameters include the following:

1) objective: This will define the loss function which is to be used.

  • binary: logistic –logistic regression for binary classification, returns predicted probability (not the class) 
  • multi: softmax –multiclass classification using the softmax objective, returns predicted class (not the probabilities)

Light Gradient Boosting Machine:

LGBM is a quick, distributed, and high-performance gradient lifting framework which is based upon a popular machine learning algorithm – Decision Tree. It can be used in classification, regression, and many more machine learning tasks. This algorithm grows leaf wise and chooses the maximum delta value to grow. LightGBM uses histogram-based algorithms. The advantages of this are as follows:

  • Less Memory Usage
  • Reduction in Communication Cost for parallel learning
  • Reduction in Cost for calculating gain for each split in the decision tree.

So as LightGBM gets trained much faster but also it can lead to the case of overfitting sometimes. So, let us see what parameters can be tuned to get a better optimal model.

To get the best fit following parameters must be tuned:

  1. num_leaves: Since LightGBM grows leaf-wise this value must be less than 2^(max_depth) to avoid an overfitting scenario. 
  2. min_data_in_leaf: For large datasets, its value should be set in hundreds to thousands.
  3. max_depth: A key parameter whose value should be set accordingly to avoid overfitting.

For Achieving Better Accuracy following parameters must be tuned:

  1. More Training Data Added to the Model can increase accuracy. (can be also external unseen data)
  2. num_leaves: Increasing its value will increase accuracy as the splitting is taking leaf-wise but overfitting also may occur.
  3. max_bin: High value will have a major impact on accuracy but will eventually go to overfitting.


To view or add a comment, sign in

More articles by Ashik Kumar

Others also viewed

Explore content categories