Bagging Vs Boosting



Bagging

  • Bagging is used when objective is to reduce variance of a decision tree.
  • Random Forest is an expansion over bagging. It also makes the random selection of features rather than using all features to develop trees.
  • The following steps which are taken to implement a Random forest the concept is to create a few subsets of data from the training sample, which is chosen randomly with replacement.
    • Consider X observations Y features in the training data set. First, a model from the training data set is taken randomly with substitution.
    • Tree is developed to the largest.
    • Given steps are repeated, and prediction is given, which is based on the collection of predictions from n number of trees.

Advantages of using Random Forest

  • It manages a higher dimension Data Set.
  • It manages missing quantities.

Disadvantages of using Random Forest

  • The last prediction depends on the mean predictions from subset trees, it won't give precise value for the regression model.

Boosting

  • Boosting is another ensemble procedure to make a collection of predictors.
  • Gradient Boosting is an expansion of the boosting procedure.Gradient Boosting = Gradient Descent + Boosting
  • It utilizes a gradient descent algorithm that can optimize any differentiable loss function.

Advantages of using Gradient Boosting

  • Works well with interactions
  • Supports different loss functions

Disadvantages of using a Gradient Boosting

  • Requires cautious tuning of different hyper-parameters.

Difference between Bagging and Boosting

Bagging Boosting
Various training data subsets are randomly drawn with replacement
from the whole training dataset.
Each new subset contains the components that were misclassified by previous models.
Attempts to tackle the over-fitting issue. Tries to reduce bias.
If the classifier is unstable, then we need to apply bagging. If the classifier is steady and straightforward, then we need to apply boosting.
Every model receives an equal weight. Models are weighted by their performance.
Objective to decrease variance, not bias. Objective to decrease bias, not variance.
It is the easiest way of connecting predictions that belong to the same type. It is a way of connecting predictions that belong to the different types.
Every model is constructed independently. New models are affected by the performance of the previously developed model.


Related Searches to Bagging Vs Boosting