Bagging and Boosting

3 min readJul 23, 2021

Table of Content

Intuition
What is Ensemble Learning?
What is Bagging?
What is Boosting?
How Boosting works?
Types of Boosting
AdaBoost
Gradient Boosting
XGBoost

Intuition

Let’s think of a scenario where a machine has to recognize whether an animal is a cat or dog. There are different ways it can recognize:

If it has pointed ears then it is a cat.
If it has cat-shaped eyes then it is a cat.
If it has bigger limbs then it is a dog.
If it has sharpened claws then it is a cat.
If it has a wider mouth structure then it is a dog.

All these are weak learners as applying individual rules won’t work. We have to apply all the rules and then predict the outcome. So, here comes the concept of Bagging and Boosting.

Either we can go for majority rule or weighted average.

What is Ensemble Learning?

It is a method that is used to enhance the performance of the ML model by combining several learners. When compared to a single model, this type of learning builds models with improvised efficiency and accuracy.

What is Bagging?

Weak Learners are produced parallelly during the training phase. Performance increases by parallelly training many weak learners on a bootstrapped dataset.

Eg- Random Forest

Dividing the dataset into a bootstrapped datasets and running a weak learner on each of these datasets parallelly doing all of this.

Boosting is sequentially doing this along with updating the weights depending upon the misclassified sample.

What is Boosting?

It is a process that uses a set of ML algorithms to combine weak learners to form strong learners to increase the model's accuracy.

How Boosting works?

Generate multiple weak learners and combine their predictions to form one strong rule. These weak learners are generated by applying base machine learning algorithms on different distributions of the dataset. These base learning algorithms are decision trees by default.

Base Learners generate weak rules for each iteration. After multiple iterations, weak learners combine to form strong learners.

Steps to perform boosting:

Note: Decision Stump- It is a single-level decision tree that tries to classify the data points.

The base algorithm reads the data and assigns equal weight to each sample observation.
False predictions are assigned to the next base learners with a higher weightage on these incorrect predictions.
Repeat step 2 until the algorithm can correctly classify the output.

Types of Boosting

Adaptive Boosting

Combine several weak learners into a single strong learner. It is used for both regression and classification problems. Though, it is used in classification problems more commonly.

Steps:

Assigning equal weightage to all the data points for the first decision stump.
Misclassified observations are assigned more weightage.
A new decision stump is drawn by considering the observations with higher weightage as more significant.
Again, if any observations are misclassified, they are given more weightage.
The process continues until all the observations fall into the right class.

Gradient Boosting

Base learners are generated so that the present base learner is more effective than the previous one.

Weights are added for misclassified outcomes.

Optimize the loss function of the previous learner.

It has three main components:

Loss function — Needs to be optimized.
Weak Learner — Used for computing predictions and forming strong learners.
Additive model — Regularize the loss function.

XG Boost (Extreme Boosting)

The main aim of it is to increase computational speed and model efficiency.

It supports parallelization by creating a decision tree parallelly(no sequential modeling).

Implements distributive computing for evaluating any large or complex models.

Uses out-of-core computing to analyze huge and varied datasets.

Implements cache optimizations to make the best use of your hardware and your resource overall.