Random Forest

2 min readDec 14, 2021

Intuition

▹Create a bootstrapped dataset:

✓ Randomly select from the dataset.

✓ Can pick a sample more than once

▹Create a Decision Tree using bootstrapped dataset but only use a random subset of variables (or column) at each step.

✓ Eg- Consider only 2 variables at each step.

✓ Build the decision tree.

In all the decision trees, see which option got more votes. That will be the answer. Bootstrapping the data plus using the aggregate to make a decision is called Bagging.

2. How to check the accuracy:

Typically, 1/3rd of the dataset does not end up in the bootstrapped dataset. That 1/3rd dataset is called the Out-of-boot dataset.
Run the dataset through all the trees on out of boot dataset to see if it correctly classifies the dataset.
Calculate how many out of bag were correctly labeled.
Ultimately, we can measure how accurate our random forest is by the proportion of Out-of-bag samples that were correctly classified by the random forest. The proportion of Out-of-Bag samples that were incorrectly classified is the Out-of-Bag Error.

3. Change the number of variables used per step. Then repeat all the steps bunch of times and then choose the one that is most accurate.

Typically, we start by using the square of the number of variables and then try a few settings above and below that value.