Principal Component Analysis

2 min readJul 19, 2021

Principal Coponent Analysis, or PCA, is a dimensionality-reduction method that is often used to reduce the dimensionality of large data sets, by transforming a large set of variables into a smaller one that still contains most of the information in the large set.

Table of Content

Why do we need it?
What it is?
Steps to perform it
PCA with Python

Why do we need it?

Machine Learning works wonders when you have more data. But it becomes difficult to work with large data as to increases the complexity. It leads to a curse of dimensionality. More the features more the dimensions.

What it is?

PCA is a dimensionality reduction technique that enables you to identify correlation and patterns in a dataset to be transformed into a dataset of significantly lower dimensions without any loss of important data.

If two predictive features in a dataset are highly correlated, the output is highly biased on those two features.

Remove data that is nonessential:

Removes inconsistencies
Redundant data
Highly-correlated features

But make sure important data should not be removed

Steps to perform it

Standardization of the data
Compute the covariance of the matrix
Calculating the eigenvectors and eigenvalues
Computing the principal components
Reducing the dimensions of the data

Principal Component Analysis

Table of Content

Why do we need it?

What it is?

Steps to perform it

PCA with Python

Written by Pavini Jain