Principal Component Analysis

Pavini Jain
2 min readJul 19, 2021

--

Principal Coponent Analysis, or PCA, is a dimensionality-reduction method that is often used to reduce the dimensionality of large data sets, by transforming a large set of variables into a smaller one that still contains most of the information in the large set.

Table of Content

  1. Why do we need it?
  2. What it is?
  3. Steps to perform it
  4. PCA with Python

Why do we need it?

Machine Learning works wonders when you have more data. But it becomes difficult to work with large data as to increases the complexity. It leads to a curse of dimensionality. More the features more the dimensions.

What it is?

PCA is a dimensionality reduction technique that enables you to identify correlation and patterns in a dataset to be transformed into a dataset of significantly lower dimensions without any loss of important data.

If two predictive features in a dataset are highly correlated, the output is highly biased on those two features.

Remove data that is nonessential:

  • Removes inconsistencies
  • Redundant data
  • Highly-correlated features

But make sure important data should not be removed

Steps to perform it

  1. Standardization of the data
  2. Compute the covariance of the matrix
  3. Calculating the eigenvectors and eigenvalues
  4. Computing the principal components
  5. Reducing the dimensions of the data

PCA with Python

--

--

Pavini Jain

Student at Jaypee Institute of Information Technology