Dimension Reduction:PCA

.title[
# Dimension Reduction:</br>PCA
]
.author[
### Anastasios Panagiotelis
]
.institute[
### University of Sydney
]

---

# Outline

- What is PCA?
--

- Application of PCA
--

- Algebraic understanding
--

- Geometric understanding
--

- Latent factor model understanding

---

# Principal Components Analysis

---

# Explaining Variance

- Let there be `$n$` observations of `$p$` variables; `$x_{ij}$` denotes observation `$i$` and variable `$j$`.
--

- Find some linear combination of variables that has maximal variance.
--

- Find `$w_1,w_2,\dots,w_p$` such that

`$$y_i=w_1x_{i1}+w_2x_{i2}+\dots w_px_{ip}$$`
has the biggest possible variance.
--

- This is the first principal component (PC).

---

# More PCs

- After finding the first principal component can look for a linear combination that
--

+ Has maximum variance
  + Is uncorrelated with the first PC
--

- This is called the second principal component
--

- This continues until there are as many PCs as variables.

---

# No cheating...

- Arbitrarily big weights
--
`$\rightarrow$` arbitrarily big variance.
--

+ Constrain `$\sum w^2_j=1$`
--

- Sensitive to units of measurement.
--

+ Center all variables by subtracting the mean.
  + Standardise all variables to have unit variance.

---
class: center, middle, inverse

#An application

---

# Implementation

R Code to implement PCA for World Bank Data
--

```r
library(tidyverse)
library(broom)
wb<-read_csv('../data/WorldBankClean.csv')
wb%>%
  select_if(.,is.numeric)%>% #Use numeric data
  scale()%>% #Standardise
  prcomp()->pca #Compute PCs
wbPC<-augment(pca,wb) #Add PCs to dataframe
```

---

# Explaining variance

- The variance of the first PC is 28.81.
--
  
  + This represents 44.32% of the total variance of the data.
--

- The variance of the second PC is 7.88.
--
  
  + This represents 12.12% of the total variance of the data.
--

- Together the first 5 PCs represent 77.48% of the total variance of the data.

---

# Scree plot

---

# Plot

<div id="htmlwidget-ed832b205bd08ad227ef" style="width:100%;height:100%;" class="widgetframe html-widget"></div>
<script type="application/json" data-for="htmlwidget-ed832b205bd08ad227ef">{"x":{"url":"02PCA_files/figure-html//widgets/widget_unnamed-chunk-4.html","options":{"xdomain":"*","allowfullscreen":false,"lazyload":false}},"evals":[],"jsHooks":[]}</script>

---

# Uncovering Structure

- Countries towards the right tend to be more economically developed.
--

- Countries towards the bottom tend to be larger in population.
--

- Countries that are similar to one another are closer together on the plot.
--

- A small number of PCs explains a large proportion of variance.

---
class: middle, center, inverse

# PCA: The Algebra

---

# PCA as optimisation

- LC given by `$\by=\bX\bw$`
--

- Variance of LC: `$\frac{1}{n-1}\sum_{i=1}^n y^2_i=\frac{1}{n-1}\by'\by$`
--

- Optimisation problem is
`$$\underset{\bw}{\max}\,\frac{1}{n-1}\bw'\bX'\bX\bw$$`

subject to `$\bw'\bw=1$`
--

- Substitute `$\bS=\frac{1}{n-1}\bX'\bX$`
---

# Solution

- Lagrangian is

`$$\calL=\bw'\bS\bw-\lambda(\bw'\bw-1)$$`
--

- A first order condition is

`$$\frac{\partial\calL}{\partial{\bw}}=2\bS\bw-2\lambda\bw$$`
--

- Need to find `$\mathbf{w}$` to satisfy

`$$\bS\bw=\lambda\bw$$`
---

# Eigenvalue Decomposition

- Solutions are given by the eigenvalue decomposition.
--

- There are multiple solutions. The eigenvector corresponding to the largest eigenvalue gives the weights of the first principal component.
--

- The eigenvector corresponding the the second largest eigenvalue gives the weights of the second principal component.
--

- And so on...

---

# Data compression

- When `$\lambda_j$` / `$\bw_j$` are eigenvalues/eigenvectors

`$$\bS=\sum_{j=1}^p \lambda_j\bw_j\bw_j'$$`
- This can be approximated by

`$$\bS\approx\sum_{j=1}^{\color{blue}{m}} \lambda_j\bw_j\bw_j'$$`
---

# PCA: The geometry

---

# Rotations

- For symmetric p.s.d matrices, the matrix of eigenvectors `$\bW$` is a rotation matrix
--
  
  + Columns/rows are orthogonal
  + Columns/rows have unit length
--

- Multiplying a vector by rotation matrix literally rotates that vector.

---

# Rotation is PCA

- Principal components given by `$\bY=\bX\bW$`
--

- Each observation (row of `$\bX$`) is rotated to new components
--

- This is best seen with a simple example

---

# A simple case

IT.NET.USER.ZS = No. people using internet
SH.ANM.NPRG.ZS = Prev. anaemia non-preg.

---

# Components

---

# Animation

---

# Or as new coordinates

---

# Or as new coordinates

First PC projects onto orange line, second PC on to blue line.

---

# PCA and Factor Models

- Suppose the data are generated from the following statistical model

`$$\bx_i=\bA\by_i+\boldsymbol{\xi}_i$$`
--

- where
  + `$\bx_i$` is a `$p\times 1$` data vector, 
  + `$\by_i$` is a `$m\times 1$` latent factor,
  + `$\bA$` are factor loadings,
  + `$\boldsymbol{\xi}_i$` is a `$p\times 1$` error vector.
--

- The `$\by_i$` can be estimated using PCs

---

# Summary

- PCA can be thought of as:
--

+ Compressing data with matrix decomposition.
  + Rotating the data.
  + Constructing new coordinates.
  + Projecting onto a low-dimensional hyper-plane.
  + A technique to estimate latent factors. 
--

- All of these intuitions are useful.

---

#Questions?