Dimension Reduction:MDS

.title[
# Dimension Reduction:</br>MDS
]
.author[
### Anastasios Panagiotelis
]
.institute[
### University of Sydney
]

---

# Outline

- Classical Multidimensional Scaling
--

+ Euclidean distance
  + Non-Euclidean distance
--

- Some applications
--

- Sammon Mapping

---

# Multidimensional Scaling (MDS)

---

#The idea

- Input points: `$\bx_i\in\bbR^p$` for `$i=1,\dots,n$` 
- Output points: `$\by_i\in\bbR^m$` for `$i=1,\dots,n$`
  + `$m<<p$`
--

- Denote distance between `$\bx_i$` and `$\bx_j$` as `$\delta_{ij}$`.
--

- Denote distance between `$\by_i$` and `$\by_j$` as `$d_{ij}$`.
--

- Want `$\delta_{ij}$` to be similar to `$d_{ij}$`.

---

# Strain

- Assume Euclidean distances 
--
for now!
--

- Objective is to minimise strain defined as:

`$$\sum\limits_{i=1}^{n-1}\sum\limits_{j=i+1}^n (\delta_{ij}^2-d_{ij}^2)$$`
--

- This is known as classical MDS
--

- How can we solve this?

---

# Solution

1. Construct an `$n\times n$` matrix of squared interpoint distances `$\bDelta^{(2)}=\left\{\delta^2_{ij}\right\}$`
--

2. Double center `$\bDelta^{(2)}$` by computing `$\bB=\bH'\bDelta^{(2)}\bH$` where `$\bH=\bI-(1/n)\biota\biota'$`
--

3. Find the eigenvalue decomposition of `$\bB=\bU\bLambda\bU'$`
--

4. The output coordinates are given by the first `$m$` columns of `$\bU\bLambda^{1/2}$`

---

# Implementation

```r
library(tidyverse)
wb<-read_csv('../data/WorldBankClean.csv')
wb%>%
  select_if(.,is.numeric)%>% #Use numeric data
  scale()%>% #Standardise
  dist()%>% #Compute distance matrix
  cmdscale()%>% #MDS
  as_tibble(.name_repair = 'universal')%>%
  cbind(wb)->wb_mds
```

---

# Plot

<div id="htmlwidget-b4eaff2a094c72df3db2" style="width:100%;height:100%;" class="widgetframe html-widget"></div>
<script type="application/json" data-for="htmlwidget-b4eaff2a094c72df3db2">{"x":{"url":"03MDS_files/figure-html//widgets/widget_unnamed-chunk-2.html","options":{"xdomain":"*","allowfullscreen":false,"lazyload":false}},"evals":[],"jsHooks":[]}</script>

---

# Look familiar??

- This is almost identical to Principal Components Analysis
--

- The axes have been flipped!
--

- PCA is invariant to reflections.
--
Why?
--

- MDS is also invariant to rotations.
--
Why?

---

# Why are they the same?

- Proof is a bit involved.
--

- The key idea to show that `$\bB$` is related to `$\bX\bX'$` while `$\bS$` is related to `$\bX'\bX$`
--

- `$\bX'\bX$` and `$\bX\bX'$` have the same non-zero eigenvalues.
--

- Geometrically, the result makes sense by thinking about the extreme case where the data lie on a `$m$`-dimensional plane.

---
class:inverse,middle,center

# Beyond Euclidean Distance

---

# Non-Euclidean Distance

- What if a non-Euclidean distance is used?
--

- In this case classical MDS does not minimise Strain as defined previously, but minimises `$tr(\bB-\bB^{*})$`.
--

- Here `$\bB^{*}$` is the doubly centered squared (Euclidean) distance matrix in the output space.
--

- Distances between output points faithfully represent distances between input points.
--

- Only use eigenvectors correponding to non-negative eigenvalues

---

# Implementation (L1)

```r
library(tidyverse)
wb<-read_csv('../data/WorldBankClean.csv')
wb%>%
  select_if(.,is.numeric)%>% #Use numeric data
  scale()%>% #Standardise
  dist(method = 'manhattan')%>% #Compute distance matrix
  cmdscale()%>% #MDS
  as_tibble(.name_repair = 'universal')%>%
  cbind(wb)->wb_mds_L1
```

---

# Plot (L1)

---

# Plot (L2)

---

# Why is this useful?

- We can have distances/dissimilarities between all sorts of objects
--
  
  + Time series
  + Functions
  + Probability distributions
  + Strings/ Texts

---

# A toy example

- Consider the word for "mother" in different languages
--

- The Levenshtein distance can be computed between strings
  + Counts number of insertions, deletions and substitutions to convert one string to another
--

- Pairwise Levenshtein distances computed and then classical multidimensional scaling applied.

---

#Languages

<div id="htmlwidget-693b1ad86111113032e5" style="width:100%;height:504px;" class="widgetframe html-widget"></div>
<script type="application/json" data-for="htmlwidget-693b1ad86111113032e5">{"x":{"url":"03MDS_files/figure-html//widgets/widget_unnamed-chunk-6.html","options":{"xdomain":"*","allowfullscreen":false,"lazyload":false}},"evals":[],"jsHooks":[]}</script>

---

# Distance between pdfs

- Consider the electricity smart meter data.
--

- The distance between pdfs can be measured using a Jensen Shannon distance.
--

- This is the square root of the average of the Kullback Leibler divergence from `$P$` to `$Q$` and from `$Q$` to `$P$`.
--

- For a log normal distribution this is available in closed form.
--

- Consider one household so that each observation corresponds to a time of week.

---

# Distance between pdfs

<div id="htmlwidget-61d52951d0282366fe17" style="width:100%;height:504px;" class="widgetframe html-widget"></div>
<script type="application/json" data-for="htmlwidget-61d52951d0282366fe17">{"x":{"url":"03MDS_files/figure-html//widgets/widget_unnamed-chunk-7.html","options":{"xdomain":"*","allowfullscreen":false,"lazyload":false}},"evals":[],"jsHooks":[]}</script>

---
class: inverse, middle, center

# MDS: Beyond Linearity

---

# Beyond Classical MDS

- Classical MDS is designed to minimise Strain.
--

- An alternative  objective function called Stress can be minimised instead
--

`$$\mbox{Stress}=\sum\limits_{i=1}^{n-1}\sum\limits_{j>i}\frac{(\delta_{ij}-d_{ij})^2}{\delta_{ij}}$$`
--

- The difference between `$\delta_{ij}$` and `$d_{ij}$` acts like an error.
--

- The `$\delta_{ij}$` on the denominator acts as a weight

---

#Weighting

- For large `$\delta$` observations are far away in the input space.
--

- For these pairs errors are more easily tolerated.
--

- For small `$\delta$` observations are close in the input space.
--

- For these pairs errors are not tolerated.
--

- The most accuracy is achieved for nearby points
--

- The local structure is preserved.

---

# Sammon mapping

- The Sammon mapping is solved by numerical optimisation (gradient descent).
--

- It is different from the classical solution
--

- It is not based on an eigenvalue decomposition
--

- It is not based on rotation
--

- It is a non-linear mapping.

---

# Example

- The following is a simulated toy example to motivate non-linear dimension reduction.
--

- Consider the case where input points are 2D and the output points are 1D.
--

- The specific problem of doing multidimensional scaling where the lower dimension is 1 is called *seriation*.

---

# Original Data

---

# Original Data

---

# Rotate (Classical Solution)

---

# Keep 1 Dimension

---

# Sammon Mapping

---

# Discussion

- Classical MDS cannot account for non-linearity.
  + The dark blue and yellow points are represented as close to one another.
--

- Sammon does account for non-linearity.
  + The blue and yellow points are represented as far apart.
--

- The Sammon mapping preserves local structure.

---

# Summary

- Classical MDS with Euclidean input distances is identical (up to rotation) to PCA
--

- This no longer holds with non-Euclidean input distances
--

+ Classical MDS gives output points that 'approximate' information in the double centered squared distance matrix.
--

- The Sammon mapping highlights the importance of preserving local structure.

---

#Questions?