Multidimensional ScalingHigh Dimensional Data AnalysisAnastasios Panagiotelis & Ruben Loaiza-MayaLecture 51

Motivation2

MotivationPreviously we looked at the concept of distance between observations.
3

MotivationPreviously we looked at the concept of distance between observations.
We looked at our usual understanding of distance known as Euclidean distance.
3

MotivationPreviously we looked at the concept of distance between observations.
We looked at our usual understanding of distance known as Euclidean distance.
We also looked at higher dimensional versions of Euclidean distance.
3

MotivationPreviously we looked at the concept of distance between observations.
We looked at our usual understanding of distance known as Euclidean distance.
We also looked at higher dimensional versions of Euclidean distance.
Other distance metrics including Jaccard distance can be used for categorical data.
3

Can we see distance?Suppose we have nn observations and the distance between each possible pair of observations.
4

Can we see distance?Suppose we have nn observations and the distance between each possible pair of observations.
A scatterplot shows whether observations are close together or far apart.
4

Can we see distance?Suppose we have nn observations and the distance between each possible pair of observations.
A scatterplot shows whether observations are close together or far apart.
This works nicely when there are 2 variables.
4

Higher-dimensional plotsSuppose we have pp variables where pp is large.
5

Higher-dimensional plotsSuppose we have pp variables where pp is large.
Consider pp-dimensional Euclidean distances.
5

Higher-dimensional plotsSuppose we have pp variables where pp is large.
Consider pp-dimensional Euclidean distances.
Can we represent these using just 22-dimensions?
5

Higher-dimensional plotsSuppose we have pp variables where pp is large.
Consider pp-dimensional Euclidean distances.
Can we represent these using just 22-dimensions?
Unfortunately the answer is no...
5

Higher-dimensional plotsSuppose we have pp variables where pp is large.
Consider pp-dimensional Euclidean distances.
Can we represent these using just 22-dimensions?
Unfortunately the answer is no...
... but we can get a good approximation
5

Multidimensional ScalingMultidimensional scaling (MDS) finds a low (usually 2) dimensional representation.
6

Multidimensional ScalingMultidimensional scaling (MDS) finds a low (usually 2) dimensional representation.
The pairwise 2D Euclidean distances in this representation should be as close as possible to the original distances.
6

Multidimensional ScalingMultidimensional scaling (MDS) finds a low (usually 2) dimensional representation.
The pairwise 2D Euclidean distances in this representation should be as close as possible to the original distances.
The meaning of close can vary since there are different ways to do MDS.
6

Multidimensional ScalingMultidimensional scaling (MDS) finds a low (usually 2) dimensional representation.
The pairwise 2D Euclidean distances in this representation should be as close as possible to the original distances.
The meaning of close can vary since there are different ways to do MDS.
However MDS always begins with a matrix of distances and ends with a low dimensional representation that can be plotted. 
6

An optical illusion with Beyonce

Beyonce and the Eiffel Tower

Why does the illusion work?The photo is a 2D representation of a 3D reality.
8

Why does the illusion work?The photo is a 2D representation of a 3D reality.
In reality the distance between Beyonce's hand and the Eiffel Tower is large.
8

Why does the illusion work?The photo is a 2D representation of a 3D reality.
In reality the distance between Beyonce's hand and the Eiffel Tower is large.
In the 2D photo, this distance is small.
8

Why does the illusion work?The photo is a 2D representation of a 3D reality.
In reality the distance between Beyonce's hand and the Eiffel Tower is large.
In the 2D photo, this distance is small.
This is a misleading representation to understand the distance between Beyonce's hand and the Eiffel Tower.
8

Why does the illusion work?The photo is a 2D representation of a 3D reality.
In reality the distance between Beyonce's hand and the Eiffel Tower is large.
In the 2D photo, this distance is small.
This is a misleading representation to understand the distance between Beyonce's hand and the Eiffel Tower.
A much more informative representation could be found by rotation.
8

Why do we care?An important issue in business is to profile the market.  For
example
9

Why do we care?An important issue in business is to profile the market.  For
exampleWhich products do customers perceive to be similar to one
another?

9

Why do we care?An important issue in business is to profile the market.  For
exampleWhich products do customers perceive to be similar to one
another?
Who is my closest competitor?

9

Why do we care?An important issue in business is to profile the market.  For
exampleWhich products do customers perceive to be similar to one
another?
Who is my closest competitor?
Are there ‘gaps’ in the market, where a new product can be introduced?

9

Why do we care?An important issue in business is to profile the market.  For
exampleWhich products do customers perceive to be similar to one
another?
Who is my closest competitor?
Are there ‘gaps’ in the market, where a new product can be introduced?

Multidimensional Scaling can help us to produce a simple visualisation that can address these questions.
9

Beer Example

Beer ExampleThe plot on the previous slide is an MDS
solution for the beer dataset.
11

Beer ExampleThe plot on the previous slide is an MDS
solution for the beer dataset.
The data are 5-dimensional so we cannot use a scatter plot.
11

Beer ExampleThe plot on the previous slide is an MDS
solution for the beer dataset.
The data are 5-dimensional so we cannot use a scatter plot.
MDS shows Olympia Gold Light and Pabst Extra Light are similar (both
light beers).
11

Beer ExampleThe plot on the previous slide is an MDS
solution for the beer dataset.
The data are 5-dimensional so we cannot use a scatter plot.
MDS shows Olympia Gold Light and Pabst Extra Light are similar (both
light beers).
This also suggest that there is a low number of competitors with St Pauli Girl.
11

Beer ExampleThe plot on the previous slide is an MDS
solution for the beer dataset.
The data are 5-dimensional so we cannot use a scatter plot.
MDS shows Olympia Gold Light and Pabst Extra Light are similar (both
light beers).
This also suggest that there is a low number of competitors with St Pauli Girl.
This may also reflect that the attributes of St Pauli Girl are not desired by customers.
11

Beer ExampleThe plot on the previous slide is an MDS
solution for the beer dataset.
The data are 5-dimensional so we cannot use a scatter plot.
MDS shows Olympia Gold Light and Pabst Extra Light are similar (both
light beers).
This also suggest that there is a low number of competitors with St Pauli Girl.
This may also reflect that the attributes of St Pauli Girl are not desired by customers.
How did we get the plot?
11

Beer Data
 
    beer 
    rating 
    origin 
    avail 
    price 
    cost 
    calories 
    sodium 
    alcohol 
    light 
  


    Olympia Gold Light 
    Fair 
    USA 
    Regional 
    2.75 
    0.46 
    72 
    6 
    2.9 
    LIGHT 
  

    Pabst Extra Light 
    Fair 
    USA 
    National 
    2.29 
    0.38 
    68 
    15 
    2.3 
    LIGHT 
  

    Schlitz Light 
    Fair 
    USA 
    National 
    2.79 
    0.47 
    97 
    7 
    4.2 
    LIGHT 
  

    Blatz 
    Fair 
    USA 
    Regional 
    1.79 
    0.30 
    144 
    13 
    4.6 
    NONLIGHT 
  

    Hamms 
    Fair 
    USA 
    Regional 
    2.59 
    0.43 
    136 
    19 
    4.4 
    NONLIGHT 
  

    Heilmans Old Style 
    Fair 
    USA 
    Regional 
    2.59 
    0.43 
    144 
    24 
    4.9 
    NONLIGHT 
  

    Rolling Rock 
    Fair 
    USA 
    Regional 
    2.15 
    0.36 
    144 
    8 
    4.7 
    NONLIGHT 
  

    Scotch Buy (Safeway) 
    Fair 
    USA 
    Regional 
    1.59 
    0.27 
    145 
    18 
    4.5 
    NONLIGHT 
  

    St Pauli Girl 
    Fair 
    Germany 
    Regional 
    4.59 
    0.77 
    144 
    21 
    4.7 
    NONLIGHT 
  

    Tuborg 
    Fair 
    USA 
    Regional 
    2.59 
    0.43 
    155 
    13 
    5.0 
    NONLIGHT 
  

12

beer	rating	origin	avail	price	cost	calories	sodium	alcohol	light
Olympia Gold Light	Fair	USA	Regional	2.75	0.46	72	6	2.9	LIGHT
Pabst Extra Light	Fair	USA	National	2.29	0.38	68	15	2.3	LIGHT
Schlitz Light	Fair	USA	National	2.79	0.47	97	7	4.2	LIGHT
Blatz	Fair	USA	Regional	1.79	0.30	144	13	4.6	NONLIGHT
Hamms	Fair	USA	Regional	2.59	0.43	136	19	4.4	NONLIGHT
Heilmans Old Style	Fair	USA	Regional	2.59	0.43	144	24	4.9	NONLIGHT
Rolling Rock	Fair	USA	Regional	2.15	0.36	144	8	4.7	NONLIGHT
Scotch Buy (Safeway)	Fair	USA	Regional	1.59	0.27	145	18	4.5	NONLIGHT
St Pauli Girl	Fair	Germany	Regional	4.59	0.77	144	21	4.7	NONLIGHT
Tuborg	Fair	USA	Regional	2.59	0.43	155	13	5.0	NONLIGHT

DetailsTo keep the example simple only the beers rated fair are used
13

DetailsTo keep the example simple only the beers rated fair are used
In general, all the beers can be used.
13

DetailsTo keep the example simple only the beers rated fair are used
In general, all the beers can be used.
Also to keep things simple we only consider the metric variables so that we can use Euclidean distance.
13

DetailsTo keep the example simple only the beers rated fair are used
In general, all the beers can be used.
Also to keep things simple we only consider the metric variables so that we can use Euclidean distance.
In general, we can use distance metrics that work for categorical data.
13

Metric VariablesAfter standardising, Euclidean distances are formed between every possible pair of beers.
14

Metric VariablesAfter standardising, Euclidean distances are formed between every possible pair of beers.
For example, the distance between Blatz and Tuborg is given by
14

Metric Variables

After standardising, Euclidean distances are formed between every possible pair of beers.
For example, the distance between Blatz and Tuborg is given by

$δ (Blatz, Tbrg) = \sqrt{\sum_{h = 1}^{5} ({Blatz}_{h} - {Tbrg}_{h})^{2}}$ Both the notation $δ_{i j}$ and $δ (i, j)$ will be used interchangeably

Doing it in R

To obtain the distance matrix in R

filter(Beer,rating=='Fair')%>% #Only fair beers
  select_if(is.numeric)%>% #Only metric data
  scale%>% #Standardise
  dist->delta #Distance
filter(Beer,rating=='Fair')%>% #Only fair beers
  pull(beer)%>% #Get beer names
  abbreviate(6)-> #Abbreviate
  attributes(delta)$Labels #Assign to d

MDS in R

We can do what is known as classical MDS in R using the cmdscale function

mdsout<-cmdscale(delta)

OlymGL	-1.9758212	-1.6276821
PbstEL	-2.1860282	-1.1600914
SchltL	-0.7420968	-0.7994497
Blatz	-0.3386684	1.4929936
Hamms	0.6053483	0.2720245
HlmnOS	1.3641181	0.6556403
RllngR	-0.2932490	0.9661501
SB(Sf)	-0.2067650	1.7951323
StPlGr	2.9648884	-2.2976842
Tuborg	0.8082737	0.7029665

Two new variablesWe have just created two new variables for visualising the distances.
17

Two new variablesWe have just created two new variables for visualising the distances.
The distances that we visualise will be 2-dimensional distances.  For example
17

Two new variables

We have just created two new variables for visualising the distances.
The distances that we visualise will be 2-dimensional distances. For example

$\begin{aligned} d & (Blatz, Tbrg) = \\ \sqrt{(- 0.339 - 0.808)^{2} + (1.493 - 0.703)^{2}} \end{aligned}$

Not exactIn this example d(Blatz,Tuborg)=1.3927d(Blatz,Tuborg)=1.3927 while δ(Blatz,Tuborg)=1.4762δ(Blatz,Tuborg)=1.4762.  Notice that
18

Not exact

In this example $d (Blatz, Tuborg) = 1.3927$ while $δ (Blatz, Tuborg) = 1.4762$ . Notice that

$d (Blatz, Tuborg) \neq δ (Blatz, Tuborg)$

But they are close.

Getting the plot

mdsout%>%
  as_tibble%>% 
  ggplot(aes(x=V1,y=V2))+geom_point()

Getting the plot with names

mdsout%>%
  as_tibble(rownames='BeerName')%>%
  ggplot(aes(x=V1,y=V2,label=BeerName))+geom_text()

The math behind classical MDSIn classical MDS the objective is to minimise strain
21

The math behind classical MDS

In classical MDS the objective is to minimise strain

$Strain = \sum_{i = 1}^{n - 1} \sum_{j > i} (δ_{i j}^{2} - d_{i j}^{2})$

Note that the $δ_{i j}$ are high dimensional distances that come from the true data.

The math behind classical MDS

In classical MDS the objective is to minimise strain

$Strain = \sum_{i = 1}^{n - 1} \sum_{j > i} (δ_{i j}^{2} - d_{i j}^{2})$

Note that the $δ_{i j}$ are high dimensional distances that come from the true data.
The $d_{i j}$ are low dimensional distances that come from the solution.

When can this be solved?The above problem has a tractable solution when Euclidean distance is used.
22

When can this be solved?The above problem has a tractable solution when Euclidean distance is used.
This solution depends on an Eigenvalue decomposition.
22

When can this be solved?The above problem has a tractable solution when Euclidean distance is used.
This solution depends on an Eigenvalue decomposition.
This solution rotates the points until we get a 2D view that represents the true distances as accurately as possible.
22

SummaryWhen Euclidean distance is used the solution provided by classical MDS:
23

SummaryWhen Euclidean distance is used the solution provided by classical MDS:Minimises the strain.

23

SummaryWhen Euclidean distance is used the solution provided by classical MDS:Minimises the strain.
Results in eigenvalues that are all positive

23

SummaryWhen Euclidean distance is used the solution provided by classical MDS:Minimises the strain.
Results in eigenvalues that are all positive

Can we use classical MDS when distances are non-Euclidean?
23

An example: Road distancesSuppose that we have the road distances between different cities in Australia.
24

An example: Road distancesSuppose that we have the road distances between different cities in Australia.
The road distances are non-Euclidean since roads can be quite wiggly.
24

An example: Road distancesSuppose that we have the road distances between different cities in Australia.
The road distances are non-Euclidean since roads can be quite wiggly.
We want to create a 2-dimensional map with the locations of the cities using only these road distances.  
24

An example: Road distancesSuppose that we have the road distances between different cities in Australia.
The road distances are non-Euclidean since roads can be quite wiggly.
We want to create a 2-dimensional map with the locations of the cities using only these road distances.  
Classical MDS can give an approximation that is quite close to a real map.
24

Road Distances
 
      
    Cairns 
    Brisbane 
    Sydney 
    Melbourne 
    Adelaide 
    Perth 
    Darwin 
    Alice Springs 
  


    Cairns 








  

    Brisbane 








  

    Sydney 








  

    Melbourne 








  

    Adelaide 








  

    Perth 








  

    Darwin 








  

    Alice Springs 








  

25

	Cairns	Brisbane	Sydney	Melbourne	Adelaide	Perth	Darwin	Alice Springs
Cairns	0	1717	2546	3054	3143	5954	2727	2324
Brisbane	1717	0	996	1674	2063	4348	3415	3012
Sydney	2546	996	0	868	1420	4144	4000	2644
Melbourne	3054	1674	868	0	728	3452	3781	2270
Adelaide	3143	2063	1420	728	0	2724	3053	1542
Perth	5954	4348	4144	3452	2724	0	4045	3630
Darwin	2727	3415	4000	3781	3053	4045	0	1511
Alice Springs	2324	3012	2644	2270	1542	3630	1511	0

Australia

MDS Solution

Rotate

Back with Map

RotatingOnce a solution is available, we rotate the points within 2 dimensions.
30

RotatingOnce a solution is available, we rotate the points within 2 dimensions.The 2D rotation does not change any of the distances.

30

RotatingOnce a solution is available, we rotate the points within 2 dimensions.The 2D rotation does not change any of the distances.
It can help us to interpret the axes.

30

RotatingOnce a solution is available, we rotate the points within 2 dimensions.The 2D rotation does not change any of the distances.
It can help us to interpret the axes.

In the previous example the x-axis represents East-West direction and the y-axis represents North-South.
30

Evaluating MDS31

How good is this representation?In theory, as long as the original distances are Euclidean, strain is minimised.
32

How good is this representation?In theory, as long as the original distances are Euclidean, strain is minimised.
What if the optimal solution is still bad?
32

How good is this representation?In theory, as long as the original distances are Euclidean, strain is minimised.
What if the optimal solution is still bad?
Use two goodness of fit measures.
32

How good is this representation?In theory, as long as the original distances are Euclidean, strain is minimised.
What if the optimal solution is still bad?
Use two goodness of fit measures.
Think of these in a similar fashion to R square in regression modelling.
32

Goodness of Fit Measures?These values depend on the eigenvalues
GF1=2∑i=1|λi|n∑i=1|λi|GF2=2∑i=1max(0,λi)n∑i=1max(0,λi)GF1=∑i=12|λi|∑i=1n|λi|GF2=∑i=12max(0,λi)∑i=1nmax(0,λi)
For Euclidean distances δijδij eigenvalues are always positive and GF1=GF2GF1=GF2.
33

Beer ExampleIn R obtain GoF using the option eig=TRUE in the cmdscale function
34

Beer Example

In R obtain GoF using the option eig=TRUE in the cmdscale function
For the Beer data.

mdsout<-cmdscale(delta,eig=TRUE)
str(mdsout$GOF)

##  num [1:2] 0.854 0.854

GoF MeasureYou may notice that the GoF measures are the same.
35

GoF MeasureYou may notice that the GoF measures are the same.
This is always the case when Euclidean distance is used.
35

GoF MeasureYou may notice that the GoF measures are the same.
This is always the case when Euclidean distance is used.
This arises since all eigenvalues are positive when the distance matrix is based on Euclidean distance.
35

Non-Euclidean distancesIn theory non-Euclidean distances can lead to negative eigenvalues.  In this case:
36

Non-Euclidean distancesIn theory non-Euclidean distances can lead to negative eigenvalues.  In this case:Classical MDS may not minimise Strain.

36

Non-Euclidean distancesIn theory non-Euclidean distances can lead to negative eigenvalues.  In this case:Classical MDS may not minimise Strain.
It minimises a slightly different function of the distances.

36

Non-Euclidean distancesIn theory non-Euclidean distances can lead to negative eigenvalues.  In this case:Classical MDS may not minimise Strain.
It minimises a slightly different function of the distances.
Two fit measures will differ.

36

Non-Euclidean distancesIn theory non-Euclidean distances can lead to negative eigenvalues.  In this case:Classical MDS may not minimise Strain.
It minimises a slightly different function of the distances.
Two fit measures will differ.

Overall, we can use classical MDS for non-Euclidean distance but must be more careful.
36

Australia data

cmdscale(doz,eig=TRUE)->dozout
str(dozout$eig)

##  num [1:8] 1.97e+07 1.25e+07 2.62e+06 5.96e+04 -3.26e-09 ...

str(dozout$eig[6:8])

##  num [1:3] -311786 -1083294 -2179888

str(dozout$GOF)

##  num [1:2] 0.837 0.923

Evaluating the ResultThere are negative eigenvalues.
38

Evaluating the ResultThere are negative eigenvalues.This occurs since road distances are not Euclidean

38

Evaluating the ResultThere are negative eigenvalues.This occurs since road distances are not Euclidean
This also implies that classical MDS does not minimise strain.

38

Evaluating the ResultThere are negative eigenvalues.This occurs since road distances are not Euclidean
This also implies that classical MDS does not minimise strain.

Both goodness of fit measures are quite high.
38

Evaluating the ResultThere are negative eigenvalues.This occurs since road distances are not Euclidean
This also implies that classical MDS does not minimise strain.

Both goodness of fit measures are quite high.The solution is an accurate representation.

38

Another example: Cheese

The following example comes from ‘Multidimensional Scaling of Sorting Data Applied to Cheese Perception’, Food Quality and Preference,6, pp.91-98. The purpose of this study was to visualise the difference between types of cheese.

Another example: CheeseThe motivation is to investigate the similarities and differences between types of cheese.
40

Another example: CheeseThe motivation is to investigate the similarities and differences between types of cheese.
In principle one could measure attributes of the cheese.
40

Another example: CheeseThe motivation is to investigate the similarities and differences between types of cheese.
In principle one could measure attributes of the cheese.
However the purpose of this study was to ask customers about their perceptions.
40

Another example: CheeseThe motivation is to investigate the similarities and differences between types of cheese.
In principle one could measure attributes of the cheese.
However the purpose of this study was to ask customers about their perceptions.
How do we ask customers about distances?
40

Another example: CheeseThe motivation is to investigate the similarities and differences between types of cheese.
In principle one could measure attributes of the cheese.
However the purpose of this study was to ask customers about their perceptions.
How do we ask customers about distances?
Could you walk out on to the street and ask someone about the Euclidean distance between Brie and Camembert?
40

Constructing the SurveyCustomers can be asked:
41

Constructing the SurveyCustomers can be asked:
On a scale of 1 to 10 with 1 being the most similar and 10
being the most different, how similar are the following
cheeses
41

Constructing the SurveyCustomers can be asked:
On a scale of 1 to 10 with 1 being the most similar and 10
being the most different, how similar are the following
cheesesBrie and Camembert

41

Constructing the SurveyCustomers can be asked:
On a scale of 1 to 10 with 1 being the most similar and 10
being the most different, how similar are the following
cheesesBrie and Camembert
Brie and Roquefort

41

Constructing the SurveyCustomers can be asked:
On a scale of 1 to 10 with 1 being the most similar and 10
being the most different, how similar are the following
cheesesBrie and Camembert
Brie and Roquefort
Camembert and Roquefort

41

Constructing the SurveyCustomers can be asked:
On a scale of 1 to 10 with 1 being the most similar and 10
being the most different, how similar are the following
cheesesBrie and Camembert
Brie and Roquefort
Camembert and Roquefort

The dissimilarity scores can be averaged over all
customers and used in an MDS
41

Constructing the SurveyCustomers can be asked:
On a scale of 1 to 10 with 1 being the most similar and 10
being the most different, how similar are the following
cheesesBrie and Camembert
Brie and Roquefort
Camembert and Roquefort

The dissimilarity scores can be averaged over all
customers and used in an MDS
This is not a good method when there is a large number of
products.
41

A more feasible approachIn the study there are 16 cheeses therefore 120 possible
pairwise comparisons.
42

A more feasible approachIn the study there are 16 cheeses therefore 120 possible
pairwise comparisons.
It is not practical to ask survey participants to make 120 comparisons!
42

A more feasible approachIn the study there are 16 cheeses therefore 120 possible
pairwise comparisons.
It is not practical to ask survey participants to make 120 comparisons!
Instead of being asked to make so many comparisons, customers were asked to put similar
cheeses into groups.
42

A more feasible approachIn the study there are 16 cheeses therefore 120 possible
pairwise comparisons.
It is not practical to ask survey participants to make 120 comparisons!
Instead of being asked to make so many comparisons, customers were asked to put similar
cheeses into groups.
Proportion of customers with two cheeses in same group is a similarity score.
42

A more feasible approachIn the study there are 16 cheeses therefore 120 possible
pairwise comparisons.
It is not practical to ask survey participants to make 120 comparisons!
Instead of being asked to make so many comparisons, customers were asked to put similar
cheeses into groups.
Proportion of customers with two cheeses in same group is a similarity score.
Proportion of customers with two cheeses in different groups is a dissimilarity score.
42

Consider four customersSuppose there are four customers sorting cheeses
43

Consider four customersSuppose there are four customers sorting cheesesCustomer A: Brie and Camembert together, Roquefort and
Blue Vein together

43

Consider four customersSuppose there are four customers sorting cheesesCustomer A: Brie and Camembert together, Roquefort and
Blue Vein together
Customer B: Roquefort and Blue Vein together, all others
separate

43

Consider four customersSuppose there are four customers sorting cheesesCustomer A: Brie and Camembert together, Roquefort and
Blue Vein together
Customer B: Roquefort and Blue Vein together, all others
separate
Customer C: All cheeses in their own category

43

Consider four customersSuppose there are four customers sorting cheesesCustomer A: Brie and Camembert together, Roquefort and
Blue Vein together
Customer B: Roquefort and Blue Vein together, all others
separate
Customer C: All cheeses in their own category
Customer D: All cheeses in one category

43

ComparisonsCustomer A and D have Brie and Camembert in the same group, customers B and C have them in different groups.
44

ComparisonsCustomer A and D have Brie and Camembert in the same group, customers B and C have them in different groups.The distance between Brie and Camembert is 0.5.

44

ComparisonsCustomer A and D have Brie and Camembert in the same group, customers B and C have them in different groups.The distance between Brie and Camembert is 0.5.

Customer A, B and D have Roquefort and Blue Vein in the same group, customer C has them in different groups.
44

ComparisonsCustomer A and D have Brie and Camembert in the same group, customers B and C have them in different groups.The distance between Brie and Camembert is 0.5.

Customer A, B and D have Roquefort and Blue Vein in the same group, customer C has them in different groups.The distance between Roquefort and Blue Vein is 0.25.

44

MDSThe study on cheese did not use classical MDS but something called Kruskals algorithm.
There are many alternatives to classical MDS.
We now briefly cover some of the ideas behind them.
45

Beyond Classical MDS46

Beyond Classical MDSClassical MDS is designed to minimise Strain.
47

Beyond Classical MDSClassical MDS is designed to minimise Strain.
An alternative  objective function called Stress can be minimised instead
47

Beyond Classical MDS

Classical MDS is designed to minimise Strain.
An alternative objective function called Stress can be minimised instead

$Stress = \sum_{i = 1}^{n - 1} \sum_{j > i} \frac{(δ_{i j} - d_{i j})^{2}}{δ_{i j}}$

Beyond Classical MDS

Classical MDS is designed to minimise Strain.
An alternative objective function called Stress can be minimised instead

$Stress = \sum_{i = 1}^{n - 1} \sum_{j > i} \frac{(δ_{i j} - d_{i j})^{2}}{δ_{i j}}$

The difference between $δ_{i j}$ and $d_{i j}$ acts like an error.

Beyond Classical MDS

Classical MDS is designed to minimise Strain.
An alternative objective function called Stress can be minimised instead

$Stress = \sum_{i = 1}^{n - 1} \sum_{j > i} \frac{(δ_{i j} - d_{i j})^{2}}{δ_{i j}}$

The difference between $δ_{i j}$ and $d_{i j}$ acts like an error.
The $δ_{i j}$ on the denominator acts as a weight

WeightingFor large δδ observations are far away in the original space.
48

WeightingFor large δδ observations are far away in the original space.For these pairs errors are more easily tolerated.

48

WeightingFor large δδ observations are far away in the original space.For these pairs errors are more easily tolerated.

For small δδ observations are close in the original space.
48

WeightingFor large δδ observations are far away in the original space.For these pairs errors are more easily tolerated.

For small δδ observations are close in the original space.For these pairs errors are not tolerated.

48

WeightingFor large δδ observations are far away in the original space.For these pairs errors are more easily tolerated.

For small δδ observations are close in the original space.For these pairs errors are not tolerated.

The most accuracy is achieved for nearby points
48

WeightingFor large δδ observations are far away in the original space.For these pairs errors are more easily tolerated.

For small δδ observations are close in the original space.For these pairs errors are not tolerated.

The most accuracy is achieved for nearby points
The local structure is preserved.
48

Sammon mappingThe Sammon mapping is solved by numerical optimisation.
49

Sammon mappingThe Sammon mapping is solved by numerical optimisation.
It is different from the classical solution
49

Sammon mappingThe Sammon mapping is solved by numerical optimisation.
It is different from the classical solutionIt is not based on an eigenvalue decomposition

49

Sammon mappingThe Sammon mapping is solved by numerical optimisation.
It is different from the classical solutionIt is not based on an eigenvalue decomposition
It is not based on rotation

49

Sammon mappingThe Sammon mapping is solved by numerical optimisation.
It is different from the classical solutionIt is not based on an eigenvalue decomposition
It is not based on rotation
It is a non-linear mapping.

49

ExampleConsider the case where points are in 2D space and the aim is to summarise them in 1D space (across a line).
The specific problem of doing multidimensional scaling where the lower dimension is 1 is called seriation.
It provides a ranking of the observations.
50

ExampleConsider the case where points are in 2D space and the aim is to summarise them in 1D space (across a line).
The specific problem of doing multidimensional scaling where the lower dimension is 1 is called seriation.
It provides a ranking of the observations.
In marketing it can be used to elicit preferences.
50

Original Data

Rotate (Classical Solution)

Keep 1 Dimension

Rug plot (classical solution)

Sammon Mapping

DiscussionClassical MDS cannot account for non-linearity.  
The dark blue and yellow points are represented as close to one another.
Sammon does account for non-linearity.
The blue and yellow points are represented as far apart.
Although they are not so far apart in the original space, these observations are downweighted relative to the local structure.
57

Kruskal algorithmKruskal's algorithm minimises a slightly different criterion.
58

Kruskal algorithmKruskal's algorithm minimises a slightly different criterion.
This is still often called stress, which is admittedly confusing.
58

Kruskal algorithmKruskal's algorithm minimises a slightly different criterion.
This is still often called stress, which is admittedly confusing.
Kruskal's algorithm is implemented in R using the isoMDS function from the MASS package.
58

Monotone tranformationsKruskal's algorithm is invariant to monotone transformations of the distances.
59

Monotone tranformationsKruskal's algorithm is invariant to monotone transformations of the distances.
By monotone transformation we mean any function of the distance that is either constantly increasing or decreasing.
59

Monotone tranformationsKruskal's algorithm is invariant to monotone transformations of the distances.
By monotone transformation we mean any function of the distance that is either constantly increasing or decreasing.Exponential function is monotone

59

Monotone tranformationsKruskal's algorithm is invariant to monotone transformations of the distances.
By monotone transformation we mean any function of the distance that is either constantly increasing or decreasing.Exponential function is monotone
Sine function is not monotone

59

Monotone tranformationsKruskal's algorithm is invariant to monotone transformations of the distances.
By monotone transformation we mean any function of the distance that is either constantly increasing or decreasing.Exponential function is monotone
Sine function is not monotone

By invariant we mean that the solution provided by Kruskal's does not change if we transform the input distances.
59

Example

library(MASS)
isoMDS(d)->kBeer

## initial  value 9.127089 
## iter   5 value 5.688460
## final  value 5.611143 
## converged

Make plot

kBeer$points%>%
  as_tibble()%>% 
  ggplot(aes(x=V1,y=V2))+
  geom_point(size=10)

Make plot

Squared distances

isoMDS(d^2)->kBeer2

## initial  value 11.274285 
## iter   5 value 6.447929
## iter  10 value 5.697285
## final  value 5.603035 
## converged

Solution

ComparisonSquaring the distances provides the same solution with two caveats:
65

ComparisonSquaring the distances provides the same solution with two caveats:The stress is slightly different.  Numerical optimisation can vary a little depending on starting values. 

65

ComparisonSquaring the distances provides the same solution with two caveats:The stress is slightly different.  Numerical optimisation can vary a little depending on starting values. 
The points in one plot are slightly rotated compared to the other.

65

ComparisonSquaring the distances provides the same solution with two caveats:The stress is slightly different.  Numerical optimisation can vary a little depending on starting values. 
The points in one plot are slightly rotated compared to the other.

Why is the invariance to monotone tranformations important?
65

Non metric MDSIn some cases, the distance themselves are not metric but ordinal.
66

Non metric MDSIn some cases, the distance themselves are not metric but ordinal.
Suppose we only know
66

Non metric MDSIn some cases, the distance themselves are not metric but ordinal.
Suppose we only know
δBri.,Cam.<δRoq.,Cam.<δRoq.,Bri.δBri.,Cam.<δRoq.,Cam.<δRoq.,Bri.
66

Non metric MDSIn some cases, the distance themselves are not metric but ordinal.
Suppose we only know
δBri.,Cam.<δRoq.,Cam.<δRoq.,Bri.δBri.,Cam.<δRoq.,Cam.<δRoq.,Bri.
Brie and Roquefort are more different compared to Brie and Camembert.
66

Non metric MDSIn some cases, the distance themselves are not metric but ordinal.
Suppose we only know
δBri.,Cam.<δRoq.,Cam.<δRoq.,Bri.δBri.,Cam.<δRoq.,Cam.<δRoq.,Bri.
Brie and Roquefort are more different compared to Brie and Camembert.
We do not know how big the distance between Brie and Roquefort is compared to the distance between Brie and Camembert. 
66

Non-metric MDSIn this case we minimise Stress subject to constraints, e.g.
67

Non-metric MDSIn this case we minimise Stress subject to constraints, e.g.
^δBri.,Cam.<^δRoq.,Cam.<^δRoq.,Bri.δ^Bri.,Cam.<δ^Roq.,Cam.<δ^Roq.,Bri.
67

Non-metric MDSTaking the ranks is an example of a monotone transformation.
68

Non-metric MDSTaking the ranks is an example of a monotone transformation.
Therefore the solution of isoMDS only requires the ranks of the distances and not the distances themselves.
68

Non-metric MDSTaking the ranks is an example of a monotone transformation.
Therefore the solution of isoMDS only requires the ranks of the distances and not the distances themselves.
This is a very useful algorithm for marketing since survey participants cannot easily and reliable assign numbers to the difference between products.
68

Modern MDSMethods for finding a low dimensional representation of high-dimensional data continue to be used today
69

Modern MDSMethods for finding a low dimensional representation of high-dimensional data continue to be used today
These mostly go by the name of manifold learning methods
69

Modern MDSMethods for finding a low dimensional representation of high-dimensional data continue to be used today
These mostly go by the name of manifold learning methods
They are not only used for visualisation
69

Modern MDSMethods for finding a low dimensional representation of high-dimensional data continue to be used today
These mostly go by the name of manifold learning methods
They are not only used for visualisation
The low-dimensional co-ordinates can also be used as features in classification and regression.
69

ExamplesLocal Linear Embedding (LLE)
IsoMap
Laplacian Eigenmap
t SNE
Kohonen Map
...
70

ExamplesLocal Linear Embedding (LLE)
IsoMap
Laplacian Eigenmap
t SNE
Kohonen Map
...
... and others.
70

PropertiesFor most of the modern methods two characteristics are common.
71

PropertiesFor most of the modern methods two characteristics are common.The idea that local structure should be preserved.  The first step of many algorithms is to find nearest neighbours of each point.

71

PropertiesFor most of the modern methods two characteristics are common.The idea that local structure should be preserved.  The first step of many algorithms is to find nearest neighbours of each point.
In many algorithms an eigenvalue decomposition forms part of the solution as is the case in classic MDS.

71

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help