+ - 0:00:00
Notes for current slide
Notes for next slide

Multidimensional Scaling

High Dimensional Data Analysis

Anastasios Panagiotelis & Ruben Loaiza-Maya

Lecture 5

1

Motivation

2

Motivation

  • Previously we looked at the concept of distance between observations.
3

Motivation

  • Previously we looked at the concept of distance between observations.
  • We looked at our usual understanding of distance known as Euclidean distance.
3

Motivation

  • Previously we looked at the concept of distance between observations.
  • We looked at our usual understanding of distance known as Euclidean distance.
  • We also looked at higher dimensional versions of Euclidean distance.
3

Motivation

  • Previously we looked at the concept of distance between observations.
  • We looked at our usual understanding of distance known as Euclidean distance.
  • We also looked at higher dimensional versions of Euclidean distance.
  • Other distance metrics including Jaccard distance can be used for categorical data.
3

Can we see distance?

  • Suppose we have n observations and the distance between each possible pair of observations.
4

Can we see distance?

  • Suppose we have n observations and the distance between each possible pair of observations.
  • A scatterplot shows whether observations are close together or far apart.
4

Can we see distance?

  • Suppose we have n observations and the distance between each possible pair of observations.
  • A scatterplot shows whether observations are close together or far apart.
  • This works nicely when there are 2 variables.
4

Higher-dimensional plots

  • Suppose we have p variables where p is large.
5

Higher-dimensional plots

  • Suppose we have p variables where p is large.
  • Consider p-dimensional Euclidean distances.
5

Higher-dimensional plots

  • Suppose we have p variables where p is large.
  • Consider p-dimensional Euclidean distances.
  • Can we represent these using just 2-dimensions?
5

Higher-dimensional plots

  • Suppose we have p variables where p is large.
  • Consider p-dimensional Euclidean distances.
  • Can we represent these using just 2-dimensions?
  • Unfortunately the answer is no...
5

Higher-dimensional plots

  • Suppose we have p variables where p is large.
  • Consider p-dimensional Euclidean distances.
  • Can we represent these using just 2-dimensions?
  • Unfortunately the answer is no...
  • ... but we can get a good approximation
5

Multidimensional Scaling

  • Multidimensional scaling (MDS) finds a low (usually 2) dimensional representation.
6

Multidimensional Scaling

  • Multidimensional scaling (MDS) finds a low (usually 2) dimensional representation.
  • The pairwise 2D Euclidean distances in this representation should be as close as possible to the original distances.
6

Multidimensional Scaling

  • Multidimensional scaling (MDS) finds a low (usually 2) dimensional representation.
  • The pairwise 2D Euclidean distances in this representation should be as close as possible to the original distances.
  • The meaning of close can vary since there are different ways to do MDS.
6

Multidimensional Scaling

  • Multidimensional scaling (MDS) finds a low (usually 2) dimensional representation.
  • The pairwise 2D Euclidean distances in this representation should be as close as possible to the original distances.
  • The meaning of close can vary since there are different ways to do MDS.
  • However MDS always begins with a matrix of distances and ends with a low dimensional representation that can be plotted.
6

An optical illusion with Beyonce

Beyonce and the Eiffel Tower

7

Why does the illusion work?

  • The photo is a 2D representation of a 3D reality.
8

Why does the illusion work?

  • The photo is a 2D representation of a 3D reality.
  • In reality the distance between Beyonce's hand and the Eiffel Tower is large.
8

Why does the illusion work?

  • The photo is a 2D representation of a 3D reality.
  • In reality the distance between Beyonce's hand and the Eiffel Tower is large.
  • In the 2D photo, this distance is small.
8

Why does the illusion work?

  • The photo is a 2D representation of a 3D reality.
  • In reality the distance between Beyonce's hand and the Eiffel Tower is large.
  • In the 2D photo, this distance is small.
  • This is a misleading representation to understand the distance between Beyonce's hand and the Eiffel Tower.
8

Why does the illusion work?

  • The photo is a 2D representation of a 3D reality.
  • In reality the distance between Beyonce's hand and the Eiffel Tower is large.
  • In the 2D photo, this distance is small.
  • This is a misleading representation to understand the distance between Beyonce's hand and the Eiffel Tower.
  • A much more informative representation could be found by rotation.
8

Why do we care?

  • An important issue in business is to profile the market. For example
9

Why do we care?

  • An important issue in business is to profile the market. For example
    • Which products do customers perceive to be similar to one another?
9

Why do we care?

  • An important issue in business is to profile the market. For example
    • Which products do customers perceive to be similar to one another?
    • Who is my closest competitor?
9

Why do we care?

  • An important issue in business is to profile the market. For example
    • Which products do customers perceive to be similar to one another?
    • Who is my closest competitor?
    • Are there ‘gaps’ in the market, where a new product can be introduced?
9

Why do we care?

  • An important issue in business is to profile the market. For example
    • Which products do customers perceive to be similar to one another?
    • Who is my closest competitor?
    • Are there ‘gaps’ in the market, where a new product can be introduced?
  • Multidimensional Scaling can help us to produce a simple visualisation that can address these questions.
9

Beer Example

10

Beer Example

  • The plot on the previous slide is an MDS solution for the beer dataset.
11

Beer Example

  • The plot on the previous slide is an MDS solution for the beer dataset.
  • The data are 5-dimensional so we cannot use a scatter plot.
11

Beer Example

  • The plot on the previous slide is an MDS solution for the beer dataset.
  • The data are 5-dimensional so we cannot use a scatter plot.
  • MDS shows Olympia Gold Light and Pabst Extra Light are similar (both light beers).
11

Beer Example

  • The plot on the previous slide is an MDS solution for the beer dataset.
  • The data are 5-dimensional so we cannot use a scatter plot.
  • MDS shows Olympia Gold Light and Pabst Extra Light are similar (both light beers).
  • This also suggest that there is a low number of competitors with St Pauli Girl.
11

Beer Example

  • The plot on the previous slide is an MDS solution for the beer dataset.
  • The data are 5-dimensional so we cannot use a scatter plot.
  • MDS shows Olympia Gold Light and Pabst Extra Light are similar (both light beers).
  • This also suggest that there is a low number of competitors with St Pauli Girl.
  • This may also reflect that the attributes of St Pauli Girl are not desired by customers.
11

Beer Example

  • The plot on the previous slide is an MDS solution for the beer dataset.
  • The data are 5-dimensional so we cannot use a scatter plot.
  • MDS shows Olympia Gold Light and Pabst Extra Light are similar (both light beers).
  • This also suggest that there is a low number of competitors with St Pauli Girl.
  • This may also reflect that the attributes of St Pauli Girl are not desired by customers.
  • How did we get the plot?
11

Beer Data

beer rating origin avail price cost calories sodium alcohol light
Olympia Gold Light Fair USA Regional 2.75 0.46 72 6 2.9 LIGHT
Pabst Extra Light Fair USA National 2.29 0.38 68 15 2.3 LIGHT
Schlitz Light Fair USA National 2.79 0.47 97 7 4.2 LIGHT
Blatz Fair USA Regional 1.79 0.30 144 13 4.6 NONLIGHT
Hamms Fair USA Regional 2.59 0.43 136 19 4.4 NONLIGHT
Heilmans Old Style Fair USA Regional 2.59 0.43 144 24 4.9 NONLIGHT
Rolling Rock Fair USA Regional 2.15 0.36 144 8 4.7 NONLIGHT
Scotch Buy (Safeway) Fair USA Regional 1.59 0.27 145 18 4.5 NONLIGHT
St Pauli Girl Fair Germany Regional 4.59 0.77 144 21 4.7 NONLIGHT
Tuborg Fair USA Regional 2.59 0.43 155 13 5.0 NONLIGHT
12

Details

  • To keep the example simple only the beers rated fair are used
13

Details

  • To keep the example simple only the beers rated fair are used
  • In general, all the beers can be used.
13

Details

  • To keep the example simple only the beers rated fair are used
  • In general, all the beers can be used.
  • Also to keep things simple we only consider the metric variables so that we can use Euclidean distance.
13

Details

  • To keep the example simple only the beers rated fair are used
  • In general, all the beers can be used.
  • Also to keep things simple we only consider the metric variables so that we can use Euclidean distance.
  • In general, we can use distance metrics that work for categorical data.
13

Metric Variables

  • After standardising, Euclidean distances are formed between every possible pair of beers.
14

Metric Variables

  • After standardising, Euclidean distances are formed between every possible pair of beers.
  • For example, the distance between Blatz and Tuborg is given by
14

Metric Variables

  • After standardising, Euclidean distances are formed between every possible pair of beers.
  • For example, the distance between Blatz and Tuborg is given by

δ(Blatz,Tbrg)=h=15(BlatzhTbrgh)2 Both the notation δij and δ(i,j) will be used interchangeably

14

Doing it in R

To obtain the distance matrix in R

filter(Beer,rating=='Fair')%>% #Only fair beers
select_if(is.numeric)%>% #Only metric data
scale%>% #Standardise
dist->delta #Distance
filter(Beer,rating=='Fair')%>% #Only fair beers
pull(beer)%>% #Get beer names
abbreviate(6)-> #Abbreviate
attributes(delta)$Labels #Assign to d
15

MDS in R

We can do what is known as classical MDS in R using the cmdscale function

mdsout<-cmdscale(delta)
OlymGL -1.9758212 -1.6276821
PbstEL -2.1860282 -1.1600914
SchltL -0.7420968 -0.7994497
Blatz -0.3386684 1.4929936
Hamms 0.6053483 0.2720245
HlmnOS 1.3641181 0.6556403
RllngR -0.2932490 0.9661501
SB(Sf) -0.2067650 1.7951323
StPlGr 2.9648884 -2.2976842
Tuborg 0.8082737 0.7029665
16

Two new variables

  • We have just created two new variables for visualising the distances.
17

Two new variables

  • We have just created two new variables for visualising the distances.
  • The distances that we visualise will be 2-dimensional distances. For example
17

Two new variables

  • We have just created two new variables for visualising the distances.
  • The distances that we visualise will be 2-dimensional distances. For example

d(Blatz,Tbrg)=(0.3390.808)2+(1.4930.703)2

17

Not exact

  • In this example d(Blatz,Tuborg)=1.3927 while δ(Blatz,Tuborg)=1.4762. Notice that
18

Not exact

  • In this example d(Blatz,Tuborg)=1.3927 while δ(Blatz,Tuborg)=1.4762. Notice that

d(Blatz,Tuborg)δ(Blatz,Tuborg)

  • But they are close.
18

Getting the plot

mdsout%>%
as_tibble%>%
ggplot(aes(x=V1,y=V2))+geom_point()

19

Getting the plot with names

mdsout%>%
as_tibble(rownames='BeerName')%>%
ggplot(aes(x=V1,y=V2,label=BeerName))+geom_text()

20

The math behind classical MDS

  • In classical MDS the objective is to minimise strain
21

The math behind classical MDS

  • In classical MDS the objective is to minimise strain

Strain=i=1n1j>i(δij2dij2)

  • Note that the δij are high dimensional distances that come from the true data.
21

The math behind classical MDS

  • In classical MDS the objective is to minimise strain

Strain=i=1n1j>i(δij2dij2)

  • Note that the δij are high dimensional distances that come from the true data.
  • The dij are low dimensional distances that come from the solution.
21

When can this be solved?

  • The above problem has a tractable solution when Euclidean distance is used.
22

When can this be solved?

  • The above problem has a tractable solution when Euclidean distance is used.
  • This solution depends on an Eigenvalue decomposition.
22

When can this be solved?

  • The above problem has a tractable solution when Euclidean distance is used.
  • This solution depends on an Eigenvalue decomposition.
  • This solution rotates the points until we get a 2D view that represents the true distances as accurately as possible.
22

Summary

  • When Euclidean distance is used the solution provided by classical MDS:
23

Summary

  • When Euclidean distance is used the solution provided by classical MDS:
    • Minimises the strain.
23

Summary

  • When Euclidean distance is used the solution provided by classical MDS:
    • Minimises the strain.
    • Results in eigenvalues that are all positive
23

Summary

  • When Euclidean distance is used the solution provided by classical MDS:
    • Minimises the strain.
    • Results in eigenvalues that are all positive
  • Can we use classical MDS when distances are non-Euclidean?
23

An example: Road distances

  • Suppose that we have the road distances between different cities in Australia.
24

An example: Road distances

  • Suppose that we have the road distances between different cities in Australia.
  • The road distances are non-Euclidean since roads can be quite wiggly.
24

An example: Road distances

  • Suppose that we have the road distances between different cities in Australia.
  • The road distances are non-Euclidean since roads can be quite wiggly.
  • We want to create a 2-dimensional map with the locations of the cities using only these road distances.
24

An example: Road distances

  • Suppose that we have the road distances between different cities in Australia.
  • The road distances are non-Euclidean since roads can be quite wiggly.
  • We want to create a 2-dimensional map with the locations of the cities using only these road distances.
  • Classical MDS can give an approximation that is quite close to a real map.
24

Road Distances

Cairns Brisbane Sydney Melbourne Adelaide Perth Darwin Alice Springs
Cairns 0 1717 2546 3054 3143 5954 2727 2324
Brisbane 1717 0 996 1674 2063 4348 3415 3012
Sydney 2546 996 0 868 1420 4144 4000 2644
Melbourne 3054 1674 868 0 728 3452 3781 2270
Adelaide 3143 2063 1420 728 0 2724 3053 1542
Perth 5954 4348 4144 3452 2724 0 4045 3630
Darwin 2727 3415 4000 3781 3053 4045 0 1511
Alice Springs 2324 3012 2644 2270 1542 3630 1511 0
25

Australia

26

MDS Solution

27

Rotate

28

Back with Map

29

Rotating

  • Once a solution is available, we rotate the points within 2 dimensions.
30

Rotating

  • Once a solution is available, we rotate the points within 2 dimensions.
    • The 2D rotation does not change any of the distances.
30

Rotating

  • Once a solution is available, we rotate the points within 2 dimensions.
    • The 2D rotation does not change any of the distances.
    • It can help us to interpret the axes.
30

Rotating

  • Once a solution is available, we rotate the points within 2 dimensions.
    • The 2D rotation does not change any of the distances.
    • It can help us to interpret the axes.
  • In the previous example the x-axis represents East-West direction and the y-axis represents North-South.
30

Evaluating MDS

31

How good is this representation?

  • In theory, as long as the original distances are Euclidean, strain is minimised.
32

How good is this representation?

  • In theory, as long as the original distances are Euclidean, strain is minimised.
  • What if the optimal solution is still bad?
32

How good is this representation?

  • In theory, as long as the original distances are Euclidean, strain is minimised.
  • What if the optimal solution is still bad?
  • Use two goodness of fit measures.
32

How good is this representation?

  • In theory, as long as the original distances are Euclidean, strain is minimised.
  • What if the optimal solution is still bad?
  • Use two goodness of fit measures.
  • Think of these in a similar fashion to R square in regression modelling.
32

Goodness of Fit Measures?

  • These values depend on the eigenvalues GF1=i=12|λi|i=1n|λi|GF2=i=12max(0,λi)i=1nmax(0,λi)
  • For Euclidean distances δij eigenvalues are always positive and GF1=GF2.
33

Beer Example

  • In R obtain GoF using the option eig=TRUE in the cmdscale function
34

Beer Example

  • In R obtain GoF using the option eig=TRUE in the cmdscale function
  • For the Beer data.
mdsout<-cmdscale(delta,eig=TRUE)
str(mdsout$GOF)
## num [1:2] 0.854 0.854
34

GoF Measure

  • You may notice that the GoF measures are the same.
35

GoF Measure

  • You may notice that the GoF measures are the same.
  • This is always the case when Euclidean distance is used.
35

GoF Measure

  • You may notice that the GoF measures are the same.
  • This is always the case when Euclidean distance is used.
  • This arises since all eigenvalues are positive when the distance matrix is based on Euclidean distance.
35

Non-Euclidean distances

  • In theory non-Euclidean distances can lead to negative eigenvalues. In this case:
36

Non-Euclidean distances

  • In theory non-Euclidean distances can lead to negative eigenvalues. In this case:
    • Classical MDS may not minimise Strain.
36

Non-Euclidean distances

  • In theory non-Euclidean distances can lead to negative eigenvalues. In this case:
    • Classical MDS may not minimise Strain.
    • It minimises a slightly different function of the distances.
36

Non-Euclidean distances

  • In theory non-Euclidean distances can lead to negative eigenvalues. In this case:
    • Classical MDS may not minimise Strain.
    • It minimises a slightly different function of the distances.
    • Two fit measures will differ.
36

Non-Euclidean distances

  • In theory non-Euclidean distances can lead to negative eigenvalues. In this case:
    • Classical MDS may not minimise Strain.
    • It minimises a slightly different function of the distances.
    • Two fit measures will differ.
  • Overall, we can use classical MDS for non-Euclidean distance but must be more careful.
36

Australia data

cmdscale(doz,eig=TRUE)->dozout
str(dozout$eig)
## num [1:8] 1.97e+07 1.25e+07 2.62e+06 5.96e+04 -3.26e-09 ...
str(dozout$eig[6:8])
## num [1:3] -311786 -1083294 -2179888
str(dozout$GOF)
## num [1:2] 0.837 0.923
37

Evaluating the Result

  • There are negative eigenvalues.
38

Evaluating the Result

  • There are negative eigenvalues.
    • This occurs since road distances are not Euclidean
38

Evaluating the Result

  • There are negative eigenvalues.
    • This occurs since road distances are not Euclidean
    • This also implies that classical MDS does not minimise strain.
38

Evaluating the Result

  • There are negative eigenvalues.
    • This occurs since road distances are not Euclidean
    • This also implies that classical MDS does not minimise strain.
  • Both goodness of fit measures are quite high.
38

Evaluating the Result

  • There are negative eigenvalues.
    • This occurs since road distances are not Euclidean
    • This also implies that classical MDS does not minimise strain.
  • Both goodness of fit measures are quite high.
    • The solution is an accurate representation.
38

Another example: Cheese

The following example comes from ‘Multidimensional Scaling of Sorting Data Applied to Cheese Perception’, Food Quality and Preference,6, pp.91-98. The purpose of this study was to visualise the difference between types of cheese.

39

Another example: Cheese

  • The motivation is to investigate the similarities and differences between types of cheese.
40

Another example: Cheese

  • The motivation is to investigate the similarities and differences between types of cheese.
  • In principle one could measure attributes of the cheese.
40

Another example: Cheese

  • The motivation is to investigate the similarities and differences between types of cheese.
  • In principle one could measure attributes of the cheese.
  • However the purpose of this study was to ask customers about their perceptions.
40

Another example: Cheese

  • The motivation is to investigate the similarities and differences between types of cheese.
  • In principle one could measure attributes of the cheese.
  • However the purpose of this study was to ask customers about their perceptions.
  • How do we ask customers about distances?
40

Another example: Cheese

  • The motivation is to investigate the similarities and differences between types of cheese.
  • In principle one could measure attributes of the cheese.
  • However the purpose of this study was to ask customers about their perceptions.
  • How do we ask customers about distances?
  • Could you walk out on to the street and ask someone about the Euclidean distance between Brie and Camembert?
40

Constructing the Survey

  • Customers can be asked:
41

Constructing the Survey

  • Customers can be asked:
  • On a scale of 1 to 10 with 1 being the most similar and 10 being the most different, how similar are the following cheeses
41

Constructing the Survey

  • Customers can be asked:
  • On a scale of 1 to 10 with 1 being the most similar and 10 being the most different, how similar are the following cheeses
    • Brie and Camembert
41

Constructing the Survey

  • Customers can be asked:
  • On a scale of 1 to 10 with 1 being the most similar and 10 being the most different, how similar are the following cheeses
    • Brie and Camembert
    • Brie and Roquefort
41

Constructing the Survey

  • Customers can be asked:
  • On a scale of 1 to 10 with 1 being the most similar and 10 being the most different, how similar are the following cheeses
    • Brie and Camembert
    • Brie and Roquefort
    • Camembert and Roquefort
41

Constructing the Survey

  • Customers can be asked:
  • On a scale of 1 to 10 with 1 being the most similar and 10 being the most different, how similar are the following cheeses
    • Brie and Camembert
    • Brie and Roquefort
    • Camembert and Roquefort
  • The dissimilarity scores can be averaged over all customers and used in an MDS
41

Constructing the Survey

  • Customers can be asked:
  • On a scale of 1 to 10 with 1 being the most similar and 10 being the most different, how similar are the following cheeses
    • Brie and Camembert
    • Brie and Roquefort
    • Camembert and Roquefort
  • The dissimilarity scores can be averaged over all customers and used in an MDS
  • This is not a good method when there is a large number of products.
41

A more feasible approach

  • In the study there are 16 cheeses therefore 120 possible pairwise comparisons.
42

A more feasible approach

  • In the study there are 16 cheeses therefore 120 possible pairwise comparisons.
  • It is not practical to ask survey participants to make 120 comparisons!
42

A more feasible approach

  • In the study there are 16 cheeses therefore 120 possible pairwise comparisons.
  • It is not practical to ask survey participants to make 120 comparisons!
  • Instead of being asked to make so many comparisons, customers were asked to put similar cheeses into groups.
42

A more feasible approach

  • In the study there are 16 cheeses therefore 120 possible pairwise comparisons.
  • It is not practical to ask survey participants to make 120 comparisons!
  • Instead of being asked to make so many comparisons, customers were asked to put similar cheeses into groups.
  • Proportion of customers with two cheeses in same group is a similarity score.
42

A more feasible approach

  • In the study there are 16 cheeses therefore 120 possible pairwise comparisons.
  • It is not practical to ask survey participants to make 120 comparisons!
  • Instead of being asked to make so many comparisons, customers were asked to put similar cheeses into groups.
  • Proportion of customers with two cheeses in same group is a similarity score.
  • Proportion of customers with two cheeses in different groups is a dissimilarity score.
42

Consider four customers

  • Suppose there are four customers sorting cheeses
43

Consider four customers

  • Suppose there are four customers sorting cheeses
    • Customer A: Brie and Camembert together, Roquefort and Blue Vein together
43

Consider four customers

  • Suppose there are four customers sorting cheeses
    • Customer A: Brie and Camembert together, Roquefort and Blue Vein together
    • Customer B: Roquefort and Blue Vein together, all others separate
43

Consider four customers

  • Suppose there are four customers sorting cheeses
    • Customer A: Brie and Camembert together, Roquefort and Blue Vein together
    • Customer B: Roquefort and Blue Vein together, all others separate
    • Customer C: All cheeses in their own category
43

Consider four customers

  • Suppose there are four customers sorting cheeses
    • Customer A: Brie and Camembert together, Roquefort and Blue Vein together
    • Customer B: Roquefort and Blue Vein together, all others separate
    • Customer C: All cheeses in their own category
    • Customer D: All cheeses in one category
43

Comparisons

  • Customer A and D have Brie and Camembert in the same group, customers B and C have them in different groups.
44

Comparisons

  • Customer A and D have Brie and Camembert in the same group, customers B and C have them in different groups.
    • The distance between Brie and Camembert is 0.5.
44

Comparisons

  • Customer A and D have Brie and Camembert in the same group, customers B and C have them in different groups.
    • The distance between Brie and Camembert is 0.5.
  • Customer A, B and D have Roquefort and Blue Vein in the same group, customer C has them in different groups.
44

Comparisons

  • Customer A and D have Brie and Camembert in the same group, customers B and C have them in different groups.
    • The distance between Brie and Camembert is 0.5.
  • Customer A, B and D have Roquefort and Blue Vein in the same group, customer C has them in different groups.
    • The distance between Roquefort and Blue Vein is 0.25.
44

MDS

  • The study on cheese did not use classical MDS but something called Kruskals algorithm.
  • There are many alternatives to classical MDS.
  • We now briefly cover some of the ideas behind them.
45

Beyond Classical MDS

46

Beyond Classical MDS

  • Classical MDS is designed to minimise Strain.
47

Beyond Classical MDS

  • Classical MDS is designed to minimise Strain.
  • An alternative objective function called Stress can be minimised instead
47

Beyond Classical MDS

  • Classical MDS is designed to minimise Strain.
  • An alternative objective function called Stress can be minimised instead

Stress=i=1n1j>i(δijdij)2δij

47

Beyond Classical MDS

  • Classical MDS is designed to minimise Strain.
  • An alternative objective function called Stress can be minimised instead

Stress=i=1n1j>i(δijdij)2δij

  • The difference between δij and dij acts like an error.
47

Beyond Classical MDS

  • Classical MDS is designed to minimise Strain.
  • An alternative objective function called Stress can be minimised instead

Stress=i=1n1j>i(δijdij)2δij

  • The difference between δij and dij acts like an error.
  • The δij on the denominator acts as a weight
47

Weighting

  • For large δ observations are far away in the original space.
48

Weighting

  • For large δ observations are far away in the original space.
    • For these pairs errors are more easily tolerated.
48

Weighting

  • For large δ observations are far away in the original space.
    • For these pairs errors are more easily tolerated.
  • For small δ observations are close in the original space.
48

Weighting

  • For large δ observations are far away in the original space.
    • For these pairs errors are more easily tolerated.
  • For small δ observations are close in the original space.
    • For these pairs errors are not tolerated.
48

Weighting

  • For large δ observations are far away in the original space.
    • For these pairs errors are more easily tolerated.
  • For small δ observations are close in the original space.
    • For these pairs errors are not tolerated.
  • The most accuracy is achieved for nearby points
48

Weighting

  • For large δ observations are far away in the original space.
    • For these pairs errors are more easily tolerated.
  • For small δ observations are close in the original space.
    • For these pairs errors are not tolerated.
  • The most accuracy is achieved for nearby points
  • The local structure is preserved.
48

Sammon mapping

  • The Sammon mapping is solved by numerical optimisation.
49

Sammon mapping

  • The Sammon mapping is solved by numerical optimisation.
  • It is different from the classical solution
49

Sammon mapping

  • The Sammon mapping is solved by numerical optimisation.
  • It is different from the classical solution
    • It is not based on an eigenvalue decomposition
49

Sammon mapping

  • The Sammon mapping is solved by numerical optimisation.
  • It is different from the classical solution
    • It is not based on an eigenvalue decomposition
    • It is not based on rotation
49

Sammon mapping

  • The Sammon mapping is solved by numerical optimisation.
  • It is different from the classical solution
    • It is not based on an eigenvalue decomposition
    • It is not based on rotation
    • It is a non-linear mapping.
49

Example

  • Consider the case where points are in 2D space and the aim is to summarise them in 1D space (across a line).
  • The specific problem of doing multidimensional scaling where the lower dimension is 1 is called seriation.
  • It provides a ranking of the observations.
50

Example

  • Consider the case where points are in 2D space and the aim is to summarise them in 1D space (across a line).
  • The specific problem of doing multidimensional scaling where the lower dimension is 1 is called seriation.
  • It provides a ranking of the observations.
  • In marketing it can be used to elicit preferences.
50

Original Data

51

Original Data

52

Rotate (Classical Solution)

53

Keep 1 Dimension

54

Rug plot (classical solution)

55

Sammon Mapping

56

Discussion

  • Classical MDS cannot account for non-linearity.
  • The dark blue and yellow points are represented as close to one another.
  • Sammon does account for non-linearity.
  • The blue and yellow points are represented as far apart.
  • Although they are not so far apart in the original space, these observations are downweighted relative to the local structure.
57

Kruskal algorithm

  • Kruskal's algorithm minimises a slightly different criterion.
58

Kruskal algorithm

  • Kruskal's algorithm minimises a slightly different criterion.
  • This is still often called stress, which is admittedly confusing.
58

Kruskal algorithm

  • Kruskal's algorithm minimises a slightly different criterion.
  • This is still often called stress, which is admittedly confusing.
  • Kruskal's algorithm is implemented in R using the isoMDS function from the MASS package.
58

Monotone tranformations

  • Kruskal's algorithm is invariant to monotone transformations of the distances.
59

Monotone tranformations

  • Kruskal's algorithm is invariant to monotone transformations of the distances.
  • By monotone transformation we mean any function of the distance that is either constantly increasing or decreasing.
59

Monotone tranformations

  • Kruskal's algorithm is invariant to monotone transformations of the distances.
  • By monotone transformation we mean any function of the distance that is either constantly increasing or decreasing.
    • Exponential function is monotone
59

Monotone tranformations

  • Kruskal's algorithm is invariant to monotone transformations of the distances.
  • By monotone transformation we mean any function of the distance that is either constantly increasing or decreasing.
    • Exponential function is monotone
    • Sine function is not monotone
59

Monotone tranformations

  • Kruskal's algorithm is invariant to monotone transformations of the distances.
  • By monotone transformation we mean any function of the distance that is either constantly increasing or decreasing.
    • Exponential function is monotone
    • Sine function is not monotone
  • By invariant we mean that the solution provided by Kruskal's does not change if we transform the input distances.
59

Example

library(MASS)
isoMDS(d)->kBeer
## initial value 9.127089
## iter 5 value 5.688460
## final value 5.611143
## converged
60

Make plot

kBeer$points%>%
as_tibble()%>%
ggplot(aes(x=V1,y=V2))+
geom_point(size=10)
61

Make plot

62

Squared distances

isoMDS(d^2)->kBeer2
## initial value 11.274285
## iter 5 value 6.447929
## iter 10 value 5.697285
## final value 5.603035
## converged
63

Solution

64

Comparison

  • Squaring the distances provides the same solution with two caveats:
65

Comparison

  • Squaring the distances provides the same solution with two caveats:
    • The stress is slightly different. Numerical optimisation can vary a little depending on starting values.
65

Comparison

  • Squaring the distances provides the same solution with two caveats:
    • The stress is slightly different. Numerical optimisation can vary a little depending on starting values.
    • The points in one plot are slightly rotated compared to the other.
65

Comparison

  • Squaring the distances provides the same solution with two caveats:
    • The stress is slightly different. Numerical optimisation can vary a little depending on starting values.
    • The points in one plot are slightly rotated compared to the other.
  • Why is the invariance to monotone tranformations important?
65

Non metric MDS

  • In some cases, the distance themselves are not metric but ordinal.
66

Non metric MDS

  • In some cases, the distance themselves are not metric but ordinal.
  • Suppose we only know
66

Non metric MDS

  • In some cases, the distance themselves are not metric but ordinal.
  • Suppose we only know δBri.,Cam.<δRoq.,Cam.<δRoq.,Bri.
66

Non metric MDS

  • In some cases, the distance themselves are not metric but ordinal.
  • Suppose we only know δBri.,Cam.<δRoq.,Cam.<δRoq.,Bri.
  • Brie and Roquefort are more different compared to Brie and Camembert.
66

Non metric MDS

  • In some cases, the distance themselves are not metric but ordinal.
  • Suppose we only know δBri.,Cam.<δRoq.,Cam.<δRoq.,Bri.
  • Brie and Roquefort are more different compared to Brie and Camembert.
  • We do not know how big the distance between Brie and Roquefort is compared to the distance between Brie and Camembert.
66

Non-metric MDS

  • In this case we minimise Stress subject to constraints, e.g.
67

Non-metric MDS

  • In this case we minimise Stress subject to constraints, e.g. δ^Bri.,Cam.<δ^Roq.,Cam.<δ^Roq.,Bri.
67

Non-metric MDS

  • Taking the ranks is an example of a monotone transformation.
68

Non-metric MDS

  • Taking the ranks is an example of a monotone transformation.
  • Therefore the solution of isoMDS only requires the ranks of the distances and not the distances themselves.
68

Non-metric MDS

  • Taking the ranks is an example of a monotone transformation.
  • Therefore the solution of isoMDS only requires the ranks of the distances and not the distances themselves.
  • This is a very useful algorithm for marketing since survey participants cannot easily and reliable assign numbers to the difference between products.
68

Modern MDS

  • Methods for finding a low dimensional representation of high-dimensional data continue to be used today
69

Modern MDS

  • Methods for finding a low dimensional representation of high-dimensional data continue to be used today
  • These mostly go by the name of manifold learning methods
69

Modern MDS

  • Methods for finding a low dimensional representation of high-dimensional data continue to be used today
  • These mostly go by the name of manifold learning methods
  • They are not only used for visualisation
69

Modern MDS

  • Methods for finding a low dimensional representation of high-dimensional data continue to be used today
  • These mostly go by the name of manifold learning methods
  • They are not only used for visualisation
  • The low-dimensional co-ordinates can also be used as features in classification and regression.
69

Examples

  • Local Linear Embedding (LLE)
  • IsoMap
  • Laplacian Eigenmap
  • t SNE
  • Kohonen Map
  • ...
70

Examples

  • Local Linear Embedding (LLE)
  • IsoMap
  • Laplacian Eigenmap
  • t SNE
  • Kohonen Map
  • ...
  • ... and others.
70

Properties

  • For most of the modern methods two characteristics are common.
71

Properties

  • For most of the modern methods two characteristics are common.
    • The idea that local structure should be preserved. The first step of many algorithms is to find nearest neighbours of each point.
71

Properties

  • For most of the modern methods two characteristics are common.
    • The idea that local structure should be preserved. The first step of many algorithms is to find nearest neighbours of each point.
    • In many algorithms an eigenvalue decomposition forms part of the solution as is the case in classic MDS.
71

Motivation

2
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow