class: center, middle, inverse, title-slide .title[ # Dimension ReductionIsomap ] .author[ ### Anastasios Panagiotelis ] .institute[ ### University of Sydney ] --- #Outline - Geodesics -- + Approximation using NN graph -- - Isomap algorithm -- - Applications -- - Initially proposed in J. B. Tenenbaum, V. de Silva, J. C. Langford, (2000), "A Global Geometric Framework for Nonlinear Dimensionality Reduction", *Science* **290**, pp. 2319–2323. --- class: center, middle, inverse # Isomap --- # Idea - Recall MDS takes distances as inputs. -- - These are typically Euclidean -- - If non-Euclidean distances are used as inputs, then the output still represents these distances quite accurately. -- - ISOMAP uses geodesic distances along the manifold as inputs. --- # Toy example Consider the following data <img src="07Isomap_files/figure-html/unnamed-chunk-1-1.png" height="450" style="display: block; margin: auto;" /> --- # Manifold Lying on a manifold. <img src="07Isomap_files/figure-html/unnamed-chunk-2-1.png" height="450" style="display: block; margin: auto;" /> --- # Euclidean distance Blue points close in ambient space. <img src="07Isomap_files/figure-html/unnamed-chunk-3-1.png" height="450" style="display: block; margin: auto;" /> --- # Geodesic Distance But not on manifold. <img src="07Isomap_files/figure-html/unnamed-chunk-4-1.png" height="450" style="display: block; margin: auto;" /> --- # Input distances - Classical MDS would use the distance in blue as an input -- - The idea behind Isomap is to use the distance in green as an input. -- - However to compute the geodesic (green) distance we need to know the manifold. -- - Instead we approximate the geodesic distance --- # Geodesic Distance Geodesic can be approximated. <img src="07Isomap_files/figure-html/unnamed-chunk-5-1.png" height="450" style="display: block; margin: auto;" /> --- # Geodesic Distance Try zooming.
--- class: center, middle, inverse # Algorithm --- # Step 1: Nearest Neighbours - Find the neighbourhood graph -- + Neighbourhood within a ball `\(\epsilon\)`-Isomap + Neearest neighbours `\(K\)`-Isomap -- - As edge weights use the Euclidean distance between points --- # Example <img src="07Isomap_files/figure-html/unnamed-chunk-7-1.png" height="450" style="display: block; margin: auto;" /> --- # Graph <img src="07Isomap_files/figure-html/unnamed-chunk-8-1.png" height="450" style="display: block; margin: auto;" /> --- # Step 2: Shortest Path - Find shortest path on graph, between every pair of points. -- - Algorithms to do this -- + Dijkstra's method + Floyd- Warshall algorithm + and more... -- - Let's look at a simple example for one pair --- # Path from 8 to 10 <img src="07Isomap_files/figure-html/unnamed-chunk-9-1.png" height="450" style="display: block; margin: auto;" /> --- # Shortest path <img src="07Isomap_files/figure-html/unnamed-chunk-10-1.png" height="450" style="display: block; margin: auto;" /> --- # On original plot <img src="07Isomap_files/figure-html/unnamed-chunk-11-1.png" height="450" style="display: block; margin: auto;" /> --- # On original plot <img src="07Isomap_files/figure-html/unnamed-chunk-12-1.png" height="450" style="display: block; margin: auto;" /> --- # Approximation - The approximate geodesic distance is given by the sum of the edge weights along the shortest path. -- - The quality of this approximation depends on + Size of `\(\epsilon\)` or `\(K\)` + The density of points `\(\alpha\)` + Curvature of manifold + Dimension of ambient space + Connectedness of graph --- # More precisely For arbitrarily small `\(\lambda_1\)`, `\(\lambda_2\)` and `\(\mu\)`, with probability `\(1-\mu\)` `$$\delta_\calM(\bx_i,\bx_j)(1-\lambda_1)\leq\hat{\delta}_\calM(\bx_i,\bx_j)\\\leq\delta_\calM(\bx_i,\bx_j)(1+\lambda_2)$$` if some conditions hold for `\(\epsilon\)` (or `\(K\)`) and the density `\(\alpha\)` of points on the manifold. --- # Step 3: MDS - Carry out classical MDS however... -- - Use the approximate geodesic distances as inputs -- - Notes on MDS for more details. --- class: center, middle, inverse # Application --- # S Curve ```r dat <- loadDataSet("3D S Curve") plot(dat,type='3varsrgl') ```
--- # MDS ```r mdsout <- embed(dat, "MDS") plot(mdsout, type = "2vars") ``` <img src="07Isomap_files/figure-html/unnamed-chunk-15-1.png" height="450" style="display: block; margin: auto;" /> --- # Isomap (K=20) ```r isoout <- embed(dat, "Isomap", knn = 20) plot(isoout, type = "2vars") ``` <img src="07Isomap_files/figure-html/unnamed-chunk-16-1.png" height="450" style="display: block; margin: auto;" /> --- # Isomap (K=4) ```r isoout <- embed(dat, "Isomap", knn = 4) plot(isoout, type = "2vars") ``` <img src="07Isomap_files/figure-html/unnamed-chunk-17-1.png" height="450" style="display: block; margin: auto;" /> --- # Isomap (K=500) ```r isoout <- embed(dat, "Isomap", knn = 500) plot(isoout, type = "2vars") ``` <img src="07Isomap_files/figure-html/unnamed-chunk-18-1.png" height="450" style="display: block; margin: auto;" /> --- # Pros and Cons - Advantages + Exploits geometry + Global in the sense that all pairwise geodesic distances are used. -- - Disadvantages + Floyd Warshall algorithm is `\(O(n^3)\)` + Eigendecomposition for MDS is `\(O(n^3)\)` + Not clear how to pick `\(K\)` or `\(\epsilon\)` --- class: center, middle, inverse #Questions?