class: center, middle, inverse, title-slide .title[ # Dimension ReductionLaplacian Eigenmaps ] .author[ ### Anastasios Panagiotelis ] .institute[ ### University of Sydney ] --- #Outline - Laplace-Beltrami Operator -- + Heat Equation -- - Graph Approximation -- + Algorithm -- - Applications -- - Reference is Belkin, M., & Niyogi, P. (2001). "Laplacian eigenmaps and spectral techniques for embedding and clustering. ", *NeurIPS*, **14**. --- class: center, middle, inverse # Some Physics --- # Motivation - In manifold learning we find functions `\(f:\calM\rightarrow\bbR^d\)` -- - For simplicity consider `\(d=1\)` (for now). -- - As a useful analogy think of the output dimension as representing "heat". -- - "Hot" points on the manifold should be close to other "hot" points and "cold" points should be close to other "cold" points. --- # A simplification - Suppose the manifold is just `\([0,1]\)`. -- - The physical interpretation is an idealised 1-D rod. -- - The function `\(f\)` indicates which part of the rod is hotter --- # A plot <img src="09LapEig_files/figure-html/unnamed-chunk-1-1.png" height="450" style="display: block; margin: auto;" /> --- # Heat diffusion - For a "good" function, how are the following quantities relevant? -- - The value of the first derivative. - The value of the second derivative. -- - On average, first derivative should be small in magnitude. -- - The value of the second derivative is connected to the idea of 'neighbourhood average'. --- # Extension - Consider `\(m\)`-dimensional Euclidean space as a manifold? -- - In this case an infintesimal change in the function is given by -- `$$df=\frac{\partial f}{\partial x_1}dx_1+\frac{\partial f}{\partial x_2}dx_2+\dots+\frac{\partial f}{\partial x_m}dx_m$$` -- - Vector of first partial derivatives should be "small" i.e. have a small norm on average. --- # Second derivative - Also of interest is the Laplacian -- `$$\frac{\partial^2 f}{\partial x^2_1}+\frac{\partial^2 f}{\partial x^2_2}+\dots+\frac{\partial^2 f}{\partial x^2_m}$$` -- - A local average of the function around any point `\(x\)` can be approximated in a way that depends on the Laplacian. -- - The Laplacian indicates whether points in the neigbourhood of `\(x\)` are "hotter" or "colder" than `\(x\)`. --- # Manifolds - Instead of Euclidean space we have a manifold -- + Imagine heat on a curved piece of metal with hot and cold regions. -- - Must define analogues to -- + The gradient + The Laplacian -- - Need to abstract from a specific coordinate system. --- # Gradient The gradient of `\(f\)`, denoted `\(\nabla f\)` where `\(f:\calM\rightarrow\bbR\)` is a vector field such that -- `$$df=g_\bx(\nabla f,d\bv)$$` where `\(g_\bx(.,.)\)` is the Riemannian (an inner product on the tangent space at `\(x\)`) and `\(d\bv\)` is an infintessimal tangent vector. --- # Laplace Beltrami Operator - The Laplace Beltrami operator is defined as -- `$$\Delta f=\nabla \cdot \nabla f$$` -- - It is a linear operator (maps functions to functions). -- - It is the Laplacian generalised for manifolds. -- - The local average of points around `\(x\)` depends on the value of `\(\Delta f\)` at `\(x\)` --- # Criterion - Why is this relevant for manifold learning? -- - The term `\(\int\limits_\calM ||\nabla f||^2\)` measures the average wiggliness of `\(f\)` on the manifold. -- - If nearby points on the input space are mapped to nearby points nearby on the output space this will be small. -- - This is a good criterion for a manifold learning. --- # Laplace Beltrami Operator - Why are second derivatives needed? -- - By Stokes' theorem `$$-\int\limits_\calM \Delta (f)f=\int\limits_\calM ||\nabla f||^2$$` -- - There is a bit of hand-waving here but the idea is to use a generalisation of integration by parts (and the product rule) on manifolds. --- class: inverse, middle, center #Discretisation --- # Solution - To solve this need to discretise the problem -- + Functions become vectors (function evaluated at each point of data) + Integrals become sums (over data points) + Operators become matrices -- - Laplacian Eigenmaps uses a discrete approximation for the Laplace Beltrami operator --- #Another Eigenvalue problem - Let `\(\bff\)` is a `\(n\)`-vector of the function evaluated at each input point `\(\bff=\left(f(\bx_1),\dots,f(\bx_n)\right)\)` -- - Let `\(\bD^{-1}\bL\)` is a `\(n\times n\)` matrix that approximates the Laplace Beltrami operator -- - Discrete approximation `$$-\int\limits_\calM \Delta (f)f\approx\bff'\bD^{-1}\bL\bff$$` -- - Minimised by solving `\(\bD^{-1}\bL\bff=\lambda\bff\)` --- # Roadmap - How to find `\(\bD\)` and `\(\bL\)`? -- - Exploit connection between Laplace Beltrami operator and heat equation. -- - Do lots of algebra. -- - Do a few approximations along the way. --- # The heat equation - Let `\(u(\bx_i,t)\)` be the heat at a point `\(\bx_i\in\calM\)` at time `\(t\)`. Also let `\(f\)` denote the initial heat distribution `\(f(\bx_i)=u(\bx_i,0)\)`. -- - Heat diffusion is described by the following differential equation -- `$$\Delta u = \frac{\partial}{\partial t}u$$` --- # Fundamental solution - An approximate solution is given by -- `$$u(\bx_i,t)\approx(4\pi t)^{-\frac{d}{2}}\int\limits_{\calM}e^{\frac{||\bx_i-\bz||^2}{4t}}f(\bz)d\bz$$` - Putting back into heat equation at `\(t=0\)` gives -- `$$\Delta f(\bx_i)\approx\frac{\partial}{\partial t}\left[(4\pi t)^{-\frac{d}{2}}\int\limits_{\calM}e^{\frac{||\bx_i-\bz||^2}{4t}}f(\bz)d\bz\right]_{t=0}$$` --- # Continuing - By the definition of a derivative -- `$$\Delta f(\bx_i)\approx\underset{t\rightarrow 0}{\lim}\frac{A(t)-A(0)}{t}$$` -- - Where `$$A(t)=(4\pi t)^{-\frac{d}{2}}\int\limits_{\calM}e^{\frac{||\bx_i-\bz||^2}{4t}}f(\bz)d\bz$$` -- - Note `\(A(0)=f(\bx_i)\)` --- # And more - Rearranging `$$-\Delta f(\bx_i)\approx\underset{t\rightarrow 0}{\lim}\frac{1}{t}\left[f(\bx_i)-A(t)\right]$$` where `$$A(t)=(4\pi t)^{-\frac{d}{2}}\int\limits_{\calM}e^{\frac{||\bx_i-\bz||^2}{4t}}f(\bz)d\bz$$` --- # Integral - Can use approximation `$$A(t)=\alpha\sum\limits_{\bx_j\in N(\bx_i)}w_{ij}f(\bx_j)$$` - where `\(w_{ij}=\begin{cases}e^{\frac{||\bx_i-\bx_j||^2}{4t}}\text{ if }\bx_i \text{ and }\bx_j\text{ neighbours}\\0\text{ otherwise}\end{cases}\)` -- - and `\(\alpha=k^{-1}(4\pi t)^{-\frac{d}{2}}\)` -- - Think of this like a Monte Carlo estimate. --- # Optimisation - Putting together `$$-\Delta f(\bx_i)\overset{\propto}{\sim}f(\bx_i)-\alpha\sum\limits_{\bx_j\in N(\bx_i)}w_{ij}f(\bx_j)$$` -- - Aim is to minimise above expression, so multiplying by a constant can be ignored. -- - How can this be written as matrices and vectors? --- # Matrices - The adjacency matrix `\(\bA\)` has element `\(w_{ij}\)` on row `\(i\)` and column `\(j\)`. -- - The degree matrix `\(\bD\)` is diagonal with `\(\sum_jw_{ij}\)` on row `\(i\)` and column `\(i\)`. -- - The graph Laplacian for a weighted graph is given by `\(\bL=\bD-\bA\)`. -- - The matrix to be considered is `\(\bD^{-1}\bL\)`. -- Why? --- # Row `\(i\)` - Consider row `\(i\)` of `\(\bD^{-1}\bL\bff\)` -- `$$\begin{bmatrix}\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots\\\vdots&w_{ij_1}&\dots&\sum_jw_{ij}&\dots&w_{ij_2}&\vdots\\\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots\\\end{bmatrix}\begin{bmatrix}\vdots\\f(\bx_{j_1})\\\vdots\\f(\bx_i)\\\vdots\\\vdots\\f(\bx_{j_2})\end{bmatrix}$$` --- # In full - Row `\(i\)` evalutes to `$$\left[\sum_k w_{ik}\right]f(\bx_i)-\sum_j w_{ij}f(\bx_j)$$` -- - Premultiplying by `\(\bD^{-1}\)` is the same as dividing through by `\((\sum_k w_{ij})\)` yielding -- `$$f(\bx_i)-\frac{1}{\sum_k w_{ik}}\sum_j w_{ij}f(\bx_j)$$` --- # One last trick - From earlier -- `$$-\Delta f(\bx_i)\overset{\propto}{\sim}f(\bx_i)-\alpha\sum_j w_{ij}f(\bx_j)$$` -- - Need to show `\(\alpha=\frac{1}{\sum_k w_{ik}}\)` -- - Consider `\(f\)` is a constant function `\(f(\bx)=c\)` for all `\(\bx\)` --- # Constant Function - If `\(f\)` is a constant function, Laplace Beltrami operator should be zero. -- Why? -- - Therefore -- `$$-\Delta f(\bx_i)=c-\alpha\sum_j w_{ij}c=0$$` -- - Solving gives `\(\alpha=\frac{1}{\sum_j w_{ij}}\)` --- # The End - Putting everything together `\(\bD^{-1}\bL\bff\approx f(\bx_i)-\alpha\sum_j w_{ij}f(\bx_j)\overset{\propto}{\sim} -\Delta f(\bx_i)\)` - Minimising `\(-\int_\calM\Delta (f) f\)` is approximately the same as minimising `\(\bff'\bD^{-1}\bL\bff\)`. --- # The algorithm - Step 1: Find nearest neighbours graph. -- - Step 2: Find weighted Adjacency, Degree and Graph Laplacian matrices. -- - Step 3: Solve the generalised eigenvalue problem `\(\bL\bff=\lambda\bD\bff\)`. -- - Step 4: Discard eigenvector corresponding to the smallest eigenvalue (this will be a constant function). -- - Step 5: Keep eigenvectors corresponding to next `\(d\)` smallest eigenvectors as output. --- # Simplification - Limiting case as `\(t\rightarrow\infty\)` `$$w_{ij}=\begin{cases}1\text{ if }\bx_i \text{ and }\bx_j\text{ neighbours}\\0\text{ otherwise}\end{cases}$$` - Lose interpretation in terms of Laplace Beltrami operator -- - Still minimise (subject to `\(f(\bx_i)^2\sum_j w_{ij}=1\)`) `$$\sum_i\sum_j (f(\bx_i)-f(\bx_j))^2w_{ij}$$` --- class: middle, inverse, center #Implementation --- # Swiss Roll ```r datsr <- loadDataSet("Swiss Roll") plot(datsr,type='3varsrgl') ```
--- # Laplacian Eigenmaps (k=50) ```r leoutsr <- embed(datsr, "LaplacianEigenmaps") plot(leoutsr, type = "2vars") ``` <img src="09LapEig_files/figure-html/unnamed-chunk-4-1.png" height="450" style="display: block; margin: auto;" /> --- # Laplacian Eigenmaps (k=30) ```r leoutsr2 <- embed(datsr, "LaplacianEigenmaps",knn=30) plot(leoutsr2, type = "2vars") ``` <img src="09LapEig_files/figure-html/unnamed-chunk-5-1.png" height="450" style="display: block; margin: auto;" /> --- # Pros and Cons - Advantages + Fast due to exploitation of sparse eigenvalue algorithms + Elegant theoretical foundation -- - Disadvantages + Only captures local geometry + There are up to two tuning parameters to select.