class: center, middle, inverse, title-slide .title[ # Dimension ReductionDetermining Dimension ] .author[ ### Anastasios Panagiotelis ] .institute[ ### University of Sydney ] --- #Outline - Spheres -- - Applications -- - Reference is Facco, E., d’Errico, M., Rodriguez, A., & Laio, A. (2017). Estimating the intrinsic dimension of datasets by a minimal neighborhood information. *Scientific reports*, **7(1)**, 1-8. --- class: center, middle, inverse # What is dimension --- # Lengths and volumes - Consider a 1-D 'ball'. -- - This is just a line. -- - What is the measure of a 1-D ball? -- + The measure is length and is `\(2r\)` where `\(r\)` is the radius. -- - What happens the the measure if we double radius of the 1-D ball? --- # In 2-D - Consider a 2-D 'ball'. -- - What is the measure of a 2-D ball? -- + The measure is area and is `\(\pi r^2\)` where `\(r\)` is the radius. -- - What happens the the measure if we double the radius of the 2-D ball? -- + Measure scaled up by `\(2^2=4\)`. --- # In 3-D - Consider a 3-D 'ball'. -- - What is the measure of a 3-D ball? -- + The measure is volume and is `\(\frac{4}{3}\pi r^3\)` where `\(r\)` is the radius. -- - What happens the the measure if we double the radius of the 3-D ball? -- + Measure scaled up by `\(2^3=8\)`. --- # Why does this matter? - Consider the nearest neighbour and second nearest neighbour of a point. -- - Consider a ball around the second nearest neighbor <img src="11Dimension_files/figure-html/unnamed-chunk-1-1.png" height="450" style="display: block; margin: auto;" /> --- # Why does this matter? - Consider the nearest neighbour and second nearest neighbour of a point. - Consider a ball around the second nearest neighbor <img src="11Dimension_files/figure-html/unnamed-chunk-2-1.png" height="450" style="display: block; margin: auto;" /> --- # Questions - In the previous example, the ratio of the distance to the second nearest neighbour to distance to the nearest neighbour is 2. -- - What is the probability that this ratio is greater than 2? -- - To answer calculate area of inner circle and divide by outer circle. -- - It is 0.25. -- - How would your answer change if it was a 3D ball? --- # Determine dimension - For each point find the ratio of the second nearest neighbour to the nearest neighbour (denoted `\(\mu\)`) - These follow a Pareto(1,d) distribution. The MLE is given by `$$\hat{d}=\frac{n-1}{\sum\limits_{i=1}^{n} log \mu_i}$$` --- # Code ```r data<-loadDataSet("Swiss Roll") dd<-as.matrix(dist(data@data)) dr<-apply(dd,2,sort) mu<-dr[3,]/dr[2,] dhat<-(length(mu)-1)/sum(log(mu)) dhat ``` ``` ## [1] 2.134324 ``` --- #Helix
--- # Helix ```r data<-cbind(x,y,z) dd<-as.matrix(dist(data)) dr<-apply(dd,2,sort) mu<-dr[3,]/dr[2,] dhat<-(length(mu)-1)/sum(log(mu)) dhat ``` ``` ## [1] 1.064575 ``` --- #Noisy Helix
--- # Code ```r dd<-as.matrix(dist(data@data)) dr<-apply(dd,2,sort) mu<-dr[3,]/dr[2,] dhat<-(length(mu)-1)/sum(log(mu)) dhat ``` ``` ## [1] 3.012847 ``` --- # Conclusions - The intrinsic dimension can be estimated using nothing more than nearest neighbour information. -- - Can be generalised to include more than 2 NN -- - Good idea to trim large values of the ratio of NN. -- - Method is not robust if too much noise is added.