Detecting anomalies 
 in smart meter data 
 via manifold learningAnastasios PanagiotelisUniversity of Sydney1

Smart Meter

Smart meters measure electricity usage in a building.
Need to detect anomalous electricity usage.
Scalable to millions of buildings.

Smart Meter data

Three housholds in an Irish dataset

Smart Meter data

Interested in distribution

Visualisation

InterpretationEach dot represents the probability distribution of electricity usage for a single household.
6

InterpretationEach dot represents the probability distribution of electricity usage for a single household.
The colors depend on the value of a kernel density estimate of the points.
6

InterpretationEach dot represents the probability distribution of electricity usage for a single household.
The colors depend on the value of a kernel density estimate of the points.
The 'typical' household comes from the middle of the distribution.
6

InterpretationEach dot represents the probability distribution of electricity usage for a single household.
The colors depend on the value of a kernel density estimate of the points.
The 'typical' household comes from the middle of the distribution.
The 'anomalous' households comes from edges of the distribution.
6

InterpretationEach dot represents the probability distribution of electricity usage for a single household.
The colors depend on the value of a kernel density estimate of the points.
The 'typical' household comes from the middle of the distribution.
The 'anomalous' households comes from edges of the distribution.
But how do we get that plot?
6

OutlineManifold Learning
7

OutlineManifold LearningMultidimensional Scaling

7

OutlineManifold LearningMultidimensional Scaling
Isomap algorithm

7

OutlineManifold LearningMultidimensional Scaling
Isomap algorithm

Statistical manifolds
7

OutlineManifold LearningMultidimensional Scaling
Isomap algorithm

Statistical manifoldsHellinger distance estimator

7

OutlineManifold LearningMultidimensional Scaling
Isomap algorithm

Statistical manifoldsHellinger distance estimator
Why it is fast

7

OutlineManifold LearningMultidimensional Scaling
Isomap algorithm

Statistical manifoldsHellinger distance estimator
Why it is fast

Application
7

OutlineManifold LearningMultidimensional Scaling
Isomap algorithm

Statistical manifoldsHellinger distance estimator
Why it is fast

Application  One household

7

OutlineManifold LearningMultidimensional Scaling
Isomap algorithm

Statistical manifoldsHellinger distance estimator
Why it is fast

Application  One household
All households

7

Dimension reduction8

IdeaData are multivariate, i.e. xi∈Rpxi∈Rp where pp is large and i=1,2,…,ni=1,2,…,n.
9

IdeaData are multivariate, i.e. xi∈Rpxi∈Rp where pp is large and i=1,2,…,ni=1,2,…,n.Consider nn firms with pp financial indicators

9

IdeaData are multivariate, i.e. xi∈Rpxi∈Rp where pp is large and i=1,2,…,ni=1,2,…,n.Consider nn firms with pp financial indicators

Construct new data yi∈Rdyi∈Rd where d<<pd<<p.
9

IdeaData are multivariate, i.e. xi∈Rpxi∈Rp where pp is large and i=1,2,…,ni=1,2,…,n.Consider nn firms with pp financial indicators

Construct new data yi∈Rdyi∈Rd where d<<pd<<p.
How might we do this?
9

DistancesClose observations in input space should be close in output space.
10

DistancesClose observations in input space should be close in output space.
Euclidean distance is a measure of similarity.
10

DistancesClose observations in input space should be close in output space.
Euclidean distance is a measure of similarity.
Inputs: δij=√(xi−xj)′(xi−xj)δij=(xi−xj)′(xi−xj)
10

DistancesClose observations in input space should be close in output space.
Euclidean distance is a measure of similarity.
Inputs: δij=√(xi−xj)′(xi−xj)δij=(xi−xj)′(xi−xj)
Outputs: dij=√(yi−yj)′(yi−yj)dij=(yi−yj)′(yi−yj)
10

DistancesClose observations in input space should be close in output space.
Euclidean distance is a measure of similarity.
Inputs: δij=√(xi−xj)′(xi−xj)δij=(xi−xj)′(xi−xj)
Outputs: dij=√(yi−yj)′(yi−yj)dij=(yi−yj)′(yi−yj)
Construct yy so that dijdij approximate δijδij
10

Criterion

Minimise:

$\sum_{i, j} (δ_{i j}^{2} - d_{i j}^{2})$

Solved by

Criterion

Minimise:

$\sum_{i, j} (δ_{i j}^{2} - d_{i j}^{2})$

Solved by

Putting $δ_{i j}$ in a matrix $Δ$
Double centering $Δ$
Finding eigenvectors.

Non-EuclideanWhat if input distances δijδij are not Euclidean?
12

Non-EuclideanWhat if input distances δijδij are not Euclidean?
Can put them in ΔΔ anyway
12

Non-EuclideanWhat if input distances δijδij are not Euclidean?
Can put them in ΔΔ anyway
Optimise a slightly different objective, but still get a good representation.
12

Non-EuclideanWhat if input distances δijδij are not Euclidean?
Can put them in ΔΔ anyway
Optimise a slightly different objective, but still get a good representation.
Output distances dijdij still approximate input distances δijδij
12

Manifold learningWhat if data lie on some non-linear dd-dimensional surface in pp-dimensional space.
13

Manifold learningWhat if data lie on some non-linear dd-dimensional surface in pp-dimensional space.
Think of the surface of a sphere (2-dimensional) in space (3-dimensional).
13

Manifold learningWhat if data lie on some non-linear dd-dimensional surface in pp-dimensional space.
Think of the surface of a sphere (2-dimensional) in space (3-dimensional).
Euclidean distance is no longer appropriate
13

Manifold learningWhat if data lie on some non-linear dd-dimensional surface in pp-dimensional space.
Think of the surface of a sphere (2-dimensional) in space (3-dimensional).
Euclidean distance is no longer appropriate
Should instead use the geodesic distance.
13

Manifold learningWhat if data lie on some non-linear dd-dimensional surface in pp-dimensional space.
Think of the surface of a sphere (2-dimensional) in space (3-dimensional).
Euclidean distance is no longer appropriate
Should instead use the geodesic distance.
This is the aim of ISOMAP.
13

Toy example

Consider the following data

Manifold

Lying on a manifold.

Euclidean distance

Blue points close in ambient space.

Geodesic Distance

But not on manifold.

Input distancesClassical MDS would use the distance in blue as an input
18

Input distancesClassical MDS would use the distance in blue as an input
The idea behind Isomap is to use the distance in green as an input.
18

Input distancesClassical MDS would use the distance in blue as an input
The idea behind Isomap is to use the distance in green as an input.
However to compute the geodesic (green) distance we need to know the manifold.
18

Input distancesClassical MDS would use the distance in blue as an input
The idea behind Isomap is to use the distance in green as an input.
However to compute the geodesic (green) distance we need to know the manifold.
Instead we approximate the geodesic distance
18

Geodesic Distance

Geodesic can be approximated.

Geodesic Distance

Try zooming.

Approximate GeodesicFind the graph of nearest neighbors
21

Approximate GeodesicFind the graph of nearest neighborsNeighbourhood within a ϵϵ-ball
KK-Nearest neighbours

21

Approximate GeodesicFind the graph of nearest neighborsNeighbourhood within a ϵϵ-ball
KK-Nearest neighbours

As edge weights use the distance between neighbors
21

Approximate GeodesicFind the graph of nearest neighborsNeighbourhood within a ϵϵ-ball
KK-Nearest neighbours

As edge weights use the distance between neighbors
Approximate geodesic distance by finding shortest path
21

ExampleISOMAP is just one manifold learning algorithm.
22

ExampleISOMAP is just one manifold learning algorithm.
Others include:LLE
Laplacian Eigenmaps
t-SNE
...

22

ExampleISOMAP is just one manifold learning algorithm.
Others include:LLE
Laplacian Eigenmaps
t-SNE
...

These algorithms (and many others) use the nearest neighbor graph.
22

Statistical Manifolds23

Statistical manifoldsA statistical manifold is a manifold with elements that are probability distributions
24

Statistical manifoldsA statistical manifold is a manifold with elements that are probability distributions
Can no longer think of the inputs as being pp-dimensional vectors
24

Statistical manifoldsA statistical manifold is a manifold with elements that are probability distributions
Can no longer think of the inputs as being pp-dimensional vectors
Can no longer think of Euclidean distance as being a distance between input points.
24

Statistical manifoldsA statistical manifold is a manifold with elements that are probability distributions
Can no longer think of the inputs as being pp-dimensional vectors
Can no longer think of Euclidean distance as being a distance between input points.
We can use other measures of distance.
24

Hellinger Distance

Distance between two distributions given by

$H_{i, j}^{2} = \frac{1}{2} \int {(\sqrt{p_{i} (z)} - \sqrt{p_{j} (z)})}^{2} d z$

Hellinger Distance

Distance between two distributions given by

$H_{i, j}^{2} = \frac{1}{2} \int {(\sqrt{p_{i} (z)} - \sqrt{p_{j} (z)})}^{2} d z$

Where $p_{i} (x)$ and $p_{j} (x)$ are densities corresponding to observation $i$ and $j$ respectively.

Hellinger Distance

Distance between two distributions given by

$H_{i, j}^{2} = \frac{1}{2} \int {(\sqrt{p_{i} (z)} - \sqrt{p_{j} (z)})}^{2} d z$

Where $p_{i} (x)$ and $p_{j} (x)$ are densities corresponding to observation $i$ and $j$ respectively.
Need to estimate using data $z_{i, 1}, z_{i, 2}, \dots, z_{i, T_{i}} \sim p_{i}$ and $z_{j, 1}, z_{j, 2}, \dots, z_{j, T_{j}} \sim p_{j}$ .

Our estimator

Pool all the data $z_{i, 1}, z_{i, 2}, \dots z_{i, T_{i}} \forall i = 1, 2, \dots, n$ together.
Find $L$ equal probability partitions of the pooled data $I_{l} = (γ_{l - 1}, γ_{l - 1}]$ and let

$π_{i, l} = \sqrt{\frac{1}{2 T_{i}} \sum_{t} I {z_{i, t} \in I_{l}}}$

Our estimator

Pool all the data $z_{i, 1}, z_{i, 2}, \dots z_{i, T_{i}} \forall i = 1, 2, \dots, n$ together.
Find $L$ equal probability partitions of the pooled data $I_{l} = (γ_{l - 1}, γ_{l - 1}]$ and let

$π_{i, l} = \sqrt{\frac{1}{2 T_{i}} \sum_{t} I {z_{i, t} \in I_{l}}}$

Estimator is

${\hat{H}}_{i, j} = \sqrt{\sum_{l} {(π_{i, l} - π_{j, l})}^{2}}$

EstimatorWe prove this estimator is consistent as the number of observations TiTi and number of partitions LL goes to infinity.
27

EstimatorWe prove this estimator is consistent as the number of observations TiTi and number of partitions LL goes to infinity.
This is not the only estimator of Hellinger distance.
27

EstimatorWe prove this estimator is consistent as the number of observations TiTi and number of partitions LL goes to infinity.
This is not the only estimator of Hellinger distance.
What makes it special is that it 'looks like' a Euclidean distance.
27

EstimatorWe prove this estimator is consistent as the number of observations TiTi and number of partitions LL goes to infinity.
This is not the only estimator of Hellinger distance.
What makes it special is that it 'looks like' a Euclidean distance.
This is important for computational reasons.
27

Nearest neighbour graphRecall that manifold learning algorithms require finding a nearest neighbor graph.
28

Nearest neighbour graphRecall that manifold learning algorithms require finding a nearest neighbor graph.
A naive way to do this would be to compute all N2N2 pairwise distances.
28

Nearest neighbour graphRecall that manifold learning algorithms require finding a nearest neighbor graph.
A naive way to do this would be to compute all N2N2 pairwise distances.
A smarter way is to use an algorithm such as k-d trees.
28

Nearest neighbour graphRecall that manifold learning algorithms require finding a nearest neighbor graph.
A naive way to do this would be to compute all N2N2 pairwise distances.
A smarter way is to use an algorithm such as k-d trees.
This only requires computing O(Nlog(N))O(Nlog(N)) pairwise distances.
28

Nearest neighbour graphRecall that manifold learning algorithms require finding a nearest neighbor graph.
A naive way to do this would be to compute all N2N2 pairwise distances.
A smarter way is to use an algorithm such as k-d trees.
This only requires computing O(Nlog(N))O(Nlog(N)) pairwise distances.
But it only works for a few distance metrics (one of which is Euclidean).
28

Approximate nearest neighboursFurther speed up is possible via approximate nearest neighbors.
29

Approximate nearest neighboursFurther speed up is possible via approximate nearest neighbors.
May find an 'incorrect' nearest neighbor, but this is still guaranteed to be within 1+ϵ1+ϵ of the true nearest neighbor.
29

Approximate nearest neighboursFurther speed up is possible via approximate nearest neighbors.
May find an 'incorrect' nearest neighbor, but this is still guaranteed to be within 1+ϵ1+ϵ of the true nearest neighbor.
We test for robustness of results to using ANN.
29

Approximate nearest neighboursFurther speed up is possible via approximate nearest neighbors.
May find an 'incorrect' nearest neighbor, but this is still guaranteed to be within 1+ϵ1+ϵ of the true nearest neighbor.
We test for robustness of results to using ANN.
We also use a more recent algorithm known as ANNOY
29

Application30

Single householdConsider a single household.
Each 'observation' is the distribution corresponding to an hour of week, over many weeks.
Here N=336N=336 and T≈535T≈535
Use ISOMAP with exact nearest neighbors, ANN and ANNOY
31

Results

Typical and anomalous hours

Speed
 
      
    Isomap 
    LLE 
    Laplacian Eigenmaps 
    Hessian LLE 
    t-SNE 
    UMAP 
  


    Exact NN with brute-force 
    2.760 
    5.134 
    2.944 
    2.087 
    1.918 
    2.367 
  

    Exact NN with k-d trees 
    2.360 
    2.609 
    2.295 
    1.669 
    1.596 
    2.089 
  

    ANN k-d trees 
    2.351 
    2.986 
    2.302 
    1.663 
    1.601 
    2.048 
  

    ANN Annoy 
    1.942 
    3.093 
    2.378 
    1.695 
    1.647 
    2.125 
  

35

	Isomap	LLE	Laplacian Eigenmaps	Hessian LLE	t-SNE	UMAP
Exact NN with brute-force	2.760	5.134	2.944	2.087	1.918	2.367
Exact NN with k-d trees	2.360	2.609	2.295	1.669	1.596	2.089
ANN k-d trees	2.351	2.986	2.302	1.663	1.601	2.048
ANN Annoy	1.942	3.093	2.378	1.695	1.647	2.125

Accuracy
 
      
    Isomap 
    LLE 
    Laplacian Eigenmaps 
    Hessian LLE 
    t-SNE 
    UMAP 
  


    Exact NN 
    0.943 
    0.709 
    0.885 
    0.882 
    0.935 
    0.924 
  

    ANN k-d trees 
    0.942 
    0.718 
    0.891 
    0.885 
    0.902 
    0.933 
  

    ANN Annoy 
    0.938 
    0.746 
    0.886 
    0.873 
    0.939 
    0.929 
  

36

	Isomap	LLE	Laplacian Eigenmaps	Hessian LLE	t-SNE	UMAP
Exact NN	0.943	0.709	0.885	0.882	0.935	0.924
ANN k-d trees	0.942	0.718	0.891	0.885	0.902	0.933
ANN Annoy	0.938	0.746	0.886	0.873	0.939	0.929

All householdsConsider all household.
Each 'observation' is the distribution corresponding to an household.
Here N=3639N=3639 and T≈179760T≈179760
Use ISOMAP with exact nearest neighbors, ANN and ANNOY
37

Results

Speed
 
      
    Isomap 
    LLE 
    Laplacian Eigenmaps 
    Hessian LLE 
    t-SNE 
    UMAP 
  


    Exact NN with brute-force 
    4773.4 
    4967.4 
    4725.2 
    5930.4 
    4739.6 
    748.6 
  

    Exact NN with k-d trees 
    755.8 
    1038.2 
    749.6 
    1204.1 
    747.7 
    744.3 
  

    ANN k-d trees 
    755.7 
    1041.0 
    751.5 
    1208.7 
    747.8 
    741.7 
  

    ANN Annoy 
    1670.2 
    1947.3 
    1656.1 
    2363.7 
    1557.1 
    1675.3 
  

39

	Isomap	LLE	Laplacian Eigenmaps	Hessian LLE	t-SNE	UMAP
Exact NN with brute-force	4773.4	4967.4	4725.2	5930.4	4739.6	748.6
Exact NN with k-d trees	755.8	1038.2	749.6	1204.1	747.7	744.3
ANN k-d trees	755.7	1041.0	751.5	1208.7	747.8	741.7
ANN Annoy	1670.2	1947.3	1656.1	2363.7	1557.1	1675.3

Accuracy
 
    Method 
    Isomap 
    LLE 
    Laplacian Eigenmaps 
    Hessian LLE 
    t-SNE 
    UMAP 
  


    Exact NN 
    0.926 
    0.911 
    0.880 
    0.579 
    0.914 
    0.943 
  

    ANN k-d trees 
    0.926 
    0.911 
    0.880 
    0.579 
    0.919 
    0.943 
  

    ANN Annoy 
    0.926 
    0.911 
    0.880 
    0.578 
    0.907 
    0.943 
  

40

Method	Isomap	LLE	Laplacian Eigenmaps	Hessian LLE	t-SNE	UMAP
Exact NN	0.926	0.911	0.880	0.579	0.914	0.943
ANN k-d trees	0.926	0.911	0.880	0.579	0.919	0.943
ANN Annoy	0.926	0.911	0.880	0.578	0.907	0.943

ConclusionsWe have a way of visualising and finding anomalies for statistical manifolds
41

ConclusionsWe have a way of visualising and finding anomalies for statistical manifolds
Relies on a consistent estimator of Hellinger distance
41

ConclusionsWe have a way of visualising and finding anomalies for statistical manifolds
Relies on a consistent estimator of Hellinger distance
Has computational advantages that exploit Nearest Neighbor algorithms
41

ConclusionsWe have a way of visualising and finding anomalies for statistical manifolds
Relies on a consistent estimator of Hellinger distance
Has computational advantages that exploit Nearest Neighbor algorithms
Further computational improvements possible with ANN and there is little tradeoff on accuracy.
41

Questions?42

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help

Detecting anomalies in smart meter data via manifold learning

Anastasios Panagiotelis

University of Sydney

Smart Meter

Smart Meter data

Smart Meter data

Visualisation

Interpretation

Interpretation

Interpretation

Interpretation

Interpretation

Outline

Outline

Outline

Outline

Outline

Outline

Outline

Outline

Outline

Dimension reduction

Idea

Idea

Idea

Idea

Distances

Distances

Distances

Distances

Distances

Criterion

Criterion

Non-Euclidean

Non-Euclidean

Non-Euclidean

Non-Euclidean

Manifold learning

Manifold learning

Manifold learning

Manifold learning

Manifold learning

Toy example

Manifold

Euclidean distance

Geodesic Distance

Input distances

Input distances

Input distances

Input distances

Geodesic Distance

Geodesic Distance

Approximate Geodesic

Approximate Geodesic

Approximate Geodesic

Approximate Geodesic

Example

Example

Example

Statistical Manifolds

Statistical manifolds

Statistical manifolds

Statistical manifolds

Statistical manifolds

Hellinger Distance

Hellinger Distance

Hellinger Distance

Our estimator

Our estimator

Estimator

Estimator

Estimator

Estimator

Nearest neighbour graph

Nearest neighbour graph

Nearest neighbour graph

Nearest neighbour graph

Nearest neighbour graph

Approximate nearest neighbours

Approximate nearest neighbours

Detecting anomalies
in smart meter data
via manifold learning