Probabilistic
 Forecast
 ReconciliationProperties, Evaluation 
 and Score OptimisationAnastasios PanagiotelisSeptember 23, 20201

Joint work with

Puwasala Gamakumara

Joint work with

Puwasala Gamakumara

George Athanasopoulos

Joint work with

Puwasala Gamakumara

George Athanasopoulos

Rob Hyndman

Motivation3

Hierarchical Time SeriesPredictions of multiple variables needed.
4

Hierarchical Time SeriesPredictions of multiple variables needed.
Variables follow linear constraints.
4

Hierarchical Time SeriesPredictions of multiple variables needed.
Variables follow linear constraints.
Forecast store level sales and aggregates.
4

Electricity ExampleTotal daily electricity in Australian NEM
5

Electricity ExampleTotal daily electricity in Australian NEMRenewable
Non-renewable

5

Electricity ExampleTotal daily electricity in Australian NEMRenewable
Non-renewable

Renewable can be broken down 
5

Electricity ExampleTotal daily electricity in Australian NEMRenewable
Non-renewable

Renewable can be broken down Solar
Wind
etc.

5

Electricity ExampleTotal daily electricity in Australian NEMRenewable
Non-renewable

Renewable can be broken down Solar
Wind
etc.

Solar can be broken down into
5

Electricity ExampleTotal daily electricity in Australian NEMRenewable
Non-renewable

Renewable can be broken down Solar
Wind
etc.

Solar can be broken down intoSolar rooftop
Solar utility

5

Electricity Example

Total daily electricity in Australian NEM
- Renewable
- Non-renewable
Renewable can be broken down
- Solar
- Wind
- etc.
Solar can be broken down into
- Solar rooftop
- Solar utility
Data sourced from Open NEM.

Electricity Data (link)

Main takeawaysData have different characteristics regarding
7

Main takeawaysData have different characteristics regarding  Trends
Seasonality
Spikes
Signal to noise ratio

7

Main takeawaysData have different characteristics regarding  Trends
Seasonality
Spikes
Signal to noise ratio

Hard to come up with a single multivariate model. 
7

Main takeawaysData have different characteristics regarding  Trends
Seasonality
Spikes
Signal to noise ratio

Hard to come up with a single multivariate model. 
Even harder to do so while accounting for constraints.
7

Why multivariate?Electricity cannot be economically stored at large scale (yet).
8

Why multivariate?Electricity cannot be economically stored at large scale (yet).
Day ahead load forecasting crucial input to dispatch and operational decisions.
8

Why multivariate?Electricity cannot be economically stored at large scale (yet).
Day ahead load forecasting crucial input to dispatch and operational decisions.
Increased use of renewables result in uncertainty in supply.
8

Why multivariate?Electricity cannot be economically stored at large scale (yet).
Day ahead load forecasting crucial input to dispatch and operational decisions.
Increased use of renewables result in uncertainty in supply.
Decisions of generators, market regulators may depend on total generation but also on generation from individual sources.
8

Why multivariate?Electricity cannot be economically stored at large scale (yet).
Day ahead load forecasting crucial input to dispatch and operational decisions.
Increased use of renewables result in uncertainty in supply.
Decisions of generators, market regulators may depend on total generation but also on generation from individual sources.
Avoid non-aligned decisions.
8

Why probabilistic?Prices are heavily right skewed.
9

Why probabilistic?Prices are heavily right skewed.Prices in NSW reached $14700 on January 4th,2020.
This compares to $30-$40 on a normal day.

9

Why probabilistic?Prices are heavily right skewed.Prices in NSW reached $14700 on January 4th,2020.
This compares to $30-$40 on a normal day.

Price peaks coincide with the use of fuels with higher marginal cost.
9

Why probabilistic?Prices are heavily right skewed.Prices in NSW reached $14700 on January 4th,2020.
This compares to $30-$40 on a normal day.

Price peaks coincide with the use of fuels with higher marginal cost.
Another important consideration is grid reliability.
9

Why probabilistic?Prices are heavily right skewed.Prices in NSW reached $14700 on January 4th,2020.
This compares to $30-$40 on a normal day.

Price peaks coincide with the use of fuels with higher marginal cost.
Another important consideration is grid reliability.
Tails are important!
9

What is reconciliation?10

Traditional approachesSingle level approaches
11

Traditional approachesSingle level approaches  Bottom Up (Schwarzkopf, Tersine, and Morris, 1988).

11

Traditional approachesSingle level approaches  Bottom Up (Schwarzkopf, Tersine, and Morris, 1988).  
Top Down (Gross and Sohl, 1990).

11

Traditional approachesSingle level approaches  Bottom Up (Schwarzkopf, Tersine, and Morris, 1988).  
Top Down (Gross and Sohl, 1990).  
Middle Out

11

Traditional approachesSingle level approaches  Bottom Up (Schwarzkopf, Tersine, and Morris, 1988).  
Top Down (Gross and Sohl, 1990).  
Middle Out

Top down do not exploit information at bottom levels.
11

Traditional approachesSingle level approaches  Bottom Up (Schwarzkopf, Tersine, and Morris, 1988).  
Top Down (Gross and Sohl, 1990).  
Middle Out

Top down do not exploit information at bottom levels.
Bottom up can suffer from the noisiness of bottom level series.
11

Traditional approachesSingle level approaches  Bottom Up (Schwarzkopf, Tersine, and Morris, 1988).  
Top Down (Gross and Sohl, 1990).  
Middle Out

Top down do not exploit information at bottom levels.
Bottom up can suffer from the noisiness of bottom level series.
These approaches do not work for more general constraints.
11

ReconciliationForecast every variable, stack in an nn-vector ^yy^.
Call these base forecasts.
Forecasts not coherent.
Forecasts need to be reconciled via a mapping ~y=ψ(^y)y~=ψ(y^).

12

Reconciliation

Forecast every variable, stack in an $n$ -vector $\hat{y}$ .
Call these base forecasts.
Forecasts not coherent.
Forecasts need to be reconciled via a mapping $\tilde{y} = ψ (\hat{y})$ .

Some math

Linear reconciliation generally takes the form $\tilde{b} = (d + G \hat{y})$ where

Some math

Linear reconciliation generally takes the form $\tilde{b} = (d + G \hat{y})$ where

$G$ is an $m \times n$ matrix of reconciliation weights
$d$ is $m \times 1$ translation vector

Some math

Linear reconciliation generally takes the form $\tilde{b} = (d + G \hat{y})$ where

$G$ is an $m \times n$ matrix of reconciliation weights
$d$ is $m \times 1$ translation vector

The full hierarchy is $\tilde{y} = S \tilde{b}$ where $S$ is an $n \times m$ matrix that encodes constraints.

The summing matrix
14

The summing matrix

$S = (\begin{matrix} 1 & 1 & 1 & 1 \\ 1 & 1 & 0 & 0 \\ 0 & 0 & 1 & 1 \\ 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{matrix})$

Specific reconciliation methodsOLS: ~y=S(S′S)−1S′^yy~=S(S′S)−1S′y^Hyndman, Ahmed, Athanasopoulos, and Shang (2011)

16

Specific reconciliation methodsOLS: ~y=S(S′S)−1S′^yy~=S(S′S)−1S′y^Hyndman, Ahmed, Athanasopoulos, and Shang (2011)

WLS: ~y=S(S′WS)−1S′W^yy~=S(S′WS)−1S′Wy^ Athanasopoulos, Hyndman, Kourentzes, and Petropoulos (2017)

16

Specific reconciliation methodsOLS: ~y=S(S′S)−1S′^yy~=S(S′S)−1S′y^Hyndman, Ahmed, Athanasopoulos, and Shang (2011)

WLS: ~y=S(S′WS)−1S′W^yy~=S(S′WS)−1S′Wy^ Athanasopoulos, Hyndman, Kourentzes, and Petropoulos (2017)

MinT: ~y=S(S′Σ−1S)−1S′Σ−1^yy~=S(S′Σ−1S)−1S′Σ−1y^ Wickramasuriya, Athanasopoulos, and Hyndman (2019)

16

Probabilistic forecasts17

Existing approachesStack quantiles of each series in a vector and reconcileShang and Hyndman (2017) for prediction intervals.
Jeon, Panagiotelis, and Petropoulos (2019) (hereafter JPP) for a full distribution.

18

Existing approachesStack quantiles of each series in a vector and reconcileShang and Hyndman (2017) for prediction intervals.
Jeon, Panagiotelis, and Petropoulos (2019) (hereafter JPP) for a full distribution.

Reconcile mean otherwise bottom upBen Taieb, Taylor, and Hyndman (2020) (hereafter BTTH)

18

Formal Definition: CoherenceLet (Rm,FRm,μ)(Rm,FRm,μ) be a probability triple 
Let s:Rm→ss:Rm→s where s(.)s(.) is premultiplication by SS.  
19

Formal Definition: Coherence

Let $(R^{m}, F_{R^{m}}, μ)$ be a probability triple
Let $s : R^{m} \to s$ where $s (.)$ is premultiplication by $S$ .
A coherent probabilistic forecast can be characterised by a probability triple $(s, F_{s}, ν)$ where

$ν (s (B)) = μ (B) \forall B \in F_{R^{m}}$ and $s (B)$ is the image of $B$ under $s (.)$ .

In a picture

Formal Definition: Reconciliation

Let $(R^{n}, F_{R^{n}}, \hat{ν})$ be a probability triple corresponding to a base forecast.

Formal Definition: Reconciliation

Let $(R^{n}, F_{R^{n}}, \hat{ν})$ be a probability triple corresponding to a base forecast.
The reconcilied forecast is characterised by

$\tilde{ν} (A) = \hat{ν} (ψ^{- 1} (A)) \forall A \in F_{s}$ and $ψ^{- 1} (A)$ is the pre-image of $A$ under $ψ (.)$ .

In a picture

In practice

If ${\hat{y}}^{[1]}, \dots, {\hat{y}}^{[L]}$ is a sample from some base probabilistic forecast, then ${\tilde{y}}^{[1]}, \dots, {\tilde{y}}^{[L]}$ is a sample from the reconciled forecast where

${\tilde{y}}^{[l]} = ψ ({\hat{y}}^{[l]}) \forall l = 1, \dots, L$

In practice

If ${\hat{y}}^{[1]}, \dots, {\hat{y}}^{[L]}$ is a sample from some base probabilistic forecast, then ${\tilde{y}}^{[1]}, \dots, {\tilde{y}}^{[L]}$ is a sample from the reconciled forecast where

${\tilde{y}}^{[l]} = ψ ({\hat{y}}^{[l]}) \forall l = 1, \dots, L$ Reconciling a sample from the base distribution gives a sample from the reconciled distribution.

In practice

If ${\hat{y}}^{[1]}, \dots, {\hat{y}}^{[L]}$ is a sample from some base probabilistic forecast, then ${\tilde{y}}^{[1]}, \dots, {\tilde{y}}^{[L]}$ is a sample from the reconciled forecast where

${\tilde{y}}^{[l]} = ψ ({\hat{y}}^{[l]}) \forall l = 1, \dots, L$ Reconciling a sample from the base distribution gives a sample from the reconciled distribution.

Proof in paper.

In practiceIf the base forecast is elliptical, then the reconcilied forecast will be elliptical.
24

In practiceIf the base forecast is elliptical, then the reconcilied forecast will be elliptical.
If the base forecast is N(^μ,^Σ)N(μ^,Σ^)
24

In practiceIf the base forecast is elliptical, then the reconcilied forecast will be elliptical.
If the base forecast is N(^μ,^Σ)N(μ^,Σ^)
If reconciliation is linear ψ(^y):=S(d+G^y)ψ(y^):=S(d+Gy^)
24

In practiceIf the base forecast is elliptical, then the reconcilied forecast will be elliptical.
If the base forecast is N(^μ,^Σ)N(μ^,Σ^)
If reconciliation is linear ψ(^y):=S(d+G^y)ψ(y^):=S(d+Gy^)
The reconciled forecast is N(S(d+G^μ),SG^ΣG′S′)N(S(d+Gμ^),SGΣ^G′S′)
24

Perfect ReconciliationUsing the result on the previous slide we prove that for an arbitrary ^μμ^ and ^ΣΣ^ there exists a dd and GG that recovers the true predictive distribution.
25

Perfect ReconciliationUsing the result on the previous slide we prove that for an arbitrary ^μμ^ and ^ΣΣ^ there exists a dd and GG that recovers the true predictive distribution.
No matter how bad the base forecast is reconciliation can get you to the truth.
25

Perfect ReconciliationUsing the result on the previous slide we prove that for an arbitrary ^μμ^ and ^ΣΣ^ there exists a dd and GG that recovers the true predictive distribution.
No matter how bad the base forecast is reconciliation can get you to the truth.
It is not feasible.
25

Perfect ReconciliationUsing the result on the previous slide we prove that for an arbitrary ^μμ^ and ^ΣΣ^ there exists a dd and GG that recovers the true predictive distribution.
No matter how bad the base forecast is reconciliation can get you to the truth.
It is not feasible.
Will not always be a projection.
25

Score Optimisation26

Scoring RulesPropose reconciliation by optimising with respect to a scoring rule (Gneiting and Raftery, 2007).
27

Scoring RulesPropose reconciliation by optimising with respect to a scoring rule (Gneiting and Raftery, 2007).
A scoring rule K(q,ω)K(q,ω) takes a probabilistic forecast qq and a realisation ωω and returns a real number.
27

Scoring Rules

Propose reconciliation by optimising with respect to a scoring rule (Gneiting and Raftery, 2007).
A scoring rule $K (q, ω)$ takes a probabilistic forecast $q$ and a realisation $ω$ and returns a real number.
The rule is proper if

$E_{p} [K (p, ω)] \leq E_{p} [K (q, ω)]$

for all $p \neq q$ where $ω \sim p$

Multivariate scoring rulesLog score
28

Multivariate scoring rulesLog scoreImproper in context of reconciliation.
Result proven in paper.

28

Multivariate scoring rulesLog scoreImproper in context of reconciliation.
Result proven in paper.

Energy score
28

Multivariate scoring rulesLog scoreImproper in context of reconciliation.
Result proven in paper.

Energy score  Multivariate generalisation of continuous ranked probability score.

28

Multivariate scoring rulesLog scoreImproper in context of reconciliation.
Result proven in paper.

Energy score  Multivariate generalisation of continuous ranked probability score.

Variogram score (Scheuerer and Hamill, 2015) 
28

If $d$ and $G$ known

Use training data $t = 1, \dots, T$ to train models and make base forecasts.

If $d$ and $G$ known

Use training data $t = 1, \dots, T$ to train models and make base forecasts.

Then reconcile.

Training dd and GGw. 01w. 02w. 03w. 04w. 051st Training Window  1st Score 2nd Training Window  2nd Score 3rd Training Window  3rd Score Rth Training Window  Rth Score Train t=R+1 to t=T+R  Forecast T+R+h Reco. WeightsForecastingForecast Setup
30

In math

Optimise $E (γ) = \sum_{t = T}^{T + R - 1} K ({\tilde{f}}_{t + h | t}^{γ}, y_{t + h})$

In math

Optimise $E (γ) = \sum_{t = T}^{T + R - 1} K ({\tilde{f}}_{t + h | t}^{γ}, y_{t + h})$ where ${\tilde{f}}_{t + h | t}^{γ}$ is reconciled with respect to $γ := (d, v e c (G))$

In math

Optimise $E (γ) = \sum_{t = T}^{T + R - 1} K ({\tilde{f}}_{t + h | t}^{γ}, y_{t + h})$ where ${\tilde{f}}_{t + h | t}^{γ}$ is reconciled with respect to $γ := (d, v e c (G))$

Energy and variogram score approximated via Monte Carlo.

Energy Score

$E (γ) \approx \sum_{t = T}^{T + R - 1} [\frac{1}{Q} (\sum_{q = 1}^{Q} | | {\tilde{y}}_{t + h | t}^{[q]} - y_{t + h} | | - \frac{1}{2} | | {\tilde{y}}_{t + h | t}^{[q]} - {\tilde{y}}_{t + h | t}^{* [q]} | |)]$ ${\tilde{y}}_{t + h | t}^{[q]} = S (d + G {\hat{y}}_{t + h | t}^{[q]})$ , ${\tilde{y}}_{t + h | t}^{* [q]} = S (d + G {\hat{y}}_{t + h | t}^{* [q]})$ and ${\hat{y}}_{t + h | t}^{[q]}, {\hat{y}}_{t + h | t}^{* [q]} \overset{i i d}{\sim} {\hat{f}}_{t + h | t}$ for $q = 1, \dots, Q$

Stochastic Gradient DescentWhen the objective function is only known up to approximation Stochastic Gradient Descent can be used for optimisation.
33

Stochastic Gradient DescentWhen the objective function is only known up to approximation Stochastic Gradient Descent can be used for optimisation.
Requires an estimate of the gradient found by Automatic Differentiation.
33

Stochastic Gradient DescentWhen the objective function is only known up to approximation Stochastic Gradient Descent can be used for optimisation.
Requires an estimate of the gradient found by Automatic Differentiation.
The specific variant of SGD used is Adam (Kingma and Ba, 2014). 
33

Stochastic Gradient Descent

When the objective function is only known up to approximation Stochastic Gradient Descent can be used for optimisation.
Requires an estimate of the gradient found by Automatic Differentiation.
The specific variant of SGD used is Adam (Kingma and Ba, 2014).
Implementation available in ProbReco package.

Simulation34

Scenarios

Simulate from 7-variable hierarchy with bottom levels given by ARIMA models

Scenarios

Simulate from 7-variable hierarchy with bottom levels given by ARIMA models

Stationary Gaussian

Scenarios

Simulate from 7-variable hierarchy with bottom levels given by ARIMA models

Stationary Gaussian
Stationary non-Gaussian

Scenarios

Simulate from 7-variable hierarchy with bottom levels given by ARIMA models

Stationary Gaussian
Stationary non-Gaussian
Non-stationary Gaussian

Scenarios

Simulate from 7-variable hierarchy with bottom levels given by ARIMA models

Stationary Gaussian
Stationary non-Gaussian
Non-stationary Gaussian
Non-stationary non-Gaussian

Base forecasts

Obtain point forecasts from ARIMA or ETS models. To sample from base probabilistic forecasts add noise that is

Base forecasts

Obtain point forecasts from ARIMA or ETS models. To sample from base probabilistic forecasts add noise that is

Independent Gaussian noise

Base forecasts

Obtain point forecasts from ARIMA or ETS models. To sample from base probabilistic forecasts add noise that is

Independent Gaussian noise
Multivariate Gaussian noise

Base forecasts

Obtain point forecasts from ARIMA or ETS models. To sample from base probabilistic forecasts add noise that is

Independent Gaussian noise
Multivariate Gaussian noise
Bootstraped residuals

Base forecasts

Obtain point forecasts from ARIMA or ETS models. To sample from base probabilistic forecasts add noise that is

Independent Gaussian noise
Multivariate Gaussian noise
Bootstraped residuals
Jointly bootstapped residuals

Residual Matrix

$(\begin{matrix} e_{T o t, 1} & \dots & e_{T o t, t} & \dots & e_{T o t, T} \\ e_{A, 1} & \dots & e_{A, t} & \dots & e_{A, T} \\ e_{B, 1} & \dots & e_{B, t} & \dots & e_{B, T} \\ e_{A A, 1} & \dots & e_{A A, t} & \dots & e_{A A, T} \\ e_{A B, 1} & \dots & e_{A B, t} & \dots & e_{A B, T} \\ e_{B A, 1} & \dots & e_{B A, t} & \dots & e_{B A, T} \\ e_{B B, 1} & \dots & e_{B B, t} & \dots & e_{B B, T} \end{matrix})$

Independently Bootstapped

Jointly Bootstapped

Reconciliation methodsJPP: Reconcile quantiles.
40

Reconciliation methodsJPP: Reconcile quantiles.
BTTH: Reconcile mean, otherwise BU.
40

Reconciliation methodsJPP: Reconcile quantiles.
BTTH: Reconcile mean, otherwise BU.
BU: Bottom up.
40

Reconciliation methodsJPP: Reconcile quantiles.
BTTH: Reconcile mean, otherwise BU.
BU: Bottom up.
OLS: Orthogonal projection.
40

Reconciliation methodsJPP: Reconcile quantiles.
BTTH: Reconcile mean, otherwise BU.
BU: Bottom up.
OLS: Orthogonal projection.
MinT: With shrinkage estimator.
40

Reconciliation methodsJPP: Reconcile quantiles.
BTTH: Reconcile mean, otherwise BU.
BU: Bottom up.
OLS: Orthogonal projection.
MinT: With shrinkage estimator.
ScoreOptE: Optimise w.r.t Energy Score.
40

Reconciliation methodsJPP: Reconcile quantiles.
BTTH: Reconcile mean, otherwise BU.
BU: Bottom up.
OLS: Orthogonal projection.
MinT: With shrinkage estimator.
ScoreOptE: Optimise w.r.t Energy Score.
ScoreOptV: Optimise w.r.t Variogram Score.
40

Simulation Results (link)

Energy application42

Base forecasts