+ - 0:00:00
Notes for current slide
Notes for next slide

Probabilistic
Forecast
Reconciliation

Properties, Evaluation
and Score Optimisation

Anastasios Panagiotelis

September 23, 2020

1

Joint work with

Puwasala Gamakumara

2

Joint work with

Puwasala Gamakumara

George Athanasopoulos

2

Joint work with

Puwasala Gamakumara

George Athanasopoulos

Rob Hyndman

2

Motivation

3

Hierarchical Time Series

  • Predictions of multiple variables needed.
4

Hierarchical Time Series

  • Predictions of multiple variables needed.
  • Variables follow linear constraints.
4

Hierarchical Time Series

  • Predictions of multiple variables needed.
  • Variables follow linear constraints.
  • Forecast store level sales and aggregates.
4

Electricity Example

  • Total daily electricity in Australian NEM
5

Electricity Example

  • Total daily electricity in Australian NEM
    • Renewable
    • Non-renewable
5

Electricity Example

  • Total daily electricity in Australian NEM
    • Renewable
    • Non-renewable
  • Renewable can be broken down
5

Electricity Example

  • Total daily electricity in Australian NEM
    • Renewable
    • Non-renewable
  • Renewable can be broken down
    • Solar
    • Wind
    • etc.
5

Electricity Example

  • Total daily electricity in Australian NEM
    • Renewable
    • Non-renewable
  • Renewable can be broken down
    • Solar
    • Wind
    • etc.
  • Solar can be broken down into
5

Electricity Example

  • Total daily electricity in Australian NEM
    • Renewable
    • Non-renewable
  • Renewable can be broken down
    • Solar
    • Wind
    • etc.
  • Solar can be broken down into
    • Solar rooftop
    • Solar utility
5

Electricity Example

  • Total daily electricity in Australian NEM
    • Renewable
    • Non-renewable
  • Renewable can be broken down
    • Solar
    • Wind
    • etc.
  • Solar can be broken down into
    • Solar rooftop
    • Solar utility
  • Data sourced from Open NEM.
5

Electricity Data (link)

6

Main takeaways

  • Data have different characteristics regarding
7

Main takeaways

  • Data have different characteristics regarding
    • Trends
    • Seasonality
    • Spikes
    • Signal to noise ratio
7

Main takeaways

  • Data have different characteristics regarding
    • Trends
    • Seasonality
    • Spikes
    • Signal to noise ratio
  • Hard to come up with a single multivariate model.
7

Main takeaways

  • Data have different characteristics regarding
    • Trends
    • Seasonality
    • Spikes
    • Signal to noise ratio
  • Hard to come up with a single multivariate model.
  • Even harder to do so while accounting for constraints.
7

Why multivariate?

  • Electricity cannot be economically stored at large scale (yet).
8

Why multivariate?

  • Electricity cannot be economically stored at large scale (yet).
  • Day ahead load forecasting crucial input to dispatch and operational decisions.
8

Why multivariate?

  • Electricity cannot be economically stored at large scale (yet).
  • Day ahead load forecasting crucial input to dispatch and operational decisions.
  • Increased use of renewables result in uncertainty in supply.
8

Why multivariate?

  • Electricity cannot be economically stored at large scale (yet).
  • Day ahead load forecasting crucial input to dispatch and operational decisions.
  • Increased use of renewables result in uncertainty in supply.
  • Decisions of generators, market regulators may depend on total generation but also on generation from individual sources.
8

Why multivariate?

  • Electricity cannot be economically stored at large scale (yet).
  • Day ahead load forecasting crucial input to dispatch and operational decisions.
  • Increased use of renewables result in uncertainty in supply.
  • Decisions of generators, market regulators may depend on total generation but also on generation from individual sources.
  • Avoid non-aligned decisions.
8

Why probabilistic?

  • Prices are heavily right skewed.
9

Why probabilistic?

  • Prices are heavily right skewed.
    • Prices in NSW reached $14700 on January 4th,2020.
    • This compares to $30-$40 on a normal day.
9

Why probabilistic?

  • Prices are heavily right skewed.
    • Prices in NSW reached $14700 on January 4th,2020.
    • This compares to $30-$40 on a normal day.
  • Price peaks coincide with the use of fuels with higher marginal cost.
9

Why probabilistic?

  • Prices are heavily right skewed.
    • Prices in NSW reached $14700 on January 4th,2020.
    • This compares to $30-$40 on a normal day.
  • Price peaks coincide with the use of fuels with higher marginal cost.
  • Another important consideration is grid reliability.
9

Why probabilistic?

  • Prices are heavily right skewed.
    • Prices in NSW reached $14700 on January 4th,2020.
    • This compares to $30-$40 on a normal day.
  • Price peaks coincide with the use of fuels with higher marginal cost.
  • Another important consideration is grid reliability.
  • Tails are important!
9

What is reconciliation?

10

Traditional approaches

  • Single level approaches
11

Traditional approaches

  • Single level approaches
    • Bottom Up (Schwarzkopf, Tersine, and Morris, 1988).
11

Traditional approaches

  • Single level approaches
    • Bottom Up (Schwarzkopf, Tersine, and Morris, 1988).
    • Top Down (Gross and Sohl, 1990).
11

Traditional approaches

  • Single level approaches
    • Bottom Up (Schwarzkopf, Tersine, and Morris, 1988).
    • Top Down (Gross and Sohl, 1990).
    • Middle Out
11

Traditional approaches

  • Single level approaches
    • Bottom Up (Schwarzkopf, Tersine, and Morris, 1988).
    • Top Down (Gross and Sohl, 1990).
    • Middle Out
  • Top down do not exploit information at bottom levels.
11

Traditional approaches

  • Single level approaches
    • Bottom Up (Schwarzkopf, Tersine, and Morris, 1988).
    • Top Down (Gross and Sohl, 1990).
    • Middle Out
  • Top down do not exploit information at bottom levels.
  • Bottom up can suffer from the noisiness of bottom level series.
11

Traditional approaches

  • Single level approaches
    • Bottom Up (Schwarzkopf, Tersine, and Morris, 1988).
    • Top Down (Gross and Sohl, 1990).
    • Middle Out
  • Top down do not exploit information at bottom levels.
  • Bottom up can suffer from the noisiness of bottom level series.
  • These approaches do not work for more general constraints.
11

Reconciliation

  • Forecast every variable, stack in an n-vector y^.
  • Call these base forecasts.
  • Forecasts not coherent.
  • Forecasts need to be reconciled via a mapping y~=ψ(y^).
12

Reconciliation

  • Forecast every variable, stack in an n-vector y^.
  • Call these base forecasts.
  • Forecasts not coherent.
  • Forecasts need to be reconciled via a mapping y~=ψ(y^).

12

Some math

Linear reconciliation generally takes the form b~=(d+Gy^) where

13

Some math

Linear reconciliation generally takes the form b~=(d+Gy^) where

  • G is an m×n matrix of reconciliation weights
  • d is m×1 translation vector
13

Some math

Linear reconciliation generally takes the form b~=(d+Gy^) where

  • G is an m×n matrix of reconciliation weights
  • d is m×1 translation vector

The full hierarchy is y~=Sb~ where S is an n×m matrix that encodes constraints.

13

The summing matrix

14

The summing matrix

S=(1111110000111000010000100001)

15

Specific reconciliation methods

  • OLS: y~=S(SS)1Sy^
    • Hyndman, Ahmed, Athanasopoulos, and Shang (2011)
16

Specific reconciliation methods

  • OLS: y~=S(SS)1Sy^
    • Hyndman, Ahmed, Athanasopoulos, and Shang (2011)
  • WLS: y~=S(SWS)1SWy^
    • Athanasopoulos, Hyndman, Kourentzes, and Petropoulos (2017)
16

Specific reconciliation methods

  • OLS: y~=S(SS)1Sy^
    • Hyndman, Ahmed, Athanasopoulos, and Shang (2011)
  • WLS: y~=S(SWS)1SWy^
    • Athanasopoulos, Hyndman, Kourentzes, and Petropoulos (2017)
  • MinT: y~=S(SΣ1S)1SΣ1y^
    • Wickramasuriya, Athanasopoulos, and Hyndman (2019)
16

Probabilistic forecasts

17

Existing approaches

  • Stack quantiles of each series in a vector and reconcile
    • Shang and Hyndman (2017) for prediction intervals.
    • Jeon, Panagiotelis, and Petropoulos (2019) (hereafter JPP) for a full distribution.
18

Existing approaches

  • Stack quantiles of each series in a vector and reconcile
    • Shang and Hyndman (2017) for prediction intervals.
    • Jeon, Panagiotelis, and Petropoulos (2019) (hereafter JPP) for a full distribution.
  • Reconcile mean otherwise bottom up
    • Ben Taieb, Taylor, and Hyndman (2020) (hereafter BTTH)
18

Formal Definition: Coherence

  • Let (Rm,FRm,μ) be a probability triple
  • Let s:Rms where s(.) is premultiplication by S.
19

Formal Definition: Coherence

  • Let (Rm,FRm,μ) be a probability triple
  • Let s:Rms where s(.) is premultiplication by S.
  • A coherent probabilistic forecast can be characterised by a probability triple (s,Fs,ν) where

ν(s(B))=μ(B)BFRm and s(B) is the image of B under s(.).

19

In a picture

20

Formal Definition: Reconciliation

Let (Rn,FRn,ν^) be a probability triple corresponding to a base forecast.

21

Formal Definition: Reconciliation

Let (Rn,FRn,ν^) be a probability triple corresponding to a base forecast.
The reconcilied forecast is characterised by

ν~(A)=ν^(ψ1(A))AFs and ψ1(A) is the pre-image of A under ψ(.).

21

In a picture

22

In practice

If y^[1],,y^[L] is a sample from some base probabilistic forecast, then y~[1],,y~[L] is a sample from the reconciled forecast where

y~[l]=ψ(y^[l])l=1,,L

23

In practice

If y^[1],,y^[L] is a sample from some base probabilistic forecast, then y~[1],,y~[L] is a sample from the reconciled forecast where

y~[l]=ψ(y^[l])l=1,,L Reconciling a sample from the base distribution gives a sample from the reconciled distribution.

23

In practice

If y^[1],,y^[L] is a sample from some base probabilistic forecast, then y~[1],,y~[L] is a sample from the reconciled forecast where

y~[l]=ψ(y^[l])l=1,,L Reconciling a sample from the base distribution gives a sample from the reconciled distribution.

Proof in paper.

23

In practice

  • If the base forecast is elliptical, then the reconcilied forecast will be elliptical.
24

In practice

  • If the base forecast is elliptical, then the reconcilied forecast will be elliptical.
  • If the base forecast is N(μ^,Σ^)
24

In practice

  • If the base forecast is elliptical, then the reconcilied forecast will be elliptical.
  • If the base forecast is N(μ^,Σ^)
  • If reconciliation is linear ψ(y^):=S(d+Gy^)
24

In practice

  • If the base forecast is elliptical, then the reconcilied forecast will be elliptical.
  • If the base forecast is N(μ^,Σ^)
  • If reconciliation is linear ψ(y^):=S(d+Gy^)
  • The reconciled forecast is N(S(d+Gμ^),SGΣ^GS)
24

Perfect Reconciliation

  • Using the result on the previous slide we prove that for an arbitrary μ^ and Σ^ there exists a d and G that recovers the true predictive distribution.
25

Perfect Reconciliation

  • Using the result on the previous slide we prove that for an arbitrary μ^ and Σ^ there exists a d and G that recovers the true predictive distribution.
  • No matter how bad the base forecast is reconciliation can get you to the truth.
25

Perfect Reconciliation

  • Using the result on the previous slide we prove that for an arbitrary μ^ and Σ^ there exists a d and G that recovers the true predictive distribution.
  • No matter how bad the base forecast is reconciliation can get you to the truth.
  • It is not feasible.
25

Perfect Reconciliation

  • Using the result on the previous slide we prove that for an arbitrary μ^ and Σ^ there exists a d and G that recovers the true predictive distribution.
  • No matter how bad the base forecast is reconciliation can get you to the truth.
  • It is not feasible.
  • Will not always be a projection.
25

Score Optimisation

26

Scoring Rules

  • Propose reconciliation by optimising with respect to a scoring rule (Gneiting and Raftery, 2007).
27

Scoring Rules

  • Propose reconciliation by optimising with respect to a scoring rule (Gneiting and Raftery, 2007).
  • A scoring rule K(q,ω) takes a probabilistic forecast q and a realisation ω and returns a real number.
27

Scoring Rules

  • Propose reconciliation by optimising with respect to a scoring rule (Gneiting and Raftery, 2007).
  • A scoring rule K(q,ω) takes a probabilistic forecast q and a realisation ω and returns a real number.
  • The rule is proper if

Ep[K(p,ω)]Ep[K(q,ω)]

for all pq where ωp

27

Multivariate scoring rules

  • Log score
28

Multivariate scoring rules

  • Log score
    • Improper in context of reconciliation.
    • Result proven in paper.
28

Multivariate scoring rules

  • Log score
    • Improper in context of reconciliation.
    • Result proven in paper.
  • Energy score
28

Multivariate scoring rules

  • Log score
    • Improper in context of reconciliation.
    • Result proven in paper.
  • Energy score
    • Multivariate generalisation of continuous ranked probability score.
28

Multivariate scoring rules

  • Log score
    • Improper in context of reconciliation.
    • Result proven in paper.
  • Energy score
    • Multivariate generalisation of continuous ranked probability score.
  • Variogram score (Scheuerer and Hamill, 2015)
28

If d and G known

Use training data t=1,,T to train models and make base forecasts.

w. 01w. 02w. 03w. 04w. 05Train t=1 to t=T Forecast T+h Forecast Setup
29

If d and G known

Use training data t=1,,T to train models and make base forecasts.

w. 01w. 02w. 03w. 04w. 05Train t=1 to t=T Forecast T+h Forecast Setup

Then reconcile.

29

Training d and G

w. 01w. 02w. 03w. 04w. 051st Training Window 1st Score 2nd Training Window 2nd Score 3rd Training Window 3rd Score Rth Training Window Rth Score Train t=R+1 to t=T+R Forecast T+R+h Reco. WeightsForecastingForecast Setup
30

In math

Optimise E(γ)=t=TT+R1K(f~t+h|tγ,yt+h)

31

In math

Optimise E(γ)=t=TT+R1K(f~t+h|tγ,yt+h) where f~t+h|tγ is reconciled with respect to γ:=(d,vec(G))

31

In math

Optimise E(γ)=t=TT+R1K(f~t+h|tγ,yt+h) where f~t+h|tγ is reconciled with respect to γ:=(d,vec(G))

  • Energy and variogram score approximated via Monte Carlo.
31

Energy Score

E(γ)t=TT+R1[1Q(q=1Q||y~t+h|t[q]yt+h||12||y~t+h|t[q]y~t+h|t[q]||)]

32

Energy Score

E(γ)t=TT+R1[1Q(q=1Q||y~t+h|t[q]yt+h||12||y~t+h|t[q]y~t+h|t[q]||)] y~t+h|t[q]=S(d+Gy^t+h|t[q]), y~t+h|t[q]=S(d+Gy^t+h|t[q]) and y^t+h|t[q],y^t+h|t[q]iidf^t+h|t for q=1,,Q

32

Stochastic Gradient Descent

  • When the objective function is only known up to approximation Stochastic Gradient Descent can be used for optimisation.
33

Stochastic Gradient Descent

  • When the objective function is only known up to approximation Stochastic Gradient Descent can be used for optimisation.
  • Requires an estimate of the gradient found by Automatic Differentiation.
33

Stochastic Gradient Descent

  • When the objective function is only known up to approximation Stochastic Gradient Descent can be used for optimisation.
  • Requires an estimate of the gradient found by Automatic Differentiation.
  • The specific variant of SGD used is Adam (Kingma and Ba, 2014).
33

Stochastic Gradient Descent

  • When the objective function is only known up to approximation Stochastic Gradient Descent can be used for optimisation.
  • Requires an estimate of the gradient found by Automatic Differentiation.
  • The specific variant of SGD used is Adam (Kingma and Ba, 2014).
  • Implementation available in ProbReco package.
33

Simulation

34

Scenarios

Simulate from 7-variable hierarchy with bottom levels given by ARIMA models

35

Scenarios

Simulate from 7-variable hierarchy with bottom levels given by ARIMA models

  • Stationary Gaussian
35

Scenarios

Simulate from 7-variable hierarchy with bottom levels given by ARIMA models

  • Stationary Gaussian
  • Stationary non-Gaussian
35

Scenarios

Simulate from 7-variable hierarchy with bottom levels given by ARIMA models

  • Stationary Gaussian
  • Stationary non-Gaussian
  • Non-stationary Gaussian
35

Scenarios

Simulate from 7-variable hierarchy with bottom levels given by ARIMA models

  • Stationary Gaussian
  • Stationary non-Gaussian
  • Non-stationary Gaussian
  • Non-stationary non-Gaussian
35

Base forecasts

Obtain point forecasts from ARIMA or ETS models. To sample from base probabilistic forecasts add noise that is

36

Base forecasts

Obtain point forecasts from ARIMA or ETS models. To sample from base probabilistic forecasts add noise that is

  • Independent Gaussian noise
36

Base forecasts

Obtain point forecasts from ARIMA or ETS models. To sample from base probabilistic forecasts add noise that is

  • Independent Gaussian noise
  • Multivariate Gaussian noise
36

Base forecasts

Obtain point forecasts from ARIMA or ETS models. To sample from base probabilistic forecasts add noise that is

  • Independent Gaussian noise
  • Multivariate Gaussian noise
  • Bootstraped residuals
36

Base forecasts

Obtain point forecasts from ARIMA or ETS models. To sample from base probabilistic forecasts add noise that is

  • Independent Gaussian noise
  • Multivariate Gaussian noise
  • Bootstraped residuals
  • Jointly bootstapped residuals
36

Residual Matrix

(eTot,1eTot,teTot,TeA,1eA,teA,TeB,1eB,teB,TeAA,1eAA,teAA,TeAB,1eAB,teAB,TeBA,1eBA,teBA,TeBB,1eBB,teBB,T)

37

Independently Bootstapped

(eTot,1eTot,teTot,TeA,1eA,teA,TeB,1eB,teB,TeAA,1eAA,teAA,TeAB,1eAB,teAB,TeBA,1eBA,teBA,TeBB,1eBB,teBB,T)

38

Jointly Bootstapped

(eTot,1eTot,teTot,TeA,1eA,teA,TeB,1eB,teB,TeAA,1eAA,teAA,TeAB,1eAB,teAB,TeBA,1eBA,teBA,TeBB,1eBB,teBB,T)

39

Reconciliation methods

  • JPP: Reconcile quantiles.
40

Reconciliation methods

  • JPP: Reconcile quantiles.
  • BTTH: Reconcile mean, otherwise BU.
40

Reconciliation methods

  • JPP: Reconcile quantiles.
  • BTTH: Reconcile mean, otherwise BU.
  • BU: Bottom up.
40

Reconciliation methods

  • JPP: Reconcile quantiles.
  • BTTH: Reconcile mean, otherwise BU.
  • BU: Bottom up.
  • OLS: Orthogonal projection.
40

Reconciliation methods

  • JPP: Reconcile quantiles.
  • BTTH: Reconcile mean, otherwise BU.
  • BU: Bottom up.
  • OLS: Orthogonal projection.
  • MinT: With shrinkage estimator.
40

Reconciliation methods

  • JPP: Reconcile quantiles.
  • BTTH: Reconcile mean, otherwise BU.
  • BU: Bottom up.
  • OLS: Orthogonal projection.
  • MinT: With shrinkage estimator.
  • ScoreOptE: Optimise w.r.t Energy Score.
40

Reconciliation methods

  • JPP: Reconcile quantiles.
  • BTTH: Reconcile mean, otherwise BU.
  • BU: Bottom up.
  • OLS: Orthogonal projection.
  • MinT: With shrinkage estimator.
  • ScoreOptE: Optimise w.r.t Energy Score.
  • ScoreOptV: Optimise w.r.t Variogram Score.
40

Simulation Results (link)

41

Energy application

42

Base forecasts

Use a feed-forward neural network with up to 28 lags of daily data. For probabilisitic forecasts:

43

Base forecasts

Use a feed-forward neural network with up to 28 lags of daily data. For probabilisitic forecasts:

  • Independent Gaussian noise
43

Base forecasts

Use a feed-forward neural network with up to 28 lags of daily data. For probabilisitic forecasts:

  • Independent Gaussian noise
  • Multivariate Gaussian noise
43

Base forecasts

Use a feed-forward neural network with up to 28 lags of daily data. For probabilisitic forecasts:

  • Independent Gaussian noise
  • Multivariate Gaussian noise
  • Bootstraped residuals
43

Base forecasts

Use a feed-forward neural network with up to 28 lags of daily data. For probabilisitic forecasts:

  • Independent Gaussian noise
  • Multivariate Gaussian noise
  • Bootstraped residuals
  • Jointly bootstapped residuals
43

Base forecasts

Use a feed-forward neural network with up to 28 lags of daily data. For probabilisitic forecasts:

  • Independent Gaussian noise
  • Multivariate Gaussian noise
  • Bootstraped residuals
  • Jointly bootstapped residuals

Consider day-ahead forecasts.

43

Model Fit (link)

44

Residual Dependence

45

Reconciliation Results

46

Independent Gaussian

47

Joint Bootstrap

48

Main takeaway

  • Score optimisation dominates alternatives when base forecasts are badly mis-specified.
49

Main takeaway

  • Score optimisation dominates alternatives when base forecasts are badly mis-specified.
  • If base models are good then OLS dominates.
49

Main takeaway

  • Score optimisation dominates alternatives when base forecasts are badly mis-specified.
  • If base models are good then OLS dominates.
  • Existing approaches in the literature do not work well and are even dominated by base forecasts.
49

Main takeaway

  • Score optimisation dominates alternatives when base forecasts are badly mis-specified.
  • If base models are good then OLS dominates.
  • Existing approaches in the literature do not work well and are even dominated by base forecasts.
  • Possible bias-variance tradeoff at play. Future work may look at regularised approaches to score optimisation.
49

References

Athanasopoulos, G. et al. (2017). "Forecasting with temporal hierarchies". In: European Journal of Operational Research 262.1, pp. 60-74.

Ben Taieb, S. et al. (2020). "Hierarchical Probabilistic Forecasting of Electricity Demand With Smart Meter Data". In: Journal of the American Statistical Association. in press.

Gneiting, T. et al. (2007). "Strictly Proper Scoring Rules, Prediction, and Estimation". In: Journal of the American Statistical Association 102.477, pp. 359-378.

Gross, C. W. et al. (1990). "Disaggregation methods to expedite product line forecasting". In: Journal of Forecasting 9.3, pp. 233-254.

Hyndman, R. J. et al. (2011). "Optimal combination forecasts for hierarchical time series". In: Computational Statistics and Data Analysis 55.9, pp. 2579-2589.

Jeon, J. et al. (2019). "Probabilistic forecast reconciliation with applications to wind power and electric load". In: European Journal of Operational Research 279.2, pp. 364-379.

Kingma, D. P. et al. (2014). "Adam: A method for stochastic optimization". . http://arxiv.org/abs/1412.6980.

Scheuerer, M. et al. (2015). "Variogram-Based Proper Scoring Rules for Probabilistic Forecasts of Multivariate Quantities". In: Monthly Weather Review 143.4, pp. 1321-1334.

Schwarzkopf, A. B. et al. (1988). "Top-down versus bottom-up forecasting strategies". In: International Journal of Production Research 26 (11), pp. 1833-1843.

Shang, H. L. et al. (2017). "Grouped Functional Time Series Forecasting: An Application to Age-Specific Mortality Rates". In: Journal of Computational and Graphical Statistics 26.2, pp. 330-343.

Wickramasuriya, S. L. et al. (2019). "Optimal Forecast Reconciliation for Hierarchical and Grouped Time Series Through Trace Minimization". In: Journal of the American Statistical Association 114.526, pp. 804-819.

50

Joint work with

Puwasala Gamakumara

2
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow