class: center, middle, inverse, title-slide .title[ # Reconciliation
with Immutable
Forecasts ] .author[ ### Anastasios Panagiotelis ] .date[ ### December 1, 2022 ] --- class:left # Joint work with Bohan Zhang <img src="img/bohan.jpeg" width="120"/> -- Yanfei Kang <img src="img/yanfei.jpeg" width="120"/> -- Feng Li <img src="img/feng.jpeg" width="120"/> --- class: inverse, center, middle # Reconciliation background --- # Hierarchical Time Series - Multivariate forecasting problem. -- - Variables follow linear constraints. -- - Forecast store level sales and aggregates.
--- # Problem context - Have .emphcol[base] forecasts at all nodes. How can we make these .emphcol[coherent]? -- - Bottom Up (Schwarzkopf, Tersine, and Morris, 1988) ignores top nodes. -- - Noise in bottom level transferred to top level. -- - Top Down (Gross and Sohl, 1990) ignores bottom nodes. -- - Discards information. - Can bias even unbiased base forecasts. --- # Reconciliation - Base forecasts stacked in an `\(n\)`-vector `\(\hat{\mathbf{y}}\)`. -- - .emphcol[Reconciled] forecasts `\(\tilde{\mathbf{y}}\)` given by `$$\tilde{\mathbf{y}}=\mathbf{S}\left(\mathbf{S}'\mathbf{W}\mathbf{S}\right)^{-1}\mathbf{S}'\mathbf{W}\hat{\mathbf{y}}$$` -- - Different choices of `\(\pmb{W}\)` give rise to different methods. -- - The original is `\(\pmb{W}=\pmb{I}\)` (Hyndman, Ahmed, Athanasopoulos, and Shang, 2011). --- # What is `\(\pmb{S}\)`? .pull-left[
`$$\pmb{y}=\pmb{S}\pmb{\color{orange}{b}}$$` ] --- # What is `\(\pmb{S}\)`? .pull-left[
`$$\pmb{y}=\pmb{S}\pmb{\color{orange}{b}}$$` ] .pull-right[ `\(\mathbf{S}=\begin{pmatrix}1&1&1&1\\1&1&0&0\\0&0&1&1\\1&0&0&0\\0&1&0&0\\0&0&1&0\\0&0&0&1\\\end{pmatrix}\)` ] --- # Why optimal? Minimises expected loss MinT method (Wickramasuriya, Athanasopoulos, and Hyndman, 2019) minimises `$$E\left[(\pmb{y}-\tilde{\pmb{y}})'(\pmb{y}-\tilde{\pmb{y}})\right]$$` if `\(\pmb{W}\)` is set to the inverse forecast error covariance of the base forecasts. -- - Subject to linearity and constraint that unbiasedness of base forecasts is preserved. --- # Why optimal? Always improves loss .pull-left[ - Reconciliation also minimises `$$(\tilde{\pmb{y}}-\hat{\pmb{y}})'\pmb{W}(\tilde{\pmb{y}}-\hat{\pmb{y}})$$` - Subject to `\(\tilde{\pmb{y}}\)` being coherent. - Guarantee `\((\pmb{y}-\hat{\pmb{y}})'\pmb{W}(\pmb{y}-\hat{\pmb{y}})\geq(\pmb{y}-\tilde{\pmb{y}})'\pmb{W}(\pmb{y}-\tilde{\pmb{y}})\)` ] .pull-right[![](img/geo.png)] --- # With non negativity - For data that cannot be negative can minimise `$$(\tilde{\pmb{y}}-\hat{\pmb{y}})'\pmb{W}(\tilde{\pmb{y}}-\hat{\pmb{y}})$$` - Subject to the coherence constraint. - Subject to the constraint that `\(\tilde{y}_i\ge 0\)` for all `\(i\)`. -- - Solved by QP using a block pivoting algorithm (Wickramasuriya, Turlach, and Hyndman, 2020). -- - Theory does not follow through, but useful in practice. --- class: inverse, center, middle # Motivation --- # Joint loss function - Everything we know about why reconciliation works applies to a .emphcol[joint] loss function. -- - Forecast accuracy for individual variables may become worse after reconciling. --
--- # Unbiased top down - For 3-variable hierarchy `\(y_T=y_A+y_B\)`, consider `$$\begin{pmatrix} \hat{y}_T\\\hat{y}_A\\\hat{y}_T-\hat{y}_A\end{pmatrix}\mbox{ or }\begin{pmatrix}\hat{y}_T\\\hat{y}_T-\hat{y}_B\\\hat{y}_B\end{pmatrix}$$` or an average of these (Hollyman, Petropoulos, and Tipping, 2021). - Top level remains unchanged (or .emphcol[immutable]) and unbiasedness is preserved --- # Our contribution - Suppose we want to keep forecasts of a subset of variables .emphcol[immutable]. -- - When is this feasible? - How should we choose immutable series? -- - We provide a conclusive answer to the first question and some suggestions (with an empirical example) for the second. --- class: inverse, center, middle # Is it feasible? --- # Feasbility check - Cannot keep Region 1, Store 1A and Store 1B immutable
--- # Basis series - Choose any representation `\(\pmb{y}=\pmb{S}\pmb{b}\)`. -- - Let `\(j_1,j_2,\dots,j_m\)` be the indices of a set of proposed immutable series -- - Construct a matrix `\(\pmb{S}_{\{j\}}\)` using the rows of `\(\pmb{S}\)` corresponding to the indices `\(j_1,j_2,\dots,j_m\)`. -- - If `\(\pmb{S}_{\{j\}}\)` has full (row) rank, then `\(y_{j_1},y_{j_2},\dots,y_{j_m}\)` can be kept immutable. --- # Example .pull-left[
] .pull-right[ `\(\mathbf{S}=\begin{pmatrix}1&1&1&1\\ \color{orange}1&\color{orange}1&\color{orange}0&\color{orange}0\\0&0&1&1\\\color{orange}1&\color{orange}0&\color{orange}0&\color{orange}0\\\color{orange}0&\color{orange}1&\color{orange}0&\color{orange}0\\0&0&1&0\\0&0&0&1\\\end{pmatrix}\)` ] --- # Example .pull-left[
] .pull-right[ `\(\mathbf{S}=\begin{pmatrix}\color{orange}1&\color{orange}1&\color{orange}1&\color{orange}1\\ 1&1&0&0\\0&0&1&1\\\color{orange}1&\color{orange}0&\color{orange}0&\color{orange}0\\\color{orange}0&\color{orange}1&\color{orange}0&\color{orange}0\\0&0&1&0\\0&0&0&1\\\end{pmatrix}\)` ] --- # Reconciliation Find `\(\tilde{\pmb{y}}\)` to minimise `$$(\tilde{\pmb{y}}-\hat{\pmb{y}})'\pmb{W}(\tilde{\pmb{y}}-\hat{\pmb{y}})$$` subject to -- - Coherence constraints - Immutability constraints - Non negativity constraints --- class: inverse, center, middle # Choosing immutable series --- # Application - Sales data from Chinese online retailer -- - Consider "Food" sales as top level. - There are 40 "middle" level categories. - There are 1905 "bottom" level categories. -- - Many series are intermittent. - Promotions are very important --- # Top level ![](img/total_sale-1.png) --- # Setup - Base models -- - Intermittent Series (more than 60% zeroes) use simple exponential smoothing. - Regression (predictors are strength of promotion) with ARIMA errors and Box Cox transformation for other series. -- - Immutable series -- - Top level - Series with longer histories (more than 1 year) --- # Results <table class=" lightable-minimal" style='font-family: "Trebuchet MS", verdana, sans-serif; margin-left: auto; margin-right: auto;'> <thead> <tr> <th style="text-align:left;"> Level </th> <th style="text-align:right;"> Base </th> <th style="text-align:right;"> C </th> <th style="text-align:right;"> CN </th> <th style="text-align:right;"> CI </th> <th style="text-align:right;"> CIN </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Top (I) </td> <td style="text-align:right;"> 2.94 </td> <td style="text-align:right;"> 2.75 </td> <td style="text-align:right;"> 2.77 </td> <td style="text-align:right;"> 2.94 </td> <td style="text-align:right;"> 2.94 </td> </tr> <tr> <td style="text-align:left;"> Middle (M) </td> <td style="text-align:right;"> 2.66 </td> <td style="text-align:right;"> 2.39 </td> <td style="text-align:right;"> 2.40 </td> <td style="text-align:right;"> 2.43 </td> <td style="text-align:right;"> 2.47 </td> </tr> <tr> <td style="text-align:left;"> Bottom (M) </td> <td style="text-align:right;"> 2.04 </td> <td style="text-align:right;"> 1.86 </td> <td style="text-align:right;"> 1.83 </td> <td style="text-align:right;"> 1.97 </td> <td style="text-align:right;"> 1.88 </td> </tr> <tr> <td style="text-align:left;"> Intermittent (I) </td> <td style="text-align:right;"> 0.11 </td> <td style="text-align:right;"> 1.52 </td> <td style="text-align:right;"> 1.52 </td> <td style="text-align:right;"> 0.11 </td> <td style="text-align:right;"> 0.11 </td> </tr> <tr> <td style="text-align:left;"> Older (I) </td> <td style="text-align:right;"> 1.08 </td> <td style="text-align:right;"> 1.58 </td> <td style="text-align:right;"> 1.19 </td> <td style="text-align:right;"> 1.08 </td> <td style="text-align:right;"> 1.08 </td> </tr> </tbody> </table> --- #Conclusions - Imposing immutability constraints still leads to improvements over base forecasts. -- - Immutability stabilises forecasting performance of intermittent series in particular. -- - Imposing immutability does not lead to better accuracy in all series. -- - Imposing non-negativity constraints generally improves performance. --- # Open Questions - Can we choose immutability constraints in a more principled way? -- - Is this another version of a shrinkage v sparsity question? -- - When is it possible reconcile in a way that yields Pareto improvements in forecast accuracy? --- # References .small[ Gross, C. W. et al. (1990). "Disaggregation methods to expedite product line forecasting". In: _Journal of Forecasting_ 9.3, pp. 233-254. Hollyman, R. et al. (2021). "Understanding Forecast Reconciliation". En. In: _European Journal of Operational Research_ 294.1, pp. 149-160. ISSN: 0377-2217. DOI: [10.1016/j.ejor.2021.01.017](https://doi.org/10.1016%2Fj.ejor.2021.01.017). Hyndman, R. J. et al. (2011). "Optimal Combination Forecasts for Hierarchical Time Series". En. In: _Computational Statistics Data Analysis_ 55.9, pp. 2579-2589. ISSN: 0167-9473. DOI: [10.1016/j.csda.2011.03.006](https://doi.org/10.1016%2Fj.csda.2011.03.006). Schwarzkopf, A. B. et al. (1988). "Top-down versus bottom-up forecasting strategies". In: _International Journal of Production Research_ 26 (11), pp. 1833-1843. Wickramasuriya, S. L. et al. (2019). "Optimal Forecast Reconciliation for Hierarchical and Grouped Time Series Through Trace Minimization". In: _Journal of the American Statistical Association_ 114.526, pp. 804-819. ISSN: 0162-1459. DOI: [10.1080/01621459.2018.1448825](https://doi.org/10.1080%2F01621459.2018.1448825). Wickramasuriya, S. L. et al. (2020). "Optimal Non-Negative Forecast Reconciliation". En. In: _Statistics and Computing_ 30.5, pp. 1167-1182. ISSN: 1573-1375. DOI: [10.1007/s11222-020-09930-0](https://doi.org/10.1007%2Fs11222-020-09930-0). ]