class: center, middle, inverse, title-slide .title[ # Boosted mortality
models with age
and spatial shrinkage ] .author[ ### Anastasios Panagiotelis ] .date[ ### November, 2023 ] --- class:center # Joint work with Li Li `\(\quad\quad\)` Han Li ![](img/LiLi.jpeg) ![](img/HanLi.jpeg) --- # The big story - Statistics v Machine Learning - Boosting or domain-specific statistical models? - Humans v Machines - Does domain knowledge about data structure matter? - Can't we just get along... -- ![](img/t2.jpeg) --- class: inverse, center, middle # The data --- # Mortality data (all U.S.) ![Image Title](img/national_male_rainbow.svg) --- # Mortality data (selected states) ![Image Title](img/states_rainbow.svg) --- # Challenges - Mortality generally trends downward -- - But not for all ages... - and not for all states -- - Age effects generally smooth over time -- - Data are noisier for age/states with lower populations and fewer deaths -- - Different states, different patterns -- - Neighboring states can have similar patterns --- class: inverse, center, middle # The model --- # Lee Carter model - A popular mortality model is the Lee and Carter (1992) model. `$$y_{x,t}=a_x+b_x\kappa_t+\epsilon_{x,t}$$` -- - The `\(y_{x,t}\)` are log mortality rates for age `\(x\)` in year `\(t\)`. - The `\(a_{x}\)` are estimated as age-wise means - The `\(b_{x}\)` are age weights, the `\(\kappa_t\)` is a time trend - Both estimated using SVD - The `\(\epsilon_{x,t}\)` are errors. --- # Gradient Boosting - Consider a prediction `\(\mathbf{f}_{t}=(f_{0,t},\dots,f_{85+,t})'\)` with squared loss function -- `$$L_t(\mathbf{y}_t,\mathbf{f}_{t})=(\mathbf{y}_t-\mathbf{f}_{t})'(\mathbf{y}_t-\mathbf{f}_{t})$$` -- - Gradient proportional to residual `\((\mathbf{y}_t-\mathbf{f}_{t})\)` -- - The idea of gradient boosting is to -- fit a *weak learner*, -- compute residuals, -- refit the weak learner -- recompute residuals, -- etc. -- - Prediction is ensemble of the *weak learners* --- # What is a weak learner? - Typically trees. But can also be -- - Logistic regression (Friedman, Hastie, and Tibshirani, 2000) - Generalized additive models (Tutz and Binder, 2006) - Copulas (Brant and Haff, 2022) -- - **Novel idea 1:** Do gradient boosting with the Lee-Carter model. --- # Exploiting structure - We know certain things about mortality data. - Mortality rates of similar ages are similar - Mortality rates of neighboring regions are similar due to local effects -- - **Novel idea 2:** Shrink forecasts together to borrow strength across nearby age groups and/or geographical regions --- # How to do shrinkage? - Add a penalty term to the objective function `$$L_t(\mathbf{y}_t,\mathbf{f}_{t})=(\mathbf{y}_t-\mathbf{f}_{t})'(\mathbf{y}_t-\mathbf{f}_{t})+\color{blue}{\lambda\mathbf{f}_{t}'\mathbf{W}\mathbf{f}_{t}}$$` - When Lee Carter model is refit, do not fit residuals, but residuals plus a term involving `\(\mathbf{W}\mathbf{f}_{t}\)` -- - What is `\(\mathbf{W}\)`? -- - The Graph Laplacian. -- - Why does it make sense? -- - Need to think about neighborhoods. --- # Graph Laplacian Neighborhood structure for age <img src="SpatBoosting_files/figure-html/unnamed-chunk-1-1.png" style="display: block; margin: auto;" /> Neighborhood structure for states <img src="SpatBoosting_files/figure-html/unnamed-chunk-2-1.png" width="50%" style="display: block; margin: auto;" /> --- # Graph Laplacian - Off diagonal elements are -1 if two nodes share an edge, 0 otherwise. - Diagonal elements are equal to number of neighbors. .pull-left[ <img src="SpatBoosting_files/figure-html/unnamed-chunk-3-1.png" width="80%" style="display: block; margin: auto;" /> ] .pull-right[![](img/GL.png)] --- # Effect on penalty - Residual for Maryland (MD) adds penalty `$$(f_{MD,t}-f_{DE,t})+(f_{MD,t}-f_{PA,t})+(f_{MD,t}-f_{WV,t})$$` -- - For age 50 add penalty `$$(f_{50,t}-f_{49,t})+(f_{50,t}-f_{51,t})$$` -- - When we fit the next Lee Carter model it will force forecasts to be closer to its neighbors forecasts. --- # Putting everything together - We can shrink across age and states using the following neighborhood structure -- - Same age, neighboring states are neighbors - Same state, neighboring age are neighbors -- - Mathematically we find the Cartesian product of the two graphs. -- - The graph Laplacian is found by a simple expression involving Kronecker products, see paper for details. --- class: inverse, center, middle # Results --- # Empirical setup - Expanding window from `\(1960-1997,...,2009\)`. -- - Make `\(h=1,\dots,10\)` step ahead forecasts. -- - Use Lee Carter (LC) and three other benchmarks - Hyndman and Ullah (2007) (H-U) - Hyndman, Booth, and Yasmeen (2013) (H-B-Y) - LightGBM (a popular tree based boosting method). --- # Empirical setup - Consider four new approaches: - Gradient boosted Lee Carter (GBLC) - with age shrinkage (GBLC-age) - with state shrinkage (GBLC-state) - with age and state shrinkage (GBLC-age-state) --- #Summary of results - At a national level, GBLC and GBLC-age significantly outperform everything. - At horizons 1 and 4, GBLC-age is significantly better than GBLC. - At a state level, LightGBM, GBLC-age and GBLC-age-state significantly outperform everything. - For longer horizons, GBLC-age-state outperforms even LightGBM an GBLC-age. - These results are based on model confidence sets --- # National level results For presentation purposes take averages across 20 year age bands. Figures show improvement in MASE ![Image Title](img/national_results.png) --- # State level results ![Image Title](img/state_results.png) --- # State level results Improvement from GBLC to GBLC-state ![Image Title](img/male_improvement_map.svg) --- # Conclusions - If you have a forecasting model that works in a specific domain, try boosting. - If you have structure in what you are trying to forecast, consider shrinkage. -- ![](img/thumbs.jpeg) --- # References .small[ Brant, S. B. et al. (2022). _Copulaboost: additive modeling with copula-based model components_. arXiv: 2208.04669 [stat.ME]. Friedman, J. et al. (2000). "Additive logistic regression: A statistical view of boosting (With discussion and a rejoinder by the authors)". In: _The Annals of Statistics_ 28.2, pp. 337 - 407. Hyndman, R. J. et al. (2013). "Coherent mortality forecasting: the product-ratio method with functional time series models". In: _Demography_ 50.1, pp. 261-283. Hyndman, R. J. et al. (2007). "Robust forecasting of mortality and fertility rates: A functional data approach". In: _Computational Statistics & Data Analysis_ 51.10, pp. 4942-4956. Lee, R. D. et al. (1992). "Modeling and forecasting US mortality". In: _Journal of the American Statistical Association_ 87.419, pp. 659-671. Tutz, G. et al. (2006). "Generalized Additive Modeling with Implicit Variable Selection by Likelihood-Based Boosting". In: _Biometrics_ 62.4, pp. 961-971. ]