class: center, middle, inverse, title-slide # Model Comparison ## PSYC 573 ### University of Southern California ### April 14, 2022 --- # Guiding Questions - What is *overfitting* and why is it problematic? - How to measure *closeness* of a model to the true model? * What do *information criteria* do? --- class: clear ## In-Sample and Out-Of-Sample Prediction - Randomly sample 10 states <img src="model_comparison_files/figure-html/plot-wd-sub-1.png" width="70%" style="display: block; margin: auto;" /> --- # Underfitting and Overfitting - Complex models require more data * Too few data for a complex model: **overfitting** * A model being too simple: **underfitting** <img src="model_comparison_files/figure-html/overfit-data-1.png" width="85%" style="display: block; margin: auto;" /> --- # Prediction of Future Observations - The more a model captures the noise in the original data, the less likely it predicts future observations well <img src="model_comparison_files/figure-html/overfit-generalize-1.png" width="85%" style="display: block; margin: auto;" /> --- # What Is A Good Model? - Closeness from the proposed model (`\(M_1\)`) to a "true" model (`\(M_0\)`) * *Kullback-Leibler Divergence* (`\(D_\textrm{KL}\)`) = `\(\text{Entropy of }M_0 - \text{elpd of }M_1\)` * elpd: expected log predictive density: `\(E_{M_0}[\log P_{M_1}(\tilde {\mathbf{y}})]\)` -- - Choose a model with *smallest `\(D_\textrm{KL}\)`* * When `\(M_0 = M_1\)`, `\(D_\textrm{KL} = 0\)` * `\(\Rightarrow\)` choose a model with largest elpd --- exclude: TRUE class: clear ### Example - True model of data: `\(M_0\)`: `\(y \sim N(3, 2)\)` - `\(M_1\)`: `\(y \sim N(3.5, 2.5)\)` - `\(M_2\)`: `\(y \sim \mathrm{Cauchy}(3, 2)\)` .pull-left[ <img src="model_comparison_files/figure-html/divergence-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ .font70[ Entropy of `\(M_0\)` = -2.112 - Theoretical elpd of `\(M_1\)`: -2.175 - `\(D_\textrm{KL}(M_0 \mid M_1)\)` = 0.063 - Theoretical elpd of `\(M_2\)`: -2.371 - `\(D_\textrm{KL}(M_0 \mid M_2)\)` = 0.259 ] ] --- class: clear ### Expected log *pointwise* predictive density `$$\sum_i \log P_{M_1} (y_i)$$` Note: elpd is a function of sample size -- - Problem: elpd depends on `\(M_0\)`, which is unknown * Estimate elpd using the current sample `\(\rightarrow\)` underestimate discrepancy * Need to estimate elpd using an *independent sample* --- # Overfitting Training set: 25 states; Test set: 25 remaining states <img src="model_comparison_files/figure-html/elpd_df-plot-1.png" width="70%" style="display: block; margin: auto;" /> -- - More complex model = more discrepancy between in-sample and out-of-sample elpd --- # Information Criteria (IC) Approximate discrepancy between in-sample and out-of-sample elpd IC = -2 `\(\times\)` (in-sample elpd - `\(p\)`) `\(p\)` = penalty for model complexity - function of number of parameters -- Choose a model with **smaller** IC -- Bayesian ICs: DIC, WAIC, etc --- # Cross-Validation - Split the sample into `\(K\)` parts - Fit a model with `\(K\)` - 1 parts, and obtain elpd for the "hold-out" part -- Leave-one-out: `\(K\)` = `\(N\)` - Very computationally intensive - `loo` package: approximation using Pareto smoothed importance sampling --- class: clear ```r loo(m1) ``` ``` ># ># Computed from 8000 by 50 log-likelihood matrix ># ># Estimate SE ># elpd_loo 15.1 4.9 ># p_loo 3.3 1.0 ># looic -30.2 9.9 ># ------ ># Monte Carlo SE of elpd_loo is 0.0. ># ># All Pareto k estimates are good (k < 0.5). ># See help('pareto-k-diagnostic') for details. ``` --- # Comparing Models `$$\texttt{Divorce}_i \sim N(\mu_i, \sigma)$$` - M1: `Marriage` - M2: `Marriage`, `South`, `Marriage` `\(\times\)` `South` - M3: `South`, smoothing spline of `Marriage` by `South` - M4: `Marriage`, `South`, `MedianAgeMarriage`, `Marriage` `\(\times\)` `South`, `Marriage` `\(\times\)` `MedianAgeMarriage`, `South` `\(\times\)` `MedianAgeMarriage`, `Marriage` `\(\times\)` `South` `\(\times\)` `MedianAgeMarriage` --- class: clear .font50[ <table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:center;"> M1 </th> <th style="text-align:center;"> M2 </th> <th style="text-align:center;"> M3 </th> <th style="text-align:center;"> M4 </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> b_Intercept </td> <td style="text-align:center;"> 0.61 </td> <td style="text-align:center;"> 0.67 </td> <td style="text-align:center;"> 0.94 </td> <td style="text-align:center;"> 5.53 </td> </tr> <tr> <td style="text-align:left;"> b_Marriage </td> <td style="text-align:center;"> 0.18 </td> <td style="text-align:center;"> 0.13 </td> <td style="text-align:center;"> </td> <td style="text-align:center;"> −1.21 </td> </tr> <tr> <td style="text-align:left;"> b_Southsouth </td> <td style="text-align:center;"> </td> <td style="text-align:center;"> −0.62 </td> <td style="text-align:center;"> 0.10 </td> <td style="text-align:center;"> 0.32 </td> </tr> <tr> <td style="text-align:left;"> b_Marriage × Southsouth </td> <td style="text-align:center;"> </td> <td style="text-align:center;"> 0.36 </td> <td style="text-align:center;"> </td> <td style="text-align:center;"> 0.52 </td> </tr> <tr> <td style="text-align:left;"> bs_sMarriage × SouthnonMsouth_1 </td> <td style="text-align:center;"> </td> <td style="text-align:center;"> </td> <td style="text-align:center;"> −0.55 </td> <td style="text-align:center;"> </td> </tr> <tr> <td style="text-align:left;"> bs_sMarriage × Southsouth_1 </td> <td style="text-align:center;"> </td> <td style="text-align:center;"> </td> <td style="text-align:center;"> 1.27 </td> <td style="text-align:center;"> </td> </tr> <tr> <td style="text-align:left;"> sds_sMarriageSouthnonMsouth_1 </td> <td style="text-align:center;"> </td> <td style="text-align:center;"> </td> <td style="text-align:center;"> 0.91 </td> <td style="text-align:center;"> </td> </tr> <tr> <td style="text-align:left;"> sds_sMarriageSouthsouth_1 </td> <td style="text-align:center;"> </td> <td style="text-align:center;"> </td> <td style="text-align:center;"> 0.48 </td> <td style="text-align:center;"> </td> </tr> <tr> <td style="text-align:left;"> b_MedianAgeMarriage </td> <td style="text-align:center;"> </td> <td style="text-align:center;"> </td> <td style="text-align:center;"> </td> <td style="text-align:center;"> −1.73 </td> </tr> <tr> <td style="text-align:left;"> b_Marriage × MedianAgeMarriage </td> <td style="text-align:center;"> </td> <td style="text-align:center;"> </td> <td style="text-align:center;"> </td> <td style="text-align:center;"> 0.45 </td> </tr> <tr> <td style="text-align:left;"> b_MedianAgeMarriage × Southsouth </td> <td style="text-align:center;"> </td> <td style="text-align:center;"> </td> <td style="text-align:center;"> </td> <td style="text-align:center;"> −0.36 </td> </tr> <tr> <td style="text-align:left;box-shadow: 0px 1px"> b_Marriage × MedianAgeMarriage × Southsouth </td> <td style="text-align:center;box-shadow: 0px 1px"> </td> <td style="text-align:center;box-shadow: 0px 1px"> </td> <td style="text-align:center;box-shadow: 0px 1px"> </td> <td style="text-align:center;box-shadow: 0px 1px"> −0.08 </td> </tr> <tr> <td style="text-align:left;"> ELPD </td> <td style="text-align:center;"> 15.1 </td> <td style="text-align:center;"> 18.3 </td> <td style="text-align:center;"> 17.7 </td> <td style="text-align:center;"> 23.8 </td> </tr> <tr> <td style="text-align:left;"> ELPD s.e. </td> <td style="text-align:center;"> 4.9 </td> <td style="text-align:center;"> 5.5 </td> <td style="text-align:center;"> 5.8 </td> <td style="text-align:center;"> 6.1 </td> </tr> <tr> <td style="text-align:left;"> LOOIC </td> <td style="text-align:center;"> −30.2 </td> <td style="text-align:center;"> −36.6 </td> <td style="text-align:center;"> −35.3 </td> <td style="text-align:center;"> −47.5 </td> </tr> <tr> <td style="text-align:left;"> LOOIC s.e. </td> <td style="text-align:center;"> 9.9 </td> <td style="text-align:center;"> 11.0 </td> <td style="text-align:center;"> 11.7 </td> <td style="text-align:center;"> 12.1 </td> </tr> <tr> <td style="text-align:left;"> WAIC </td> <td style="text-align:center;"> −30.3 </td> <td style="text-align:center;"> −36.9 </td> <td style="text-align:center;"> −37.1 </td> <td style="text-align:center;"> −48.1 </td> </tr> <tr> <td style="text-align:left;"> RMSE </td> <td style="text-align:center;"> 0.17 </td> <td style="text-align:center;"> 0.15 </td> <td style="text-align:center;"> 0.14 </td> <td style="text-align:center;"> 0.13 </td> </tr> </tbody> </table> ] --- # Notes for Using ICs - Same outcome variable and transformation - Same sample size - Cannot compare discrete and continuous models * E.g., Poisson vs. normal