class: center, middle, inverse, title-slide # Generalized Linear Model ## PSYC 573 ### University of Southern California ### March 22, 2022 --- # Regression for Prediction One outcome `\(Y\)`, one or more predictors `\(X_1\)`, `\(X_2\)`, `\(\ldots\)` E.g., - What will a student's college GPA be given an SAT score of `\(x\)`? - How long will a person live if the person adopts diet `\(x\)`? - What will the earth's global temperature be if the carbon emission level is `\(x\)`? --- # Keep These in Mind 1. Likelihood function is defined for the outcome `\(Y\)` 2. Prediction is probabilistic (i.e., uncertain) and contains error --- class: center, middle, inverse # Generalized Linear Models (GLM) --- # GLM Three components: - Conditional distribution of `\(Y\)` - Link function - Linear predictor --- # Some Examples | Outcome type | Support | Distributions | Link | |:------------:|:----------------:|:-------------:|:----:| | continuous | [`\(-\infty\)`, `\(\infty\)`] | Normal | Identity | | count (fixed duration) | {0, 1, `\(\ldots\)`} | Poisson | Log | | count (known # of trials) | {0, 1, `\(\ldots\)`, `\(N\)`} | Binomial | Logit | | binary | {0, 1} | Bernoulli | Logit | | ordinal | {0, 1, `\(\ldots\)`, `\(K\)`} | categorical | Logit | | nominal | `\(K\)`-vector of {0, 1} | categorical | Logit | | multinomial | `\(K\)`-vector of {0, 1, `\(\ldots\)`, `\(K\)`} | categorical | Logit | --- # Mathematical Form (One Predictor) `\begin{align} Y_i & \sim \mathrm{Dist}(\mu_i, \tau) \\ g(\mu_i) & = \eta_i \\ \eta_i & = \beta_0 + \beta_1 X_{i} \end{align}` - `\(\mathrm{Dist}\)`: conditional distribution of `\(Y \mid X\)` (e.g., normal, Bernoulli, `\(\ldots\)`) * I.e., distribution of **prediction error**; not the marginal distribution of `\(Y\)` - `\(\mu_i\)`: mean parameter for the `\(i\)`th observation - `\(\eta_i\)`: linear predictor - `\(g(\cdot)\)`: link function - (`\(\tau\)`: dispersion parameter) --- # Illustration Next few slides contain example GLMs, with the same predictor `\(X\)` ```r num_obs <- 100 x <- runif(num_obs, min = 1, max = 5) # uniform x beta0 <- 0.2; beta1 <- 0.5 ``` --- # Normal, Identity Link aka linear regression .panelset[ .panel[.panel-name[Model] `\begin{align} Y_i & \sim N(\mu_i, \sigma) \\ \mu_i & = \eta_i \\ \eta_i & = \beta_0 + \beta_1 X_{i} \end{align}` ] .panel[.panel-name[Simulation] .font70[ ```r eta <- beta0 + beta1 * x mu <- eta y <- rnorm(num_obs, mean = mu, sd = 0.3) ``` ] <img src="glm_files/figure-html/plot-lm-1.png" width="50%" style="display: block; margin: auto;" /> ] ] --- # Poisson, Log Link aka poisson regression .panelset[ .panel[.panel-name[Model] `\begin{align} Y_i & \sim \mathrm{Pois}(\mu_i) \\ \log(\mu_i) & = \eta_i \\ \eta_i & = \beta_0 + \beta_1 X_{i} \end{align}` ] .panel[.panel-name[Simulation] .font70[ ```r eta <- beta0 + beta1 * x mu <- exp(eta) # inverse link y <- rpois(num_obs, lambda = mu) ``` ] <img src="glm_files/figure-html/plot-pois-1.png" width="50%" style="display: block; margin: auto;" /> ] ] --- # Bernoulli, Logit Link aka binary logistic regression .panelset[ .panel[.panel-name[Model] `\begin{align} Y_i & \sim \mathrm{Bern}(\mu_i) \\ \log\left(\frac{\mu_i}{1 - \mu_i}\right) & = \eta_i \\ \eta_i & = \beta_0 + \beta_1 X_{i} \end{align}` ] .panel[.panel-name[Simulation] .font70[ ```r eta <- beta0 + beta1 * x mu <- plogis(eta) # inverse link is logistic y <- rbinom(num_obs, size = 1, prob = mu) ``` ] <img src="glm_files/figure-html/plot-bern-logit-1.png" width="50%" style="display: block; margin: auto;" /> ] ] --- # Binomial, Logit Link aka binomial logistic regression .panelset[ .panel[.panel-name[Model] `\begin{align} Y_i & \sim \mathrm{Bin}(N, \mu_i) \\ \log\left(\frac{\mu_i}{1 - \mu_i}\right) & = \eta_i \\ \eta_i & = \beta_0 + \beta_1 X_{i} \end{align}` ] .panel[.panel-name[Simulation] .font70[ ```r num_trials <- 10 eta <- beta0 + beta1 * x mu <- plogis(eta) # inverse link is logistic y <- rbinom(num_obs, size = num_trials, prob = mu) ``` ] <img src="glm_files/figure-html/plot-bin-logit-1.png" width="50%" style="display: block; margin: auto;" /> ] ] --- # Remarks Different link functions can be used - E.g., identity link or probit link for Bernoulli variables Linearity is a strong assumption - GLM can allow `\(\eta\)` and `\(X\)` to be nonlinearly related, as long as it's linear in the coefficients * E.g., `\(\eta_i = \beta_0 + \beta_1 \log(X_{i})\)` * E.g., `\(\eta_i = \beta_0 + \beta_1 X_i + \beta_2 X_i^2\)` * But not something like `\(\eta_i = \beta_0 \log(\beta_1 + x_i)\)` --- class: inverse, middle, center # Linear Regression --- class: clear Many relations can be approximated as linear But many relations cannot be approximated as linear --- class: clear ### Example: "Bread and peace" model <img src="glm_files/figure-html/unnamed-chunk-4-1.png" width="70%" style="display: block; margin: auto;" /> --- # Linear Regression Model Model: `\begin{align} \text{vote}_i & \sim N(\mu_i, \sigma) \\ \mu_i & = \beta_0 + \beta_1 \text{growth}_i \end{align}` `\(\sigma\)`: SD (margin) of prediction error Prior: `\begin{align} \beta_0 & \sim N(45, 10) \\ \beta_1 & \sim N(0, 10) \\ \sigma & \sim t^+_4(0, 5) \end{align}` --- class: clear .panelset[ .panel[.panel-name[Stan] .font60[ ```stan data { int<lower=0> N; // number of observations vector[N] y; // outcome; vector[N] x; // predictor; } parameters { real beta0; // regression intercept real beta1; // regression coefficient real<lower=0> sigma; // SD of prediction error } model { // model y ~ normal(beta0 + beta1 * x, sigma); // prior beta0 ~ normal(45, 10); beta1 ~ normal(0, 10); sigma ~ student_t(4, 0, 5); } generated quantities { vector[N] y_rep; // place holder for (n in 1:N) y_rep[n] = normal_rng(beta0 + beta1 * x[n], sigma); } ``` ] ] .panel[.panel-name[`brms`] .font70[ ```r library(brms) m1_brm <- brm(vote ~ growth, data = hibbs, family = gaussian(link = "identity"), prior = c( prior(normal(45, 10), class = "Intercept"), prior(normal(0, 10), class = "b"), prior(student_t(4, 0, 5), class = "sigma") ), iter = 4000, seed = 31143) ``` ``` ># ># SAMPLING FOR MODEL '11c459bfe9958a66b303e7ee3a15738a' NOW (CHAIN 1). ># Chain 1: ># Chain 1: Gradient evaluation took 1e-05 seconds ># Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.1 seconds. ># Chain 1: Adjust your expectations accordingly! ># Chain 1: ># Chain 1: ># Chain 1: Iteration: 1 / 4000 [ 0%] (Warmup) ># Chain 1: Iteration: 400 / 4000 [ 10%] (Warmup) ># Chain 1: Iteration: 800 / 4000 [ 20%] (Warmup) ># Chain 1: Iteration: 1200 / 4000 [ 30%] (Warmup) ># Chain 1: Iteration: 1600 / 4000 [ 40%] (Warmup) ># Chain 1: Iteration: 2000 / 4000 [ 50%] (Warmup) ># Chain 1: Iteration: 2001 / 4000 [ 50%] (Sampling) ># Chain 1: Iteration: 2400 / 4000 [ 60%] (Sampling) ># Chain 1: Iteration: 2800 / 4000 [ 70%] (Sampling) ># Chain 1: Iteration: 3200 / 4000 [ 80%] (Sampling) ># Chain 1: Iteration: 3600 / 4000 [ 90%] (Sampling) ># Chain 1: Iteration: 4000 / 4000 [100%] (Sampling) ># Chain 1: ># Chain 1: Elapsed Time: 0.029007 seconds (Warm-up) ># Chain 1: 0.028928 seconds (Sampling) ># Chain 1: 0.057935 seconds (Total) ># Chain 1: ># ># SAMPLING FOR MODEL '11c459bfe9958a66b303e7ee3a15738a' NOW (CHAIN 2). ># Chain 2: ># Chain 2: Gradient evaluation took 5e-06 seconds ># Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.05 seconds. ># Chain 2: Adjust your expectations accordingly! ># Chain 2: ># Chain 2: ># Chain 2: Iteration: 1 / 4000 [ 0%] (Warmup) ># Chain 2: Iteration: 400 / 4000 [ 10%] (Warmup) ># Chain 2: Iteration: 800 / 4000 [ 20%] (Warmup) ># Chain 2: Iteration: 1200 / 4000 [ 30%] (Warmup) ># Chain 2: Iteration: 1600 / 4000 [ 40%] (Warmup) ># Chain 2: Iteration: 2000 / 4000 [ 50%] (Warmup) ># Chain 2: Iteration: 2001 / 4000 [ 50%] (Sampling) ># Chain 2: Iteration: 2400 / 4000 [ 60%] (Sampling) ># Chain 2: Iteration: 2800 / 4000 [ 70%] (Sampling) ># Chain 2: Iteration: 3200 / 4000 [ 80%] (Sampling) ># Chain 2: Iteration: 3600 / 4000 [ 90%] (Sampling) ># Chain 2: Iteration: 4000 / 4000 [100%] (Sampling) ># Chain 2: ># Chain 2: Elapsed Time: 0.029802 seconds (Warm-up) ># Chain 2: 0.028111 seconds (Sampling) ># Chain 2: 0.057913 seconds (Total) ># Chain 2: ># ># SAMPLING FOR MODEL '11c459bfe9958a66b303e7ee3a15738a' NOW (CHAIN 3). ># Chain 3: ># Chain 3: Gradient evaluation took 5e-06 seconds ># Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.05 seconds. ># Chain 3: Adjust your expectations accordingly! ># Chain 3: ># Chain 3: ># Chain 3: Iteration: 1 / 4000 [ 0%] (Warmup) ># Chain 3: Iteration: 400 / 4000 [ 10%] (Warmup) ># Chain 3: Iteration: 800 / 4000 [ 20%] (Warmup) ># Chain 3: Iteration: 1200 / 4000 [ 30%] (Warmup) ># Chain 3: Iteration: 1600 / 4000 [ 40%] (Warmup) ># Chain 3: Iteration: 2000 / 4000 [ 50%] (Warmup) ># Chain 3: Iteration: 2001 / 4000 [ 50%] (Sampling) ># Chain 3: Iteration: 2400 / 4000 [ 60%] (Sampling) ># Chain 3: Iteration: 2800 / 4000 [ 70%] (Sampling) ># Chain 3: Iteration: 3200 / 4000 [ 80%] (Sampling) ># Chain 3: Iteration: 3600 / 4000 [ 90%] (Sampling) ># Chain 3: Iteration: 4000 / 4000 [100%] (Sampling) ># Chain 3: ># Chain 3: Elapsed Time: 0.029756 seconds (Warm-up) ># Chain 3: 0.029499 seconds (Sampling) ># Chain 3: 0.059255 seconds (Total) ># Chain 3: ># ># SAMPLING FOR MODEL '11c459bfe9958a66b303e7ee3a15738a' NOW (CHAIN 4). ># Chain 4: ># Chain 4: Gradient evaluation took 6e-06 seconds ># Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.06 seconds. ># Chain 4: Adjust your expectations accordingly! ># Chain 4: ># Chain 4: ># Chain 4: Iteration: 1 / 4000 [ 0%] (Warmup) ># Chain 4: Iteration: 400 / 4000 [ 10%] (Warmup) ># Chain 4: Iteration: 800 / 4000 [ 20%] (Warmup) ># Chain 4: Iteration: 1200 / 4000 [ 30%] (Warmup) ># Chain 4: Iteration: 1600 / 4000 [ 40%] (Warmup) ># Chain 4: Iteration: 2000 / 4000 [ 50%] (Warmup) ># Chain 4: Iteration: 2001 / 4000 [ 50%] (Sampling) ># Chain 4: Iteration: 2400 / 4000 [ 60%] (Sampling) ># Chain 4: Iteration: 2800 / 4000 [ 70%] (Sampling) ># Chain 4: Iteration: 3200 / 4000 [ 80%] (Sampling) ># Chain 4: Iteration: 3600 / 4000 [ 90%] (Sampling) ># Chain 4: Iteration: 4000 / 4000 [100%] (Sampling) ># Chain 4: ># Chain 4: Elapsed Time: 0.030512 seconds (Warm-up) ># Chain 4: 0.03104 seconds (Sampling) ># Chain 4: 0.061552 seconds (Total) ># Chain 4: ``` ] ] .panel[.panel-name[`brms` results] .font70[ ``` ># Family: gaussian ># Links: mu = identity; sigma = identity ># Formula: vote ~ growth ># Data: hibbs (Number of observations: 16) ># Draws: 4 chains, each with iter = 4000; warmup = 2000; thin = 1; ># total post-warmup draws = 8000 ># ># Population-Level Effects: ># Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS ># Intercept 46.21 1.78 42.75 49.75 1.00 6338 4772 ># growth 3.05 0.76 1.52 4.55 1.00 6380 4762 ># ># Family Specific Parameters: ># Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS ># sigma 3.99 0.81 2.77 5.94 1.00 5229 4889 ># ># Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS ># and Tail_ESS are effective sample size measures, and Rhat is the potential ># scale reduction factor on split chains (at convergence, Rhat = 1). ``` ] ] ] --- class: clear .pull-left[ <img src="glm_files/figure-html/unnamed-chunk-6-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ <img src="glm_files/figure-html/unnamed-chunk-7-1.png" width="100%" style="display: block; margin: auto;" /> ] --- # Meaning of Coefficients When growth = 0, `\(\text{vote} \sim N(\beta_0, \sigma)\)` When growth = 1, `\(\text{vote} \sim N(\beta_0 + \beta_1, \sigma)\)` <img src="glm_files/figure-html/unnamed-chunk-8-1.png" width="70%" style="display: block; margin: auto;" /> --- # Posterior Predictive Check <img src="glm_files/figure-html/unnamed-chunk-9-1.png" width="70%" style="display: block; margin: auto;" /> The model fits a majority of the data, but not everyone. The biggest discrepancy is 1952. --- # Prediction .font70[ Predicted vote share when growth = 2: `\(\tilde y \mid y \sim N(\beta_0 + \beta_1 \times 2, \sigma)\)` ```r pp_growth_eq_2 <- posterior_predict(m1_brm, newdata = list(growth = 2) ) ``` ] <img src="glm_files/figure-html/unnamed-chunk-11-1.png" width="50%" style="display: block; margin: auto;" /> Probability of incumbent's vote share > 50% = 0.713 --- # Table .font70[ ```r library(modelsummary) msummary(m1_brm, estimate = "{estimate} [{conf.low}, {conf.high}]", statistic = NULL, fmt = 2) ``` <table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:center;"> Model 1 </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> b_Intercept </td> <td style="text-align:center;"> 46.20 [42.76, 49.75] </td> </tr> <tr> <td style="text-align:left;"> b_growth </td> <td style="text-align:center;"> 3.03 [1.56, 4.56] </td> </tr> <tr> <td style="text-align:left;box-shadow: 0px 1px"> sigma </td> <td style="text-align:center;box-shadow: 0px 1px"> 3.88 [2.56, 5.51] </td> </tr> <tr> <td style="text-align:left;"> Num.Obs. </td> <td style="text-align:center;"> 16 </td> </tr> <tr> <td style="text-align:left;"> ELPD </td> <td style="text-align:center;"> −46.1 </td> </tr> <tr> <td style="text-align:left;"> ELPD s.e. </td> <td style="text-align:center;"> 3.6 </td> </tr> <tr> <td style="text-align:left;"> LOOIC </td> <td style="text-align:center;"> 92.3 </td> </tr> <tr> <td style="text-align:left;"> LOOIC s.e. </td> <td style="text-align:center;"> 7.2 </td> </tr> <tr> <td style="text-align:left;"> WAIC </td> <td style="text-align:center;"> 92.1 </td> </tr> <tr> <td style="text-align:left;"> RMSE </td> <td style="text-align:center;"> 24.97 </td> </tr> </tbody> </table> ] --- # Diagnostics Linearity (functional form) .pull-left[ <img src="glm_files/figure-html/unnamed-chunk-13-1.png" width="100%" style="display: block; margin: auto;" /> ] -- .pull-right[ .font70[ ```r pp_check(m_lin, type = "intervals", x = "x") + geom_smooth(aes(x = x, y = y), se = FALSE) ``` <img src="glm_files/figure-html/unnamed-chunk-15-1.png" width="90%" style="display: block; margin: auto;" /> ] ] --- # Residual Plots <img src="glm_files/figure-html/unnamed-chunk-16-1.png" width="45%" style="display: block; margin: auto;" /> -- .font70[ .pull-left[ ```r pp_check(m_lin_norm, ndraws = 100, bw = "SJ") ``` <img src="glm_files/figure-html/unnamed-chunk-18-1.png" width="90%" style="display: block; margin: auto;" /> ] .pull-right[ ```r pp_check(m_lin_norm, type = "error_scatter_avg_vs_x", x = "x") ``` <img src="glm_files/figure-html/unnamed-chunk-19-1.png" width="90%" style="display: block; margin: auto;" /> ] ] --- # Prediction vs. Explanation Is personal income growth a reason a candidate/party got more vote share? If so, what is the mechanism? If not, what is responsible for the association? --- # Additional Notes Outlier: use `\(Y_i \sim t_\nu(\mu_i, \sigma)\)` Nonconstant `\(\sigma\)` - One option is `\(\log(\sigma_i) = \beta^{s}_0 + \beta^{s}_1 X_i\)` Check whether linearity holds - Other options: splines, quadratic, log transform (i.e., lognormal model), etc --- class: inverse, middle, center # Poisson Regression --- class: clear .font70[ - `count`: The seizure count between two visits - `Trt`: Either 0 or 1 indicating if the patient received anticonvulsant therapy ] `\begin{align} \text{count}_i & \sim \mathrm{Pois}(\mu_i) \\ \log(\mu_i) & = \eta_i \\ \eta_i & = \beta_0 + \beta_1 \text{Trt}_{i} \end{align}` <img src="glm_files/figure-html/unnamed-chunk-20-1.png" width="70%" style="display: block; margin: auto;" /> --- class: clear ### Poisson with log link Predicted seizure rate = `\(\exp(\beta_0 + \beta_1) = \exp(\beta_0) \exp(\beta_1)\)` for Trt = 1; `\(\exp(\beta_0)\)` for Trt = 0 `\(\beta_1\)` = mean difference in **log** rate of seizure; `\(\exp(\beta_1)\)` = ratio in rate of seizure --- class: clear .font70[ ```r m2 <- brm(count ~ Trt, data = epilepsy4, family = poisson(link = "log")) ``` ``` ># Family: poisson ># Links: mu = log ># Formula: count ~ Trt ># Data: epilepsy4 (Number of observations: 59) ># Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1; ># total post-warmup draws = 4000 ># ># Population-Level Effects: ># Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS ># Intercept 2.07 0.07 1.95 2.20 1.00 3759 2833 ># Trt1 -0.17 0.10 -0.36 0.01 1.00 3082 2363 ># ># Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS ># and Tail_ESS are effective sample size measures, and Rhat is the potential ># scale reduction factor on split chains (at convergence, Rhat = 1). ``` ] --- class: clear ### Poisson with identity link In this case, with one binary predictor, the link does not matter to the fit `\begin{align} \text{count}_i & \sim \mathrm{Pois}(\mu_i) \\ \mu_i & = \eta_i \\ \eta_i & = \beta_0 + \beta_1 \text{Trt}_{i} \end{align}` `\(\beta_1\)` = mean difference in the rate of seizure in two weeks ```r m3 <- brm(count ~ Trt, data = epilepsy4, family = poisson(link = "identity")) ``` --- exclude: TRUE class: clear .pull-left[ Prediction With Log Link <img src="glm_files/figure-html/unnamed-chunk-24-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ Prediction With Identity Link <img src="glm_files/figure-html/unnamed-chunk-25-1.png" width="100%" style="display: block; margin: auto;" /> ] --- class: clear .font70[ <table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:center;"> log link </th> <th style="text-align:center;"> identity link </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> b_Intercept </td> <td style="text-align:center;"> 2.07 </td> <td style="text-align:center;"> 7.97 </td> </tr> <tr> <td style="text-align:left;"> </td> <td style="text-align:center;"> [1.95, 2.20] </td> <td style="text-align:center;"> [6.94, 8.96] </td> </tr> <tr> <td style="text-align:left;"> b_Trt1 </td> <td style="text-align:center;"> −0.17 </td> <td style="text-align:center;"> −1.25 </td> </tr> <tr> <td style="text-align:left;box-shadow: 0px 1px"> </td> <td style="text-align:center;box-shadow: 0px 1px"> [−0.35, 0.02] </td> <td style="text-align:center;box-shadow: 0px 1px"> [−2.58, 0.16] </td> </tr> <tr> <td style="text-align:left;"> Num.Obs. </td> <td style="text-align:center;"> 59 </td> <td style="text-align:center;"> 59 </td> </tr> <tr> <td style="text-align:left;"> ELPD </td> <td style="text-align:center;"> −343.1 </td> <td style="text-align:center;"> −345.1 </td> </tr> <tr> <td style="text-align:left;"> ELPD s.e. </td> <td style="text-align:center;"> 93.8 </td> <td style="text-align:center;"> 95.7 </td> </tr> <tr> <td style="text-align:left;"> LOOIC </td> <td style="text-align:center;"> 686.2 </td> <td style="text-align:center;"> 690.2 </td> </tr> <tr> <td style="text-align:left;"> LOOIC s.e. </td> <td style="text-align:center;"> 187.7 </td> <td style="text-align:center;"> 191.3 </td> </tr> <tr> <td style="text-align:left;"> WAIC </td> <td style="text-align:center;"> 688.5 </td> <td style="text-align:center;"> 687.8 </td> </tr> <tr> <td style="text-align:left;"> RMSE </td> <td style="text-align:center;"> 10.50 </td> <td style="text-align:center;"> 10.53 </td> </tr> </tbody> </table> ]