Generalized Linear Model

class: center, middle, inverse, title-slide

# Generalized Linear Model
## PSYC 573
### University of Southern California
### March 22, 2022

---

# Regression for Prediction

One outcome `\(Y\)`, one or more predictors `\(X_1\)`, `\(X_2\)`, `\(\ldots\)`

E.g.,

- What will a student's college GPA be given an SAT score of `\(x\)`?
- How long will a person live if the person adopts diet `\(x\)`?
- What will the earth's global temperature be if the carbon emission level is `\(x\)`?

---

# Keep These in Mind

1. Likelihood function is defined for the outcome `\(Y\)`

2. Prediction is probabilistic (i.e., uncertain) and contains error

---
class: center, middle, inverse

# Generalized Linear Models (GLM)

---

# GLM

Three components:

- Conditional distribution of `\(Y\)`
- Link function
- Linear predictor

---

# Some Examples

| Outcome type | Support          | Distributions | Link |
|:------------:|:----------------:|:-------------:|:----:|
| continuous | [`\(-\infty\)`, `\(\infty\)`] | Normal | Identity |
| count (fixed duration) | {0, 1, `\(\ldots\)`} | Poisson | Log |
| count (known # of trials) | {0, 1, `\(\ldots\)`, `\(N\)`} | Binomial | Logit |
| binary | {0, 1} | Bernoulli | Logit |
| ordinal | {0, 1, `\(\ldots\)`, `\(K\)`} | categorical | Logit |
| nominal | `\(K\)`-vector of {0, 1} | categorical | Logit |
| multinomial | `\(K\)`-vector of {0, 1, `\(\ldots\)`, `\(K\)`} | categorical | Logit |

---

# Mathematical Form (One Predictor)

`\begin{align}
  Y_i & \sim \mathrm{Dist}(\mu_i, \tau)  \\
  g(\mu_i) & = \eta_i \\
  \eta_i & = \beta_0 + \beta_1 X_{i}
\end{align}`

- `\(\mathrm{Dist}\)`: conditional distribution of `\(Y \mid X\)` (e.g., normal, Bernoulli, `\(\ldots\)`)
    * I.e., distribution of **prediction error**; not the marginal distribution of `\(Y\)`
- `\(\mu_i\)`: mean parameter for the `\(i\)`th observation
- `\(\eta_i\)`: linear predictor
- `\(g(\cdot)\)`: link function
- (`\(\tau\)`: dispersion parameter)

---

# Illustration

Next few slides contain example GLMs, with the same predictor `\(X\)`

```r
num_obs <- 100
x <- runif(num_obs, min = 1, max = 5)  # uniform x
beta0 <- 0.2; beta1 <- 0.5
```

---

# Normal, Identity Link

aka linear regression

.panelset[
.panel[.panel-name[Model]

`\begin{align}
  Y_i & \sim N(\mu_i, \sigma) \\
  \mu_i & = \eta_i \\
  \eta_i & = \beta_0 + \beta_1 X_{i}
\end{align}`

]

.panel[.panel-name[Simulation]

.font70[

```r
eta <- beta0 + beta1 * x
mu <- eta
y <- rnorm(num_obs, mean = mu, sd = 0.3)
```
]

]
]

---

# Poisson, Log Link

aka poisson regression

.panelset[
.panel[.panel-name[Model]

`\begin{align}
  Y_i & \sim \mathrm{Pois}(\mu_i) \\
  \log(\mu_i) & = \eta_i \\
  \eta_i & = \beta_0 + \beta_1 X_{i}
\end{align}`

]

.panel[.panel-name[Simulation]

.font70[

```r
eta <- beta0 + beta1 * x
mu <- exp(eta)  # inverse link
y <- rpois(num_obs, lambda = mu)
```

]

]
]

---

# Bernoulli, Logit Link

aka binary logistic regression

.panelset[
.panel[.panel-name[Model]

`\begin{align}
  Y_i & \sim \mathrm{Bern}(\mu_i) \\
  \log\left(\frac{\mu_i}{1 - \mu_i}\right) & = \eta_i \\
  \eta_i & = \beta_0 + \beta_1 X_{i}
\end{align}`

]

.panel[.panel-name[Simulation]

.font70[

```r
eta <- beta0 + beta1 * x
mu <- plogis(eta)  # inverse link is logistic
y <- rbinom(num_obs, size = 1, prob = mu)
```

]

]
]

---

# Binomial, Logit Link

aka binomial logistic regression

.panelset[
.panel[.panel-name[Model]

`\begin{align}
  Y_i & \sim \mathrm{Bin}(N, \mu_i) \\
  \log\left(\frac{\mu_i}{1 - \mu_i}\right) & = \eta_i \\
  \eta_i & = \beta_0 + \beta_1 X_{i}
\end{align}`

]

.panel[.panel-name[Simulation]

.font70[

```r
num_trials <- 10
eta <- beta0 + beta1 * x
mu <- plogis(eta)  # inverse link is logistic
y <- rbinom(num_obs, size = num_trials, prob = mu)
```

]

]
]

---

# Remarks

Different link functions can be used
- E.g., identity link or probit link for Bernoulli variables

Linearity is a strong assumption
- GLM can allow `\(\eta\)` and `\(X\)` to be nonlinearly related, as long as it's linear in the coefficients
    * E.g., `\(\eta_i = \beta_0 + \beta_1 \log(X_{i})\)`
    * E.g., `\(\eta_i = \beta_0 + \beta_1 X_i + \beta_2 X_i^2\)`
    * But not something like `\(\eta_i = \beta_0 \log(\beta_1 + x_i)\)`

---
class: inverse, middle, center

# Linear Regression

---
class: clear

Many relations can be approximated as linear

But many relations cannot be approximated as linear

---
class: clear

### Example: "Bread and peace" model

---

# Linear Regression Model

Model:

`\begin{align}
  \text{vote}_i & \sim N(\mu_i, \sigma) \\
  \mu_i & = \beta_0 + \beta_1 \text{growth}_i
\end{align}`

`\(\sigma\)`: SD (margin) of prediction error

Prior:

`\begin{align}
  \beta_0 & \sim N(45, 10)  \\
  \beta_1 & \sim N(0, 10)  \\
  \sigma & \sim t^+_4(0, 5)
\end{align}`

---
class: clear

.panelset[
.panel[.panel-name[Stan]

.font60[

```stan
data {
  int<lower=0> N;  // number of observations
  vector[N] y;  // outcome;
  vector[N] x;  // predictor;
}
parameters {
  real beta0;  // regression intercept
  real beta1;  // regression coefficient
  real<lower=0> sigma;  // SD of prediction error
}
model {
  // model
  y ~ normal(beta0 + beta1 * x, sigma);
  // prior
  beta0 ~ normal(45, 10);
  beta1 ~ normal(0, 10);
  sigma ~ student_t(4, 0, 5);
}
generated quantities {
  vector[N] y_rep;  // place holder
  for (n in 1:N)
    y_rep[n] = normal_rng(beta0 + beta1 * x[n], sigma);
}
```

]

.panel[.panel-name[`brms`]

.font70[

```r
library(brms)
m1_brm <- brm(vote ~ growth,
              data = hibbs,
              family = gaussian(link = "identity"),
              prior = c(
                  prior(normal(45, 10), class = "Intercept"),
                  prior(normal(0, 10), class = "b"),
                  prior(student_t(4, 0, 5), class = "sigma")
              ),
              iter = 4000,
              seed = 31143)
```

```
># 
># SAMPLING FOR MODEL '11c459bfe9958a66b303e7ee3a15738a' NOW (CHAIN 1).
># Chain 1: 
># Chain 1: Gradient evaluation took 1e-05 seconds
># Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.1 seconds.
># Chain 1: Adjust your expectations accordingly!
># Chain 1: 
># Chain 1: 
># Chain 1: Iteration:    1 / 4000 [  0%]  (Warmup)
># Chain 1: Iteration:  400 / 4000 [ 10%]  (Warmup)
># Chain 1: Iteration:  800 / 4000 [ 20%]  (Warmup)
># Chain 1: Iteration: 1200 / 4000 [ 30%]  (Warmup)
># Chain 1: Iteration: 1600 / 4000 [ 40%]  (Warmup)
># Chain 1: Iteration: 2000 / 4000 [ 50%]  (Warmup)
># Chain 1: Iteration: 2001 / 4000 [ 50%]  (Sampling)
># Chain 1: Iteration: 2400 / 4000 [ 60%]  (Sampling)
># Chain 1: Iteration: 2800 / 4000 [ 70%]  (Sampling)
># Chain 1: Iteration: 3200 / 4000 [ 80%]  (Sampling)
># Chain 1: Iteration: 3600 / 4000 [ 90%]  (Sampling)
># Chain 1: Iteration: 4000 / 4000 [100%]  (Sampling)
># Chain 1: 
># Chain 1:  Elapsed Time: 0.029007 seconds (Warm-up)
># Chain 1:                0.028928 seconds (Sampling)
># Chain 1:                0.057935 seconds (Total)
># Chain 1: 
># 
># SAMPLING FOR MODEL '11c459bfe9958a66b303e7ee3a15738a' NOW (CHAIN 2).
># Chain 2: 
># Chain 2: Gradient evaluation took 5e-06 seconds
># Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.05 seconds.
># Chain 2: Adjust your expectations accordingly!
># Chain 2: 
># Chain 2: 
># Chain 2: Iteration:    1 / 4000 [  0%]  (Warmup)
># Chain 2: Iteration:  400 / 4000 [ 10%]  (Warmup)
># Chain 2: Iteration:  800 / 4000 [ 20%]  (Warmup)
># Chain 2: Iteration: 1200 / 4000 [ 30%]  (Warmup)
># Chain 2: Iteration: 1600 / 4000 [ 40%]  (Warmup)
># Chain 2: Iteration: 2000 / 4000 [ 50%]  (Warmup)
># Chain 2: Iteration: 2001 / 4000 [ 50%]  (Sampling)
># Chain 2: Iteration: 2400 / 4000 [ 60%]  (Sampling)
># Chain 2: Iteration: 2800 / 4000 [ 70%]  (Sampling)
># Chain 2: Iteration: 3200 / 4000 [ 80%]  (Sampling)
># Chain 2: Iteration: 3600 / 4000 [ 90%]  (Sampling)
># Chain 2: Iteration: 4000 / 4000 [100%]  (Sampling)
># Chain 2: 
># Chain 2:  Elapsed Time: 0.029802 seconds (Warm-up)
># Chain 2:                0.028111 seconds (Sampling)
># Chain 2:                0.057913 seconds (Total)
># Chain 2: 
># 
># SAMPLING FOR MODEL '11c459bfe9958a66b303e7ee3a15738a' NOW (CHAIN 3).
># Chain 3: 
># Chain 3: Gradient evaluation took 5e-06 seconds
># Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.05 seconds.
># Chain 3: Adjust your expectations accordingly!
># Chain 3: 
># Chain 3: 
># Chain 3: Iteration:    1 / 4000 [  0%]  (Warmup)
># Chain 3: Iteration:  400 / 4000 [ 10%]  (Warmup)
># Chain 3: Iteration:  800 / 4000 [ 20%]  (Warmup)
># Chain 3: Iteration: 1200 / 4000 [ 30%]  (Warmup)
># Chain 3: Iteration: 1600 / 4000 [ 40%]  (Warmup)
># Chain 3: Iteration: 2000 / 4000 [ 50%]  (Warmup)
># Chain 3: Iteration: 2001 / 4000 [ 50%]  (Sampling)
># Chain 3: Iteration: 2400 / 4000 [ 60%]  (Sampling)
># Chain 3: Iteration: 2800 / 4000 [ 70%]  (Sampling)
># Chain 3: Iteration: 3200 / 4000 [ 80%]  (Sampling)
># Chain 3: Iteration: 3600 / 4000 [ 90%]  (Sampling)
># Chain 3: Iteration: 4000 / 4000 [100%]  (Sampling)
># Chain 3: 
># Chain 3:  Elapsed Time: 0.029756 seconds (Warm-up)
># Chain 3:                0.029499 seconds (Sampling)
># Chain 3:                0.059255 seconds (Total)
># Chain 3: 
># 
># SAMPLING FOR MODEL '11c459bfe9958a66b303e7ee3a15738a' NOW (CHAIN 4).
># Chain 4: 
># Chain 4: Gradient evaluation took 6e-06 seconds
># Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.06 seconds.
># Chain 4: Adjust your expectations accordingly!
># Chain 4: 
># Chain 4: 
># Chain 4: Iteration:    1 / 4000 [  0%]  (Warmup)
># Chain 4: Iteration:  400 / 4000 [ 10%]  (Warmup)
># Chain 4: Iteration:  800 / 4000 [ 20%]  (Warmup)
># Chain 4: Iteration: 1200 / 4000 [ 30%]  (Warmup)
># Chain 4: Iteration: 1600 / 4000 [ 40%]  (Warmup)
># Chain 4: Iteration: 2000 / 4000 [ 50%]  (Warmup)
># Chain 4: Iteration: 2001 / 4000 [ 50%]  (Sampling)
># Chain 4: Iteration: 2400 / 4000 [ 60%]  (Sampling)
># Chain 4: Iteration: 2800 / 4000 [ 70%]  (Sampling)
># Chain 4: Iteration: 3200 / 4000 [ 80%]  (Sampling)
># Chain 4: Iteration: 3600 / 4000 [ 90%]  (Sampling)
># Chain 4: Iteration: 4000 / 4000 [100%]  (Sampling)
># Chain 4: 
># Chain 4:  Elapsed Time: 0.030512 seconds (Warm-up)
># Chain 4:                0.03104 seconds (Sampling)
># Chain 4:                0.061552 seconds (Total)
># Chain 4:
```

]
]

.panel[.panel-name[`brms` results]

.font70[

```
>#  Family: gaussian 
>#   Links: mu = identity; sigma = identity 
># Formula: vote ~ growth 
>#    Data: hibbs (Number of observations: 16) 
>#   Draws: 4 chains, each with iter = 4000; warmup = 2000; thin = 1;
>#          total post-warmup draws = 8000
># 
># Population-Level Effects: 
>#           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
># Intercept    46.21      1.78    42.75    49.75 1.00     6338     4772
># growth        3.05      0.76     1.52     4.55 1.00     6380     4762
># 
># Family Specific Parameters: 
>#       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
># sigma     3.99      0.81     2.77     5.94 1.00     5229     4889
># 
># Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
># and Tail_ESS are effective sample size measures, and Rhat is the potential
># scale reduction factor on split chains (at convergence, Rhat = 1).
```

]
]
]

---
class: clear

.pull-left[

]

.pull-right[

]

---

# Meaning of Coefficients

When growth = 0, `\(\text{vote} \sim N(\beta_0, \sigma)\)`

When growth = 1, `\(\text{vote} \sim N(\beta_0 + \beta_1, \sigma)\)`

---

# Posterior Predictive Check

The model fits a majority of the data, but not everyone. The biggest discrepancy is 1952.

---

# Prediction

.font70[

Predicted vote share when growth = 2: `\(\tilde y \mid y \sim N(\beta_0 + \beta_1 \times 2, \sigma)\)`

```r
pp_growth_eq_2 <- posterior_predict(m1_brm,
    newdata = list(growth = 2)
)
```

]

Probability of incumbent's vote share > 50% = 0.713

---

# Table

.font70[

```r
library(modelsummary)
msummary(m1_brm, estimate = "{estimate} [{conf.low}, {conf.high}]",
         statistic = NULL, fmt = 2)
```

<table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;">   </th>
   <th style="text-align:center;"> Model 1 </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> b_Intercept </td>
   <td style="text-align:center;"> 46.20 [42.76, 49.75] </td>
  </tr>
  <tr>
   <td style="text-align:left;"> b_growth </td>
   <td style="text-align:center;"> 3.03 [1.56, 4.56] </td>
  </tr>
  <tr>
   <td style="text-align:left;box-shadow: 0px 1px"> sigma </td>
   <td style="text-align:center;box-shadow: 0px 1px"> 3.88 [2.56, 5.51] </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Num.Obs. </td>
   <td style="text-align:center;"> 16 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> ELPD </td>
   <td style="text-align:center;"> −46.1 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> ELPD s.e. </td>
   <td style="text-align:center;"> 3.6 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> LOOIC </td>
   <td style="text-align:center;"> 92.3 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> LOOIC s.e. </td>
   <td style="text-align:center;"> 7.2 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> WAIC </td>
   <td style="text-align:center;"> 92.1 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> RMSE </td>
   <td style="text-align:center;"> 24.97 </td>
  </tr>
</tbody>
</table>

]

---

# Diagnostics

Linearity (functional form)

.pull-left[

]

.pull-right[

.font70[

```r
pp_check(m_lin, type = "intervals", x = "x") +
  geom_smooth(aes(x = x, y = y), se = FALSE)
```

]
]

---

# Residual Plots

.font70[

.pull-left[

```r
pp_check(m_lin_norm, ndraws = 100, bw = "SJ")
```

]

.pull-right[

```r
pp_check(m_lin_norm, type = "error_scatter_avg_vs_x", x = "x")
```

]
]

---

# Prediction vs. Explanation

Is personal income growth a reason a candidate/party got more vote share?

If so, what is the mechanism?

If not, what is responsible for the association?

---

# Additional Notes

Outlier: use `\(Y_i \sim t_\nu(\mu_i, \sigma)\)`

Nonconstant `\(\sigma\)`
- One option is `\(\log(\sigma_i) = \beta^{s}_0 + \beta^{s}_1 X_i\)`

Check whether linearity holds
- Other options: splines, quadratic, log transform (i.e., lognormal model), etc

---
class: inverse, middle, center

# Poisson Regression

---
class: clear

.font70[

- `count`: The seizure count between two visits
- `Trt`: Either 0 or 1 indicating if the patient received anticonvulsant therapy

]

`\begin{align}
  \text{count}_i & \sim \mathrm{Pois}(\mu_i)  \\
  \log(\mu_i) & = \eta_i \\
  \eta_i & = \beta_0 + \beta_1 \text{Trt}_{i}
\end{align}`

---
class: clear

### Poisson with log link

Predicted seizure rate = `\(\exp(\beta_0 + \beta_1) = \exp(\beta_0) \exp(\beta_1)\)` for Trt = 1; `\(\exp(\beta_0)\)` for Trt = 0

`\(\beta_1\)` = mean difference in **log** rate of seizure; `\(\exp(\beta_1)\)` = ratio in rate of seizure

---
class: clear

.font70[

```r
m2 <- brm(count ~ Trt, data = epilepsy4,
          family = poisson(link = "log"))
```

```
>#  Family: poisson 
>#   Links: mu = log 
># Formula: count ~ Trt 
>#    Data: epilepsy4 (Number of observations: 59) 
>#   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
>#          total post-warmup draws = 4000
># 
># Population-Level Effects: 
>#           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
># Intercept     2.07      0.07     1.95     2.20 1.00     3759     2833
># Trt1         -0.17      0.10    -0.36     0.01 1.00     3082     2363
># 
># Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
># and Tail_ESS are effective sample size measures, and Rhat is the potential
># scale reduction factor on split chains (at convergence, Rhat = 1).
```

]

---
class: clear

### Poisson with identity link

In this case, with one binary predictor, the link does not matter to the fit

`\begin{align}
  \text{count}_i & \sim \mathrm{Pois}(\mu_i)  \\
  \mu_i & = \eta_i \\
  \eta_i & = \beta_0 + \beta_1 \text{Trt}_{i}
\end{align}`

`\(\beta_1\)` = mean difference in the rate of seizure in two weeks

```r
m3 <- brm(count ~ Trt, data = epilepsy4,
          family = poisson(link = "identity"))
```

---
exclude: TRUE
class: clear

.pull-left[

Prediction With Log Link

]

.pull-right[

Prediction With Identity Link

]

---
class: clear

.font70[

<table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;">   </th>
   <th style="text-align:center;"> log link </th>
   <th style="text-align:center;"> identity link </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> b_Intercept </td>
   <td style="text-align:center;"> 2.07 </td>
   <td style="text-align:center;"> 7.97 </td>
  </tr>
  <tr>
   <td style="text-align:left;">  </td>
   <td style="text-align:center;"> [1.95, 2.20] </td>
   <td style="text-align:center;"> [6.94, 8.96] </td>
  </tr>
  <tr>
   <td style="text-align:left;"> b_Trt1 </td>
   <td style="text-align:center;"> −0.17 </td>
   <td style="text-align:center;"> −1.25 </td>
  </tr>
  <tr>
   <td style="text-align:left;box-shadow: 0px 1px">  </td>
   <td style="text-align:center;box-shadow: 0px 1px"> [−0.35, 0.02] </td>
   <td style="text-align:center;box-shadow: 0px 1px"> [−2.58, 0.16] </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Num.Obs. </td>
   <td style="text-align:center;"> 59 </td>
   <td style="text-align:center;"> 59 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> ELPD </td>
   <td style="text-align:center;"> −343.1 </td>
   <td style="text-align:center;"> −345.1 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> ELPD s.e. </td>
   <td style="text-align:center;"> 93.8 </td>
   <td style="text-align:center;"> 95.7 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> LOOIC </td>
   <td style="text-align:center;"> 686.2 </td>
   <td style="text-align:center;"> 690.2 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> LOOIC s.e. </td>
   <td style="text-align:center;"> 187.7 </td>
   <td style="text-align:center;"> 191.3 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> WAIC </td>
   <td style="text-align:center;"> 688.5 </td>
   <td style="text-align:center;"> 687.8 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> RMSE </td>
   <td style="text-align:center;"> 10.50 </td>
   <td style="text-align:center;"> 10.53 </td>
  </tr>
</tbody>
</table>

]