Generalized Linear Model II

Author

Affiliation

Published

March 13, 2022

DOI

library(tidyverse)
library(here)  # for easier management of file paths
library(rstan)
rstan_options(auto_write = TRUE)  # save compiled Stan object
library(brms)  # simplify fitting Stan GLM models
library(posterior)  # for summarizing draws
library(bayesplot)  # for plotting
library(modelsummary)  # table for brms
theme_set(theme_classic() +
    theme(panel.grid.major.y = element_line(color = "grey92")))

Binary Logistic Regression

We will use an example from this paper (https://journals.sagepub.com/doi/abs/10.1177/0956797616645672), which examines how common it was for researchers to report marginal values (i.e., .05 < .10). The outcome is whether a study reported one or more marginal values (1 = Yes, 0 = No). The researchers also categorized the studies into three subfields (Cognitive Psychology, Developmental Psychology, Social Psychology), and there were a total of 1,535 studies examined from 1970 to 2010.

# readxl package can be used to import excel files
datfile <- here("data_files", "marginalp.xlsx")
if (!file.exists(datfile)) {
    dir.create(here("data_files"))
    download.file("https://osf.io/r4njf/download",
        destfile = datfile
    )
}
marginalp <- readxl::read_excel(datfile)
# Recode `Field` into a factor
marginalp <- marginalp %>%
    # Filter out studies without any experiments
    filter(`Number of Experiments` >= 1) %>%
    mutate(Field = factor(Field,
        labels = c(
            "Cognitive Psychology",
            "Developmental Psychology",
            "Social Psychology"
        )
    )) %>%
    # Rename the outcome
    rename(marginal_p = `Marginals Yes/No`)
# Proportion of marginal p in each subfield across Years
marginalp %>%
    ggplot(aes(x = Year, y = marginal_p)) +
    stat_summary(aes(fill = Field), geom = "ribbon", alpha = 0.3) +
    stat_summary(aes(col = Field), geom = "line") +
    stat_summary(aes(col = Field), geom = "point") +
    coord_cartesian(ylim = c(0, 0.7)) +
    facet_wrap(~ Field) +
    theme(legend.position = "top")

Centering

The issue of centering applies for most regression models, including GLMs. Because the predictor Year has values 1970, 1980, etc, if we use this variable as a predictor, the intercept will be the predicted value of when Year = 0, which is like 2,000 years before the data were collected. So instead, we usually want the value 0 to be something meaningful and/or a possible value in the data. So we will center the variable year by making 0 the year 1970. In addition, we will divide the values by 10 so that one unit in the new predictor means 10 years.

In other words, we will use a transformed predictor, Year10 = (Year - 1970) / 10. So when Year = 1970, Year10 = 10; when Year = 1990, Year10 = 2.

marginalp <- marginalp %>%
  mutate(Year10 = (Year - 1970) / 10)
# Check the recode
distinct(marginalp, Year, Year10)

#> # A tibble: 5 × 2
#>    Year Year10
#>   <dbl>  <dbl>
#> 1  1980      1
#> 2  1970      0
#> 3  2000      3
#> 4  1990      2
#> 5  2010      4

For this example, I’ll look at the subset for developmental psychology. You may want to try with the other subsets of the data for practice.

marginalp_dev <- filter(marginalp,
                        Field == "Developmental Psychology")
head(marginalp_dev)

#> # A tibble: 6 × 9
#>   Field        `Paper Title` Authors  Year `Number of Exp…` marginal_p
#>   <fct>        <chr>         <chr>   <dbl>            <dbl>      <dbl>
#> 1 Development… A New Look a… Christ…  2010                1          1
#> 2 Development… A Tale to Tw… daniel…  2010                1          1
#> 3 Development… Associations… Rachel…  2010                1          1
#> 4 Development… Attachment-b… kathyr…  2000                1          1
#> 5 Development… Effects of A… Timoth…  1990                1          1
#> 6 Development… Effortful co… grazyn…  2000                1          1
#> # … with 3 more variables: `First or Only Marginal` <dbl>,
#> #   `Conventional Threshold` <dbl>, Year10 <dbl>

Model and Priors

As the name “logistic regression” suggested, the logistic function is used somewhere in the model. The logistic function, or the inverse logit function, is

which is a transformation from

. The link function, which is a transformation from

, is the logit function,

which converts

, which is in probability metric, to

, which is in log odds metric (odds = probability / [1 - probability]).

Model:

Priors:

There has been previous literature on what choices of prior on the coefficients for logistic regressions would be appropriate (see this paper). coefficients in logistic regression can be relatively large, unlike in normal regression. Therefore, it’s pointed out that a heavy tail distribution, like Cauchy and , would be more appropriate. Recent discussions have settled on priors such as distributions with a small degrees of freedom as a good balance between heavy tails and efficiency for MCMC sampling. I use for , which puts more density to be in [-2.5, 2.5] on the log odds unit. This means, in the year 1970, our belief is that the probability of marginally significant results would be somewhere between 0.0758582 to 0.9241418, which seems to cover a reasonable range. For , I use , which suggests that a 10 year difference like corresponds to a difference in log odds between -1 and 1. While it’s hard to interpret difference in log odds, a quick rule of thumb is to divide by 4, which roughly corresponds to the maximum difference in probability for unit difference in the predictor. So if = 1, the maximum difference in probability will be no more than 25 percentage points. So my prior says that it’s unlikely that the probability of reporting marginal values would increase or decrease by 25 percentage points every 10 years, which again is a weak prior. If you’re uncertain, you can consider something weaker.

The priors are chosen to be weakly informative.

m4 <- brm(marginal_p ~ Year10,
    data = marginalp_dev,
    family = bernoulli(link = "logit"),
    prior = c(
        prior(student_t(4, 0, 1), class = "b"),
        prior(student_t(4, 0, 2.5), class = "Intercept")
    ),
    # Note: no sigma
    iter = 4000,
    seed = 1340
)

Note that in brms we used family = bernoulli(). In other R functions, such as glm, they do not distinguish between bernoulli and Binomial and only recognize family = binomial(), as a Bernoulli variable is a binomial variable with .

Interpreting the results

Any non-linear relationships will involve more work in interpretations, and the coefficients in logistic regressions are no exceptions.

Posterior predictions

A good way to start is to plot the model-implied association on the original unit of the data. The conditional_effects() function in brms comes in very handy, and I recommend you always start with that:

plot(
    conditional_effects(m4, prob = .90),
    points = TRUE,
    point_args = list(height = 0.01, width = 0.05, alpha = 0.05)
)

As you can see, the logistic model implies a somewhat non-linear association between Year10 and the outcome. From the graph, when Year10 = 2, which is the Year 1990, the predicted probability of marginal significant results is about 25%, with a 90% credible interval of about 21 to 28%. We can get that using

posterior_epred(m4, newdata = list(Year10 = 2)) %>%
    summarise_draws()

#> # A tibble: 1 × 10
#>   variable  mean median     sd    mad    q5   q95  rhat ess_bulk
#>   <chr>    <dbl>  <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl>    <dbl>
#> 1 ...1     0.246  0.245 0.0193 0.0190 0.214 0.278  1.00    5651.
#> # … with 1 more variable: ess_tail <dbl>

Intercept

From the equation, when all predictors are zero, we have

Therefore, the intercept is the log odds that a study reported a marginally significant

value when Year10 = 0 (i.e, in 1970), which was estimated to be -1.81, 90% CI [-2.14, -1.51]. As log odds are not as intuitive as probability, it is common to instead interpret

, which is the conditional probability of being marginal_p = 1 in 1970. For Bayesian, that means obtaining the posterior distribution of

, which can be done by

m4_draws <- as_draws_array(m4)
m4_draws <- m4_draws %>%
    mutate_variables(logistic_beta0 = plogis(b_Intercept))
summarize_draws(m4_draws)

#> # A tibble: 5 × 10
#>   variable         mean   median     sd    mad       q5      q95  rhat
#>   <chr>           <dbl>    <dbl>  <dbl>  <dbl>    <dbl>    <dbl> <dbl>
#> 1 b_Intercept    -1.81    -1.81  0.191  0.194    -2.14    -1.51   1.00
#> 2 b_Year10        0.344    0.343 0.0687 0.0689    0.231    0.458  1.00
#> 3 lprior         -3.08    -3.08  0.0410 0.0406   -3.15    -3.02   1.00
#> 4 lp__         -295.    -295.    1.01   0.726  -297.    -294.     1.00
#> 5 logistic_be…    0.142    0.141 0.0229 0.0234    0.106    0.181  1.00
#> # … with 2 more variables: ess_bulk <dbl>, ess_tail <dbl>

The bayesplot package allow you to plot transformed parameters quickly:

mcmc_areas(m4, pars = "b_Intercept",
           transformations = list("b_Intercept" = "plogis"),
           bw = "SJ")

Interpreting as odds ratio

The slope, , represents the difference in the predicted log odds between two observations with 1 unit difference in the predictor. For example, for two individuals with 1 unit difference in Year10 (i.e., 10 years), we have

Again, difference in log odds are hard to interpret, and so we will exponentiate to get

The fraction on the left hand side is the odds ratio of reporting a marginal value associated with a one unit difference in Year10 (i.e., 10 years). An odds of 1.0 means that the probability of success and failure is equal; an odds > 1 means success is more likely than failures; and an odds < 1 means success is less likely than failures. Again, for Bayesian, we can obtain the posterior distribution of by

m4_draws <- m4_draws %>%
    mutate_variables(exp_beta1 = exp(b_Year10))
summarize_draws(m4_draws)

#> # A tibble: 6 × 10
#>   variable         mean   median     sd    mad       q5      q95  rhat
#>   <chr>           <dbl>    <dbl>  <dbl>  <dbl>    <dbl>    <dbl> <dbl>
#> 1 b_Intercept    -1.81    -1.81  0.191  0.194    -2.14    -1.51   1.00
#> 2 b_Year10        0.344    0.343 0.0687 0.0689    0.231    0.458  1.00
#> 3 lprior         -3.08    -3.08  0.0410 0.0406   -3.15    -3.02   1.00
#> 4 lp__         -295.    -295.    1.01   0.726  -297.    -294.     1.00
#> 5 logistic_be…    0.142    0.141 0.0229 0.0234    0.106    0.181  1.00
#> 6 exp_beta1       1.41     1.41  0.0974 0.0970    1.26     1.58   1.00
#> # … with 2 more variables: ess_bulk <dbl>, ess_tail <dbl>

Using the posterior mean, we predict that the odds of reporting a marginal value for a study that is 10 years later is multiplied by 0.14 times, 90% CI [0.11, 0.18]; in other words, every 10 years, the odds increased by -85.8139207%. Here’s the posterior density plot for the odds ratio:

mcmc_areas(m4, pars = "b_Year10",
           transformations = list("b_Year10" = "exp"),
           bw = "SJ")

Odds ratio (OR) is popular as the multiplicative effect is constant, thus making interpretations easier. Also, in medical research and some other research areas, OR can be an excellent approximation of the relative risk, which is the probability ratio of two groups, of some rare disease or events. However, odds and odds ratio are never intuitive metrics for people, and in many situations a large odds ratio may be misleading as it corresponds to a very small effect. Therefore, in general I would recommend you to interpret coefficients in probability unit, even though that means more work.

Interpreting Coefficients in Probability Units

Another way to interpret the results of logistic regression coefficient is to examine the change in probability. Because the predicted probability is a non-linear function of the predictors, a one unit difference in the predictor has different meanings depending on the values on you chose for interpretations. You can see that in the conditional_effects() plot earlier.

Consider the change in the predicted probability of reporting a marginal value with Year10 = 0 (1970), Year10 = 1 (1980), to Year10 = 4, respectively:

m4_pred <- posterior_epred(m4, list(Year10 = 0:4))
colnames(m4_pred) <- paste0("Year10=", 0:4)
summarise_draws(m4_pred)

#> # A tibble: 5 × 10
#>   variable  mean median     sd    mad    q5   q95  rhat ess_bulk
#>   <chr>    <dbl>  <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl>    <dbl>
#> 1 Year10=0 0.142  0.141 0.0229 0.0234 0.106 0.181  1.00    4503.
#> 2 Year10=1 0.188  0.188 0.0208 0.0209 0.155 0.223  1.00    4674.
#> 3 Year10=2 0.246  0.245 0.0193 0.0190 0.214 0.278  1.00    5651.
#> 4 Year10=3 0.314  0.314 0.0240 0.0236 0.276 0.354  1.00    7305.
#> 5 Year10=4 0.393  0.393 0.0363 0.0363 0.334 0.453  1.00    6868.
#> # … with 1 more variable: ess_tail <dbl>

As you can see, the predicted difference in probability is smaller when comparing Year10 = 0 and 1, but is larger when comparing Year10 = 3 and 4.

The “divide by 4 rule”

A quick approximation is to divide the coefficient by 4 to get an upper bound on the change in probability associated with a one unit change in the predictor. In our example, this corresponds to 0.3437533 / 4 = 0.0859383, which is very close to the predicted difference in probability from Year10 = 3 to Year10 = 4.

Posterior Predictive Check

pp_check(
    m4,
    type = "error_binned"
)

The linear model is not the best fit.

Linear Spline

Use bs() to specify a linear spline (degree = 1) with one turning point (knots = 3).

library(splines)
m5 <- brm(marginal_p ~ bs(Year10, degree = 1, knots = 3),
    data = marginalp_dev,
    family = bernoulli(link = "logit"),
    prior = prior(student_t(4, 0, .875), class = "b"),
    # Note: no sigma
    iter = 4000,
    seed = 1340
)

Posterior Prediction

plot(
    conditional_effects(m5),
    points = TRUE,
    point_args = list(height = 0.01, width = 0.05, alpha = 0.05)
)

pp_check(
    m5,
    type = "error_binned",
    x = "Year10"
)

Model Comparison

msummary(list(linear = m4, `linear spline` = m5),
         estimate = "{estimate} [{conf.low}, {conf.high}]",
         statistic = NULL, fmt = 2)

	linear	linear spline
b_Intercept	−1.81 [−2.19, −1.45]	−1.95 [−2.36, −1.52]
b_Year10	0.34 [0.21, 0.48]
b_bsYear10degreeEQ1knotsEQ31		1.57 [0.95, 2.27]
b_bsYear10degreeEQ1knotsEQ32		1.22 [0.69, 1.79]
Num.Obs.	535	535
ELPD	−292.9	−290.1
ELPD s.e.	11.2	11.1
LOOIC	585.9	580.3
LOOIC s.e.	22.4	22.2
WAIC	585.9	580.3
RMSE	0.43	0.42

The linear spline is better according to information criteria.

Binomial Logistic Regression

Individual data: Bernoulli
Grouped data: Binomial

The two above are equivalent

Model

Priors:

# The group should only be done for observations with
# the same predicted probabilities
marginalp_dev_grouped <-
    marginalp_dev %>%
    group_by(Year10) %>%
    summarize(
        marginal_p = sum(marginal_p),  # number of "successes"
        n = n()  # number of trials
    )

m5_bin <- brm(
    marginal_p | trials(n) ~ bs(Year10, degree = 1, knots = 3),
    data = marginalp_dev_grouped,
    family = binomial(link = "logit"),
    prior = prior(student_t(4, 0, 1), class = "b"),
    # Note: no sigma
    iter = 4000,
    seed = 1340
)

pp_check(m5_bin, type = "intervals", x = "Year10")

(#fig:ppc-m5_bin)Posterior predictive check using the predicted and observed counts.

Ordinal Regression

Check out this paper: https://journals.sagepub.com/doi/full/10.1177/2515245918823199

stemcell <- read.csv("https://osf.io/vxw73/download")
stemcell <- stemcell %>%
    mutate(
        belief = factor(belief,
            levels = c("moderate", "fundamentalist", "liberal")
        )
    )

stemcell %>%
    ggplot(aes(x = rating)) +
    geom_bar() +
    facet_wrap(~ belief)

https://www.thearda.com/archive/files/Codebooks/GSS2006_CB.asp

The outcome is attitude towards stem cells research, and the predictor is religious belief.

Recently, there has been controversy over whether the government should provide any funds at all for scientific research that uses stem cells taken from human embryos. Would you say the government . . .

1 = Definitely, should fund such research
2 = Probably should fund such research
3 = Probably should not fund such research
4 = Definitely should not fund such research

Model

Priors:

m6 <- brm(
    rating ~ belief,
    data = stemcell,
    family = cumulative(link = "logit"),
    prior = prior(std_normal(), class = "b")
)

Posterior Predictive Check

pp_check(m6, type = "bars_grouped", group = "belief",
         ndraws = 100)

The fit was reasonable.

Plot

conditional_effects(m6, categorical = TRUE)

Nominal Logistic Regression

Ordinal regression is a special case of nominal regression with the proportional odds assumption.

Model

As you can see, it has two additional parameters for each predictor column.

m7 <- brm(
    rating ~ belief,
    data = stemcell,
    family = categorical(link = "logit"),
    prior = prior(std_normal(), class = "b", dpar = "mu2") +
        prior(std_normal(), class = "b", dpar = "mu3") +
        prior(std_normal(), class = "b", dpar = "mu4")
)

Model Comparison

msummary(list(`ordinal (proportional odds)` = m6, norminal = m7),
         estimate = "{estimate} [{conf.low}, {conf.high}]",
         statistic = NULL, fmt = 2)

	ordinal (proportional odds)	norminal
b_Intercept[1]	−0.94 [−1.17, −0.71]
b_Intercept[2]	1.04 [0.80, 1.27]
b_Intercept[3]	2.14 [1.86, 2.43]
b_belieffundamentalist	0.41 [0.08, 0.70]
b_beliefliberal	−0.55 [−0.85, −0.24]
b_mu2_Intercept		0.63 [0.35, 0.90]
b_mu3_Intercept		−0.57 [−0.94, −0.22]
b_mu4_Intercept		−1.06 [−1.47, −0.61]
b_mu2_belieffundamentalist		0.12 [−0.30, 0.53]
b_mu2_beliefliberal		−0.69 [−1.07, −0.33]
b_mu3_belieffundamentalist		0.52 [0.00, 1.01]
b_mu3_beliefliberal		−0.76 [−1.31, −0.22]
b_mu4_belieffundamentalist		0.70 [0.16, 1.29]
b_mu4_beliefliberal		−0.58 [−1.15, 0.00]
Num.Obs.	829	829
ELPD	−1019.3	−1020.9
ELPD s.e.	15.5	15.6
LOOIC	2038.6	2041.9
LOOIC s.e.	31.0	31.2
WAIC	2038.6	2041.8

Last updated

#> [1] "April 16, 2022"

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-NC-SA 4.0. Source code is available at https://github.com/marklhc/20221-psyc573-usc, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Generalized Linear Model II

Author

Affiliation

Published

DOI

Binary Logistic Regression

Centering

Model and Priors

Interpreting the results

Posterior predictions

Intercept

Interpreting exp(β1) as odds ratio

Interpreting Coefficients in Probability Units

The “divide by 4 rule”

Posterior Predictive Check

Linear Spline

Posterior Prediction

Model Comparison

Binomial Logistic Regression

Model

Ordinal Regression

Model

Posterior Predictive Check

Plot

Nominal Logistic Regression

Model

Model Comparison

Last updated

Footnotes

Corrections

Reuse

Interpreting as odds ratio