class: center, middle, inverse, title-slide # Causal Inference ## PSYC 573 ### University of Southern California ### March 29, 2022 --- class: inverse, center, middle # Causation > Data are profoundly dumb about causal relationships .right[ --- Pearl & Mackenzie (2018) ] ??? Outline: - DAG - confounding - `\(d\)`-separation - mediation --- class: clear Materials based on chapters 5 and 6 of McElreath (2020) --- exclude: TRUE # Thought Experiment You have a group of 20 friends. You found out 10 have taken a "smart pill," and the others have not. When comparing the stat exam performance of the two groups, the "smart pill" group, on average, is better, with 90% CI [1, 5]. > Do you think the "smart pill" causes half of your friends to do better in stat? --- exclude: TRUE # Thought Experiment (cont'd) A researcher conducts an experiment with 20 students. Ten are randomly assigned to take a "smart pill," and the other a placebo. When comparing the stat exam performance of the two groups, the "smart pill" group, on average, is better, with 90% CI [1, 5]. > Do you think the "smart pill" causes half of the students to do better in stat? --- exclude: TRUE class: clear Is there any difference in the **statistical results** between the two scenarios? Is there any difference in **causal implications** between the two scenarios? --- exclude: TRUE # Thought Experiment (cont'd) A researcher conducts a study with 20 students. Ten volunteers took a "smart pill," and then the researcher compared their stat exam performance with 10 other students who had similar stat background as the "smart pill" group but did not take the pill. The "smart pill" group, on average, is better, with 90% CI [1, 5]. > Do you think the "smart pill" causes the first 10 students to do better in stat? --- # Causal Inference Obtaining an estimate of the causal effect of one variable on another -- > an hour more exercise per day causes an increase in happiness by 0.1 to 0.2 points -- - Intervention: if I exercise one hour more, my happiness will increase by 0.1 to 0.2 points - Counterfactual: had I exercised one less hour, my happiness would have been 0.1 to 0.2 points less --- class: inverse, middle, center # Directed Acyclic Graph <img src="causal_inference_files/figure-html/unnamed-chunk-3-1.png" width="70%" style="display: block; margin: auto;" /> --- class: clear Data from the 2009 American Community Survey (ACS) <img src="causal_inference_files/figure-html/unnamed-chunk-4-1.png" width="70%" style="display: block; margin: auto;" /> -- Does marriage **cause** divorce? (pay attention to the unit of analysis) --- class: clear Age at marriage? <img src="causal_inference_files/figure-html/unnamed-chunk-5-1.png" width="70%" style="display: block; margin: auto;" /> --- # Directed Acyclic Graph (DAG) Allows researchers to encode **causal assumptions** of the data - Based on knowledge of the *data* and the *variables* -- <img src="causal_inference_files/figure-html/unnamed-chunk-6-1.png" width="70%" style="display: block; margin: auto;" /> --- class: clear <img src="causal_inference_files/figure-html/unnamed-chunk-7-1.png" width="30%" style="display: block; margin: auto;" /> -- .pull-left[ "Weak" assumptions - A *may* directly influence M - A *may* directly influence D - M *may* directly influence D ] -- .pull-right[ "Strong" assumptions: things not shown in the graph - E.g., M does not directly influence A - E.g., A is the only relevant variable in the causal pathway M → D ] --- # Basic Types of Junctions **Fork**: A ← B → C **Chain/Pipe**: A → B → C **Collider**: A → B ← C --- # Fork aka Classic confounding - *Confound*: something that misleads us about a causal influence M ← <span style="color:red">A</span> → D -- Assuming the DAG is correct, - the causal effect of M → D can be obtained by holding constant A * stratifying by A; "controlling" for A --- class: clear .panelset[ .panel[.panel-name[Model] `\begin{align} D_i & \sim N(\mu_i, \sigma) \\ \mu_i & = \beta_0 + \beta_1 A_i + \beta_2 M_i \\ \beta_0 & \sim N(0, 5) \\ \beta_1 & \sim N(0, 1) \\ \beta_2 & \sim N(0, 1) \\ \sigma & \sim t^+_4(0, 3) \\ \end{align}` ] .panel[.panel-name[brms] ```r library(brms) m1 <- brm(Divorce ~ MedianAgeMarriage + Marriage, data = waffle_divorce, prior = prior(std_normal(), class = "b") + prior(normal(0, 5), class = "Intercept") + prior(student_t(4, 0, 3), class = "sigma"), seed = 941, iter = 4000 ) ``` ] .panel[.panel-name[Results] .font70[ ``` ># Family: gaussian ># Links: mu = identity; sigma = identity ># Formula: Divorce ~ MedianAgeMarriage + Marriage ># Data: waffle_divorce (Number of observations: 50) ># Draws: 4 chains, each with iter = 4000; warmup = 2000; thin = 1; ># total post-warmup draws = 8000 ># ># Population-Level Effects: ># Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS ># Intercept 3.49 0.77 1.96 4.99 1.00 5179 5008 ># MedianAgeMarriage -0.94 0.25 -1.42 -0.44 1.00 5605 5608 ># Marriage -0.04 0.08 -0.20 0.12 1.00 5198 4900 ># ># Family Specific Parameters: ># Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS ># sigma 0.15 0.02 0.12 0.19 1.00 6071 5326 ># ># Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS ># and Tail_ESS are effective sample size measures, and Rhat is the potential ># scale reduction factor on split chains (at convergence, Rhat = 1). ``` ] ] ] --- class: clear ### Posterior predictive checks .pull-left[ <img src="causal_inference_files/figure-html/unnamed-chunk-8-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ <img src="causal_inference_files/figure-html/unnamed-chunk-9-1.png" width="100%" style="display: block; margin: auto;" /><img src="causal_inference_files/figure-html/unnamed-chunk-9-2.png" width="100%" style="display: block; margin: auto;" /> ] --- # Pedicting an Intervention > What would happen to the divorce rate if we encourage more people to get married, so that marriage rate increases by 1 per 10 adults? -- Based on our DAG, this should not change the median marriage age -- | Marriage| MedianAgeMarriage| Estimate| Est.Error| Q2.5| Q97.5| |--------:|-----------------:|--------:|---------:|-----:|-----:| | 2| 2.5| 1.07| 0.034| 0.999| 1.14| | 3| 2.5| 1.03| 0.068| 0.894| 1.16| --- class: inverse, middle, center # Randomization --- exclude: TRUE class: clear <img src="images/Brader_etal_2008.png" width="90%" style="display: block; margin: auto;" /> --- # Framing Experiment - X: exposure to a negatively framed news story about immigrants - Y: anti-immigration political action -- .pull-left[ No Randomization <img src="causal_inference_files/figure-html/unnamed-chunk-12-1.png" width="70%" style="display: block; margin: auto;" /> ] ??? Potential confound: - Location - Usual outlet/source to acquire information -- .pull-right[ Randomization <img src="causal_inference_files/figure-html/unnamed-chunk-13-1.png" width="70%" style="display: block; margin: auto;" /> ] --- # Back-Door Criterion <img src="causal_inference_files/figure-html/unnamed-chunk-14-1.png" width="50%" style="display: block; margin: auto;" /> The causal effect of X → Y can be obtained by blocking all the backdoor paths that do not involve descendants of X -- - Randomization: (when done successfully) eliminates all paths entering X - Conditioning (holding constant) --- # Dagitty ```r library(dagitty) dag4 <- dagitty("dag{ X -> Y; W1 -> X; U -> W2; W2 -> X; W1 -> Y; U -> Y }") latents(dag4) <- "U" adjustmentSets(dag4, exposure = "X", outcome = "Y", effect = "direct") ``` ``` ># { W1, W2 } ``` ```r impliedConditionalIndependencies(dag4) ``` ``` ># W1 _||_ W2 ``` --- exclude: TRUE # Exercise <img src="images/McElreath_2020_ch6_ex.jpg" width="70%" style="display: block; margin: auto;" /> --- class: inverse, center, middle # Post-Treatment Bias --- class: clear ## Data for Framing Experiment - `cong_mesg`: binary variable indicating whether or not the participant agreed to send a letter about immigration policy to his or her member of Congress - `emo`: post-test anxiety about increased immigration (0-9) - `tone`: framing of news story (0 = positive, 1 = negative) --- class: clear ## Results <table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:center;"> No adjustment </th> <th style="text-align:center;"> Adjusting for feeling </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> b_Intercept </td> <td style="text-align:center;"> −0.81 [−1.18, −0.45] </td> <td style="text-align:center;"> −2.01 [−2.60, −1.40] </td> </tr> <tr> <td style="text-align:left;"> b_tone </td> <td style="text-align:center;"> 0.22 [−0.29, 0.74] </td> <td style="text-align:center;"> −0.14 [−0.71, 0.42] </td> </tr> <tr> <td style="text-align:left;"> b_emo </td> <td style="text-align:center;"> </td> <td style="text-align:center;"> 0.32 [0.21, 0.43] </td> </tr> </tbody> </table> ??? Negative framing: emphasizing costs Positive framing: emphasizing benefits -- Which one estimates the causal effect? --- # Mediation <img src="causal_inference_files/figure-html/unnamed-chunk-20-1.png" width="35%" style="display: block; margin: auto;" /> In the DAG, E is a post-treatment variable potentially influenced by T - E is a potential **mediator** -- > A mediator is very different from a confounder --- # Mediation Analysis .panelset[ .panel[.panel-name[Model] `\begin{align} \text{emo}_i & \sim N(\mu^\text{e}_i, \sigma) \\ \mu^\text{e}_i & = \beta^\text{e}_0 + \beta_1 \text{tone}_i \\ \text{cong_mesg}_i & \sim \mathrm{Bern}(\mu^\text{c}_i, \sigma^{c}) \\ \mathrm{logit}(\mu^\text{c}_i) & = \eta_i \\ \eta_i & = \beta^\text{c}_0 + \beta_2 \text{tone}_i + \beta_3 \text{emo}_i \\ \beta^\text{e}_0, \beta^\text{c}_0 & \sim N(0, 5) \\ \beta_1, \beta_2, \beta_3 & \sim N(0, 1) \\ \sigma & \sim t^+_4(0, 3) \\ \end{align}` ] .panel[.panel-name[Code] ```r m_med <- brm( # Two equations for two outcomes bf(cong_mesg ~ tone + emo) + bf(emo ~ tone) + set_rescor(FALSE), data = framing, seed = 1338, iter = 4000, family = list(bernoulli("logit"), gaussian("identity")) ) ``` ] .panel[.panel-name[Output] .font60[ ``` ># Family: MV(bernoulli, gaussian) ># Links: mu = logit ># mu = identity; sigma = identity ># Formula: cong_mesg ~ tone + emo ># emo ~ tone ># Data: framing (Number of observations: 265) ># Draws: 4 chains, each with iter = 4000; warmup = 2000; thin = 1; ># total post-warmup draws = 8000 ># ># Population-Level Effects: ># Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS ># congmesg_Intercept -2.01 0.30 -2.60 -1.42 1.00 9742 6632 ># emo_Intercept 3.40 0.24 2.93 3.86 1.00 10684 6756 ># congmesg_tone -0.15 0.29 -0.73 0.41 1.00 9449 6097 ># congmesg_emo 0.32 0.06 0.21 0.43 1.00 9514 6710 ># emo_tone 1.14 0.33 0.47 1.79 1.00 10417 5856 ># ># Family Specific Parameters: ># Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS ># sigma_emo 2.73 0.12 2.51 2.98 1.00 10496 6553 ># ># Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS ># and Tail_ESS are effective sample size measures, and Rhat is the potential ># scale reduction factor on split chains (at convergence, Rhat = 1). ``` ] ] ] --- # Direct Effect Causal effect when holding mediator at a specific level .font70[ ```r cond_df <- data.frame(tone = c(0, 1, 0, 1), emo = c(0, 0, 9, 9)) cond_df %>% bind_cols( fitted(m_med, newdata = cond_df)[ , , "congmesg"] ) %>% knitr::kable() ``` <table> <thead> <tr> <th style="text-align:right;"> tone </th> <th style="text-align:right;"> emo </th> <th style="text-align:right;"> Estimate </th> <th style="text-align:right;"> Est.Error </th> <th style="text-align:right;"> Q2.5 </th> <th style="text-align:right;"> Q97.5 </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 0.122 </td> <td style="text-align:right;"> 0.032 </td> <td style="text-align:right;"> 0.069 </td> <td style="text-align:right;"> 0.195 </td> </tr> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 0.108 </td> <td style="text-align:right;"> 0.033 </td> <td style="text-align:right;"> 0.054 </td> <td style="text-align:right;"> 0.183 </td> </tr> <tr> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 9 </td> <td style="text-align:right;"> 0.699 </td> <td style="text-align:right;"> 0.071 </td> <td style="text-align:right;"> 0.549 </td> <td style="text-align:right;"> 0.826 </td> </tr> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 9 </td> <td style="text-align:right;"> 0.669 </td> <td style="text-align:right;"> 0.063 </td> <td style="text-align:right;"> 0.539 </td> <td style="text-align:right;"> 0.786 </td> </tr> </tbody> </table> ] --- # Indirect Effect Change in `\(Y\)` of the control group if their mediator level changes to what the treatment group *would have obtained* -- Quick Demo using posterior means<sup>1</sup> - T = 0, E(M) = 3.39 - T = 1, E(M) = 3.39 + 1.14 = 4.53 .footnote[ [1]: Fully Bayesian analyses in the note ] -- .font70[ <table> <thead> <tr> <th style="text-align:right;"> tone </th> <th style="text-align:right;"> emo </th> <th style="text-align:right;"> Estimate </th> <th style="text-align:right;"> Est.Error </th> <th style="text-align:right;"> Q2.5 </th> <th style="text-align:right;"> Q97.5 </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 3.39 </td> <td style="text-align:right;"> 0.286 </td> <td style="text-align:right;"> 0.042 </td> <td style="text-align:right;"> 0.208 </td> <td style="text-align:right;"> 0.372 </td> </tr> <tr> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 4.53 </td> <td style="text-align:right;"> 0.365 </td> <td style="text-align:right;"> 0.048 </td> <td style="text-align:right;"> 0.275 </td> <td style="text-align:right;"> 0.462 </td> </tr> </tbody> </table> ] --- # Potential Confounding .pull-left[ <img src="causal_inference_files/figure-html/unnamed-chunk-23-1.png" width="100%" style="display: block; margin: auto;" /> ] -- .pull-right[ Maybe age is related to both `emo` and `cong_mesg`? .font70[ ```r m_med2 <- brm( # Two equations for two outcomes bf(cong_mesg ~ tone + emo + age) + bf(emo ~ tone + age) + set_rescor(FALSE), data = framing, seed = 1338, iter = 4000, family = list(bernoulli("logit"), gaussian("identity")) ) ``` ] ] --- # Unobserved Confounding Can be incorporated by assigning priors to the unobserved confounding paths --- class: inverse, middle, center # Collider Bias --- class: clear .pull-left[ <img src="causal_inference_files/figure-html/unnamed-chunk-24-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ E.g., Is the most newsworthy research the least trustworthy? <img src="causal_inference_files/figure-html/code-6-1-1.png" width="100%" style="display: block; margin: auto;" /> ] --- class: clear ### Conditioning on a collider creates spurious associations - nice person → date ← good-looking person -- - impulsivity → high-risk youth ← delinquency -- - healthcare worker → COVID-19 testing ← COVID-19 severity<sup>2</sup> .footnote[ [2]: See https://www.nature.com/articles/s41467-020-19478-2 ] -- - standardized test → admission ← research skills -- - maternal smoking → birth weight → birth defect ← mortality --- class: inverse, middle, center # Final Example --- exclude: TRUE class: clear <img src="images/Bickel_etal_1975_science.png" width="70%" style="display: block; margin: auto;" /> --- # Student Admissions at UC Berkeley (1973) .font70[ <table> <thead> <tr> <th style="text-align:left;"> Dept </th> <th style="text-align:right;"> App_Male </th> <th style="text-align:right;"> Admit_Male </th> <th style="text-align:right;"> Percent_Male </th> <th style="text-align:right;"> App_Female </th> <th style="text-align:right;"> Admit_Female </th> <th style="text-align:right;"> Percent_Female </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> A </td> <td style="text-align:right;"> 825 </td> <td style="text-align:right;"> 512 </td> <td style="text-align:right;"> 62.1 </td> <td style="text-align:right;"> 108 </td> <td style="text-align:right;"> 89 </td> <td style="text-align:right;"> 82.41 </td> </tr> <tr> <td style="text-align:left;"> B </td> <td style="text-align:right;"> 560 </td> <td style="text-align:right;"> 353 </td> <td style="text-align:right;"> 63.0 </td> <td style="text-align:right;"> 25 </td> <td style="text-align:right;"> 17 </td> <td style="text-align:right;"> 68.00 </td> </tr> <tr> <td style="text-align:left;"> C </td> <td style="text-align:right;"> 325 </td> <td style="text-align:right;"> 120 </td> <td style="text-align:right;"> 36.9 </td> <td style="text-align:right;"> 593 </td> <td style="text-align:right;"> 202 </td> <td style="text-align:right;"> 34.06 </td> </tr> <tr> <td style="text-align:left;"> D </td> <td style="text-align:right;"> 417 </td> <td style="text-align:right;"> 138 </td> <td style="text-align:right;"> 33.1 </td> <td style="text-align:right;"> 375 </td> <td style="text-align:right;"> 131 </td> <td style="text-align:right;"> 34.93 </td> </tr> <tr> <td style="text-align:left;"> E </td> <td style="text-align:right;"> 191 </td> <td style="text-align:right;"> 53 </td> <td style="text-align:right;"> 27.7 </td> <td style="text-align:right;"> 393 </td> <td style="text-align:right;"> 94 </td> <td style="text-align:right;"> 23.92 </td> </tr> <tr> <td style="text-align:left;"> F </td> <td style="text-align:right;"> 373 </td> <td style="text-align:right;"> 22 </td> <td style="text-align:right;"> 5.9 </td> <td style="text-align:right;"> 341 </td> <td style="text-align:right;"> 24 </td> <td style="text-align:right;"> 7.04 </td> </tr> <tr> <td style="text-align:left;"> Total </td> <td style="text-align:right;"> 2691 </td> <td style="text-align:right;"> 1198 </td> <td style="text-align:right;"> 44.5 </td> <td style="text-align:right;"> 1835 </td> <td style="text-align:right;"> 557 </td> <td style="text-align:right;"> 30.35 </td> </tr> </tbody> </table> ] --- # Causal Thinking <img src="causal_inference_files/figure-html/unnamed-chunk-28-1.png" width="70%" style="display: block; margin: auto;" /> What do we mean by the causal effect of gender? What do we mean by gender bias? --- # Instrumental Variables <img src="causal_inference_files/figure-html/dag9-1.png" width="70%" style="display: block; margin: auto;" /> See more in the note --- # Remarks - Causal inference requires **causal assumptions** * You need a DAG -- - Blindly adjusting for covariates does not give better results * post-treatment bias, collider bias, etc -- - Think carefully about what causal quantity is of interest * E.g., direct, indirect, total -- - Causal inferences are possible with both experimental and non-experimental data