class: center, middle, inverse, title-slide # Markov Chain Monte Carlo III ## PSYC 573 ### University of Southern California ### February 24, 2022 --- class: inverse, middle, center # Hamiltonian Monte Carlo (HMC) --- exclude: TRUE class: clear .pull-left[ Metropolis ![](https://upload.wikimedia.org/wikipedia/commons/2/24/Camino_de_Costa_Roca_Coffee_plantation_hike.jpg) ] .pull-right[ Hamiltonian ![](https://upload.wikimedia.org/wikipedia/commons/8/84/Ski_Famille_-_Family_Ski_Holidays.jpg) ] --- class: clear From Hamiltonian mechanics - Use *gradients* to generate better proposal values Results: - Higher acceptance rate - Less autocorrelation/higher ESS - Better suited for high dimensional problems --- exclude: TRUE # Joint Density
--- exclude: TRUE # Gradients of Log Density
--- # Gradients of Log Density Consider just `\(\sigma^2\)` Potential energy = `\(-\log P(\theta)\)` > Which one has a higher **potential energy**? <img src="hmc_stan_files/figure-html/lp_sigma-1.png" width="100%" style="display: block; margin: auto;" /> --- # HMC Algorithm Imagine a marble on a surface like the log posterior 1. Simulate a random *momentum* (usually from a normal distribution) 2. Apply the momentum to the marble to roll on the surface 3. Treat the position of the marble after some time as the *proposed value* 4. Accept the new position based on the Metropolis step * i.e., probabilistically using the posterior density ratio --- # Leapfrog Integrator Location and velocity constantly change <img src="images/hmc1.gif" width="70%" style="display: block; margin: auto;" /> --- class: clear ### Leapfrog integrator - Solve for the new location using `\(L\)` leapfrog steps - Larger `\(L\)`, more accurate location - Higher curvature requires larger `\(L\)` and smaller *step size* <img src="hmc_stan_files/figure-html/leapfrog-steps-1.png" width="70%" style="display: block; margin: auto;" /> *Divergent transition*: When the leapfrog approximation deviates substantially from where it should be --- # No-U-Turn Sampler (NUTS) Algorithm used in STAN Two problems of HMC - Need fine-tuning `\(L\)` and **step size** - Wasted steps when the marble makes a U-turn -- NUTS uses a binary search tree to determine `\(L\)` and the **step size** - The **maximum treedepth** determines how far it searches See a demo here: https://chi-feng.github.io/mcmc-demo/app.html --- class: inverse, middle, center # Stan --- # Stan A language for doing MCMC sampling (and other related methods, such as maximum likelihood estimation) Current version (2.29): mainly uses NUTS -- It supports a wide range of distributions and prior distributions Written in C++ (faster than R) --- class: clear Consider the example .pull-left[ Model: `$$\text{wc_laptop}_i \sim N(\mu, \sigma)$$` Prior: `\begin{align} \mu & \sim N(5, 10) \\ \sigma & \sim t_4^+(0, 3) \end{align}` ] .pull-right[ .font70[ `\(t_4^+(0, 3)\)` is a half-`\(t\)` distribution with df = 4 and scale = 3 ] <img src="hmc_stan_files/figure-html/t-plus-1.png" width="100%" style="display: block; margin: auto;" /> ] --- class: clear ### An example STAN model .font70[ ```stan data { int<lower=0> N; // number of observations vector[N] y; // data vector y } parameters { real mu; // mean parameter real<lower=0> sigma; // non-negative SD parameter } model { // model y ~ normal(mu, sigma); // use vectorization // prior mu ~ normal(5, 10); sigma ~ student_t(4, 0, 3); } generated quantities { vector[N] y_rep; // place holder for (n in 1:N) y_rep[n] = normal_rng(mu, sigma); } ``` ] --- # Components of a STAN Model - `data`: Usually a list of different types * `int`, `real`, `matrix`, `vector`, `array` can set lower/upper bounds - `parameters` - `transformed parameters`: optional variables that are transformation of the model parameters - `model`: definition of **priors** and the **likelihood** - `generated quantities`: new quantities from the model (e.g., simulated data) --- # RStan https://mc-stan.org/users/interfaces/rstan An interface to call Stan from R, and import results from STAN to R --- # Call `rstan` .panelset[ .panel[.panel-name[R code] .font70[ ```r library(rstan) rstan_options(auto_write = TRUE) # save compiled STAN object nt_dat <- haven::read_sav("https://osf.io/qrs5y/download") wc_laptop <- nt_dat$wordcount[nt_dat$condition == 0] / 100 # Data: a list with names matching the Stan program nt_list <- list( N = length(wc_laptop), # number of observations y = wc_laptop # outcome variable (yellow card) ) # Call Stan norm_prior <- stan( file = here("stan", "normal_model.stan"), data = nt_list, seed = 1234 # for reproducibility ) ``` ] ] .panel[.panel-name[Output] .font50[ ``` ># ># SAMPLING FOR MODEL 'normal_model' NOW (CHAIN 1). ># Chain 1: ># Chain 1: Gradient evaluation took 7e-06 seconds ># Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.07 seconds. ># Chain 1: Adjust your expectations accordingly! ># Chain 1: ># Chain 1: ># Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup) ># Chain 1: Iteration: 200 / 2000 [ 10%] (Warmup) ># Chain 1: Iteration: 400 / 2000 [ 20%] (Warmup) ># Chain 1: Iteration: 600 / 2000 [ 30%] (Warmup) ># Chain 1: Iteration: 800 / 2000 [ 40%] (Warmup) ># Chain 1: Iteration: 1000 / 2000 [ 50%] (Warmup) ># Chain 1: Iteration: 1001 / 2000 [ 50%] (Sampling) ># Chain 1: Iteration: 1200 / 2000 [ 60%] (Sampling) ># Chain 1: Iteration: 1400 / 2000 [ 70%] (Sampling) ># Chain 1: Iteration: 1600 / 2000 [ 80%] (Sampling) ># Chain 1: Iteration: 1800 / 2000 [ 90%] (Sampling) ># Chain 1: Iteration: 2000 / 2000 [100%] (Sampling) ># Chain 1: ># Chain 1: Elapsed Time: 0.018408 seconds (Warm-up) ># Chain 1: 0.016197 seconds (Sampling) ># Chain 1: 0.034605 seconds (Total) ># Chain 1: ># ># SAMPLING FOR MODEL 'normal_model' NOW (CHAIN 2). ># Chain 2: ># Chain 2: Gradient evaluation took 5e-06 seconds ># Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.05 seconds. ># Chain 2: Adjust your expectations accordingly! ># Chain 2: ># Chain 2: ># Chain 2: Iteration: 1 / 2000 [ 0%] (Warmup) ># Chain 2: Iteration: 200 / 2000 [ 10%] (Warmup) ># Chain 2: Iteration: 400 / 2000 [ 20%] (Warmup) ># Chain 2: Iteration: 600 / 2000 [ 30%] (Warmup) ># Chain 2: Iteration: 800 / 2000 [ 40%] (Warmup) ># Chain 2: Iteration: 1000 / 2000 [ 50%] (Warmup) ># Chain 2: Iteration: 1001 / 2000 [ 50%] (Sampling) ># Chain 2: Iteration: 1200 / 2000 [ 60%] (Sampling) ># Chain 2: Iteration: 1400 / 2000 [ 70%] (Sampling) ># Chain 2: Iteration: 1600 / 2000 [ 80%] (Sampling) ># Chain 2: Iteration: 1800 / 2000 [ 90%] (Sampling) ># Chain 2: Iteration: 2000 / 2000 [100%] (Sampling) ># Chain 2: ># Chain 2: Elapsed Time: 0.020357 seconds (Warm-up) ># Chain 2: 0.015692 seconds (Sampling) ># Chain 2: 0.036049 seconds (Total) ># Chain 2: ># ># SAMPLING FOR MODEL 'normal_model' NOW (CHAIN 3). ># Chain 3: ># Chain 3: Gradient evaluation took 4e-06 seconds ># Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.04 seconds. ># Chain 3: Adjust your expectations accordingly! ># Chain 3: ># Chain 3: ># Chain 3: Iteration: 1 / 2000 [ 0%] (Warmup) ># Chain 3: Iteration: 200 / 2000 [ 10%] (Warmup) ># Chain 3: Iteration: 400 / 2000 [ 20%] (Warmup) ># Chain 3: Iteration: 600 / 2000 [ 30%] (Warmup) ># Chain 3: Iteration: 800 / 2000 [ 40%] (Warmup) ># Chain 3: Iteration: 1000 / 2000 [ 50%] (Warmup) ># Chain 3: Iteration: 1001 / 2000 [ 50%] (Sampling) ># Chain 3: Iteration: 1200 / 2000 [ 60%] (Sampling) ># Chain 3: Iteration: 1400 / 2000 [ 70%] (Sampling) ># Chain 3: Iteration: 1600 / 2000 [ 80%] (Sampling) ># Chain 3: Iteration: 1800 / 2000 [ 90%] (Sampling) ># Chain 3: Iteration: 2000 / 2000 [100%] (Sampling) ># Chain 3: ># Chain 3: Elapsed Time: 0.018081 seconds (Warm-up) ># Chain 3: 0.017181 seconds (Sampling) ># Chain 3: 0.035262 seconds (Total) ># Chain 3: ># ># SAMPLING FOR MODEL 'normal_model' NOW (CHAIN 4). ># Chain 4: ># Chain 4: Gradient evaluation took 4e-06 seconds ># Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.04 seconds. ># Chain 4: Adjust your expectations accordingly! ># Chain 4: ># Chain 4: ># Chain 4: Iteration: 1 / 2000 [ 0%] (Warmup) ># Chain 4: Iteration: 200 / 2000 [ 10%] (Warmup) ># Chain 4: Iteration: 400 / 2000 [ 20%] (Warmup) ># Chain 4: Iteration: 600 / 2000 [ 30%] (Warmup) ># Chain 4: Iteration: 800 / 2000 [ 40%] (Warmup) ># Chain 4: Iteration: 1000 / 2000 [ 50%] (Warmup) ># Chain 4: Iteration: 1001 / 2000 [ 50%] (Sampling) ># Chain 4: Iteration: 1200 / 2000 [ 60%] (Sampling) ># Chain 4: Iteration: 1400 / 2000 [ 70%] (Sampling) ># Chain 4: Iteration: 1600 / 2000 [ 80%] (Sampling) ># Chain 4: Iteration: 1800 / 2000 [ 90%] (Sampling) ># Chain 4: Iteration: 2000 / 2000 [100%] (Sampling) ># Chain 4: ># Chain 4: Elapsed Time: 0.018265 seconds (Warm-up) ># Chain 4: 0.018438 seconds (Sampling) ># Chain 4: 0.036703 seconds (Total) ># Chain 4: ``` ] ] ]