> > #Entering data (p. 213) > adverts<-c(5,4,4,6,8) > packets<-c(8,9,10,13,15) > advertData<-data.frame(adverts, packets) > > > ###################### regular, non Bayesian analyses of the data ###################### > > > # conventional Pearson correlation & NHST for two variables > cor.test(advertData$adverts, advertData$packets) Pearson's product-moment correlation data: advertData$adverts and advertData$packets t = 3.0732, df = 3, p-value = 0.05443 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: -0.0479747 0.9914236 sample estimates: cor 0.8711651 > > # standardizing data for Bayesian correlation analysis (not in Field text) > advertData = data.frame(scale(advertData)) > > # conventional multiple linear regression > fit = lm(adverts ~ packets, data = advertData) > summary(fit) Call: lm(formula = adverts ~ packets, data = advertData) Residuals: 1 2 3 4 5 0.6574 -0.2390 -0.5379 -0.2390 0.3586 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -8.727e-17 2.535e-01 0.000 1.0000 packets 8.712e-01 2.835e-01 3.073 0.0544 . --- Signif. codes: 0 Ô***Ő 0.001 Ô**Ő 0.01 Ô*Ő 0.05 Ô.Ő 0.1 Ô Ő 1 Residual standard error: 0.5669 on 3 degrees of freedom Multiple R-squared: 0.7589, Adjusted R-squared: 0.6786 F-statistic: 9.444 on 1 and 3 DF, p-value: 0.05443 > > > ###################### Bayesian correlation through MCMCglmm ###################### > > model1 = MCMCglmm(adverts ~ packets, data = advertData, + nitt=100000, burnin=3000, thin=10, verbose=F) > # nitt was set to a high number due to the small N for this data (N = 5) > > summary(model1) Iterations = 3001:99991 Thinning interval = 10 Sample size = 9700 DIC: 10.8143 R-structure: ~units post.mean l-95% CI u-95% CI eff.samp units 0.9386 0.05646 2.732 9700 Location effects: adverts ~ packets post.mean l-95% CI u-95% CI eff.samp pMCMC (Intercept) -0.002840 -0.787619 0.773290 9700 0.9909 packets 0.865180 0.008751 1.733223 9700 0.0507 . --- Signif. codes: 0 Ô***Ő 0.001 Ô**Ő 0.01 Ô*Ő 0.05 Ô.Ő 0.1 Ô Ő 1 > posterior.mode(model1$Sol[,2]) var1 0.9233415 > autocorr(model1$VCV) , , units units Lag 0 1.0000000000 Lag 10 -0.0027914831 Lag 50 -0.0042898907 Lag 100 -0.0036246338 Lag 500 -0.0002723812 > plot(model1) Hit to see next plot: > > # running the same analyses using different priors, & then plotting the various posteriors for "packets" > > estimateNum = 2 # the (row) number of regression estimate to be used for the analyses > # e.g., the 1st (intercept), or 2nd, etc > > Ndatasets = 5 # the number of additional analyses using different priors > > > # plot data for the first Bayes model > densdat <- density(model1$Sol[,estimateNum]) > plotdatx <- densdat$x > plotdaty <- densdat$y > > # for MCMCglmm priors, B is for fixed effects. B is a list containing the expected value (mu) > # and a (co)varcv bgiance matrix (V) representing the strength of belief: the defaults are B$mu=0 > # and B$V=I*1e+10, where where I is an identity matrix of appropriate dimension > > # the different priors will be derived from the lm, non-Bayesian regression coefficients > # the mu values will be random values from a 2 SE range around the lm estimates > # the V values will be set at (4 * the SEs)**2 > > fit.list <- summary.lm(fit) > coeffs <- fit.list$coefficients > coefnames <- rownames(coeffs) > > # ranges around the lm estimates > MUranges <- cbind( (coeffs[,1] - (4 * coeffs[,2])), (coeffs[,1] + (4 * coeffs[,2])) ) > > # random values from the MUranges for each coefficient > MUs <- matrix(-9999,nrow(coeffs),Ndatasets) > for (lupe in 1:nrow(coeffs)) { MUs[lupe,] <- runif(Ndatasets, min=MUranges[lupe,1], max=MUranges[lupe,2]) } > dimnames(MUs) <-list(rep("", dim(MUs)[1])) > colnames(MUs) <- seq(1:Ndatasets) > cat('\n\nPrior regression coefficient estimates for each additional analysis:') Prior regression coefficient estimates for each additional analysis: > print(round(MUs,2)) 1 2 3 4 5 0.92 0.27 -0.79 1.01 0.96 1.18 0.20 0.67 1.84 1.79 > > # (co)variance matrix (V) representing the strength of belief - which will be (4 * the lm SEs)**2 > Vlmx2 <- diag((4 * coeffs[,2])**2) > colnames(Vlmx2) <- seq(1:nrow(MUs)) > cat('\n\nPrior variance (strength of belief) value used for each regression estimate in the additional analyses:\n', + round(diag(Vlmx2),5), sep="\n") Prior variance (strength of belief) value used for each regression estimate in the additional analyses: 1.02857 1.28571 > > > for (lupeSets in 1:Ndatasets) { + prior.list <- list(B = list(mu = MUs[,lupeSets], V = Vlmx2)) + model2 = MCMCglmm(adverts ~ packets, data = advertData, prior = prior.list, + nitt=50000, burnin=3000, thin=10, verbose=F) + + densdat <- density(model2$Sol[,estimateNum]) + plotdatx <- cbind( plotdatx, densdat$x) + plotdaty <- cbind( plotdaty, densdat$y) + } > > matplot( plotdatx, plotdaty, main="", xlab=paste('Estimate for', coefnames[estimateNum]), ylab='Density', + font.lab=1.8, type='l', cex.lab=1.2, col=c(2,rep(1,ncol(plotdatx))), lty=1, lwd=1 ) > title(main=paste('Density Plots of the Posteriors for', coefnames[estimateNum], '\nProduced Using Differing Priors')) > legend("topright", c("broad priors","random priors"), bty="n", lty=c(1,1), + lwd=2, cex=1.2, text.col=c(2,1), col=c(2,1) ) > library(bayesplot) This is bayesplot version 1.6.0 - Online documentation and vignettes at mc-stan.org/bayesplot - bayesplot theme set to bayesplot::theme_default() * Does _not_ affect other ggplot2 plots * See ?bayesplot_theme_set for details on theme setting > > # from the bayesplot package documentation: > # The bayesplot package provides various plotting functions for graphical posterior predictive checking, > # that is, creating graphical displays comparing observed data to simulated data from the posterior > # predictive distribution. The idea behind posterior predictive checking is simple: if a model is a > # good fit then we should be able to use it to generate data that looks a lot like the data we observed. > > y <- advertData$adverts # the DV that was used for the MCMCglmm model > > # generating the simulated data using the simulate.MCMCglmm function from the MCMCglmm package > yrep <- simulate.MCMCglmm(model1, 25) > > yrep <- t( yrep ) # transposing yrep for the bayesplot functions > > > # ppc_stat: A histogram of the distribution of a test statistic computed by applying stat to each > # dataset (row) in yrep. The value of the statistic in the observed data, stat(y), is > # overlaid as a vertical line. > ppc_stat(y, yrep, binwidth = 1) > > > # ppc_dens_overlay: Kernel density or empirical CDF estimates of each dataset (row) in yrep are > # overlaid, with the distribution of y itself on top (and in a darker shade). > ppc_dens_overlay(y, yrep[1:25, ]) > > > # ppc_scatter_avg: A scatterplot of y against the average values of yrep, i.e., the > # points (mean(yrep[, n]), y[n]), where each yrep[, n] is a vector of length equal > # to the number of posterior draws. > ppc_scatter_avg(y, yrep) > > > # ppc_hist: A separate histogram, shaded frequency polygon, smoothed kernel density estimate, > # or box and whiskers plot is displayed for y and each dataset (row) in yrep. > # For these plots yrep should therefore contain only a small number of rows. > ppc_hist(y, yrep[1:8, ], binwidth = .3) > > > > ###################### conduct the same Bayesian analyses using the rstanarm package ###################### > > > model10 <- stan_glm(adverts ~ packets,data = advertData, + warmup = 1000, iter = 50000, sparse = FALSE, seed = 123) SAMPLING FOR MODEL 'continuous' NOW (CHAIN 1). Gradient evaluation took 0.000635 seconds 1000 transitions using 10 leapfrog steps per transition would take 6.35 seconds. Adjust your expectations accordingly! Iteration: 1 / 50000 [ 0%] (Warmup) Iteration: 1001 / 50000 [ 2%] (Sampling) Iteration: 6000 / 50000 [ 12%] (Sampling) Iteration: 11000 / 50000 [ 22%] (Sampling) Iteration: 16000 / 50000 [ 32%] (Sampling) Iteration: 21000 / 50000 [ 42%] (Sampling) Iteration: 26000 / 50000 [ 52%] (Sampling) Iteration: 31000 / 50000 [ 62%] (Sampling) Iteration: 36000 / 50000 [ 72%] (Sampling) Iteration: 41000 / 50000 [ 82%] (Sampling) Iteration: 46000 / 50000 [ 92%] (Sampling) Iteration: 50000 / 50000 [100%] (Sampling) Elapsed Time: 0.028573 seconds (Warm-up) 1.83949 seconds (Sampling) 1.86806 seconds (Total) SAMPLING FOR MODEL 'continuous' NOW (CHAIN 2). Gradient evaluation took 1.2e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.12 seconds. Adjust your expectations accordingly! Iteration: 1 / 50000 [ 0%] (Warmup) Iteration: 1001 / 50000 [ 2%] (Sampling) Iteration: 6000 / 50000 [ 12%] (Sampling) Iteration: 11000 / 50000 [ 22%] (Sampling) Iteration: 16000 / 50000 [ 32%] (Sampling) Iteration: 21000 / 50000 [ 42%] (Sampling) Iteration: 26000 / 50000 [ 52%] (Sampling) Iteration: 31000 / 50000 [ 62%] (Sampling) Iteration: 36000 / 50000 [ 72%] (Sampling) Iteration: 41000 / 50000 [ 82%] (Sampling) Iteration: 46000 / 50000 [ 92%] (Sampling) Iteration: 50000 / 50000 [100%] (Sampling) Elapsed Time: 0.029688 seconds (Warm-up) 2.12988 seconds (Sampling) 2.15957 seconds (Total) SAMPLING FOR MODEL 'continuous' NOW (CHAIN 3). Gradient evaluation took 1.2e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.12 seconds. Adjust your expectations accordingly! Iteration: 1 / 50000 [ 0%] (Warmup) Iteration: 1001 / 50000 [ 2%] (Sampling) Iteration: 6000 / 50000 [ 12%] (Sampling) Iteration: 11000 / 50000 [ 22%] (Sampling) Iteration: 16000 / 50000 [ 32%] (Sampling) Iteration: 21000 / 50000 [ 42%] (Sampling) Iteration: 26000 / 50000 [ 52%] (Sampling) Iteration: 31000 / 50000 [ 62%] (Sampling) Iteration: 36000 / 50000 [ 72%] (Sampling) Iteration: 41000 / 50000 [ 82%] (Sampling) Iteration: 46000 / 50000 [ 92%] (Sampling) Iteration: 50000 / 50000 [100%] (Sampling) Elapsed Time: 0.050052 seconds (Warm-up) 2.07093 seconds (Sampling) 2.12099 seconds (Total) SAMPLING FOR MODEL 'continuous' NOW (CHAIN 4). Gradient evaluation took 1.3e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.13 seconds. Adjust your expectations accordingly! Iteration: 1 / 50000 [ 0%] (Warmup) Iteration: 1001 / 50000 [ 2%] (Sampling) Iteration: 6000 / 50000 [ 12%] (Sampling) Iteration: 11000 / 50000 [ 22%] (Sampling) Iteration: 16000 / 50000 [ 32%] (Sampling) Iteration: 21000 / 50000 [ 42%] (Sampling) Iteration: 26000 / 50000 [ 52%] (Sampling) Iteration: 31000 / 50000 [ 62%] (Sampling) Iteration: 36000 / 50000 [ 72%] (Sampling) Iteration: 41000 / 50000 [ 82%] (Sampling) Iteration: 46000 / 50000 [ 92%] (Sampling) Iteration: 50000 / 50000 [100%] (Sampling) Elapsed Time: 0.030795 seconds (Warm-up) 1.38197 seconds (Sampling) 1.41276 seconds (Total) > > summary(model10, digits = 2) Model Info: function: stan_glm family: gaussian [identity] formula: adverts ~ packets algorithm: sampling priors: see help('prior_summary') sample: 196000 (posterior sample size) observations: 5 predictors: 2 Estimates: mean sd 2.5% 25% 50% 75% 97.5% (Intercept) 0.00 0.38 -0.77 -0.20 0.00 0.20 0.77 packets 0.85 0.41 -0.01 0.64 0.85 1.07 1.66 sigma 0.76 0.38 0.33 0.51 0.66 0.90 1.77 mean_PPD 0.00 0.54 -1.08 -0.28 0.00 0.28 1.09 log-posterior -10.46 1.62 -14.63 -11.22 -10.05 -9.26 -8.58 Diagnostics: mcse Rhat n_eff (Intercept) 0.00 1.00 81230 packets 0.00 1.00 82355 sigma 0.00 1.00 47027 mean_PPD 0.00 1.00 114849 log-posterior 0.01 1.00 38306 For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1). > plot(model10) > >