> > # load or enter data > > #album2<-read.delim("Album Sales 2.dat", header = TRUE) > #albumNames<- c("adverts", "sales", "airplay", "attract") > > album2 <- ' + 10.256 330 43 10 + 985.685 120 28 7 + 1445.563 360 35 7 + 1188.193 270 33 7 + 574.513 220 44 5 + 568.954 170 19 5 + 471.814 70 20 1 + 537.352 210 22 9 + 514.068 200 21 7 + 174.093 300 40 7 + 1720.806 290 32 7 + 611.479 70 20 2 + 251.192 150 24 8 + 97.972 190 38 6 + 406.814 240 24 7 + 265.398 100 25 5 + 1323.287 250 35 5 + 196.650 210 36 8 + 1326.598 280 27 8 + 1380.689 230 33 8 + 792.345 210 33 7 + 957.167 230 28 6 + 1789.659 320 30 9 + 656.137 210 34 7 + 613.697 230 49 7 + 313.362 250 40 8 + 336.510 60 20 4 + 1544.899 330 42 7 + 68.954 150 35 8 + 785.692 150 8 6 + 125.628 180 49 7 + 377.925 80 19 8 + 217.994 180 42 6 + 759.862 130 6 7 + 1163.444 320 36 6 + 842.957 280 32 7 + 125.179 200 28 6 + 236.598 130 25 8 + 669.811 190 34 8 + 612.234 150 21 6 + 922.019 230 34 7 + 50.000 310 63 7 + 2000.000 340 31 7 + 1054.027 240 25 7 + 385.045 180 42 7 + 1507.972 220 37 7 + 102.568 40 25 8 + 204.568 190 26 7 + 1170.918 290 39 7 + 689.547 340 46 7 + 784.220 250 36 6 + 405.913 190 12 4 + 179.778 120 2 8 + 607.258 230 29 8 + 1542.329 190 33 8 + 1112.470 210 28 7 + 856.985 170 10 6 + 836.331 310 38 7 + 236.908 90 19 4 + 1077.855 140 13 6 + 579.321 300 30 7 + 1500.000 340 38 8 + 731.364 170 22 8 + 25.689 100 23 6 + 391.749 200 22 9 + 233.999 80 20 7 + 275.700 100 18 6 + 56.895 70 37 7 + 255.117 50 16 8 + 566.501 240 32 8 + 102.568 160 26 5 + 250.568 290 53 9 + 68.594 140 28 7 + 642.786 210 32 7 + 1500.000 300 24 7 + 102.563 230 37 6 + 756.984 280 30 8 + 51.229 160 19 7 + 644.151 200 47 6 + 15.313 110 22 5 + 243.237 110 10 8 + 256.894 70 1 4 + 22.464 100 1 6 + 45.689 190 39 6 + 724.938 70 8 5 + 1126.461 360 38 7 + 1985.119 360 35 5 + 1837.516 300 40 5 + 135.986 120 22 7 + 237.703 150 27 8 + 976.641 220 31 6 + 1452.689 280 19 7 + 1600.000 300 24 9 + 268.598 140 1 7 + 900.889 290 38 8 + 982.063 180 26 6 + 201.356 140 11 6 + 746.024 210 34 6 + 1132.877 250 55 7 + 1000.000 250 5 7 + 75.896 120 34 6 + 1351.254 290 37 9 + 202.705 60 13 8 + 365.985 140 23 6 + 305.268 290 54 6 + 263.268 160 18 7 + 513.694 100 2 7 + 152.609 160 11 6 + 35.987 150 30 8 + 102.568 140 22 7 + 215.368 230 36 6 + 426.784 230 37 8 + 507.772 30 9 3 + 233.291 80 2 7 + 1035.433 190 12 8 + 102.642 90 5 9 + 526.142 120 14 7 + 624.538 150 20 5 + 912.349 230 57 6 + 215.994 150 19 8 + 561.963 210 35 7 + 474.760 180 22 5 + 231.523 140 16 7 + 678.596 360 53 7 + 70.922 10 4 6 + 1567.548 240 29 6 + 263.598 270 43 7 + 1423.568 290 26 7 + 715.678 220 28 7 + 777.237 230 37 8 + 509.430 220 32 5 + 964.110 240 34 7 + 583.627 260 30 7 + 923.373 170 15 7 + 344.392 130 23 7 + 1095.578 270 31 8 + 100.025 140 21 5 + 30.425 60 28 1 + 1080.342 210 18 7 + 799.899 210 28 7 + 1071.752 240 37 8 + 893.355 210 26 6 + 283.161 200 30 8 + 917.017 140 10 7 + 234.568 90 21 7 + 456.897 120 18 9 + 206.973 100 14 7 + 1294.099 360 38 7 + 826.859 180 36 6 + 564.158 150 32 7 + 192.607 110 9 5 + 10.652 90 39 5 + 45.689 160 24 7 + 42.568 230 45 7 + 20.456 40 13 8 + 635.192 60 17 6 + 1002.273 230 32 7 + 1177.047 230 23 6 + 507.638 120 0 6 + 215.689 150 35 5 + 526.480 120 26 6 + 26.895 60 19 6 + 883.877 280 26 7 + 9.104 120 53 8 + 103.568 230 29 8 + 169.583 230 28 7 + 429.504 40 17 6 + 223.639 140 26 8 + 145.585 360 42 8 + 985.968 210 17 6 + 500.922 260 36 8 + 226.652 250 45 7 + 1051.168 200 20 7 + 68.093 150 15 7 + 1547.159 250 28 8 + 393.774 100 27 6 + 804.282 260 17 8 + 801.577 210 32 8 + 450.562 290 46 9 + 26.598 220 47 8 + 179.061 70 19 1 + 345.687 110 22 8 + 295.840 250 55 9 + 2271.860 320 31 5 + 1134.575 300 39 8 + 601.434 180 21 6 + 45.298 180 36 6 + 759.518 200 21 7 + 832.869 320 44 7 + 56.894 140 27 7 + 709.399 100 16 6 + 56.895 120 33 6 + 767.134 230 33 8 + 503.172 150 21 7 + 700.929 250 35 9 + 910.851 190 26 7 + 888.569 240 14 6 + 800.615 250 34 6 + 1500.000 230 11 8 + 785.694 110 20 9' > > albumNames<- c("adverts", "sales", "airplay", "attract") > > album2 <- data.frame( read.table(text=album2, fill=TRUE, col.names=albumNames) ) > > > > ###################### regular, non Bayesian analyses of the data ###################### > > # conventional multiple linear regression with NHST (p. 279) > > fit = lm(sales ~ adverts + airplay + attract, data = album2) > summary(fit) Call: lm(formula = sales ~ adverts + airplay + attract, data = album2) Residuals: Min 1Q Median 3Q Max -121.324 -28.336 -0.451 28.967 144.132 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -26.612958 17.350001 -1.534 0.127 adverts 0.084885 0.006923 12.261 < 2e-16 *** airplay 3.367425 0.277771 12.123 < 2e-16 *** attract 11.086335 2.437849 4.548 9.49e-06 *** --- Signif. codes: 0 Ô***Õ 0.001 Ô**Õ 0.01 Ô*Õ 0.05 Ô.Õ 0.1 Ô Õ 1 Residual standard error: 47.09 on 196 degrees of freedom Multiple R-squared: 0.6647, Adjusted R-squared: 0.6595 F-statistic: 129.5 on 3 and 196 DF, p-value: < 2.2e-16 > > > > ###################### Bayesian Regression through MCMCglmm ###################### > > model1 = MCMCglmm(sales ~ adverts + airplay + attract, data = album2, + nitt=50000, burnin=3000, thin=10, verbose=F) > > summary(model1) Iterations = 3001:49991 Thinning interval = 10 Sample size = 4700 DIC: 2114.398 R-structure: ~units post.mean l-95% CI u-95% CI eff.samp units 2240 1825 2703 4700 Location effects: sales ~ adverts + airplay + attract post.mean l-95% CI u-95% CI eff.samp pMCMC (Intercept) -26.45144 -58.63401 8.76438 4700 0.124 adverts 0.08490 0.07111 0.09844 4700 <2e-04 *** airplay 3.36391 2.79723 3.90889 4320 <2e-04 *** attract 11.08396 6.22036 15.79193 4700 <2e-04 *** --- Signif. codes: 0 Ô***Õ 0.001 Ô**Õ 0.01 Ô*Õ 0.05 Ô.Õ 0.1 Ô Õ 1 > autocorr(model1$VCV) , , units units Lag 0 1.000000000 Lag 10 -0.007938273 Lag 50 0.007551095 Lag 100 0.012793699 Lag 500 -0.005771883 > plot(model1) Hit to see next plot: > > # running the same analyses using different priors, & then plotting the various posteriors for "airplay" > > estimateNum = 3 # the number of regression estimate to be used for the analyses e.g., the 1st (intercept), or 2nd, etc > > Ndatasets = 5 # the number of additional analyses using different priors > > # plot data for the first Bayes model > densdat <- density(model1$Sol[,estimateNum]) > plotdatx <- densdat$x > plotdaty <- densdat$y > > # for MCMCglmm priors, B is for fixed effects. B is a list containing the expected value (mu) > # and a (co)varcv bgiance matrix (V) representing the strength of belief: the defaults are B$mu=0 > # and B$V=I*1e+10, where where I is an identity matrix of appropriate dimension > > # the different priors will be derived from the lm, non-Bayesian regression coefficients > # the mu values will be random values from a 4 SE range around the lm estimates > # the V values will be set at (4 * the SEs)**2 > > fit.list <- summary.lm(fit) > coeffs <- fit.list$coefficients > coefnames <- rownames(coeffs) > > # ranges around the lm estimates > MUranges <- cbind( (coeffs[,1] - (4 * coeffs[,2])), (coeffs[,1] + (4 * coeffs[,2])) ) > > # random values from the MUranges for each coefficient > MUs <- matrix(-9999,nrow(coeffs),Ndatasets) > for (lupe in 1:nrow(coeffs)) { MUs[lupe,] <- runif(Ndatasets, min=MUranges[lupe,1], max=MUranges[lupe,2]) } > dimnames(MUs) <-list(rep("", dim(MUs)[1])) > colnames(MUs) <- seq(1:Ndatasets) > cat('\n\nPrior regression coefficient estimates for each additional analysis:') Prior regression coefficient estimates for each additional analysis: > print(round(MUs,2)) 1 2 3 4 5 -94.14 -26.35 23.01 -58.21 -25.08 0.09 0.09 0.07 0.10 0.08 3.44 4.24 3.67 3.46 4.32 14.83 14.80 11.86 6.30 2.05 > > # (co)variance matrix (V) representing the strength of belief - which will be (2 * the lm SEs)**2 > Vlmx2 <- diag((4 * coeffs[,2])**2) > colnames(Vlmx2) <- seq(1:nrow(MUs)) > cat('\n\nPrior variance (strength of belief) value used for each regression estimate in the additional analyses:\n', + round(diag(Vlmx2),5), sep="\n") Prior variance (strength of belief) value used for each regression estimate in the additional analyses: 4816.36 0.00077 1.23451 95.08974 > > > for (lupeSets in 1:Ndatasets) { + prior.list <- list(B = list(mu = MUs[,lupeSets], V = Vlmx2)) + model2 = MCMCglmm(sales ~ adverts + airplay + attract, data = album2, prior = prior.list, + nitt=50000, burnin=3000, thin=10, verbose=F) + + densdat <- density(model2$Sol[,estimateNum]) + plotdatx <- cbind( plotdatx, densdat$x) + plotdaty <- cbind( plotdaty, densdat$y) + } > > matplot( plotdatx, plotdaty, main="", xlab=paste('Estimate for', coefnames[estimateNum]), ylab='Density', + font.lab=1.8, type='l', cex.lab=1.2, col=c(2,rep(1,ncol(plotdatx))), lty=1, lwd=1 ) > title(main=paste('Density Plots of the Posteriors for', coefnames[estimateNum], '\nProduced Using Differing Priors')) > legend("topright", c("broad priors","random priors"), bty="n", lty=c(1,1), + lwd=2, cex=1.2, text.col=c(2,1), col=c(2,1) ) > > > ################ posterior predictive checks using the bayesplot package ###################### > > # from the bayesplot package documentation: > # The bayesplot package provides various plotting functions for graphical posterior predictive checking, > # that is, creating graphical displays comparing observed data to simulated data from the posterior > # predictive distribution. The idea behind posterior predictive checking is simple: if a model is a > # good fit then we should be able to use it to generate data that looks a lot like the data we observed. > > y <- album2$sales # the DV that was used for the MCMCglmm model > > # generating the simulated data using the simulate.MCMCglmm function from the MCMCglmm package > yrep <- simulate.MCMCglmm(model1, 25) > > yrep <- t( yrep ) # transposing yrep for the bayesplot functions > > > # ppc_stat: A histogram of the distribution of a test statistic computed by applying stat to each > # dataset (row) in yrep. The value of the statistic in the observed data, stat(y), is > # overlaid as a vertical line. > ppc_stat(y, yrep, binwidth = 1) > > > # ppc_dens_overlay: Kernel density or empirical CDF estimates of each dataset (row) in yrep are > # overlaid, with the distribution of y itself on top (and in a darker shade). > ppc_dens_overlay(y, yrep[1:25, ]) > > > # ppc_scatter_avg: A scatterplot of y against the average values of yrep, i.e., the > # points (mean(yrep[, n]), y[n]), where each yrep[, n] is a vector of length equal > # to the number of posterior draws. > ppc_scatter_avg(y, yrep) > > > # ppc_hist: A separate histogram, shaded frequency polygon, smoothed kernel density estimate, > # or box and whiskers plot is displayed for y and each dataset (row) in yrep. > # For these plots yrep should therefore contain only a small number of rows. > ppc_hist(y, yrep[1:8, ], binwidth = .3) > > > > ###################### now conduct the Bayesian analyses using the rstanarm package ###################### > > > ###################### now conduct the Bayesian analyses using the rstanarm package ###################### > model10 <- stan_glm(sales ~ adverts + airplay + attract, data = album2, + warmup = 1000, iter = 2000, sparse = FALSE, seed = 123) SAMPLING FOR MODEL 'continuous' NOW (CHAIN 1). Gradient evaluation took 5e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.5 seconds. Adjust your expectations accordingly! Iteration: 1 / 2000 [ 0%] (Warmup) Iteration: 200 / 2000 [ 10%] (Warmup) Iteration: 400 / 2000 [ 20%] (Warmup) Iteration: 600 / 2000 [ 30%] (Warmup) Iteration: 800 / 2000 [ 40%] (Warmup) Iteration: 1000 / 2000 [ 50%] (Warmup) Iteration: 1001 / 2000 [ 50%] (Sampling) Iteration: 1200 / 2000 [ 60%] (Sampling) Iteration: 1400 / 2000 [ 70%] (Sampling) Iteration: 1600 / 2000 [ 80%] (Sampling) Iteration: 1800 / 2000 [ 90%] (Sampling) Iteration: 2000 / 2000 [100%] (Sampling) Elapsed Time: 0.132469 seconds (Warm-up) 0.130465 seconds (Sampling) 0.262934 seconds (Total) SAMPLING FOR MODEL 'continuous' NOW (CHAIN 2). Gradient evaluation took 1.7e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.17 seconds. Adjust your expectations accordingly! Iteration: 1 / 2000 [ 0%] (Warmup) Iteration: 200 / 2000 [ 10%] (Warmup) Iteration: 400 / 2000 [ 20%] (Warmup) Iteration: 600 / 2000 [ 30%] (Warmup) Iteration: 800 / 2000 [ 40%] (Warmup) Iteration: 1000 / 2000 [ 50%] (Warmup) Iteration: 1001 / 2000 [ 50%] (Sampling) Iteration: 1200 / 2000 [ 60%] (Sampling) Iteration: 1400 / 2000 [ 70%] (Sampling) Iteration: 1600 / 2000 [ 80%] (Sampling) Iteration: 1800 / 2000 [ 90%] (Sampling) Iteration: 2000 / 2000 [100%] (Sampling) Elapsed Time: 0.140624 seconds (Warm-up) 0.110341 seconds (Sampling) 0.250965 seconds (Total) SAMPLING FOR MODEL 'continuous' NOW (CHAIN 3). Gradient evaluation took 1.7e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.17 seconds. Adjust your expectations accordingly! Iteration: 1 / 2000 [ 0%] (Warmup) Iteration: 200 / 2000 [ 10%] (Warmup) Iteration: 400 / 2000 [ 20%] (Warmup) Iteration: 600 / 2000 [ 30%] (Warmup) Iteration: 800 / 2000 [ 40%] (Warmup) Iteration: 1000 / 2000 [ 50%] (Warmup) Iteration: 1001 / 2000 [ 50%] (Sampling) Iteration: 1200 / 2000 [ 60%] (Sampling) Iteration: 1400 / 2000 [ 70%] (Sampling) Iteration: 1600 / 2000 [ 80%] (Sampling) Iteration: 1800 / 2000 [ 90%] (Sampling) Iteration: 2000 / 2000 [100%] (Sampling) Elapsed Time: 0.168552 seconds (Warm-up) 0.098467 seconds (Sampling) 0.267019 seconds (Total) SAMPLING FOR MODEL 'continuous' NOW (CHAIN 4). Gradient evaluation took 3.1e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.31 seconds. Adjust your expectations accordingly! Iteration: 1 / 2000 [ 0%] (Warmup) Iteration: 200 / 2000 [ 10%] (Warmup) Iteration: 400 / 2000 [ 20%] (Warmup) Iteration: 600 / 2000 [ 30%] (Warmup) Iteration: 800 / 2000 [ 40%] (Warmup) Iteration: 1000 / 2000 [ 50%] (Warmup) Iteration: 1001 / 2000 [ 50%] (Sampling) Iteration: 1200 / 2000 [ 60%] (Sampling) Iteration: 1400 / 2000 [ 70%] (Sampling) Iteration: 1600 / 2000 [ 80%] (Sampling) Iteration: 1800 / 2000 [ 90%] (Sampling) Iteration: 2000 / 2000 [100%] (Sampling) Elapsed Time: 0.237165 seconds (Warm-up) 0.094195 seconds (Sampling) 0.33136 seconds (Total) > > summary(model10, digits = 3) Model Info: function: stan_glm family: gaussian [identity] formula: sales ~ adverts + airplay + attract algorithm: sampling priors: see help('prior_summary') sample: 4000 (posterior sample size) observations: 200 predictors: 4 Estimates: mean sd 2.5% 25% 50% 75% 97.5% (Intercept) -26.579 17.412 -60.378 -38.376 -26.764 -14.959 8.623 adverts 0.085 0.007 0.072 0.080 0.085 0.089 0.098 airplay 3.367 0.288 2.814 3.175 3.363 3.564 3.921 attract 11.091 2.446 6.397 9.433 11.164 12.675 15.993 sigma 47.258 2.427 42.723 45.582 47.167 48.804 52.443 mean_PPD 193.048 4.767 183.635 189.831 192.990 196.393 202.211 log-posterior -1066.252 1.594 -1070.379 -1067.056 -1065.918 -1065.080 -1064.180 Diagnostics: mcse Rhat n_eff (Intercept) 0.275 1.000 4000 adverts 0.000 1.000 4000 airplay 0.005 1.001 4000 attract 0.039 1.000 4000 sigma 0.038 1.000 4000 mean_PPD 0.075 1.000 4000 log-posterior 0.039 1.002 1692 For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1). > plot(model10) >