Bayes Factors for t tests and one way Analysis of Variance; in R
Link to the last RSS article here: Sharpening Occam’s Razor: Using Bayesian Model Averaging in R to Separate the Wheat from the Chaff -- Ed.
By Dr. Jon Starkweather, Research and Statistical Support Consultant
Tt may seem like small potatoes, but the Bayesian approach offers advantages even when the analysis to be run is not complex. For instance, a traditional frequentist approach to a t test or one way Analysis of Variance (ANOVA; two or more group design with one outcome variable) would result in a p value which would be interpreted as the probability of the data (result) assuming the null hypothesis is true. Often, the p value’s interpretation is abbreviated and it is interpreted as indicating empirical support for or against a null hypothesis. Of course, an effect size measure such as Cohen’s d (Cohen, 1988) would also be computed to offer insight as to the magnitude of the effect. Wetzels, Matzke, Lee, Rouder, Iverson, and Wagenmakers (submitted) have advocated use of the Bayesian perspective for simple two or more group designs with the use of Bayes Factors. The advantage being that Bayes Factors “incorporate inferences about both the presence of effects, as well as their magnitude…” (Wetzels, et al., p. 1). Kass and Raftery (1995) define the Bayes factor as “a summary of the evidence provided by the data in favor of one scientific theory, represented by a statistical model, as opposed to another” (p. 777). Simply stated, the Bayes factor is a number, a ratio of one model’s odds over the odds of another model. Another way of thinking about the meaning of a Bayes factor is that it is the resultant odds from dividing the likelihood of one model (e.g. the null hypothesis) by the likelihood of another model (e.g. the alternative hypothesis). At the risk of figuratively beating a dead horse; the Bayes factor can be thought of as the result of a simple fraction or division, in which the probability of one model (null) is divided by the probability of a second model (alternative). Therefore, its interpretation is very straightforward. For instance, a Bayes factor of 1.00 represents equal odds for either model (the null and alternative hypotheses), a Bayes factor greater than 1.00 represents evidence for the one model (e.g. the null hypothesis), and a Bayes factor less than 1.00 represents evidence for another model (e.g. the alternative hypothesis). The interpretation of magnitude for a Bayes factor, like traditional effect size estimates, involves some flexible categories (suggested by Jeffreys, 1961). For instance, a Bayes factor between (roughly) 1.00 and 3.00 (or between 1 and 0.30) represents scarce evidence, a Bayes factor between (roughly) 3.00 and 10.00 (or between 0.30 and 0.10) represents substantial evidence, a Bayes factor between (roughly) 10.00 and 30.00 (or between 0.10 and 0.03) represents strong evidence, and a Bayes factor between (roughly) 30.00 and 100.00 (or between 0.03 and 0.01) represents very strong evidence (Jeffreys). It is important to note; theoretically, there is no limit to the magnitude of a Bayes factor, Jeffreys suggested that a Bayes factor greater than 100.00 (or less than 0.01) would represent decisive evidence. So, the benefits of taking a Bayesian perspective (beyond the general reasons for choosing a Bayesian perspective over a frequentist perspective) are that in these simple situations, a Bayes factor is one number which is easily interpreted for both identifying an effect and measuring the magnitude of the effect. By contrast, the frequentist p value is easily confused, controversial, and would involve another statistic to express the magnitude of effect (i.e. effect size; e.g. Cohen’s d).
Implementing the use of Bayes factors is very easy to do when working in R. The package ‘BayesFactorPCL’ (Morey & Rouder, 2010) provides functions for the computation of Bayes factors for one sample or two sample t tests, as well as for one way ANOVA. The package is relatively new and is still being developed, so it is only available (for now) from R-Forge. However the functions for t tests and one way ANOVA are stable. The package authors are working on implementing a function for applying Bayes factors to regression and that is likely why the package has not yet been released to CRAN.
To explore some examples of Bayes factors analysis using the functions in the ‘BayesFactorPCL’ package, begin by importing some data from the web naming it ‘example.1’. In R, load the ‘foreign’ library (necessary to import SPSS.sav files; which this example uses), then import the data, and then get a summary of the data if desired.
Next, load the ‘Rcmdr’ and ‘abind’ packages using: library(Rcmdr) and library(abind) in the console. We will need these packages for the ‘numSummary’ functions (used below) which display descriptive statistics information in a cross-tabs manner.
Finally, we can load the ‘BayesFactorPCL’ library.
If you would like more information about the ‘BayesFactorPCL’ library, simply consult the help documentation, by typing: help(BayesFactorPCL) in the console.
t test example.
In order to see what we are using as an example, we can use the ‘numSummary’ function from the Rcmdr and abind packages. Our dependent variable is number of words recalled (Recall1) and our independent variable is type of candy given to participants (Candy) where some participants were given Skittles (Skittles) and some participants were given no candy (none).
Next, we can use the ‘tapply’ function to calculate the variances (var) of each group and use the ‘leveneTest’ function to test the assumption of homogeneity of variances.
The output (above) shows that the variances (Skittles = 10.25, None = 17.68) are not significantly different (p = 0.077). Below a box and whisker plot displays fairly clearly how the groups differ.
Next, we can conduct a traditional t test for comparison with the Bayes factor; and in R we need the actual t value to calculate the Bayes factor later.
Here, we see that participants in the Skittles group (M = 9.41, SD = 3.20) recalled significantly fewer words than participants in the group which received no candy (M = 17.30, SD = 4.20), t(52) = -7.7566, p < .001. Of course, we would want to take a look at the effect size, Cohen’s d (1988), also called the Standardized Measure of Difference (SMD). To calculate the SMD in R, we need to first split the two groups’ scores of the dependent variable.
Next, load the ‘MBESS’ library which contains the ‘smd’ function. Then we can apply that function to our groups. Here (below) we see a substantially large effect size (d = -2.11).
Next, we can conduct a Bayesian version of the Levene's test for homogeneity of variances; using the 'BayesFactorPCL' library function 'eqVariance.Gibbs'; which requires a matrix of data with each group as a column and each row a case.
The output for the ‘eqVariance.Gibbs’ function is quite long, but the key feature is the "$BF" is the Bayes Factor, which if greater than one indicates the groups' variances are equal.
Long section of output omitted to save space.
Now, we can take the information from the traditional t test and conduct the Bayes factor analysis using the ‘ttest.Quad’ function from the ‘BayesFactorPCL’ library.
Notice the script above specifies the default for the prior; Cauchy, which is preferred (Rouder, Speckman, Sun, & Morey, 2009). When that argument is specified as FALSE, a normal prior distribution is applied. The normal prior applied to this (example) data changes the result very little (see below).
Use of either type of prior results in a very, very small Bayes factor, indicating decisive evidence for the alternative hypothesis (i.e. there is a difference between the groups in the number of words recalled and the magnitude of effect is ‘decisive’). For more information on the ‘ttest.Quad’ function, simply type help(ttest.Quad) in the R console to bring up the function documentation.
One way ANOVA example.
First, take a look at the variables of interest, here we are testing the number of words recalled (Recall1) of three distraction groups (No Distraction, Cell Phone Ring, & Light Bulb Failure).
Next, as was done above, take a look at the variances of each group and evaluate the homogeneity of variance assumption.
A box-and-whisker plot shows how the groups’ number of words recalled were distributed.
Next, we can conduct the traditional ANOVA. We see (below), there does not appear to be a significant difference in the number of words recalled among the groups.
Now, we can compute the Bayes factor using the ‘oneWayAOV.Quad’ function from the ‘BayesFactorPCL’ library. Notice the ‘rscale = 1’ argument which specifies a non-informative Jeffreys, standard multivariate prior; which is the default and recommended (Morey & Rouder, 2010; Rouder, Speckman, Sun, & Morey, 2009).
The 3.23 Bayes factor indicates there was ‘substantial’ evidence from this data, to support the null model or null hypothesis (i.e. consistent with the traditional approach results; there is no significant difference among the groups).
For more information on the ‘oneWayAOV.Quad’ function, simply type help(oneWayAOV.Quad) in the R console.
The ‘LearnBayes’ package, which is a companion for the book Bayesian Computation with R, both of which authored by Jim Albert (2010, 2007); also contains functions for computing Bayes Factors.
An Adobe.pdf version of this article can be found here.
Until next time, “now all the criminals in their coats and their ties; are free to drink martinis and watch the sun rise…” (Dylan & Levy, 1975)
References / Resources
Albert, J. (2007). Bayesian computation with R. New York: Springer Science+Business Media, LLC.
Albert, J. (2010). Package ‘LearnBayes’. Available at CRAN: http://cran.r-project.org/web/packages/LearnBayes/index.html
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd. ed.). Lawrence Erlbaum Associates.
Dylan, B., & Levy, J. (1975). Hurricane. Album: Desire.
Jeffreys, H. (1961). Theory of probability. Oxford, UK: Oxford University Press.
Geyer, C. J. (2010). Bayes factors via serial tempering. Available at: http://cran.r-project.org/web/packages/mcmc/vignettes/bfst.pdf
Kass, R. E., & Raftery, A. E. (1995). Bayes Factors. Journal of the American Statistical Association, 90, 773 -- 795. Available at: http://www.stat.cmu.edu/~kass/papers/bayesfactors.pdf
Morey, R. D., & Rouder, J. N. (2010). Package ‘BayesFactorPCL’. Available at: https://r-forge.r-project.org/projects/bayesfactorpcl/
Rouder, J. N., Speckman, P. L., Sun, D., & Morey, R. D. (2009). Bayesian t tests for accepting and rejecting the null hypothesis. Psychonomic Bulletin & Review, 16(2), 225 -- 237. Available at: http://pcl.missouri.edu/node/32
Wetzels, R., Matzke, D., Lee, M., Rouder, J., Iverson, G., & Wagenmakers, E. (submitted). Statistical evidence in experimental psychology: An empirical comparison using 855 t tests. Available at: http://www.socsci.uci.edu/~mdlee/WetzelsEtAl2010.pdf