The Bayes Factor is a quantitative measure of much evidence there is for hypothesis A relative to hypothesis B
given the data. But what does that even mean? If you’re familiar with the likelihood ratio test for traditional statistical tests, the Bayes Factor is essentially the Bayesian equivalent with a little more rigor. This fantastic blog post (link) explains it really well.
Under a traditional setting if we have some data and we fit two statistical models, we can compare the two and gauge which one fits the data better by the likelihood ratio test.
Commonly the log-likelihood is used to avoid very small numbers. Under this formulation the log of the likelihood ratio is then given by
where is the log-likelihood. If
the hypothesis
is rejected and if
the hypothesis
is rejected where
is often chosen to be 0.
Example
Using the familiar Boston dataset we’ll see this in action. Two models will be fit, one with all covariates and one after backwards selection has been implemented to find the best fitting model. I won’t be too rigorous with the modelling, I just want to demonstrate the theory.
suppressPackageStartupMessages(library(MCMCpack))
suppressPackageStartupMessages(library(lmtest))
data(Boston)
Boston$chas <- as.factor(Boston$chas)
# simple linear model
# full model
glm.a <- glm(log(medv) ~ ., data = Boston)
# backwards selection
back_sel <- step(glm(log(medv) ~ ., data = Boston), direction = "backward")
## Start: AIC=-229.23 ## log(medv) ~ crim + zn + indus + chas + nox + rm + age + dis + ## rad + tax + ptratio + black + lstat ## ## Df Deviance AIC ## - age 1 17.755 -231.065 ## - indus 1 17.786 -230.197 ## <none> 17.749 -229.228 ## - zn 1 17.914 -226.567 ## - chas 1 18.058 -222.502 ## - black 1 18.283 -216.234 ## - tax 1 18.373 -213.758 ## - nox 1 18.684 -205.249 ## - rad 1 18.791 -202.381 ## - rm 1 18.813 -201.778 ## - dis 1 19.113 -193.768 ## - ptratio 1 19.676 -179.075 ## - crim 1 19.949 -172.115 ## - lstat 1 25.130 -55.281 ## ## Step: AIC=-231.07 ## log(medv) ~ crim + zn + indus + chas + nox + rm + dis + rad + ## tax + ptratio + black + lstat ## ## Df Deviance AIC ## - indus 1 17.791 -232.03 ## <none> 17.755 -231.06 ## - zn 1 17.914 -228.55 ## - chas 1 18.069 -224.20 ## - black 1 18.298 -217.82 ## - tax 1 18.376 -215.68 ## - nox 1 18.720 -206.30 ## - rad 1 18.791 -204.38 ## - rm 1 18.900 -201.44 ## - dis 1 19.302 -190.79 ## - ptratio 1 19.677 -181.05 ## - crim 1 19.954 -173.99 ## - lstat 1 25.950 -41.04 ## ## Step: AIC=-232.03 ## log(medv) ~ crim + zn + chas + nox + rm + dis + rad + tax + ptratio + ## black + lstat ## ## Df Deviance AIC ## <none> 17.791 -232.033 ## - zn 1 17.936 -229.924 ## - chas 1 18.131 -224.458 ## - black 1 18.326 -219.057 ## - tax 1 18.405 -216.868 ## - nox 1 18.726 -208.117 ## - rad 1 18.800 -206.125 ## - rm 1 18.909 -203.219 ## - dis 1 19.530 -186.857 ## - ptratio 1 19.678 -183.045 ## - crim 1 20.014 -174.461 ## - lstat 1 25.952 -43.004
glm.b <- glm(back_sel$formula, data = Boston)
# log-likelihood ratio test
D <- 2*(logLik(glm.b) - logLik(glm.a))
D
## 'log Lik.' -1.195587 (df=13)
In this case so we would reject
and retain the second model, which is what we expect given we ran backwards selection to find a better fitting model.
The Bayes Factor
Under a Bayesian framework instead of looking at the ratio of likelihoods between two models we look at the ratio of the posterior distributions to take into account the prior information.
For each model the there is a probability of observing the given data. This requires the calculation of the marginal probabilities
Fortunately this is all taken care of by the MCMCpack package.
Example
Using the Boston data again we’ll apply the same models under Bayesian framework using MCMCregress(). To do so priors need to be specified. To make things easy the coeficients of the first models will be rounded to be (very) good approximations of the prior means. The standard deviations will be set to be 1, larger than the actual standard errors meaning we’ve essentially chosen flat priors. The other inputs and
are the shape and scale parameters for the model variance inverse gamma prior. Again these values describe a wide distribution meaning a non-informative prior, but necessary for the calculation.
# using the above lets hack some priors together
prior_b0_a <- unname(round(coef(glm.a)))
prior_b0_b <- unname(round(coef(glm.b)))
# bayesain models with MCMCpack
mod.a <- MCMCregress(glm.a$formula, data = Boston,
b0 = prior_b0_a, B0 = rep(1, length(prior_b0_a)),
c0 = 10, d0 = 10^5,
marginal.likelihood = "Chib95")
mod.b <- MCMCregress(glm.b$formula, data = Boston,
b0 = prior_b0_b, B0 = rep(1, length(prior_b0_b)),
c0 = 10, d0 = 10^5,
marginal.likelihood = "Chib95")
# calculate the Bayes Factor
BayesFactor(mod.a, mod.b)
## The matrix of Bayes Factors is: ## mod.a mod.b ## mod.a 1 0.006 ## mod.b 167 1.000 ## ## The matrix of the natural log Bayes Factors is: ## mod.a mod.b ## mod.a 0.00 -5.12 ## mod.b 5.12 0.00 ## ## mod.a : ## call = ## MCMCregress(formula = glm.a$formula, data = Boston, b0 = prior_b0_a, ## B0 = rep(1, length(prior_b0_a)), c0 = 10, d0 = 10^5, marginal.likelihood = "Chib95") ## ## log marginal likelihood = -2063.353 ## ## ## mod.b : ## call = ## MCMCregress(formula = glm.b$formula, data = Boston, b0 = prior_b0_b, ## B0 = rep(1, length(prior_b0_b)), c0 = 10, d0 = 10^5, marginal.likelihood = "Chib95") ## ## log marginal likelihood = -2058.238
The final thing to do is interpret the output. The log of the Bayes factor is 5.12 indicating very strong support for model b.
Interpretation of the Bayes factor can be a hot topic because there are efforts to avoid a similar situation to p-hacking i.e. where is significant therefore we reject the null hypothesis. There are tables out there which break down the scale of the Bayes factor so I’ll leave it there, but check out this post (link) for more.