What is the Bayes Factor anyway?

The Bayes Factor is a quantitative measure of much evidence there is for hypothesis A (H_a) relative to hypothesis B (H_b) given the data. But what does that even mean? If you’re familiar with the likelihood ratio test for traditional statistical tests, the Bayes Factor is essentially the Bayesian equivalent with a little more rigor. This fantastic blog post (link) explains it really well.

Under a traditional setting if we have some data and we fit two statistical models, we can compare the two and gauge which one fits the data better by the likelihood ratio test.

    \[ \begin{aligned} H_a:& \text{ } \mathbf{\beta} = \mathbf{\beta}_a \\ H_b:& \text{ } \mathbf{\beta} = \mathbf{\beta}_b \\ \end{aligned} \]

Commonly the log-likelihood is used to avoid very small numbers. Under this formulation the log of the likelihood ratio is then given by

    \[ \text{D} = 2 \left( \ell(\mathbf{\beta}_b|\mathbf{y}) - \ell(\mathbf{\beta}_a|\mathbf{y}) \right) \]

where \ell(\beta|\mathbf{y}) is the log-likelihood. If \text{D} > c the hypothesis H_b is rejected and if \text{D} < c the hypothesis H_a is rejected where c is often chosen to be 0.


Using the familiar Boston dataset we’ll see this in action. Two models will be fit, one with all covariates and one after backwards selection has been implemented to find the best fitting model. I won’t be too rigorous with the modelling, I just want to demonstrate the theory.


In this case D < 0 so we would reject H_a and retain the second model, which is what we expect given we ran backwards selection to find a better fitting model.

The Bayes Factor

Under a Bayesian framework instead of looking at the ratio of likelihoods between two models we look at the ratio of the posterior distributions to take into account the prior information.

    \[ K=\frac{P(H_b|\mathbf{y})}{P(H_a|\mathbf{y})} = \frac{P(H_b)}{P(H_a)} \frac{P(\mathbf{y}|H_b)}{P(\mathbf{y}|H_a)} \]

For each model the there is a probability of observing the given data. This requires the calculation of the marginal probabilities

    \[ P(\mathbf{y}|H_a) = \int{P(\mathbf{\beta}_a)P(\mathbf{y}|\mathbf{\beta}_a)}d\mathbf{\beta}_a \]

Fortunately this is all taken care of by the MCMCpack package.


Using the Boston data again we’ll apply the same models under Bayesian framework using MCMCregress(). To do so priors need to be specified. To make things easy the coeficients of the first models will be rounded to be (very) good approximations of the prior means. The standard deviations will be set to be 1, larger than the actual standard errors meaning we’ve essentially chosen flat priors. The other inputs c0 and d0 are the shape and scale parameters for the model variance inverse gamma prior. Again these values describe a wide distribution meaning a non-informative prior, but necessary for the calculation.


The final thing to do is interpret the output. The log of the Bayes factor is 5.12 indicating very strong support for model b.

Interpretation of the Bayes factor can be a hot topic because there are efforts to avoid a similar situation to p-hacking i.e. where P < 0.05 is significant therefore we reject the null hypothesis. There are tables out there which break down the scale of the Bayes factor so I’ll leave it there, but check out this post (link) for more.

Follow me on social media: