Do confessionals give away the winner of Survivor?

Confessional counts are often great for indicating who the key players are in a season of Survivor. But do confessionals really signpost who the winner is going to be for a given season and spoil the show?

In this post I answer the following 4 questions:

  1. Do those with the most confessionals at the final 5 typically win the season?
  2. Do more confessionals imply a greater chance of winning?
  3. Do winners tend to receive proportionally more confessionals i.e. more confessionals than expected?
  4. Do confessionals give away the winner of Survivor?

I’ll answer these questions with a bayesian data analysis in R using the {survivoR} package. This will focus on the US version but will touch on some stats from the Australian and South African versions.

TL;DR

In summary:

  1. Yes – this has occurred 17 times (39%). We would expect only 8-9 times.
  2. Yes – more confessionals imply a higher chance of winning.
  3. Yes – the winner tends to receive proportionally more confessionals, ~10-14% on average.
  4. No – confessionals are far from deterministic but are a useful predictor.

Let’s take a closer look.

1. Do those with the most confessionals at the final 5 typically win the season?

The data will be reduced to those in the final 5 for each season as this is typically either the final episode or at least near the very end.

Out of the 43 seasons, there have been 17 instances where the castaway with the most confessionals at the final 5 won the season.

VersionNumber of seasonsWinners with the most
confessionals at final 5
PercentageExpected (%)
US431739%20%
AU7229%20%
SA9222%20%

If confessionals had no association with the winner and it was essentially random chance we would expect only 8-9 instances (20%). So the hypothesis is, is the observed percentage of 39% statistically different from 20% or within random chance?

p0 <- 0.2
p <- 17/43
sd <- sqrt(p0*(1-p0)/43)
c(p0 - 1.96*sd, p0 + 1.96*sd)

# CI
[1] 0.08044112 0.31955888

The test shows that p = 0.39 is well outside the confidence interval (0.08, 0.32) showing that those with the highest number of confessionals at the final 5 are more likely to win.

The confessional rank chart shows who had the most confessionals and who ended up winning. The top castaways are the 17 winners that also had the highest number of confessionals.

2. Do more confessionals imply a greater chance of winning?

I’ll expand on the above analysis by fitting a model to estimate the probability of winning using confessionals as a predictor.

The confessionals have been standardised to an index estimating the proportion of confessionals above or below the expectation. This accounts for seasons that have more cast, more episodes, longer episodes, tribe composition, challenge wins and different editing styles. For example, an index of 1 means the castaway has received the amount that we would expect, an index of 1.5 means they have received 50% more than expected, and an index of 0.8 means they received 20% less than expected. The index is calculated using the make_conf_index() function at the bottom of the post.

I’ll fit a Bayesian GLM with the log confessional index as the predictor and the winner as the response. The index is bound between (0, Inf) and can have a long tail, so it’s appropriate to work on the log scale. I’ll use a basic prior on the predictor of N(0, 1).

mod_brm <- brm(
  winner ~ log_index,
  data = df,
  family = bernoulli(),
  prior = prior(normal(0, 1), class = "b", coef = "log_index")
  )
Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept    -1.36      0.18    -1.72    -1.02 1.00     3270     2680
log_index     1.18      0.40     0.42     2.00 1.00     2771     2315
IndexProbability of winningUplift
(increase from random chance)
0.511%-45%
121%+4%
1.530%+48%
237%+86%
2.543%+117%

There is definitely a relationship between confessionals and the chances of winning. The coefficient is well above 0 on the log scale indicating a strong positive relationship. The higher the relative number of confessionals the higher the probability of winning the season.

While there is a relationship some perspective is needed. A castaway with a relatively high number of confessionals and an index around 2-2.5 has an approximate 40% chance of winning. Less than a flip of a coin. A castaway with an index of 1 is right on expectation and has an approximate 20% chance of winning. Essentially random chance. So more confessionals doesn’t signpost them for the win but does indicate they are more likely to.

As the index increases, so do the chances but also the variation in the estimate of the probability. This is due to small sample sizes but also situations such as Russell Hantz. He dominated the counts in his seasons but never won. There are obviously a lot of complexities in the game that confessional counts are never going to capture, for example, challenge success, social aspects, gameplay, and jury pitches. The only place this exists here is in the model variation. More complex models with more predictors to refine the probability is absolutely possible but not done here.

To be precise, for the model assumptions to hold, each observation should be independent. But since there can only be one winner per season, independence doesn’t hold. For the purposes of testing if confessional counts increase the probability of winning it should be fine. This property actually makes it challenging to fit appropriate models.

In summary, the higher the index, the higher the chances of winning.

3. Does the winner tend to receive a more favourable edit?

Or in other words, does the winner receive proportionally more confessionals on average?

We’ll look at this in two ways

  1. The distribution of the confessional index at the final 5 and
  2. The distribution of the confessional index at the conclusion of the season.

If the distribution is statistically greater than 0 (again on the log scale) we can say that on average this is true. I’ll fit the simplest of all models with {brms} for both 1 and 2 (doesn’t need to be done this way but I like the way I can easily extract the draws from the posterior and summarise the output).

# final 5
mod_f5 <- brm(log_index ~ 1, data = df_x)
Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     0.09      0.06    -0.04     0.21 1.00     3093     2616
# end-of-season
mod_win <- brm(log_index ~ 1, data = df_x)
Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     0.13      0.06     0.02     0.24 1.00     2951     2492
ModelIndexy 95% CImu 95% CI
Final 51.10 (+10%)(0.47, 2.5)(0.96, 1.23)
End-of-season1.14 (+14%)(0.55, 2.38)(1.02, 1.27)

The estimate of the mean for the end-of-season model is above 1 (0 on the log scale) showing a positive relationship. The mean for the final 5 model is positive but 1 sits just inside the credible interval. It’s quite close though with a p-val of 0.07 and enough for me to call it.

Winners tend to receive a more favourable edit at the final 5 and more so by the end of the season.

4. Do confessionals give away the winner of Survivor?

No. This is best summarised in sections 1 and 2. The probability of winning only starts to go beyond 50% at around 2.5 i.e. the castaway has 2.5x more confessionals than expected for the season. Much better chances than random chance alone, but hardly a sure bet.

There could be a number of reasons why e.g.

  • They are too much of a threat, are voted out, and don’t make it to the final 2/3
  • They cooked their social game e.g. Russell H.
  • They tanked their jury pitch.

While confessionals are a good predictor they are far from deterministic. So my advice, use it as a guide but don’t read too much into it.

Based on confessionals alone, who had the best chance of winning S43?

Let’s put it to the test. Would the confessional counts predict the winner of season 43? Using the model from section to the winning probabilities are…

x <- posterior_epred(mod_brm, newdata = filter(df, version_season == "US43"))

df |>
  filter(version_season == "US43") |>
  add_castaway() |>
  mutate(p = colMeans(x)) |>
  select(castaway, n, index, log_index, p) |>
  arrange(desc(p))
CastawayConfessional countIndexlog(index)Probability of winning
Jesse491.250.2225%
Karla421.170.1624%
Owen371021%
Gabler350.94-0.0620%
Cassidy300.84-0.1818%

Two things come to mind

  1. Not particularly convincing based on the counts alone. I think everyone would have predicted a Jesse win if he made it to the final 3. But that’s due to more than simply the counts.
  2. Gabler actually had a higher probability than Cassidy.

The model could be vastly improved by using other predictors such as:

  • Votes received
  • Tribals attended
  • Successful boots
  • Challenge wins
  • Alliance strength

to name a few. But this was just to test confessionals.

Final thoughts

It definitely seems to be the case that more confessionals mean a higher chance of winning the season. That’s not too surprising since you want to build a narrative for the winner. Although there are definitely cases where the narrative for the season revolved around a particular person that didn’t make the final tribal e.g. Jesse in S43.

While the chances of winning are higher, the probabilities are still fairly low. Above random chance but far from being deterministic.

What I haven’t looked at is the differences between genders. Women tend to receive fewer confessionals than men but my expectation is it’s still in the same direction – the more confessionals the better. This is for a later post.

Code bits

All code is available on Github. Most is shown below. The code to produce the plots and extra bits is in the link.

Functions

cast_in_final_n <- function(.n) {
    survivoR::boot_mapping |>
    filter(final_n == .n) |>
    group_by(version_season) |>
    slice_max(episode) |>
    ungroup()
}

add_castaway <- function(df) {
  df |>
    left_join(
      survivoR::castaway_details |>
        select(castaway_id, castaway),
      by = "castaway_id"
    ) |>
    select(castaway_id, castaway, everything())
}

add_tribe <- function(df, .tribe_status = NULL) {

  if(is.null(.tribe_status)) {
    out <- df |>
      left_join(
        survivoR::tribe_mapping |>
          distinct(version_season, episode, castaway_id, tribe),
        by = c("version_season", "episode", "castaway_id")
      )
  } else {
    out <- df |>
      left_join(
        survivoR::tribe_mapping |>
          filter(tribe_status == .tribe_status) |>
          distinct(version_season, castaway_id, tribe),
        by = c("version_season", "castaway_id")
      )
  }
  out
}

make_conf_index <- function(
    .vs,
    .final = NULL,
    .ep = NULL
) {

  # set pars if NULL
  if(is.null(.final) & is.null(.ep)) {
    .ep <- 99
  } else if(!is.null(.final)){
    df_final_n <- survivoR::castaways |>
      filter(version_season == .vs) |>
      slice_max(order, n = .final)

    .ep <- min(df_final_n$episode)
  } else if(!is.null(.ep)) {
    .ep <- .ep + 1
  }

  # filter confessionals to episode
  df_conf_ep <- survivoR::confessionals |>
    filter(
      version_season == .vs,
      episode < .ep
    ) |>
    add_tribe() |>
    group_by(season, episode, tribe) |>
    summarise(
      total = sum(confessional_count),
      ep_mean = mean(confessional_count),
      n_cast = n(),
      .groups = "drop"
    )

  # who's still alive
  alive <- survivoR::castaways |>
    filter(version_season == .vs) |>
    mutate(episode = replace_na(episode, 99)) |>
    filter(episode >= .ep)

  # pull it together
  survivoR::confessionals |>
    filter(
      version_season == .vs,
      episode < .ep
    ) |>
    add_tribe() |>
    left_join(df_conf_ep, by = c("season", "episode", "tribe")) |>
    group_by(season, castaway_id) |>
    summarise(
      total = sum(confessional_count),
      n_conf = sum(confessional_count),
      n_eps = n_distinct(season, episode),
      mean_conf = round(mean(confessional_count), 1),
      exp_conf = sum(ep_mean)
    ) |>
    group_by(castaway_id) |>
    summarise(
      version_season = .vs,
      total = sum(total),
      n_eps = sum(n_eps),
      n_conf = sum(n_conf),
      exp_conf = sum(exp_conf),
    ) |>
    ungroup() |>
    add_castaway() |>
    mutate(
      index = n_conf/exp_conf,
      edit_pct = round(n_conf/exp_conf - 1, 2)*100,
      alive = castaway_id %in% alive$castaway_id
    ) |>
    arrange(desc(index)) |>
    select(version_season, everything())
}

1. Max confessionals

.v <- "US"

f5 <- cast_in_final_n(5) |>
  filter(
    !version_season %in% c("US44", "AU08"),
    version %in% .v
  ) |>
  group_by(version_season) |>
  distinct(version_season, f5_ep = episode)

winner <- survivoR::castaways |>
  filter(
    !version_season %in% c("US44", "AU08"),
    version %in% .v
    ) |>
  filter(str_detect(result, "Sole")) |>
  distinct(version_season, castaway_id) |>
  mutate(winner = 1)

df_vs <- f5 |>
  mutate(
    season = as.numeric(str_extract(version_season, "[:digit:]+")),
    version = str_sub(version_season, 1, 2)
  )

df_index <- NULL
for(k in 1:nrow(df_vs)) {
  df_index <- df_index |>
    bind_rows(
      make_conf_index(
        .vs = df_vs$version_season[k],
        .final = 5
      ) |>
        mutate(
          season = df_vs$season[k],
          version = df_vs$version[k]
        )
    )
}
df_index <- df_index |>
  mutate(log_index = log(index))

# main data frame
df <- survivoR::confessionals |>
  left_join(f5, by = "version_season") |>
  filter(episode < f5_ep) |>
  semi_join(cast_in_final_n(5), by = c("version_season", "castaway_id")) |>
  group_by(version, version_season, castaway_id) |>
  summarise(n = sum(confessional_count)) |>
  group_by(version_season) |>
  mutate(is_max = as.numeric(n == max(n))) |>
  left_join(winner, by = c("version_season", "castaway_id")) |>
  mutate(winner = replace_na(winner, 0)) |>
  group_by(version_season) |>
  mutate(mean = mean(n)) |>
  ungroup() |>
  mutate(
    sd = sd(n),
    n_scaled = (n - mean)/sd
    ) |>
  left_join(
    df_index |>
      select(version_season, castaway_id, index, log_index),
    by = c("version_season", "castaway_id")
  )

# model
mod_brm <- brm(
  winner ~ log_index,
  data = df,
  family = bernoulli(),
  prior = prior(normal(0, 1), class = "b", coef = "log_index")
  )

2. Probability of winning

mod_brm <- brm(
  winner ~ log_index,
  data = df,
  family = bernoulli(),
  prior = prior(normal(0, 1), class = "b", coef = "log_index")
  )

mod_brm
 Family: bernoulli 
  Links: mu = logit 
Formula: winner ~ log_index 
   Data: df (Number of observations: 215) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept    -1.36      0.18    -1.72    -1.02 1.00     3270     2680
log_index     1.18      0.40     0.42     2.00 1.00     2771     2315

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

3. Final 5 model

df_x <- df_index |>
  semi_join(winner, by = c("version_season", "castaway_id"))

mod_f5 <- brm(log_index ~ 1, data = df_x)

mod_f5
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: log_index ~ 1 
   Data: x (Number of observations: 43) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     0.09      0.06    -0.04     0.21 1.00     3093     2616

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.42      0.05     0.34     0.53 1.00     3070     2374

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
# posterior check
pp_check(mod_win)

3. End-of-season model

df_index_win <- map2_dfr(df_vs$season, df_vs$version, ~{
  make_edit_index(.season = .x, .version = .y) |>
    mutate(
      version_season = paste0(.y, str_pad(.x, width = 2, pad = 0)),
      version = .y
      )
  }) |>
  mutate(log_index = log(edit_index))

df_x <- df_index_win |>
  semi_join(winner, by = c("version_season", "castaway_id"))

mod_win <- brm(log_index ~ 1, data = df_x)
mod_win
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: log_index ~ 1 
   Data: df_x (Number of observations: 43) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     0.13      0.06     0.02     0.24 1.00     2951     2492

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.36      0.04     0.29     0.45 1.00     2969     2307

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Follow me on social media: