Racial Bias in Survivor: Are People of Colour Disproportionately Voted Out First?

After episode 9 of season 42 of Survivor, there was a lot of discussion of implicit racial bias in Survivor. Race is a very complex issue and I’m not going to comment more generally on race here. It’s not my place.

Discussions on the issue have made many claims to support the idea. For example, the thread below asks “why are first boots so often POC, and specifically WOC?“. I have heard this a lot, and to be fair I haven’t seen any evidence to support this other than conjecture.

While I’m not going to comment on race specifically, I am going to check if the claims made are factual. In particular, I’m going to check if:

  • People of colour (PoC) are voted out of the game first, more often than white players.
  • Women of colour (WoC) are voted out of the game first, more often than other demographics

I’ll be taking a purely statistical view rather than addressing individual motives. A statistical view is needed here to see if the claims stack up and put them into perspective.

The first boot

The discussions I’ve seen arguing for racial bias in Survivor have commonly compared the first boots to the overall proportion of PoC in the game. This misses a key component which is the makeup of the first tribe that goes to Tribal Council.

The number of PoC in the tribe at the first Tribal is key to understanding if we expect a PoC to be voted out first. For example, in season 13 Cook Islands, the first tribe to go to Tribal Council was the African-American Manihiki tribe. In this case, there was a 100% chance a Black person would be first voted out in Cook Islands. In season 11 Guatemala, there were no PoC on the Nakum tribe that went to Tribal first. So, a 100% chance a White person would be voted out first in Guatemala.

Assume the first tribe to go to Tribal Council was made up of 5 White players and 1 Black player. If left entirely to chance, by drawing rocks say, there would be a 1/6 chance the Black player would be going home. The same chance as rolling a 6 on a die. However, from what the discussions are saying, there is a bias toward voting out the Black player. The probability would in effect be greater than 1/6 in that case.

To check if this is a fair claim I want to look at the makeup of each tribe that first attended Tribal Council across the 42 seasons. Historically it has been approximately 25-40% PoC. There are some spikes in some seasons and there seems to be an increasing trend as there is more balance in diversity in the later seasons.

The proportion of PoC that attended the first tribal council
The percentage of PoC attending the first Tribal Council across 42 seasons

What is the expected number of PoC in the first boot?

Let’s assume that we could replay all of the first Tribal Councils. This time there are no votes, instead, they go straight to rocks. It would be completely random. So what is the expected value? How many PoC should we expect to be voted out first when left entirely to chance?

DemographicActual number of bootsExpected number of boots
(random chance)
Male1521
Female2721
White2829
PoC1413
WoC108
Black78
The actual number observed over the 42 seasons and expected value of first boots for each selected demographic groups

Over the 42 seasons, there have been 14 PoC voted out first. If left entirely to chance we would expect 13. This is only one more person than expected and statistically not large enough for us to believe it’s anything more than random chance. The probability of there being 14 PoC players voted out first, assuming they draw rocks, is more likely than flipping a coin twice and getting two heads in a row.

For Black players, the expectation is that 8 people would have been voted out first however, we have only observed 7 in the 42 seasons. This isn’t statistically important either and is in line with expectations.

The most important result is that we have only seen 15 men booted out first but have seen 27 women. The expected value for both men and women is 21. This is a big difference and suggests that there is definitely an inclination to vote women out first.

Getting back to the claim first boots are often PoC and specifically WoC. The expected value for WoC first boots is 8 but we have seen 10. This is about the same chance as rolling a 6 on a die. But from what we have just seen this difference is more likely due to gender rather than race.

This scenario was simulated 20,000 times. The chart shows the probability distribution of the number of people voted out in the first tribal for each demographic group we are testing. The grey bars describe the probability distribution whereas the teal bar is the actual number of boots.

The number of boots for the first tribal council
Grey bars describe the probability distribution. The teal bar is the actual number observed in 42 seasons. The closer the teal bar is to the center and the larger it is, the closer it is to random chance

The chart shows that there is a big difference between men and women. The actual number of boots are in the tails of the distribution meaning it is unlikely to occur by chance alone. PoC is one step to the right of the center of the curve and well within the realm of random chance. Similarly for the Black cohort but left of center.

What about including every tribe’s first Tribal Council?

I’ll extend the analysis to include the first time each starting tribe fronts Tribal Council for the first time. Now we are not specifically talking about the first boot from the game, but rather the first boot from each tribe. For this, I’ll be excluding tribes that don’t attend Tribal until after the tribe swap. The swap creates a dynamic that is difficult to control.

DemographicActual number of bootsExpected number of boots
(random chance)
Male3140
Female5041
White5557
PoC2624
WoC1814
Black1112
The actual number observed over the 42 seasons and expected value of first boots from each tribe for each selected demographic groups

This reinforces the insights from above that gender is by far the most significant factor. Again the observed number of PoC boots is just right of center but very much in the middle of the curve and therefore consistent with random chance. WoC sits at 18 boots which is 4 people more than we would expect, but again this is due to gender more so than race.

The number of boots at the first tribal council for each original tribe
Grey bars describe the probability distribution. The teal bar is the actual number observed in 42 seasons. The closer the teal bar is to the center and the larger it is, the closer it is to random chance

What about the first Tribal after the merge?

This is a departure from the claim but worthwhile looking into. At the merge, the game is effectively turned on its head and strategies change. The data shows that men now become the targets rather than the women. However, the actual value is only 2 people more than the expected value. While there has been a dramatic shift it is now closer to being random after the merge than before. But it’s definitely clear that the focus has shifted from women to men. PoC, WoC, and Black players are now all below expectation.

DemographicActual number of bootsExpected number of boots
(random chance)
Male2523
Female1719
White3432
PoC811
WoC25
Black45
The actual number observed over the 42 seasons and expected value of first boots for each selected demographic groups
The number of boots at the first tribal council after the merge
Grey bars describe the probability distribution. The teal bar is the actual number observed in 42 seasons. The closer the teal bar is to the center and the larger it is, the closer it is to random chance

Closing thoughts

Are people of colour disproportionately voted out first in Survivor? The data suggests they are not.

The claim that ‘PoC are voted out first and specifically WoC’ doesn’t really stack up. If all of the first tribal councils were replayed and they all drew rocks the result would be very similar to what we have seen with respect to race.

There does appear to be a clear difference between men and women, where women are more frequently targeted before the merge. Men are targeted more frequently after the merge.

Any difference observed for WoC is more due to gender than race. The point on gender is genuinely an interesting find and worth investigating further.

I have to stress, that I am not saying that based on these results there is no racial bias in Survivor. I’m not saying that. But I am saying that the specific claim isn’t true and probably shouldn’t be used as evidence for racial bias in Survivor, just as much as it shouldn’t be used as evidence against racial bias in Survivor.

I hope I speak for everyone when saying that it is great seeing more diversity in the new era of Survivor and I hope it continues.

Code

All of my data and code is open source and free to use. So, here it is. If you use it and conduct your own analysis please share it with me, I’m interested to see what you find.

# install from git
install_github('doehm/survivoR')

# load libraries
library(survivoR)
library(tidyverse)

col_actual <-"#35b0ab"

expected_values <- function(stage_of_game, include_tribe) {

  tm <- survivoR::tribe_mapping |>
    distinct(season, tribe_status, tribe, castaway_id) |>
    filter(tribe_status == stage_of_game)

  if(include_tribe) {
    gby <- function(data) group_by(data, season, tribe)
  } else {
    gby <- function(data) group_by(data, season)
  }

  prob_df <- survivoR::vote_history |>
    left_join(
      survivoR::castaway_details |>
        select(castaway_id, poc, race, gender),
      by = "castaway_id"
    ) |>
    left_join(
      survivoR::castaway_details |>
        select(castaway_id, poc_voted_out = poc, race_voted_out = race, gender_voted_out = gender),
      by = c("voted_out_id" = "castaway_id")
    ) |>
    mutate(black_voted_out = replace_na(race_voted_out, "White")) |>
    filter(tribe_status == stage_of_game) |>
    left_join(tm, by = c("castaway_id", "season", "tribe_status")) |>
    gby() |>
    slice_min(order) |>
    distinct(castaway_id, poc, race, gender, poc_voted_out, black_voted_out, gender_voted_out) |>
    summarise(
      n_poc = sum(poc == "POC", na.rm = TRUE),
      n_black = sum(race == "Black", na.rm = TRUE),
      n_woc = sum(poc == "POC" & gender == "Female", na.rm = TRUE),
      n_female = sum(gender == "Female", na.rm = TRUE),
      n_male = sum(gender == "Male", na.rm = TRUE),
      n_white = sum(poc == "White", na.rm = TRUE),
      n_poc_voted_out = unique(poc_voted_out),
      n_black_voted_out = unique(black_voted_out, na.rm = TRUE),
      n_gender_voted_out = unique(gender_voted_out, na.rm = TRUE),
      n = n_distinct(castaway_id)
    ) |>
    mutate(
      p_poc = n_poc/n,
      p_black = n_black/n,
      p_woc = n_woc/n,
      p_female = n_female/n,
      p_male = n_male/n,
      p_white = n_white/n
    ) |>
    ungroup()

  results <- prob_df |>
    summarise(
      exp_poc = sum(p_poc),
      n_poc = sum(n_poc_voted_out == "POC", na.rm = TRUE),
      exp_black = sum(p_black),
      n_black = sum(n_black_voted_out == "Black", na.rm = TRUE),
      exp_woc = sum(p_woc),
      n_woc = sum(n_gender_voted_out == "Female" & n_poc_voted_out == "POC", na.rm = TRUE),
      exp_female = sum(p_female),
      n_female = sum(n_gender_voted_out == "Female", na.rm = TRUE),
      exp_male = sum(p_male),
      n_male = sum(n_gender_voted_out == "Male", na.rm = TRUE),
      exp_white = sum(p_white),
      n_white = sum(n_poc_voted_out == "White", na.rm = TRUE)
    )

    out <- results |>
      select(contains("n_")) |>
      pivot_longer(everything(), names_to = "group", values_to = "observed") |>
      arrange(group) |>
      mutate(group = c("Black", "Female", "Male", "POC", "White", "WOC")) |>
      left_join(
        results |>
          select(contains("exp_")) |>
          pivot_longer(everything(), names_to = "group", values_to = "expected") |>
          arrange(group) |>
          mutate(group = c("Black", "Female", "Male", "POC", "White", "WOC")),
        by = "group"
      )

  list(
    prob_df = prob_df,
    results = out,
    stage_of_game= stage_of_game,
    include_tribe = include_tribe
  )

}

# charts ------------------------------------------------------------------

plot_sims <- function(data) {
  df_results <- data$results

  vars <- c("p_male", "p_female", "p_white", "p_poc", "p_woc", "p_black")
  names <- c("Male", "Female", "White", "POC", "WOC", "Black")
  tbl_ls <- map(vars, ~{
    p <- data$prob_df[[.x]]
    sim <- map_dbl(1:20000, ~{
      sum(rbernoulli(length(p), p))
    })
    tibble(
      x = as.numeric(names(table(sim))),
      n = table(sim)
    )
  }) |>
    set_names(names)

  df <- tbl_ls|>
    bind_rows(.id = "group") |>
    left_join(df_results, by = "group") |>
    mutate(group = factor(group, levels = names))

df |>
    ggplot(aes(x, n, fill = x == observed)) +
    geom_col(position = position_dodge2(width = 0.9, preserve = "single")) +
    geom_text(aes(x = x, y = n+50, label = x), filter(df, x == observed), vjust = 0, size = 12) +
    facet_wrap(~group, nrow = length(tbl_ls), strip.position = "left") +
    scale_x_continuous(breaks = seq(5, 80, 5), labels = seq(5, 80, 5)) +
    scale_fill_manual(values = c("grey80", col_actual), breaks = c(FALSE, TRUE), labels = c("Random\ndraws", "Actual\nvalue")) +
    coord_cartesian(clip = "off") +
    labs(
      x = "Number voted out in cohort",
      title = title,
      subtitle = subtitle
      ) +
    theme_minimal() +
    theme(
      text = element_text(size = 40),
      plot.title = element_text(face = "bold", size = 48),
      plot.subtitle = element_text(lineheight = 0.3),
      axis.text.y = element_blank(),
      axis.title.y = element_blank(),
      axis.text.x = element_text(margin = margin(t = 5, b = 5)),
      strip.text = element_text(colour = "white", face = "bold"),
      strip.background = element_rect(fill = dark),
      legend.position = "right",
      legend.title = element_blank(),
      legend.text = element_text(lineheight = 0.25, margin = margin(b = 5))
    ) +
    ggsave(glue("images/random/race/hist_{data$stage_of_game}_{data$include_tribe}.png"), height = 10, width = 8)
}

# run sims ----------------------------------------------------------------
# first boot
dat <- expected_values("Original", FALSE)
plot_sims(dat)

# first boot from each tribe
dat <- expected_values("Original", TRUE)
plot_sims(dat)

# first boot after merge
dat <- expected_values("Merged", FALSE)
plot_sims(dat)
Follow me on social media: