A while back, I looked into whether or not BIPOC players are disproportionately voted out first. I didn’t find a lot of evidence to support this claim, despite what it may seem.

However, I did find that female players were disproportionately voted out first. BIPOC women were as well although that is likely more due to gender not race/ethnicity. At the merge, it flips and men are more likely to be voted out.

I refreshed the analysis after Season 46 made the merge to see if there had been a change. I’ve expanded it to test the following points:

- Are BIPOC players disproportionately voted out of their original tribe first?
- Are women disproportionately voted out of their original tribe first?
- Are BIPOC women disproportionately voted out of their original tribe first?
- Are white women disproportionally voted out of their original tribe first?

I take a very statistical view here but I think that’s needed to cut through perceptions and confirmation bias.

## TL;DR

The number of first boots is over expectation for both BIPOC and Female cohorts, but far more so for women.

Cohort | Expected | Actual | Difference |
---|---|---|---|

BIPOC | 29 | 33 | 4 |

Female | 45 | 55 | 10 |

Male, BIPOC | 12 | 9 | -3 |

Male, White | 31 | 24 | -7 |

Female, White | 28 | 31 | 3 |

Female, BIPOC | 17 | 24 | 7 |

#### Model results

The model estimates if there is an increase in the probability of being voted out for a certain cohort. When there are equal numbers of BIPOC and other players in the tribe at the first Tribal Council, above 50% means a positive bias. For the splits e.g. BIPOC women, I have assumed 25%. The bands indicate statistical variation and represent the 50%, 80%, and 95% credible intervals.

Both BIPOC and female cohorts have a positive bias. The interval for the BIPOC cohort fairly comfortably contains 50% within the 80% CI, so maybe there is some weak evidence there but, it’s not particularly strong.

The bias estimate for women is clearly higher than 50% so it’s pretty obvious there is a bias here. Women are more likely to be voted out of their tribe first.

#### Are BIPOC players disproportionately voted out of their original tribe first?

Not really.

If there was no bias and perfectly random, out of 82 Tribal Councils that have at least 1 BIPOC player, we would expect 29 BIPOC players to be voted out first and have observed 33.

The bias is estimated to be a +6% chance of being voted out first, but it’s not a huge amount, the 95% CI is (-6%, 17%). The claim that BIPOC players are disproportionately voted out first is not supported by the data.

To put this into perspective a little more, let’s say there are 8 people on the tribe, 4 of which are BIPOC players (50%). The average bias is +6% so the probability a BIPOC player will be voted out is 56%. Therefore each player has a 14% (56%/4) chance of being voted out where equal chance is 12.5%, just a +1.5% increase per person.

#### Are women disproportionately voted out of their original tribe first?

Yes.

If there was no bias we could expect 45 first boots to be female but we have seen 55. If it were completely random the probability of seeing 55 boots is about 1%.

The bias is estimated to be a +13% chance of being voted out first and is significant in this case. The 95% CI is (3%, 24%).

#### Are BIPOC women disproportionately voted out of their original tribe first?

Yes, but likely more due to gender than race/ethnicity.

If there was no bias we could expect to see 17 but there have been 24. The probability of seeing 24 is about 2%. The bias is estimated to be a +11% chance of being voted out first with a 95% CI of (0%, 24%). It’s wide due to the lower numbers, but from the above, we can say that it’s primarily due to gender not race.

#### Are white women disproportionately voted out of their original tribe first?

Actually, no.

We would expect to see 28 but there have been 31. The bias is estimated to be +4% of being voted out first with a 95% of (-5%, 13%). This is pretty interesting, gender is clearly the strongest factor (out of gender and race/ethnicity), but particularly from S42 the first boots have been primarily BIPOC women. This suggests that BIPOC women contributing more than their fair share to first boots.

## Summary

These results are similar to what I found last time. The number of BIPOC first boots is above average but only by 4, which seems like a considerable increase above what was expected, but it could also happen randomly. After many more seasons something may emerge, but at this stage claiming that BIPOC players are voted out first isn’t supported by the data. Although, I don’t think it’s as clear-cut as that.

Gender bias is present and more of a factor than race/ethnicity. The data shows that women are more likely to be voted out of the tribe first with a +13% increase in probability, and easily above equal probability.

What’s important here is that while women are disproportionately voted out, BIPOC women make up the most of that imbalance. When modeling white women and BIPOC women independently white women have a smaller bias and, to be fair, within reasonable variation. The bias for BIPOC women is about 2.5x higher.

There could be other points for consideration. There was a post comparing votes for BIPOC players and other players. I’ll look into that next to see if it holds water (I have a lot to say about that post, to be honest). In a later post, I’ll also consider age as a contributing factor.

Again, I’m not saying that (subconscious) bias doesn’t exist towards BIPOC players in Survivor, just that it’s not measurable in the data point of who is voted out first.

## Analysis

How I arrived at this position is important, so we get into the weeds a bit (by a bit I mean a lot) here about how I conducted the analysis. It’s important because I consider a few things that are often overlooked.

I looked at it in 3 ways:

- Bayesian model
- Simulation model
- Regression model (there are issues with this though. I’ll explain.)

All code and results are contained in the post so you can reproduce the analysis.

### Data setup and considerations

There are quite a few things to consider to set up the data correctly:

- Who is considered BIPOC? In the survivoR package, I record anyone as BIPOC if they are listed as African-American/Canadian, Asian-American/Canadian, Latin-American, or Native American on the Survivor Wiki. I don’t make assumptions about someone’s identity.
- I only consider the first time an original tribe goes to Tribal and votes for the first time. There could be players that didn’t go to Tribal until either the Merge or before they swapped tribes. They are removed from the analysis because I want to control for any other sources of variation. This leaves 88 Tribal Councils for the analysis. Still a good amount for this analysis.
- I only consider the first true vote out. That means if a player quits in the first Tribal e.g. Hannah in S45, or when Jonny Fairplay asked to be voted out in S16, I don’t consider that the first Tribal, similarly I remove medically evacuated players. I want to make sure I only include the first true
*target*for each tribe. - To calculate the probability of someone being voted out I consider only those eligible to be voted out. For example, if someone has individual immunity for the first Tribal, or safety without power, they are removed from the analysis.
- I ensure the makeup of the tribe is accounted for. This is the single most important consideration and what other analyses overlook. More below in the analysis.
- I have included players who have played multiple times. Even though in seasons where there are returning players and newbies, returning players already have a target on their back. This is mostly because it would remove too many tribals.

The tricky thing with this problem is that every first tribe has a different number of BIPOC / other players. The proportion ranges from 0-1 in the 88 tribes that attend Tribal Council across the 46 seasons. On average 25% of the castaways are BIPOC in a tribe. If it was the same every Tribal there wouldn’t be an issue but this is an important point that can’t be ignored. More below.

### Bayesian model

I want to directly estimate the bias with this model. If bias is present and BIPOC players are more likely to be voted out, the probability should be some factor above equal probability.

The way I’ve formulated it is as follows:

where is the probability of a BIPOC player being voted out first. takes the log odds of the probability and adds , the bias term. is a hierarchical term across all the seasons whereas the other parameters are different for each season.

Under this model, we can let be normally distributed and unconstrained to estimate the bias. I’ve used a prior of . On the log scale, it’s hard to interpret, but essentially if this would equate to a bias of +0.12 which I think is fair.

Code: Bayesian data analysis

```
# set up ------------------------------------------------------------------
no_quitters <- survivoR::castaways |>
filter(
version == "US",
str_detect(result, "voted"),
!(castaway == "Jonny Fairplay" & version_season == "US16")
) |>
distinct(version_season, castaway, castaway_id)
demogs <- survivoR::castaway_details |>
select(castaway_id, gender, bipoc, race, ethnicity)
tribe_size <- survivoR::boot_mapping |>
filter(order == 0) |>
count(version_season, tribe)
log_odds <- function(x) {
log(x/(1-x))
}
log_odds_inv <- function(x) {
1/(1+exp(-x))
}
p_adj <- function(p, bias) {
log_odds_inv(log_odds(p)+bias)
}
summarise_quantiles <- function(df, x) {
x <- enquo(x)
df |>
summarise(
q2.5 = quantile(!!x, 0.025),
q10 = quantile(!!x, 0.1),
q25 = quantile(!!x, 0.25),
q50 = quantile(!!x, 0.5),
q75 = quantile(!!x, 0.75),
q90 = quantile(!!x, 0.90),
q97.5 = quantile(!!x, 0.975),
mean = mean(!!x),
sd = sd(!!x)
)
}
levels <- c("bipoc", "not_bipoc", "female", "male", "female_bipoc", "male_bipoc",
"female_not_bipoc", "male_not_bipoc")
df_labs <- tribble(
~var, ~lab_text,
"bipoc", "BIPOC",
"female", "Female",
"female_bipoc", "Female, BIPOC",
"female_not_bipoc", "Female, White",
"male_bipoc", "Male, BIPOC",
"male_not_bipoc", "Male, White"
) |>
mutate(var = factor(var, levels = levels))
# first boots -------------------------------------------------------------
# voted out data frame
df_voted_out <- survivoR::vote_history |>
filter(
version == "US",
tribe_status == "Original"
) |>
distinct(version_season, voted_out, voted_out_id, order, tribe, tribe_status) |>
semi_join(no_quitters, by = c("version_season", "voted_out_id" = "castaway_id")) |>
group_by(version_season, tribe) |>
slice_min(order) |>
left_join(demogs, by = c("voted_out_id" = "castaway_id")) |>
mutate(
bipoc = replace_na(bipoc, FALSE),
not_bipoc = !bipoc,
female = gender == "Female",
male = gender == "Male",
female_bipoc = gender == "Female" & bipoc,
male_bipoc = gender == "Male" & bipoc,
female_not_bipoc = gender == "Female" & !bipoc,
male_not_bipoc = gender == "Male" & !bipoc
) |>
ungroup()
# expected data frame
df_expected <- survivoR::vote_history |>
filter(
version == "US",
tribe_status == "Original",
is.na(immunity) | immunity == "Hidden"
) |>
distinct(version_season, castaway, castaway_id, order, tribe, tribe_status) |>
group_by(version_season, tribe) |>
slice_min(order) |>
left_join(demogs, by = "castaway_id") |>
mutate(
bipoc = replace_na(bipoc, FALSE),
female = gender == "Female",
male = gender == "Male",
female_bipoc = gender == "Female" & bipoc,
male_bipoc = gender == "Male" & bipoc,
female_not_bipoc = gender == "Female" & !bipoc,
male_not_bipoc = gender == "Male" & !bipoc
) |>
group_by(version_season, order, tribe) |>
summarise(
n = n(),
n_bipoc = sum(bipoc),
n_not_bipoc = sum(!bipoc),
n_female = sum(female),
n_male = sum(male),
n_female_bipoc = sum(female_bipoc),
n_male_bipoc = sum(male_bipoc),
n_female_not_bipoc = sum(female_not_bipoc),
n_male_not_bipoc = sum(male_not_bipoc),
.groups = "drop"
) |>
mutate(
p_bipoc = n_bipoc/n,
p_not_bipoc = n_not_bipoc/n,
p_female = n_female/n,
p_male = n_male/n,
p_female_bipoc = n_female_bipoc/n,
p_male_bipoc = n_male_bipoc/n,
p_female_not_bipoc = n_female_not_bipoc/n,
p_male_not_bipoc = n_male_not_bipoc/n
) |>
ungroup()
# summary -----------------------------------------------------------------
# observed
df_obs <- df_voted_out |>
summarise(
bipoc = sum(bipoc),
not_bipoc = sum(not_bipoc),
female = sum(female),
male = sum(male),
female_bipoc = sum(female_bipoc),
male_bipoc = sum(male_bipoc),
female_not_bipoc = sum(female_not_bipoc),
male_not_bipoc = sum(male_not_bipoc)
) |>
pivot_longer(everything(), names_to = "var", values_to = "observed") |>
left_join(
df_expected |>
select(starts_with("p")) |>
summarise_all(~round(sum(.x))) |>
pivot_longer(everything(), names_to = "var", values_to = "expected") |>
mutate(var = str_remove(var, "p_")),
by = "var"
) |>
mutate(
var = factor(var, levels = levels),
res = observed-expected
)
# bayes model -------------------------------------------------------------
library(rstan)
library(tidybayes)
stan_dat <- df_voted_out |>
left_join(
df_expected |>
select(version_season, tribe, p_bipoc, p_not_bipoc, p_female, p_male, p_female_bipoc,
p_male_bipoc, p_female_not_bipoc, p_male_not_bipoc),
by = c("version_season", "tribe")
) |>
transmute(
version_season,
tribe,
y_bipoc = as.numeric(bipoc),
y_not_bipoc = as.numeric(!bipoc),
y_female = as.numeric(gender == "Female"),
y_male = as.numeric(gender == "Male"),
y_female_bipoc = as.numeric(female_bipoc),
y_male_bipoc = as.numeric(male_bipoc),
y_female_not_bipoc = as.numeric(female_not_bipoc),
y_male_not_bipoc = as.numeric(male_not_bipoc),
p_bipoc,
p_not_bipoc,
p_female,
p_male,
p_female_bipoc,
p_male_bipoc,
p_female_not_bipoc,
p_male_not_bipoc
)
stan_dat <- stan_dat |>
select(-starts_with("p")) |>
pivot_longer(starts_with("y"), names_to = "var", values_to = "y") |>
mutate(var = str_remove(var, "y_")) |>
left_join(
stan_dat |>
select(-starts_with("y")) |>
pivot_longer(starts_with("p"), names_to = "var", values_to = "p") |>
mutate(var = str_remove(var, "p_")),
by = c("version_season", "tribe", "var")
) |>
mutate(
log_odds = log(p/(1-p)),
var = factor(var, levels = levels),
mu0 = case_when(
var %in% c("bipoc", "female", "female_not_bipoc", "female_bipoc") ~ 0.5,
var %in% c("not_bipoc", "male", "male_not_bipoc", "male_bipoc") ~ -0.5,
TRUE ~ 0
)
)
stan_code <- "data {
int<lower=0> N;
array[N] int<lower=0, upper=1> y;
array[N] real<lower=0, upper=1> p;
array[N] real log_odds;
real mu0;
}
parameters {
real beta;
}
transformed parameters {
array[N] real<lower=0, upper=1> kappa;
for(k in 1:N) {
kappa[k] = 1/(1+exp(-(log_odds[k] + beta)));
}
}
model {
beta ~ normal(mu0, 1.5);
y ~ bernoulli(kappa);
}"
# compile one for faster fitting
dat <- stan_dat |>
filter(
var == "female",
p > 0,
p < 1
) |>
as.list()
dat$mu0 <- unique(dat$mu0)
dat$N <- length(dat$y)
mod_stan <- stan(
model_code = stan_code,
data = dat
)
# fit models
df_bias <- map_dfr(levels, ~{
dat <- stan_dat |>
filter(
var == .x,
p > 0,
p < 1
) |>
as.list()
dat$mu0 <- unique(dat$mu0)
dat$N <- length(dat$y)
mod_stan <- stan(
model_code = stan_code,
data = dat
)
tibble(
var = .x,
bias = rstan::extract(mod_stan, "beta")$beta
)
}) |>
mutate(var = factor(var, levels = levels)) |>
left_join(
stan_dat |>
group_by(var) |>
summarise(median = median(p)),
by = "var"
)
df_bias_summary <- df_bias |>
group_by(var) |>
summarise_quantiles(bias) |>
mutate(
lab = snakecase::to_title_case(levels) |>
str_replace("Bipoc", "BIPOC")
)
df_bias_summary_p <- df_bias |>
mutate(
p0 = ifelse(var %in% c("female_bipoc", "male_bipoc", "female_not_bipoc", "male_not_bipoc"), 0.25, 0.5),
p = log_odds_inv(log_odds(p0)+bias),
) |>
group_by(var, p0) |>
summarise_quantiles(p) |>
mutate(
pct = glue("{ifelse(q50<p0, '', '+')}{100*round(q50-p0, 2)}%"),
)
```

```
> df_bias_summary
# A tibble: 8 × 11
var q2.5 q10 q25 q50 q75 q90 q97.5 mean sd lab
<fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 bipoc -0.245 -0.0708 0.0768 0.236 0.390 0.533 0.692 0.234 0.238
2 not_bipoc -0.687 -0.532 -0.384 -0.222 -0.0525 0.0908 0.268 -0.218 0.244
3 female 0.0987 0.247 0.369 0.525 0.691 0.832 0.997 0.533 0.232
4 male -0.967 -0.817 -0.678 -0.525 -0.374 -0.237 -0.0831 -0.526 0.226
5 female_bipoc 0.00560 0.178 0.342 0.513 0.699 0.861 1.04 0.517 0.263
6 male_bipoc -1.21 -0.929 -0.698 -0.443 -0.204 -0.00400 0.211 -0.458 0.366
7 female_not_bipoc -0.288 -0.127 0.0177 0.183 0.341 0.482 0.627 0.179 0.238
8 male_not_bipoc -0.932 -0.755 -0.588 -0.413 -0.240 -0.0944 0.0718 -0.416 0.258
```

For BIPOC players, the bias term CI includes 0 fairly comfortably so I’d say that they are not voted out first any more than other players, at least not enough evidence to confirm that they are. It is also lower than my expectations.

Women are voted out first more often than male players. The 95% CI doesn’t include 0 and clearly different. BIPOC women players are similar. From this it should be clear that it’s more due to gender than race/ethnicity.

### Simulation

The Bayesian analysis showed us what we need to know, but I wanted to look at this another way as well. I’ve also fit a simulation model and looked at the probability distribution. If it was completely random how many BIPOC players can we expect to be voted out first?

I took 4,000 random draws from each of the first Tribal Councils and counted how many times a BIPOC, female, or female BIPOC castaway was voted out. Below are the probability distributions under the assumption of perfect randomness. Each bar represents the likelihood of observing that many boots out of the 88 Tribal councils.

For example, there have been 33 BIPOC players booted from the first Tribal Council, and under perfect randomness, we would expect 29. But the distribution shows we can reasonably expect somewhere between 22-37, so 33 is on the upper end but isn’t particularly unusual.

We have seen 55 female castaways booted first where would expect 45. This is right in the tail of the distribution. There’s only a 1% chance that we should see 55 or more female first boots which means there’s probably something here, there’s a preference to vote women out first.

Code: Simulation

```
# number of sims
n_sims <- 4000
levels <- c("bipoc", "female", "female_bipoc", "female_not_bipoc")
df_sim0 <- map_dfr(1:n_sims, ~{
df_expected |>
mutate(sim = .x)
}) |>
mutate(
bipoc = rbernoulli(n(), p_bipoc),
female = rbernoulli(n(), p_female),
female_bipoc = rbernoulli(n(), p_female_bipoc),
female_not_bipoc = rbernoulli(n(), p_female_not_bipoc),
male_bipoc = rbernoulli(n(), p_male_bipoc),
male_not_bipoc = rbernoulli(n(), p_male_not_bipoc)
) |>
group_by(sim) |>
summarise(
bipoc = sum(bipoc),
female = sum(female),
female_bipoc = sum(female_bipoc),
female_not_bipoc = sum(female_not_bipoc),
male_bipoc = sum(male_bipoc),
male_not_bipoc = sum(male_not_bipoc)
) |>
pivot_longer(-sim, names_to = "var", values_to = "y") |>
filter(var %in% levels)
df_ci <- df_sim0 |>
group_by(var) |>
summarise_quantiles(y)
df_sim <- df_sim0 |>
count(var, y)
```

I’ve only included those with a positive bias in the chart.

### Regression Model

The final way I’ll look at this is by fitting a basic regression model. This is a bad model choice, to be honest for reasons I’ll explain.

For this model to work the model data frame needs to be at the person level. The response is either 0 or 1 if the person voted out. The predictors are BIPOC (yes, no) and gender (male, female).

The issue with this model is each observation is assumed to be independent meaning that whether or not the person is voted out is only dependent on the person’s characteristics and independent from all other people. But, that doesn’t hold. There is only one person eliminated per Tribal Council. That means whoever is voted out of the tribe first means all the others can’t be voted out. Independence only holds between Tribals Councils.

That’s really important to understand because if you were to predict who will be voted out the model may spit out multiple people going home which is dumb. It’s also important to consider what that means for the coefficients of the model. What’s going to happen is that the dependent relationship is going to change the variance depending on the proportions within the tribe and the effect is going to be averaged across the seasons. This could be misleading.

You need to be really careful when interpreting the output under these conditions. I’m doing this anyway because I’m mainly interested in if it’s drastically different from the above.

The convenient thing about the regression model is comparing the coefficients of gender and BIPOC status. Even with the issues of dependence, we can compare the magnitude of both to see which has the strongest influence. You still have to be careful though.

Given the analysis above, gender should be the stronger predictor. If that’s true, I rest my case.

Code: Regression

```
# regression --------------------------------------------------------------
diverse_tribes <- df_expected |>
filter(
p_bipoc > 0,
p_bipoc < 1
) |>
distinct(version_season, tribe)
df_mod <- survivoR::vote_history |>
semi_join(diverse_tribes, by = c("version_season", "tribe")) |>
filter(
version == "US",
tribe_status == "Original",
is.na(immunity) | immunity == "Hidden"
) |>
distinct(version_season, castaway, castaway_id, voted_out, voted_out_id, order, tribe, tribe_status) |>
semi_join(no_quitters, by = c("version_season", "voted_out_id" = "castaway_id")) |>
mutate(voted_out = as.numeric(voted_out == castaway)) |>
group_by(version_season, tribe) |>
slice_min(order) |>
left_join(demogs, by = "castaway_id") |>
mutate(bipoc = replace_na(bipoc, FALSE)) |>
filter(gender != "Non-binary")
mod <- glm(voted_out ~ gender + bipoc, data = df_mod, family = binomial(link='logit'))
summary(mod)
```

```
> summary(mod)
Call:
glm(formula = voted_out ~ gender + bipoc, family = binomial(link = "logit"),
data = df_mod)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.7650 0.1816 -9.721 <2e-16 ***
genderMale -0.5869 0.2454 -2.392 0.0168 *
bipocTRUE 0.2285 0.2473 0.924 0.3555
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 494.06 on 654 degrees of freedom
Residual deviance: 486.77 on 652 degrees of freedom
AIC: 492.77
Number of Fisher Scoring iterations: 5
```

I rest my case.

Gender is clearly more influential than BIPOC status. If I was a frequentist, I’d be removing BIPOC from the model as it’s not significant.

## Not all research agrees

I didn’t want to talk about this, but here we go. The paper titled ‘Surviving Racism and Sexism: What Votes in the Television Program Survivor Reveal about Discrimination‘ came out after my original post. The analysis looks at if BIPOC and female contestants are disproportionately voted out as well as trends at other stages of the game.

It shows that women are disproportionately voted out first, as I’ve shown above. However, it claims that BIPOC players are also more likely to be the target and disproportionately voted out: *“Compared to White contestants, BIPOC contestants had 51% higher odds of being voted out of their tribe first, (1, N=731)=4.59, p=.032, OR=1.51, 95% CI [1.03–2.19]”*. This is counter to what I have shown in the analysis.

They have used a logistic regression model at the person level for all 731 castaways in seasons 1-40. As I’ve shown above, this is not a good model for the problem. Even with the issues of fitting a regression model to this data I don’t see anything close to 51% higher odds.

I suspect there are also differences in how the data was set up. There’s not a lot of discussion about the data considerations before modeling as I’ve done, e.g. only using the original tribes, and removing ineligible castaways.

They have used the Survivor WIki for labeling race/ethnicity so that should be consistent but there could be differences in which race/ethnicities are included.

I’ve curated the data to only those eligible to be voted out and the first true vote for a tribe using 46 seasons. Some tribes/castaways are removed because they didn’t go to Tribal Council before a swap. This leaves 655 castaways over 46 seasons.

They have only kept diverse tribes (although N=731 from above so I’m not so sure about that). A tribe that consists of entirely BIPOC players like the Manihiki tribe in Cook Islands, only has one choice so is removed. This is an important consideration and should be removed but the same logic should extend to all tribe makeups. The probability of voting out a BIPOC player with there are 5/6 in a tribe is much higher than if there is only 1/10.

This imbalance alters the model outcome and I believe is the heart of the issue and why the paper probably made some incorrect conclusions. I’ll explain.

## Damned statistics

To demonstrate why this is important, I’ll make up a toy example.

Let’s assume 50 tribes went to Tribal Council. Each tribe has 1 BIPOC and 9 white players. In total, there are 500 players – 50 BIPOC and 450 white.

Let’s also assume there is no bias and everyone has a 1/10 chance of being voted out. Then we expect to see 5 BIPOC players and 45 white players voted out first. We’ll put that into a 2×2.

```
x1 <- matrix(c(405, 45, 45, 5), nrow = 2, dimnames = list(c("White", "BIPOC"), c("No", "Yes")))
> x1
No Yes
White 405 45
BIPOC 45 5
```

I’ll fit a Chi-squared test to see if there is an association between race and being voted out first.

```
> chisq.test(x1)
Pearson's Chi-squared test
data: x1
X-squared = 0, df = 1, p-value = 1
```

The p-value is 1 because we’ve assumed equal probability of being voted out first. Makes perfect sense.

Let’s choose another example, 40 Tribals, and each tribe has 3 BIPOC players and 1 White player. That’s 120 BIPOC players and 40 White players in total. Again, let’s assume no bias and equal probability of being voted out of 1/4. Then we expect 30 BIPOC players and 10 white players voted out. I’ll put that into a 2×2 and run a Chi-squared test.

```
x2 <- matrix(c(30, 90, 10, 30), nrow = 2, dimnames = list(c("White", "BIPOC"), c("No", "Yes")))
> x2
No Yes
White 30 10
BIPOC 90 30
> chisq.test(x2)
Pearson's Chi-squared test
data: x2
X-squared = 0, df = 1, p-value = 1
```

No surprises to anyone, there’s no association.

Now what if we joined them together i.e. ? There would be 90 Tribal Councils, 490 White players, and 170 BIPOC players, 55 and 35 voted out respectively. Again, I’ll run a Chi-squared test for an association.

```
> x <- x1 + x2
> x
No Yes
White 435 55
BIPOC 135 35
> chisq.test(x)
Pearson's Chi-squared test with Yates' continuity correction
data: x
X-squared = 8.6183, df = 1, p-value = 0.003328
```

Now the test is highly significant! We would confirm without a doubt that there IS an association between race and being voted out first.

But we know this isn’t correct because we specifically assumed equal probability of being voted out. We set the data up with this exact property and the individual tests produced a p-value of 1.

So, why is combining them suddenly showing an association when there is not? It’s because each observation is assumed to be iid – independent and identically distributed. But, the observations are not independent. In a single Tribal Council if one person gets voted out it means the others can’t be voted out. There is a dependency within each Tribal Council so the makeup of the tribe matters. The model doesn’t understand this though.

Each Tribal Council IS independent since there is no interaction between the votes in one Tribal and the votes in another. That’s what the observation should be, the Tribal Council, not the player.

This is accounted for in the Bayesian model and the simulation, but not the regression model since it is at the person level.

I hope this makes sense because by ignoring this property you could be making conclusions about an association when there is none, which is what I think may have happened.

## What can we do about the bias though?

To call a spade a spade, a group of people go to Tribal Council to vote someone out. Bias enters the game when humans group together and make decisions about who to vote out, which is the very essence of the game. Maybe, that’s what needs to change? To make the game fairer perhaps more elements of chance need to be introduced. Perhaps they need to remove players’ votes, force people to rely on their social game and make true, meaningful connections.

I can’t imagine the fandom getting behind any major change though. They are a pretty conservative bunch and resist almost any change made to the game – final 4 fire, the 3 tribe set up, 26 days, new advantages, more than one hidden immunity idol, moral dilemmas, Summit journeys, the rice negotiation, beware advantages, fake idol kits, shot in the dark…. pretty much everything. Now in the new era, players can lose their vote and they fucking hate it.

So, it’s pretty funny reading unhinged posts like ‘Man who manipulates Survivor’s game cannot imagine adjusting to make it fair‘ because a) maybe production is adjusting it? And b) I can’t imagine any change aimed at making it ‘fairer’ that would receive unanimous approval, particularly when that would probably mean removing human decisions or introducing a new mechanic. The sentiment tends to be ‘if it ain’t broke, don’t fix it, except when the person I liked gets voted out’.

Follow me on social media: