Alone R package: Datasets from the survival TV series

I have been watching the survival TV series ‘Alone,’ where 10 survivalists are dropped in an extremely remote area and must fend for themselves. I am super impressed by their skills, endurance, and mental fortitude. To last 100 days in the Arctic winter living off the land is truly impressive.

True to form, I’ve collected the data and I am sharing it here in the {alone} R package.

It is a collection of datasets about the TV series in a tidy format. Included in the package are 4 datasets

  • survivalists
  • loadouts
  • episodes
  • seasons

For non-Rstats users here is the link to the Google sheets doc.

Installation

Install from CRAN:

install.packages("alone")

Install from Github:

devtools::install_github("doehm/alone")

Datasets

survivalists

A data frame of survivalists across all 9 seasons detailing name and demographics, location and profession, result, days lasted, reasons for tapping out (detailed and categorised), and page URL.

Dataset features:

  • season: The season number
  • name: Name of the survivalist
  • age: Age of survivalist
  • gender: Gender
  • city: City
  • state: State
  • country: Country
  • result: Place the survivalist finished in the season
  • days_lasted: The number of days lasted in the game before tapping out or winning
  • medically_evacuated: Logical. If the survivalist was medically evacuated from the game
  • reason_tapped_out: The reason the survivalist tapped out of the game. NA means they were the winner. Reason being that technically if they won they never tapped out.
  • reason_category: A simplified category of the reason for tapping out
  • team: The team they were associated with (only for season 4)
  • day_linked_up: Day the team members linked up (only for season 4)
  • profession: Profession
  • url: URL of cast page on the history channel website. Prefix URL with https://www.history.com/shows/alone/cast

As an example, use this dataset to compare the number of days survived for both men and women.

library(tidyverse)

df <- expand_grid(
  days_lasted = 0:max(survivalists$days_lasted),
  gender = unique(survivalists$gender)
) |> 
  left_join(
    survivalists |> 
      count(days_lasted, gender),
    by = c("days_lasted", "gender")
  ) |> 
  left_join(
    survivalists |> 
      count(gender, name = "N"),
    by = "gender"
  ) |> 
  group_by(gender) |> 
  mutate(
    n = replace_na(n, 0),
    n_lasted = N-cumsum(n),
    p = n_lasted/N
  ) 

# Kaplan-Meier survival curves
# code is simplified and plot won't match below
df |> 
  ggplot(aes(days_lasted, p, colour = gender)) +
  geom_line() 

# boxplots
survivalists |> 
  ggplot(aes(days_lasted, fill = gender)) +
  geom_boxplot(alpha = 0.5) +
  geom_jitter(width = 0.2, pch = 1, size = 3) +
  theme_minimal()

While there is yet to be a female winner, there is some evidence to suggest that women, on average, survive longer than men. Although, we should investigate this further since in the first season there are a lot of early taps and no women, plus the winners should be treated as censored data.

The full code to reproduce the above plot is found here.

loadouts

The rules allow each survivalist to take 10 items with them. This dataset includes information on each survivalist’s loadout. It has detailed item descriptions and a simplified version for easier aggregation and analysis.

Dataset features:

  • version: Country code for the version of the show
  • season: The season number
  • name: Name of the survivalist
  • item_number: Item number
  • item_detailed: Detailed loadout item description
  • item: Loadout item. Simplified for aggregation
library(forcats)

loadouts |>
  count(item) |>
  mutate(item = fct_reorder(item, n, max)) |>
  ggplot(aes(item, n)) +
  geom_col() +
  geom_text(aes(item, n + 3, label = n), family = ft, size = 12, colour = txt) +
  coord_flip()

episodes

This dataset contains details of each episode including the title, number of viewers, beginning quote, and IMDb rating. New episodes are added at the end of future seasons.

Dataset features:

  • version: Country code for the version of the show
  • season: The season number
  • episode_number_overall: Episode number across seasons
  • episode: Episode number
  • title: Episode title
  • air_date: Date the episode originally aired
  • viewers: Number of viewers in the US (millions)
  • quote: The beginning quote
  • author: Author of the beginning quote
  • imdb_rating: IMDb rating of the episode
  • n_ratings: Number of ratings given for the episode

seasons

The season summary dataset includes location, latitude and longitude, and other season-level information. It includes the date of drop-off where the information exists.

Dataset features:

  • version: Country code for the version of the show
  • season: The season number
  • location: Location
  • country: Country
  • n_survivors: Number of survivalists in the season. In season 4 there were 7 teams of 2.
  • lat: Latitude
  • lon: Longitude
  • date_drop_off: The date the survivalists were dropped off

References

If there is any data you would like to include please get in touch.

  1. History: https://www.history.com/shows/alone/cast
  2. Wikipedia: https://en.wikipedia.org/wiki/Alone_(TV_series)
  3. Wikipedia (episodes): https://en.wikipedia.org/wiki/List_of_Alone_episodes#Season_1_(2015)_-_Vancouver_Island
Follow me on social media:

Leave a Reply

Your email address will not be published. Required fields are marked *