I’m happy to announce that survivoR v1.0 is now on CRAN. The package now contains all the features intended for the first major release. A big thank you to Carly Levitz for helping collate and test the data.
This post details the major updates since v0.9.12. For a complete list of tables and features of the package please visit the Github page.
To jump right into it you can install the package with
Or from Git with
If you find an issues please raise them on Github and I’ll correct them asap. For updates feel free to follow myself and Carly on Twitter.
This release features new datasets, additional fields on existing tables, and the removal of unused or redundant features.
advantage_details– Details of each advantage found and used across all seasons
advantage_movement– Details the movement of each advantage and when each advantage is played including hidden immunity idols
boot_mapping– A mapping table for the stage of the game referenced by the number of boots there have been
tribe– The name of the tribe that attended Tribal Council.
vote_event– To identify other events that can occur at Tribal Council e.g. castaway played the Shot-in-the-Dark.
split_vote– If there was a split vote orchestrated to flush an idol this identifies who the votes were split across and who was involved with the strategy.
- tie – A logical field to identify if the vote resulted in a tie.
order– The boot order references how many boots there have been in the game so far. This is to map to the
imdb_rating– The IMDb rating for the episode. Given these are user ratings they may change over time. With each new release, the ratings will be updated however only minor changes are expected for the most recent season.
confessionals– Double episodes have been collapsed to ensure alignment with episodes on all other tables. This will impact mean confessionals per episode calculations but has a more consistent and convenient structure. Recap episodes are also accounted for.
All advantages and hidden immunity idols found across all seasons are captured in these two tables. The tables map to each other by
advantage_id and detail the life of each advantage in tidy format.
This dataset lists the hidden idols and advantages in the game for all seasons. It details where it was found, if there was a clue to the advantage, location, and other advantage conditions. This maps to the
> advantage_details |> + filter(season == 41) # A tibble: 9 x 9 version version_season season_name season advantage_id advantage_type clue_details location_found conditions <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <chr> <chr> 1 US US41 Survivor: 41 41 USEV4101 Extra vote No clue exis~ Shipwheel Isl~ NA 2 US US41 Survivor: 41 41 USEV4102 Extra vote No clue exis~ Shipwheel Isl~ NA 3 US US41 Survivor: 41 41 USEV4103 Extra vote No clue exis~ Shipwheel Isl~ NA 4 US US41 Survivor: 41 41 USHI4101 Hidden immunit~ Found withou~ Found around ~ Beware advantage: all~ 5 US US41 Survivor: 41 41 USHI4102 Hidden immunit~ Found withou~ Found around ~ Beware advantage: all~ 6 US US41 Survivor: 41 41 USHI4103 Hidden immunit~ Found withou~ Found around ~ Beware advantage: all~ 7 US US41 Survivor: 41 41 USHI4104 Hidden immunit~ Found withou~ Found around ~ Beware advantage: all~ 8 US US41 Survivor: 41 41 USKP4101 Knowledge is p~ No clue exis~ Found around ~ NA 9 US US41 Survivor: 41 41 USVS4101 Steal a vote No clue exis~ Shipwheel Isl~ NA
advantage_movement table tracks who found the advantage, who they may have handed it to, and who they played it for. Each step is considered an event. The
sequence_id tracks the logical step of the advantage. For example, in season 41, JD found an extra vote advantage. JD gave it to Shan in good faith who then voted him out keeping the extra vote. Shan gave it to Ricard in good faith who eventually gave it back before Shan played it for Naseer. That movement is recorded in this table.
Who the advatnge was eventually played for, if it was successful or not needed is included in this table. Or in the unfortunate situations when someone is blindsided and voted out with the advantage, that is recorded here.
> advantage_movement |> + filter(advantage_id == "USEV4102") # A tibble: 5 x 15 version version_season season_name season castaway castaway_id advantage_id sequence_id day episode event played_for played_for_id success votes_nullified <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <dbl> 1 US US41 Survivor: 41 41 JD US0603 USEV4102 1 2 1 Found NA NA NA NA 2 US US41 Survivor: 41 41 Shan US0606 USEV4102 2 9 4 Received NA NA NA NA 3 US US41 Survivor: 41 41 Ricard US0596 USEV4102 3 9 4 Received NA NA NA NA 4 US US41 Survivor: 41 41 Shan US0606 USEV4102 4 11 5 Received NA NA NA NA 5 US US41 Survivor: 41 41 Shan US0606 USEV4102 5 17 9 Played Naseer US0600 Yes NA
boot_mapping table is to easily filter to the set of castaways that are still in the game after a specified number of boots. How this differs from the tribe mapping is that rather than being focused on an episode, it is focused on the boot which is often more useful. The number of boots and who is left in the game is often the better indicator of the stage of the game than the episode or day. When someone quits the game or is medically evacuated it is considered a boot. This table tracks multiple boots per episode.
In the case of double tribal councils there is an order in which castaways have their torch snuffed. This is also capture even though it means there is a set of players still remaining for literally minutes before the next leaves the game.
If you needed to determine who is left in the game of season 41 after 12 boots (12 people have either been voted off or left the game) you can use the following code.
> boot_mapping |> + filter( + season == 41, + order == 12 + ) # A tibble: 6 x 11 version version_season season_name season episode order castaway castaway_id tribe tribe_status in_the_game <chr> <chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <lgl> 1 US US41 Survivor: 41 41 12 12 Heather US0593 Via Kana Merged TRUE 2 US US41 Survivor: 41 41 12 12 Erika US0594 Via Kana Merged TRUE 3 US US41 Survivor: 41 41 12 12 Ricard US0596 Via Kana Merged TRUE 4 US US41 Survivor: 41 41 12 12 Xander US0597 Via Kana Merged TRUE 5 US US41 Survivor: 41 41 12 12 Danny US0599 Via Kana Merged TRUE 6 US US41 Survivor: 41 41 12 12 Deshawn US0601 Via Kana Merged TRUE
A an example, the
boot_mapping table can be used to calculate how many people and who participated in certain challenges once mapped to
df_challenges <- challenge_results |> unnest(winners) |> filter( season == 41, order == 4, outcome_status == "Winner" ) |> count(season, episode, order, challenge_type, name = "n_winners") boot_mapping |> filter( season == 41, order == 4 ) |> count(season, episode, order, name = "n_challengers") |> left_join(df_challenges, by = c("season", "episode", "order"))
This table comes in hand for many types of analysis. Please see the documentation for detailed descriptions of the fields.
This is more of a reminder the package also includes ggplot fill and colour scales based on the season logo and tribe colours. Season 42 season logo and tribe colour palletes have been added. To use the colours from a particular season simply use
scale_*_survivor(<season number>) or
library(survivoR) library(tidyverse) df_results <- castaways |> mutate( result = case_when( str_detect(result, "Sole") ~ "Sole Survivor", str_detect(result, "unner") ~ "Finalist", str_detect(jury_status, "jury") ~ "Jury", TRUE ~ "Other" ), result = factor(result, levels = c("Sole Survivor", "Finalist", "Jury", "Other")) ) |> distinct(version_season, castaway_id, result) vote_history |> filter(!is.na(vote_id)) |> left_join(df_results, by = c("version_season", "vote_id" = "castaway_id")) |> count(order, result) |> ggplot(aes(order, n, fill = result)) + geom_col() + scale_x_continuous(breaks = 1:20, labels = 1:20) + labs( title = "Total number of votes across 42 seasons", subtitle = "Distribution of votes by boot order and result", x = "Boot order", y = "Number of votes received", fill = "Result" ) + scale_fill_survivor(42) + theme_minimal()
confessionals |> left_join(df_results, by = c("version_season", "castaway_id")) |> group_by(episode, result) |> summarise(n = sum(confessional_count)) |> ggplot(aes(episode, n, fill = result)) + geom_col() + scale_x_continuous(breaks = 1:16, labels = 1:16) + labs( title = "Total number of confessionals across 42 seasons", subtitle = "Distribution of confessionals by episode and result", x = "Episode", y = "Number of confessionals", fill = "Result" ) + scale_fill_tribes(16, reverse = TRUE) + theme_minimal()