Tidy Tuesday: Paralympic Medals
By Katie Press in TidyTuesday
August 4, 2021
The Paralympic Games
Today’s post covers circular barplots in ggplot2 and interactivity using the ggiraph package. This week’s Tidy Tuesday dataset contains information from the Paralympic Games scraped from the web. I noticed that the dataset didn’t include the winter games, which is something I wanted to look at. So I decided to re-scrape the data myself. The datset can be found on Kaggle, and the original code for scraping is in my Github (linked in the Kaggle description). The linked dataset includes all years for the summer and winter games, and I tried to clean the athlete names a bit more since scraping data isn’t always clean and pretty.
I thought I would put the final plots up here at the top of the post for anyone who just wants to play with the reactivity. Continue on to the rest of the post to see the code.
Summer Games
Winter Games
Data Import
Packages used in this post:
knitr::opts_chunk$set(echo = TRUE, message=FALSE,warning=FALSE)
library(tidyverse)
library(janitor)
library(tidytuesdayR)
library(countrycode)
library(ggflags)
library(ggiraph)
olympic.pal2 <- c("#1c6798", "#88bcd0", "#cc9432","#f7d577", "#373735",
"#918c87", "#007552", "#93a556", "#ae2128", "#dc9693")
Read in the data - this is the same data I linked to on Kaggle.
medal_standings <- readRDS("~/Desktop/Rproj/2021-07-26_olympic_medals/medal_standings.RDS")
medal_athlete <- readRDS("~/Desktop/Rproj/2021-07-26_olympic_medals/medal_athlete.RDS")
Add the country code for ggflag.
medal_standings$team_country <-
countrycode(medal_standings$npc_name, "country.name.en", "iso2c")
medal_standings <- medal_standings %>%
mutate(team_country = str_to_lower(team_country))
medal_athlete$team_country <-
countrycode(medal_athlete$npc_name, "country.name.en", "iso2c")
medal_athlete <- medal_athlete %>%
mutate(team_country = str_to_lower(team_country))
Get the all time medals, by type and total.
medal_count <- medal_standings %>%
filter(rank_type == "Sport",!is.na(npc_name)) %>%
mutate(across(
one_of("npc_gold", "npc_silver", "npc_bronze"),
~ as.numeric(.)
)) %>%
group_by(sport, games_season, npc_name, team_country) %>%
summarise(
total_gold = sum(npc_gold),
total_silver = sum(npc_silver),
total_bronze = sum(npc_bronze)
) %>%
mutate(total_medals = sum(total_gold + total_silver + total_bronze))
Now get the number of athletes per country/sport from the other file and create a variable for the proportion of medals to athletes.
temp <- medal_athlete %>%
ungroup() %>%
filter(!is.na(npc_name)) %>%
count(sport, games_season, team_country, npc_name, athlete_name) %>%
count(sport, games_season, team_country, npc_name) %>%
rename("total_athlete" = n)
medal_count <- medal_count %>%
left_join(temp) %>%
mutate(medal_prop = round(total_medals / total_athlete, 2))
What are the top 10 most participated summer Paralympics? There are quite a few sports and not all of them will fit on the chart and still be readable.
top_sports <- medal_athlete %>%
ungroup() %>%
select(games_code:athletes) %>%
distinct() %>%
group_by(sport, games_season) %>%
summarise(total_npcs = sum(npcs)) %>%
arrange(games_season, desc(total_npcs)) %>%
group_by(games_season) %>%
slice(1:10)
Prep the data for charts
First I will chart the summer Paralympics. I’m going to make a circular bar chart, so I need to prep the data to be in that format. More information on circular bar charts can be found here.
plot_data <- medal_count %>%
filter(games_season == "Summer", sport %in% top_sports$sport) %>%
mutate(sport = factor(sport)) %>%
arrange(sport, desc(medal_prop)) %>%
group_by(sport) %>%
slice(1:10)
plot_data <- plot_data %>%
bind_rows(tibble(sport = factor(levels(plot_data$sport))))
plot_data <- plot_data %>%
ungroup() %>%
arrange(sport, desc(medal_prop)) %>%
mutate(id = row_number())
Prep a dataset to label each section by sport. I’m also adding an angle column so that I can angle the text and put it inside the circle. I tried to find an “official” way to do this, but it doesn’t seem possible at this point to have sections of curved text, although the example I linked above explains how to add curved lines with geom_segment. So this might not be perfect, but it’s passable.
base_data <- plot_data %>%
group_by(sport) %>%
summarize(start = min(id), end = max(id) - 1) %>%
rowwise() %>%
mutate(title = mean(c(start, end)))
base_data <- base_data %>%
mutate(sport = fct_recode(sport, `Power Lifting` = "Powerlifting")) %>%
mutate(sport2 = paste0(word(sport, 1), "\n", word(sport, 2))) %>%
mutate(sport2 = ifelse(str_detect(sport2, "NA"), NA, sport2)) %>%
mutate(sport2 = coalesce(sport2, sport))
#for text angles
base_data <- base_data %>%
ungroup() %>%
mutate(angle = c(-15,-55,-85, 55, 25,-15,-50, 90, 55, 25))
Create the tooltip column. This is what will be used with ggiraph to provide the information for each bar when the mouse hovers over it.
plot_data <- plot_data %>% #changing this because it's a bit long compared to the others
mutate(npc_name = ifelse(str_detect(npc_name, "Hong Kong"), "Hong Kong",
ifelse(str_detect(npc_name, "Emirates"), "UAE", npc_name)))
plot_data <- plot_data %>%
mutate(tooltip = paste0("Country: ", npc_name, "\n",
"Total Medals: ", total_medals, "\n",
"Total Gold: ", total_gold, "\n",
"Total Silver: ", total_silver, "\n",
"Total Bronze: ", total_bronze, "\n",
"Medals Per Athlete: ", medal_prop, "\n"))
Create the Summer Paralympics Plot
Create a theme for the plot for visual specifications. I’m using the same theme from my last blog post, which was inspired by this image from Sports Illustrated:
my_theme <- theme(
legend.position = "none",
plot.title = element_text(
family = "Hiragino Sans W3",
size = 11,
vjust = -15,
hjust = .5,
color = "#645c54"
),
plot.subtitle = element_text(
family = "Hiragino Sans W3",
size = 9,
vjust = -17,
hjust = .5,
color = "#645c54"
),
plot.caption = element_text(
family = "Hiragino Sans W3",
size = 7,
vjust = 22,
hjust = .95,
color = "#645c54"
),
plot.background = element_rect(fill = "#fcf2e6", color = "#fceee2"),
axis.ticks = element_blank(),
axis.text = element_blank(),
axis.title = element_blank(),
panel.grid = element_blank(),
plot.margin = unit(c(-1, 0,-1, 0), "cm")
)
Create the plot using ggplot, using geom_bar_interactive instead of just regular geom_bar, which will allow it to link to the tooltip with ggiraph. The geom_flag is for the flags of course, and the geom_text is for the sport labels on the inside of the circle.
summer_plot <-
ggplot(plot_data,
aes(
x = factor(id),
y = medal_prop,
fill = sport,
data_id = tooltip
)) +
geom_bar_interactive(aes(tooltip = tooltip, data_id = factor(id)),
stat = "identity",
alpha = 0.7) +
scale_fill_manual(values = olympic.pal2) +
ylim(-5, 5) +
coord_polar() +
geom_flag(data = plot_data,
aes(
x = id,
y = medal_prop + .5,
country = team_country,
size = medal_prop
)) +
geom_text(
data = base_data,
aes(x = title, y = -.75, label = sport2),
colour = olympic.pal2,
#alpha = 0.8,
size = 2.5,
fontface = "bold",
inherit.aes = FALSE,
angle = base_data$angle
) +
labs(title = "Summer Paralympic Games Cumulative Medal Winnings",
subtitle = "Top 10 Countries per Sport Ranked by Ratio of Medals to Athletes",
caption = "@katie_press | kpress.dev | Data: paralympics.org")+
theme_minimal() +
my_theme
Create the girafe object and format. This will allow me to print out the rmarkdown document with interactivity in this plot. The opts_hover_inv will make all the bars lighter except for the one that is currently selected. The opts_hover is how I finally got the hover to be the same color as the bar, actually I’m not sure if that’s what it’s doing but it works like this. If I left that out, the hover color would be bright yellow-ish. Otherwise I could specify a color in that css fill, but if I used the color name I could only pick one color.
tooltip_css <- "border-radius: 5px;font-family:Hiragino Sans W3;color:white"
girafe(ggobj = summer_plot,
options = list(
opts_tooltip(
use_fill = TRUE,
use_stroke = TRUE,
css = tooltip_css
),
opts_hover_inv(css = "opacity:0.5;"),
opts_hover(css = "fill:plot_data$sport;")
))
Prep the Winter Games Data
This will basically use the same method as above, however there are only seven winter sports so the plot will be smaller. After I ran through this the first time, I realized I needed to exclude the wheelchair curling and para ice hockey as well, because the medal to athlete proportions were so much smaller than the other sports you couldn’t really see them on the chart. I think it might have something to do with them being team events, and I used the overall ranking tables so some of them might be slightly under-counted with this method.
plot_data2 <- medal_count %>%
filter(games_season == "Winter",
!sport %in% c("Wheelchair Curling", "Para Ice Hockey")) %>%
mutate(sport = factor(sport)) %>%
#mutate(gold_prop = total_gold/total_athlete) %>%
arrange(sport, desc(medal_prop)) %>%
group_by(sport) %>%
slice(1:10)
plot_data2 <- plot_data2 %>%
bind_rows(tibble(sport = factor(levels(plot_data2$sport))))
plot_data2 <- plot_data2 %>%
ungroup() %>%
arrange(sport, desc(medal_prop)) %>%
mutate(id = row_number())
Prep a dataset for group/sport titles. I had to split up the really long sport name to be on two lines.
base_data2 <- plot_data2 %>%
group_by(sport) %>%
summarize(start = min(id), end = max(id) - 1) %>%
rowwise() %>%
mutate(title = mean(c(start, end)),
sport = as.character(sport))
base_data2 <- base_data2 %>%
mutate(sport = replace(
sport,
sport == "Ice Sledge Speed Skating",
"\n Ice Sledge \n Speed Skating"
))
#for text angles
base_data2 <- base_data2 %>%
ungroup() %>%
mutate(angle = c(90, 90, 90, 90, 90))
Create the tooltip column. This time I decided halfway through that I wanted to order the sports to kind of order them from highest to lowest proportion (but still grouped by sport). I didn’t want to redo everything in case I didn’t end up liking it, so I just made sport a factor, rearranged the data, and created a second id variable to plot with.
plot_data2 <- plot_data2 %>%
mutate(tooltip = paste0("Country: ", npc_name, "\n",
"Total Medals: ", total_medals, "\n",
"Total Gold: ", total_gold, "\n",
"Total Silver: ", total_silver, "\n",
"Total Bronze: ", total_bronze, "\n",
"Medals Per Athlete: ", medal_prop, "\n"))
plot_data2 <- plot_data2 %>%
mutate(sport2 = factor(
sport,
levels = c(
"Ice Sledge Speed Skating",
"Alpine Skiing",
"Cross-Country",
"Biathlon",
"Snowboard"
)
))
plot_data2 <- arrange(plot_data2, sport, desc(medal_prop))
plot_data2 <- plot_data2 %>%
mutate(id2 = row_number())
base_data2 <- base_data2 %>%
ungroup() %>%
mutate(angle = c(-30, 65,-5, 105, 40))
Create the Winter Paralympics Plot
This time I need to create a different theme. I was inspired by a photo from the Sochi Games opening ceremonies. I just used an online color picker tool, fed in the linked image, and copied some of the colors that came up.
my_theme2 <- theme(
legend.position = "none",
plot.title = element_text(
family = "Hiragino Sans W3",
size = 11,
vjust = -13,
hjust = .5,
color = "#3b5e7e"
),
plot.subtitle = element_text(
family = "Hiragino Sans W3",
size = 9,
vjust = -15.5,
hjust = .5,
color = "#3b5e7e"
),
plot.caption = element_text(
family = "Hiragino Sans W3",
size = 7,
hjust = .95,
vjust = 40,
color = "#3b5e7e"
),
plot.background = element_rect(fill = "#e4e9f3", color = "#e4e9f3"),
axis.ticks = element_blank(),
axis.text = element_blank(),
axis.title = element_blank(),
panel.grid = element_blank(),
#plot.margin = unit(c(-1,-1,-2, -1), "cm")
plot.margin = unit(c(-1, 0,-2, 0), "cm")
)
olympic.pal3 <- c("#c6a4e8", #light purple
"#dc8cb9", #light pink
"#adad65", #light green
"#5a7aae", #steel blue
"#734f8a", #dark purple
"#a55097" #medium purple
)
Now create the plot.
winter_plot <- ggplot(plot_data2,
aes(
x = factor(id2),
y = medal_prop,
fill = sport,
data_id = tooltip
)) +
geom_bar_interactive(aes(tooltip = tooltip, data_id = factor(id)),
stat = "identity",
alpha = 0.7) +
scale_fill_manual(values = olympic.pal3) +
ylim(-6.5, 6.5) +
coord_polar() +
geom_flag(data = plot_data2,
aes(
x = id2,
y = medal_prop + .5,
country = team_country,
size = 2
)) +
geom_text(
data = base_data2,
aes(x = title, y = -.75, label = sport),
colour = olympic.pal3[1:5],
#alpha = 0.8,
size = 2.5,
fontface = "bold",
inherit.aes = FALSE,
angle = base_data2$angle
) +
labs(title = "Winter Paralympic Games Cumulative Medal Winnings",
subtitle = "Top 10 Countries per Sport Ranked by Ratio of Medals to Athletes",
caption = "@katie_press | kpress.dev | Data Source: paralympics.org")+
theme_minimal() +
my_theme2
Create the girafe object and format.
girafe(ggobj = winter_plot,
options = list(
opts_tooltip(
use_fill = TRUE,
use_stroke = TRUE,
css = tooltip_css
),
opts_hover_inv(css = "opacity:0.5;"),
opts_hover(css = "fill:plot_data$sport;")
))