Tidy Tuesday: Nuclear Explosions

Hi, all! This week’s Tidy Tuesday post is about nuclear explosions. Our data come from Stockholm International Peace Research Institute.

One of my goals for this week is to explore some of ggplot’s advanced features, most of which are discussed in “Graphics for Communcation” from R for Data Science. So, let’s try to create a visually appealing and informative plot!

Data Wrangling

Let’s read in the data and get started.

library(readr)
library(dplyr)
library(ggplot2)
library(forcats)
library(purrr)
library(stringr)

nuclear_explosions <- read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-08-20/nuclear_explosions.csv")

First, let’s make some of the country names look better.

nuclear_explosions <-
  mutate(nuclear_explosions,
         country = fct_recode(country,
                              France = "FRANCE",
                              China = "CHINA",
                              India = "INDIA",
                              Pakistan = "PAKIST"
                              ))

Now, to put nice labels on our plot, we need a way to determine an appropriate height. Labels should be high enough that they don’t obstruct the bar plot. However, to save space, I want to put the legend inside the plot. So, we can’t place the labels too high. To solve this problem, we will find the number of explosions for each year, and write a function to find the largest number of yearly explosions over the 10 most recent years to any input.

(Thanks to Twitter user @msubbaiah1 for providing a short explination for the gap in testing in 1958 which saved us some Googling.)

yearlyExplosions <-
  nuclear_explosions %>%
  group_by(year) %>%
  summarise(n = n())

getHeight <- function(year){
  checkYears <- year + seq(-10, 0)
  explosions <- filter(yearlyExplosions, year %in% checkYears) %>% select(n)
  height <- max(explosions, na.rm = T) + 5
  return(height)
}

dates <- tibble(year = c(1959, 1996),
                text = map_chr(c("US, UK, and USSR form moratorium on nuclear testing from Nov '58 to Aug '61",
                                 "Comprehensive Nuclear-Test-Ban Treaty signed in Sept '96"), 
                           str_wrap, width = 20)) %>%
  mutate(height = map_dbl(year, getHeight))

Data Visualization

ggplot(nuclear_explosions, aes(x = year, fill = fct_rev(fct_infreq(country)))) +
  geom_bar(width = 1, color = "black") +
  geom_segment(aes(yend = height, x = year, xend = year), 
               y = 0, data = dates, inherit.aes = F) +
  geom_label(aes(x = year, y = height, label = text), 
             data = dates, inherit.aes = F, size = 2.5, 
             label.r = unit(0, "lines"), vjust = "bottom", hjust = "right") +
  labs(title = "A History of Nuclear Explosions",
       x = "Year", y = "Number of Nuclear Explosions",
       caption = "Visualization: jackmwolf.rbind.io\nData: Stockholm International Peace Research Institute") +
  theme_bw() +
  theme(legend.justification=c(1,1), legend.position=c(1,1), 
        legend.background=element_blank(),
        panel.grid.major.x = element_blank(),
        panel.grid.minor.x = element_blank()) +
  guides(fill = guide_legend(nrow = 2, reverse = TRUE)) +
  scale_fill_brewer(palette = "Accent", name = "") +
  scale_y_continuous(expand = expand_scale(mult = c(0, 0.05))) +
  scale_x_continuous(breaks = seq(1945, 2000, by = 5))

Reflections

I used a lot of new functions this week (str_wrap(), geom_segment(), geom_label(), and several others)! This was also my first time using purrr instead of the apply family of functions. I love how easy it is to use, and how intuitive it feels—I will definitely start to use it more in my work. It was fun to put so much effort into one plot, and I enjoyed exploring the wide range of options that ggplot offers.

Thanks for reading! I’ll see you all next week.

Avatar
Jack M. Wolf
Biostatistician and Educator

I’m an biostatistics PhD student at the University of Minnesota interested in causal inference, clinical trial design, and statistics and data science education.