Tidy Tuesday: Roman Emperors

Hello, all! One of my hopes for blog is to participate in the R for Data Science community’s weekly Tidy Tuesday projects. Without further ado, let’s dig into this week’s data!

This week’s data comes from Wikipedia’s “List of Roman Emporers” and was compiled and shared by Georgios Karamanis.

Let’s load in some packages and read in the data.

library(readr)
library(dplyr)
library(forcats)
library(ggplot2)

emperors <- read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-08-13/emperors.csv")

The data presents a lot to work with. A Reddit post on /r/dataisbeautiful presented a beautiful visualization of the time of each emperor’s reign. Let’s attempt another visualization to show this timeline and look into how each reign ended.

Data Wrangling

One difficulty is Emperor Augustus, who rose to power in 27 BCE. The documentation for this data notes how BCE years were handled. Because I couldn’t find any great ways of dealing with BCE dates in R, I decided to convert the Date variables reign_start and reign_end to be numeric and represent the number of years since the year 0AD. We’ll also create the variable reign_length for later.

emperors <- 
  mutate(emperors,
         ## Convert Date variables to double
         reign_startDbl = (as.integer(reign_start) - as.integer(as.Date("0000-01-01"))) / 365.25,
         reign_endDbl   = (as.integer(reign_end)   - as.integer(as.Date("0000-01-01"))) / 365.25,
         ## Fix Augustus by hand
         reign_startDbl = ifelse(name == "Augustus", 
                                 -1 * reign_startDbl + 1,  
                                 reign_startDbl),
         ## Length of reign in years
         reign_length = reign_endDbl - reign_startDbl
  )

Data Visualization

ggplot(emperors, aes(x = fct_rev(fct_reorder(name, index)),
                     ymin = reign_startDbl, 
                     ymax = reign_endDbl,
                     color = fct_rev(fct_reorder(cause, reign_length))
                     )) +
  geom_linerange(size = 2) +
  coord_flip() +
  theme_minimal() +
  theme() +
  labs(y = "Year (AD)", x = "", 
       title = "A Timeline of Roman Emperors",
       caption = "Visualization: jackmwolf.rbind.io\nData: @geokaramanis and Wikipedia") +
  scale_color_discrete(name = "Cause of Death")

Let’s explore the relationship between how long an emperor ruled and how they died in greater detail.

ggplot(emperors, aes(x = fct_reorder(cause, reign_length), 
                     y = reign_length, 
                     fill = fct_rev(fct_reorder(cause, reign_length))
                     )) +
  geom_boxplot(alpha = 0.5, outlier.alpha = 0) +
  geom_jitter(shape = 21, color = "black", size = 3, alpha = 0.5, width = 0.3) +
  coord_flip() +
  labs(y = "Length of Reign (Years)",
       x = "Cause of Death",
       title = "Length of Roman Emperors' Reigns",
       caption = "Visualization: jackmwolf.rbind.io\nData: @geokaramanis and Wikipedia") +
  theme_minimal() +
  theme(legend.position = "none")

Reflection

This dataset challenged me to (re)learn about variables of class Date and to find some workarounds to use dates both BCE and AD.

Thanks for reading! I’ll hopefully be back next week for another Tidy Tuesday post.

Avatar
Jack M. Wolf
Biostatistician and Educator

I’m an biostatistics PhD student at the University of Minnesota interested in causal inference, clinical trial design, and statistics and data science education.