Here is a case of an interesting correlation: the number of people who drowned by falling into a pool and the number of films Nicholas Cage appeared in.
library(spuriouscorrelations)library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
[1] Suicides by hanging, strangulation and suffocation
[2] Number of people who drowned by falling into a pool
[3] Number of people who died by becoming tangled in their bedsheets
[4] Murders by steam, hot vapours and hot objects
[5] Computer science doctorates awarded in the US
[6] Sociology doctorates awarded in the US
[7] Civil engineering doctorates awarded in the US
[8] People who drowned after falling out of a fishing boat
[9] Drivers killed in collision with railway train
[10] Total US crude oil imports
[11] Number of people who drowned while in a swimming-pool
[12] Suicides by crashing of motor vehicle
[13] Number of people killed by venomous spiders
[14] Mathematics doctorates awarded
14 Levels: Civil engineering doctorates awarded in the US ...
drownings <- spurious_correlations %>%filter( var1 =="Number of people who drowned by falling into a pool" ) %>%select(year, var1, var2, var1_value, var2_value)cor(drownings$var1_value, drownings$var2_value)
[1] 0.6660043
Now let’s plot the data.
# compute a scale factor so that max(var2_value * factor) ≈ max(var1_value)max1 <-max(drownings$var1_value)max2 <-max(drownings$var2_value)ratio <- max1 / max2ggplot(drownings, aes(x = year)) +geom_line(aes(y = var1_value, color ="Drownings")) +geom_line(aes(y = var2_value * ratio, color ="Films")) +scale_y_continuous(name ="Number of drownings",sec.axis =sec_axis(~ . / ratio,name ="Number of films" ),limits =c(0, NA) ) +scale_color_manual(name ="",values =c("Drownings"="blue","Films"="red" ) ) +theme_minimal() +labs(title ="Number of people who drowned by falling into a pool vs.\nNumber of films Nicholas Cage appeared in",caption ="Source: Spurious Correlations (Vigen 2015)" )
Interested? You can install the package from GitHub