One Weird Trick to Obtain the Maximum (or Minimum)

R
Efficient use of vectorized functions
Author

Mauricio “Pachá” Vargas S.

Published

June 10, 2022

Motivation

I was tagged on Twitter about the next question (translated):

How do I choose a maximum date per row? I have a data frame with 3 columns with dates, and I want to create a 4th column that has the maximum of those 3 for each row.

Dataset

First, we create a data frame with dates, let’s assume this contains the last three days where people used the bus instead of the subway.

library(dplyr)
library(stringr)

set.seed(1234)

d <- tibble(
  name = factor(c("George", "John", "Paul", "Ringo")),
  day1 = paste0("2022-05-", str_pad(sample(1:10, 4), 2, "left", "0")),
  day2 = paste0("2022-05-", sample(11:20, 4)),
  day3 = paste0("2022-05-", sample(21:30, 4))
) %>% 
  mutate_if(is.character, as.Date)

d
# A tibble: 4 × 4
  name   day1       day2       day3      
  <fct>  <date>     <date>     <date>    
1 George 2022-05-10 2022-05-19 2022-05-22
2 John   2022-05-06 2022-05-15 2022-05-27
3 Paul   2022-05-05 2022-05-16 2022-05-26
4 Ringo  2022-05-04 2022-05-14 2022-05-30

Here the column for “day 3” is the maximum date per row, but the example still applies for the general case.

Without reshaping

Here we use pmax() (or pmin() for the minimum).

d %>% 
  mutate(
    max_day = pmax(day1, day2, day3)
  )
# A tibble: 4 × 5
  name   day1       day2       day3       max_day   
  <fct>  <date>     <date>     <date>     <date>    
1 George 2022-05-10 2022-05-19 2022-05-22 2022-05-22
2 John   2022-05-06 2022-05-15 2022-05-27 2022-05-27
3 Paul   2022-05-05 2022-05-16 2022-05-26 2022-05-26
4 Ringo  2022-05-04 2022-05-14 2022-05-30 2022-05-30

Reshaping

What if we had more columns, for example, if we had 10 or more columns with dates. While it’s not the case for this example, it’s useful to know what to do in those situations. Here I print the result and then compute the operation for clarity, then use max() (or min() for the minimum).

library(tidyr)

d2 <- d %>% 
  gather("day", "value", -name)

d2
# A tibble: 12 × 3
   name   day   value     
   <fct>  <chr> <date>    
 1 George day1  2022-05-10
 2 John   day1  2022-05-06
 3 Paul   day1  2022-05-05
 4 Ringo  day1  2022-05-04
 5 George day2  2022-05-19
 6 John   day2  2022-05-15
 7 Paul   day2  2022-05-16
 8 Ringo  day2  2022-05-14
 9 George day3  2022-05-22
10 John   day3  2022-05-27
11 Paul   day3  2022-05-26
12 Ringo  day3  2022-05-30
d2 %>% 
  group_by(name) %>% 
  summarise(max_day = max(value))
# A tibble: 4 × 2
  name   max_day   
  <fct>  <date>    
1 George 2022-05-22
2 John   2022-05-27
3 Paul   2022-05-26
4 Ringo  2022-05-30