One Weird Trick to Obtain the Maximum (or Minimum)
R
Efficient use of vectorized functions
Author
Mauricio “Pachá” Vargas S.
Published
June 10, 2022
Motivation
I was tagged on Twitter about the next question (translated):
How do I choose a maximum date per row? I have a data frame with 3 columns with dates, and I want to create a 4th column that has the maximum of those 3 for each row.
Dataset
First, we create a data frame with dates, let’s assume this contains the last three days where people used the bus instead of the subway.
# A tibble: 4 × 4
name day1 day2 day3
<fct> <date> <date> <date>
1 George 2022-05-10 2022-05-19 2022-05-22
2 John 2022-05-06 2022-05-15 2022-05-27
3 Paul 2022-05-05 2022-05-16 2022-05-26
4 Ringo 2022-05-04 2022-05-14 2022-05-30
Here the column for “day 3” is the maximum date per row, but the example still applies for the general case.
Without reshaping
Here we use pmax() (or pmin() for the minimum).
d %>%mutate(max_day =pmax(day1, day2, day3) )
# A tibble: 4 × 5
name day1 day2 day3 max_day
<fct> <date> <date> <date> <date>
1 George 2022-05-10 2022-05-19 2022-05-22 2022-05-22
2 John 2022-05-06 2022-05-15 2022-05-27 2022-05-27
3 Paul 2022-05-05 2022-05-16 2022-05-26 2022-05-26
4 Ringo 2022-05-04 2022-05-14 2022-05-30 2022-05-30
Reshaping
What if we had more columns, for example, if we had 10 or more columns with dates. While it’s not the case for this example, it’s useful to know what to do in those situations. Here I print the result and then compute the operation for clarity, then use max() (or min() for the minimum).
library(tidyr)d2 <- d %>%gather("day", "value", -name)d2
# A tibble: 12 × 3
name day value
<fct> <chr> <date>
1 George day1 2022-05-10
2 John day1 2022-05-06
3 Paul day1 2022-05-05
4 Ringo day1 2022-05-04
5 George day2 2022-05-19
6 John day2 2022-05-15
7 Paul day2 2022-05-16
8 Ringo day2 2022-05-14
9 George day3 2022-05-22
10 John day3 2022-05-27
11 Paul day3 2022-05-26
12 Ringo day3 2022-05-30