Using googlesheets and mailR packages in R to automate reporting

R
Google Sheets
Using Excel for reporting has its drawbacks, here I share a way to solve some of those drawbacks.
Author

Mauricio “Pachá” Vargas S.

Published

December 18, 2016

Updated 2017-08-02

Most companies make extensive use of spreadsheets to store, work and share data internally.

For years, Excel has been the main software that allowed people to share internal data. But using Excel for reporting has its drawbacks:

  1. First manually updating data into spreadsheets is time-consuming and takes a large amount of analysts’ time. Before automating much of the reporting I am responsible for, I was often spending well over 30% of my time pulling data from our database, inputting the updated dataset in a spreadsheet and wait until all computations were done.
  2. Manual reporting can lead to errors. Shifting the inputted data by one cell will yield wrong results and will trigger angry emails from decision-makers!
  3. Excel files are shared via emails as attached files. It is not uncommon for multiple versions of the same files to be shared in the organization. This can lead to confusions, miscommunication and loss in efficiency.

With its Google docs suite, Google has solved point 3) by giving anyone the ability to create online collaborative spreadsheets for teams to work on simultaneously, for colleagues to give comments and communicate with one another in one place without having to email different mark-ups back and forth.

This post is about providing solutions to points 1) and 2) to help you reduce the amount of time you spend on reporting by showing a general procedure with OfficeSupplies dataset. This tutorial is based on what Pauline Glikman posted on Medium.com. Her tutorial is very good but I changed some parts that in my opinion could be improved.

R allows for easy automation of the reporting process. This means that the analyst’s task of pulling data, making some computations, uploading it in a nice spreadsheets and emailing it to relevant business stakeholders can be completely automated in a single script. And because with the same script and the same dataset, you will confidently obtain the same reproducible results you will not have to worry about making errors anymore. So whether you are currently running a daily reports showing sales or service level, this post can help you improve your productivity.

Installing the needed R packages

Before we get started, and assuming you have already installed R and Rstudio on your computer, let’s install everything we need:

  • googlesheets package
  • rJava package
  • mailR package
  • dplyr package
# Install/load the libraries to use
install.packages(c("googlesheets","rJava", "mailR", "dplyr"))
library(googlesheets)
library(mailR)
library(dplyr)

Building your report with googlesheets package

For the purpose of this post, we will use this spreadsheet. Make a copy of it to your drive, we’ll use the url to your spreadsheet copy.

To work with and authorize googlesheets to view and manage your files, you need to type:

# Google's session authentication 
gs_auth()

You will be directed to a web browser, asked to sign in to your Google account, and to grant googlesheets permission to operate on your behalf with Google Sheets and Google Drive.

For the purpose of this post we will be using the OfficeSupplies dataset (in the spreadsheet). Now to define the spreadsheet and sheet to work with you need to type:

# List your worksheets
gs_ls()

# Read spreadsheet
office_supplies <- gs_read(ss = gs_title("OfficeSupplies"), ws = "OfficeSupplies", range = "A3:F46")

As an example suppose I want to report the sold units by region, so I’ll need this table:

sold_units_by_region <- office_supplies %>% 
  select(Region,Units) %>%
  group_by(Region) %>% 
  summarise(Sold_Units = sum(Units))

sold_units_by_region

Done! You can now run your script every single day!

mailR: Updating your stakeholder with an automated email

Once your report is updated, you might want to send an email to the people looking at your dashboard. If you are running this script daily, opening your gmail and drafting an email seems like a waste of time.

Don’t worry, mailR has your back!

First follow this Google’s instructions to add a specific app password. This will generate a random password that you can just use for mailR, allowing you to save it in a script without revealing your actual credentials. You can also revoke this password at any time from your Google accounts settings page. Be careful, however – if this password is compromised, it can be used to fully access your account from anywhere! Consequently, make sure you don’t share a script with your password in it with anyone else.

Add this line to your .Rprofile for not exposing your app password:

options(GoogleDocsPassword = c(login = '16 characters password'))

And then add these lines at the end of your script and observe the magic:

# Write the content of your email
msg <- paste("Hi",
             "<br>",
             "This is my last report with date", as.character(date()),
             "<br>",
             "Please see the sales by region",
             "<br>",
             htmlTable::htmlTable(sold_units_by_region),
             "<br>",
             "Regards",
             "<br>",
             "John Doe")

# Define who the sender is
sender <- "johndoe@gmail.com"

# Define who should get your email
recipients <- c("johndoe@gmail.com")

# Send your email with the send.mail function
send.mail(from = sender,
          to = recipients,
          subject = "My report",
          body = msg,
          smtp = list(host.name = "smtp.gmail.com", port = 587,
                      user.name = "johndoe@gmail.com",
                      passwd = getOption('GoogleDocsPassword')[[1]], ssl = TRUE),
          authenticate = TRUE,
          send = TRUE,
          html = TRUE)

Appendix: One more thing on googlesheets

There is one major limitation with the googlesheets package is that it relies on a Google API which is apparently slow. As a result, uploading large datasets to your spreadsheet will not work in one chunk. One fix I have found and been using is to batch my data and load it in the spreadsheet in chunks. You can use this code to chunk your data:

# Chunk your data in batches of N rows to ease the 
N <- 5 # you can change this according to what you need
data_in_chunks <- list()
batches <- round(nrow(data)/N)
for(i in 1:batches){
  data_in_chunks[[i]] <-  data[(((i-1)*N)+1):(i*N),]
}

# Then upload your batches with a for loop
for(i in 1:(length(data_in_chunks)-2)){
  row <- paste("A",as.character(2+(i*N)), sep = '')
  top_10_cars <- top_10_cars %>% gs_edit_cells(ws = tab, input = data_in_chunks[[i+1]], anchor = row, col_names = FALSE, verbose = TRUE)
}