A step by step guide to write an R package that uses C++ code (Ubuntu)

R
VSCode
Linear models
C++
Linux
Using cpp11 (R package) on Ubuntu.
Author

Mauricio “Pachá” Vargas S.

Published

May 21, 2023

R and Shiny Training: If you find this blog to be interesting, please note that I offer personalized and group-based training sessions that may be reserved through Buy me a Coffee. Additionally, I provide training services in the Spanish language and am available to discuss means by which I may contribute to your Shiny project.

Updated 2023-05-23: In a posterior post about a more advanced C++ option called vendoring I used the convention that functions with names ending in underscore are “development” functions and those without it are “end-user” functions. I modified this post and the repository to do the same here for consistency. Also, if we include “cpp11.hpp”, then we don’t need to call “doubles.hpp” and “matrix.hpp”, therefore I simplified the example here and just included “cpp11.hpp” to build the package. In the vendoring post I refined what to include to compile and explicitly imported specific headers for doubles (i.e., numeric vectors) and matrices excluding calls to lists, strings and others.

Updated (2nd time) 2023-05-23: I added a note about removing the “-g” flags in the “Makevars” file once everything works correctly. As Mark Padgham correctly mentioned, it is just for debugging purposes and makes compilation slower.

Updated (3rd time) 2023-09-30: I simplified all the C++ setup because at the University of Toronto we use g++ instead of clang.

Motivation

A large part of my research interest requires to estimate computationally intensive models, such as the General Equilibrium Poisson Pseudo Maximum Likelihood (GEPPML) estimator derived from the equilibrium conditions introduced by Anderson and Van Wincoop (2004) for estimation and inference.

The GEPPML estimator is a computationally intensive estimator that requires to solve a system of non-linear equations, and for this task we might be better-off by using a compiled language such as C++. The good news is that we can use C++ code within R and Python, and this blog post is about using C++ functions from R.

Also, I do not pretend to be an expert on C++ or debate if R is better than Python. I use both from Visual Studio Code. I do want to share my experience on how to use C++ code within R.

Honest disclaimer

This blog post is a summary of what worked after hours of fails for my future self. I hope it helps you too.

I am a Statistician and Political Scientist, not a Computer Scientist!

Setup

Ubuntu and its derived distributions (I use Linux Mint) use gcc as the default C++ compiler. I will use g++ just for consistency with what is used at the University of Toronto.

According to Ubuntu documentation: “When you compile C++ programs, you should invoke GCC as g++ instead.”

I installed the R packages cpp11 and usethis:

install.packages(c("cpp11", "usethis"))

I created a file ~/.Rprofile containing the following lines:

library(devtools)
library(usethis)
library(cpp11)

Now forget about devtools::install(). After reopening your editor, every time you use RStudio (or VSCode) you just call install(), and the same applies to usethis::use_*() and cpp11::cpp_*() functions.

Up to this point I still had the following error messages when compiling C++ code:

fatal error: 'cstdio' file not found
fatal error: 'vector' file not found
cannot find -lc++abi: No such file or directory

I had to install additional packages. This took me a few hours searching on the Internet until I figured it out. Install the following packages:

sudo apt install g++-11 libc++-11-dev libc++abi-11-dev

To be sure that the install() function in R uses g++, I created the ~/.R/Makevars file. The contents of the file are the following:

CC = gcc
CXX = g++
CXX98 = g++
CXX11 = g++
CXX14 = g++
CXX14 = g++
CXX17 = g++
CXX20 = g++
CXXCPP = g++
OBJC = gcc
OBJCXX = g++
SHLIB_CXXLD = g++

# USE -O0 for debugging
# USE -O3 for production code
CXXFLAGS=-Wall -O3 -pedantic
CXX11FLAGS=-Wall -O3 -pedantic
CXX14FLAGS=-Wall -O3 -pedantic
CXX17FLAGS=-Wall -O3 -pedantic
CXX20FLAGS=-Wall -O3 -pedantic

A more flexible approach is to edit src/Makevars within the package folder, and add the following lines:

PKG_CXXFLAGS = -Wall -O0 -pedantic

In the CXXFLAGS I use -O0 to avoid optimization, which is useful for debugging. After the code is working, I can change it to -O3 to optimize the compiled code.

If later on I need to compile with gcc, I can open ~/.R/Makevars, comment all the lines, restart RStudio or VSCode, and run install() again.

If you close RStudio (or VSCode) and open it again, you can check that the changes were implemented by running this code:

cpp11::cpp_source(
  code = "
    #include <cpp11.hpp>

    using namespace cpp11;

    [[cpp11::register]] int plusone(int x)
    {
        return x + 1;
    }",
  quiet = FALSE
)
using C++ compiler: ‘g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0’
using C++11
g++ -std=gnu++11 -I"/usr/share/R/include" -DNDEBUG -I'/home/pacha/R/x86_64-pc-linux-gnu-library/4.3/cpp11/include'      -fpic  -g -O2 -ffile-prefix-map=/build/r-base-H0vbME/r-base-4.3.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2  -c /tmp/RtmppcZp5n/file2c332188727d/src/code_2c3341d2bc49.cpp -o /tmp/RtmppcZp5n/file2c332188727d/src/code_2c3341d2bc49.o
g++ -std=gnu++11 -I"/usr/share/R/include" -DNDEBUG -I'/home/pacha/R/x86_64-pc-linux-gnu-library/4.3/cpp11/include'      -fpic  -g -O2 -ffile-prefix-map=/build/r-base-H0vbME/r-base-4.3.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2  -c /tmp/RtmppcZp5n/file2c332188727d/src/cpp11.cpp -o /tmp/RtmppcZp5n/file2c332188727d/src/cpp11.o
g++ -std=gnu++11 -shared -L/usr/lib/R/lib -Wl,-Bsymbolic-functions -flto=auto -ffat-lto-objects -flto=auto -Wl,-z,relro -o /tmp/RtmppcZp5n/file2c332188727d/src/code_2c3341d2bc49.so /tmp/RtmppcZp5n/file2c332188727d/src/code_2c3341d2bc49.o /tmp/RtmppcZp5n/file2c332188727d/src/cpp11.o -L/usr/lib/R/lib -lR

The output should start with:

using C++ compiler: ‘g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0’

Instead of:

using C compiler: ‘gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0’

Creating a dummy package

From RStudio (or VSCode) we can create a new package by running create_package("~/cpp11dummypackage"). This will create a new folder with the name cpp11dummypackage. Then I run use_cpp11() to add the required files to use C++ code within R.

Then I run use_r("cpp11dummypackage-package") to create a new R script file with the name cpp11dummypackage-package.R within the R folder, and added the following code to it:

#' @useDynLib cpp11dummypackage, .registration = TRUE
NULL

The usethis skeleton also created the file src/code.cpp for us. I added a simple function to transpose a matrix to it, by replacing the file contents by the following lines:

#include <cpp11.hpp>

using namespace cpp11;
    
[[cpp11::register]] doubles_matrix<> transpose_(doubles_matrix<> X)
{
    int NX = X.nrow();
    int MX = X.ncol();

    writable::doubles_matrix<> R(MX, NX);

    for (int i = 0; i < MX; i++)
    {
        for (int j = 0; j < NX; j++)
        {
            R(i, j) = X(j, i);
        }
    }

    return R;
}

In order to export the function, I added the following lines to cpp11dummypackage-package.R:

#' Transpose a matrix
#' @export
#' @param X numeric matrix
#' @return numeric matrix
#' @examples
#' set.seed(1234)
#' X <- matrix(rnorm(4), nrow = 2, ncol = 2)
#' X
#' transpose(X)
transpose <- function(X) {
  transpose_(X)
}

I tested the functions after running cpp11_register() and load_all():

> set.seed(1234)

> X <- matrix(rnorm(4), nrow = 2, ncol = 2)

> X
           [,1]      [,2]
[1,] -1.2070657  1.084441
[2,]  0.2774292 -2.345698

> transpose(X)
          [,1]       [,2]
[1,] -1.207066  0.2774292
[2,]  1.084441 -2.3456977

If I would have passed 1:4 instead of rnorm(4) to matrix(), I would have obtained the following error message:

> transpose(X)
Error: Invalid input type, expected 'double' actual 'integer'

This is because I declared the function to accept a doubles_matrix<> as input, and not an integers_matrix<>.

To install the recently created package, I run the following lines in the R console:

clean_dll()
cpp_register()
document()
install()

Debugging the package

In order to access debugging symbols, I created a new Makevars file within the src folder, and added the following lines:

CXX_STD = CXX11
PKG_CPPFLAGS = -UDEBUG -g

Then I reinstalled the package compiled with debugging symbols, and in bash I run R -d lldb-11. From there I could follow this excellent guide to debug R and C++ code.

I shouldn’t generally leave the -g flag on in a Makevars file, that will insert trace symbols in the compiled binary, both increasing compilation times (often by a large margin), and creating larger binaries. Once the package is compiled and I am sure that it works properly, I need to remove the PKG_CPPFLAGS = -UDEBUG -g line.

A more complex example

I created a package containing a set of simple functions to obtain the Ordinary Least Squares (OLS) estimator by calling a C++ function that calls other C++ functions. My approach was to create one function per step, which meant to create one function to obtain \(X^tX\), another for \((X^tX)^{-1}\) which consisted in implementing the Gauss-Jordan method to invert a matrix, another for \(X^tY\) and then call each of those functions to obtain \(\hat{\beta} = (X^tX)^{-1}(X^tY)\).

This implementation is extremely naive, but it is enough to show how to use C++ code within R. Please see it from my GitHub profile.

A good challenge would be to implement the QR decomposition used by the lm() function in R and use it to obtain the OLS estimator in C++. This would require some effort, but here you can find a good starting point.

In any case, it would be extremely hard to beat the performance of the lm() function in R, which has some internals written in C, and how computationally robust lm() is means another feature that is hard to beat.

References