Introducing cpp11armadillo: R and Armadillo integration using the header-only cpp11 R package
The goal of cpp11armadillo is to provide a novel approach to use the Armadillo C++ library by using the header-only cpp11 R package and to simplify things for the end-user.
The idea is to pass matrices/vectors from R to C++, write pure C++/Armadillo code for the computation, and then export the result back to R with the proper data structures.
This follows from the same goals as cpp11:
- Enforcing copy-on-write semantics.
- Improving the safety of using the R API from C++ code.
- Using UTF-8 strings everywhere.
- Applying newer C++11 features.
- Having a more straightforward, simpler implementation.
- Faster compilation time with lower memory requirements.
- Growing vectors more efficiently.
Installation
You can install the development version of cpp11armadillo like so:
::install_github("pachadotdev/cpp11armadillo") remotes
Minimal example
I have provided a package template for RStudio that also works with VS Code.
The idea of this package is to be naive and simple (like me).
From RStudio/VSCode create a new project and run:
::pkg_template() cpp11armadillo
Then follow the instructions from the README.
Here is a commented example from the package template:
#include <armadillo.hpp>
#include <cpp11.hpp>
#include <cpp11armadillo.hpp>
using namespace arma;
using namespace cpp11;
using namespace std;
[[cpp11::register]] doubles_matrix<> ols_mat(const doubles_matrix<>& y,
const doubles_matrix<>& x) {
<double> Y = as_Mat(y);
Mat<double> X = as_Mat(x);
Mat
<double> XtX = X.t() * X;
Mat<double> XtX_inv = inv(XtX);
Mat<double> beta = XtX_inv * X.t() * Y;
Mat
return as_doubles_matrix(beta);
}
This code:
- Includes the Armadillo, cpp11 and cpp11armadillo libraries and allows interfacing C++ with R (i.e., the
#include <XYZ.hpp>
lines). - Loads the corresponding namespaces (i.e., the
using namespace XYZ
lines) in order to simplify the notation (i.e., usingMat
instead ofarma::Mat
). - Declares a function
ols_mat()
that takes inputs from R, does the computation on C++ side, and it can be called from R scripts. The use ofconst
and&
are specific to the C++ language and allow to pass data from R to C++ while avoiding copying the data, therefore saving time and memory. as_Mat()
is a C++ template (i.e., a “diplomat” function) that puts R and C++ data types in conversation and facilitates communications between those two. The templates for doubles/integers matrices are provided bycpp11armadillo
.XtX = X.t() * X
calculates the product of the transpose ofX
andX
.inv(XtX)
calculates the inverse ofXtX
.XtX_inv * X.t() * Y
calculates the OLS estimator.as_doubles_matrix()
is another template that takesbeta
, expressed as a C++ data structure, and converts it to a data structure thatcpp11
and R understand.
Certainly, the goal is to use linear algebra. This is a very simple example and you are better-off using the lm()
function from R for this particular case.
For other tasks, you are better-off with C++-side computation because C++ can address:
- Loops that cannot be easily vectorised because subsequent iterations depend on previous ones.
- Recursive functions, or problems which involve calling functions thousands/ millions of times.
- The overhead of calling a function in C++ is much lower than in R (and Python).
- Problems that require advanced data structures and algorithms that R does not provide.
- Through the Standard Template Library (STL), C++ has efficient implementations of many important data structures, from ordered maps to double-ended queues.