The development of cpp11armadillo emerges from the desire to follow a simplified approach towards R and C++ integration by building on top of cpp11, a ground up rewrite of C++ bindings to R with different design trade-offs and features. cpp11armadillo aims at providing an additional layer to put the end-user focus on the computation instead of configuration (Vaughan, Hester, and François 2023).
Armadillo is a linear algebra library for the C++ language, aiming towards a good balance between speed and ease of use. It is justified in the fact that C++, in its current form, is very valuable to address bottlenecks that we find with interpreted languages such as R and Python but it does not provide data structures nor functions for linear algebra (Sanderson and Curtin 2016).
RcppArmadillo was first published to CRAN in 2010, and it allows to use Armadillo via Rcpp, a widely extended R package to call C++ functions from R (Eddelbuettel and Sanderson 2014).
The design choices in cpp11armadillo are:
These ideas reflect a comprehensive effort to provide an efficient interface for integrating C++ and R that aligns with the Tidy philosophy (Wickham et al. 2019), addressing both the technical and community-driven aspects that influence software evolution.
These choices have advantages and disadvantages. A disadvantage is that cpp11armadillo will not convert data types automatically, the user must be explicit about data types, especially when passing data from R to C++ and then exporting the final computation back to R. An advantage is that cpp11armadillo codes, including its internal templates, can be adapted to work with Python via pybind11 (Jakob, Rhinelander, and Moldovan 2016).
cpp11armadillo uses Hansen (2022) notation, meaning that matrices are column-major and vectors are expressed as column vectors (i.e., \(N\times1\) matrices).
Convention: input R matrices are denoted by x
, y
, z
, and output or intermediate C++ matrices are denoted by X
, Y
, Z
. The example functions can be called from R scripts and should have proper headers as in the following code:
#include <cpp11armadillo.hpp>
#include <cpp11.hpp>
#include <cpp11armadillo.hpp>
using namespace arma;
using namespace cpp11;
cpp11::register]] // allows using the function in R
[[
doubles_matrix<> solve_mat(doubles_matrix<> x) {double> Y = as_Mat(x); // convert from R to C++
Mat<double> Yinv = inv(Y); // Y^(-1)
Mat<return as_doubles_matrix(Yinv); // convert from C++ to R
}
This example includes the Armadillo, cpp11 and cpp11armadillo libraries, and allows interfacing C++ with R (i.e., the #include <cpp11.hpp>
). It also loads the corresponding namespaces (i.e., the using namespace cpp11
) in order to simplify the notation (i.e., using Mat
instead of arma::Mat
).
The as_Mat()
function is provided by cpp11armadillo to pass a matrix
object from R to C++ and that Armadillo can read.
The as_doubles_matrix()
function is also provided by cpp11armadillo to pass a Mat<double>
or Mat<int>
object from C++ to R.
Given a design matrix \(X\) and and outcome vector \(y\), one function to obtain the OLS estimator \(\hat{\beta} = (X^tX)^{-1}(X^tY)\) as a matrix (i.e., column vector) is:
double> ols_(const doubles_matrix<>& y, const doubles_matrix<>& x) {
Mat<double> Y = as_Mat(y); // Col<double> Y = as_Col(y); also works
Mat<double> X = as_Mat(x);
Mat<
double> XtX = X.t() * X; // X'X
Mat<double> XtX_inv = inv(XtX); // (X'X)^(-1)
Mat<double> beta = XtX_inv * X.t() * Y; // (X'X)^(-1)(X'Y)
Mat<
return beta;
}
cpp11::register]] doubles_matrix<> ols_mat(const doubles_matrix<>& y,
[[const doubles_matrix<>& x) {
double> beta = ols_(y, x);
Mat<return as_doubles_matrix(beta);
}
cpp11::register]] doubles ols_dbl(const doubles_matrix<>& y,
[[const doubles_matrix<>& x) {
double> beta = ols_(y, x);
Mat<return as_doubles(beta);
}
The ols_mat()
function receives inputs from R and calls ols_()
to do the computation on C++ side. The use of const
and &
are specific to the C++ language and allow to pass data from R to C++ while avoiding copying the data, therefore saving time and memory.
The ols_dbl()
function does the same but returns a vector instead of a matrix.
A proper benchmark is to compute eigenvalues for large matrices. Both cpp11armadillo and RcppArmadillo use Armadillo as a backend, and the marginal observed differences are because of how cpp11 and Rcpp pass data from R to C++ and viceversa. The computation times are identical.
Input | Median time cpp11armadillo | Median time RcppArmadillo |
---|---|---|
500x500 | 35.07ms | 36.4ms |
1000x1000 | 260.28ms | 263.21ms |
1500x1500 | 874.62ms | 857.31ms |
2000x2000 | 2.21s | 2.21s |
Input | Memory allocation cpp11armadillo | Memory allocation RcppArmadillo |
---|---|---|
500x500 | 17.1KB | 4.62MB |
1000x1000 | 21KB | 4.62MB |
1500x1500 | 24.9KB | 4.63MB |
2000x2000 | 28.8KB | 4.63MB |
The cpp11armadillo computation was obtained with the eigen_sym_mat()
function already shown.
The RcppArmadillo computation was obtained with the following function:
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
// [[Rcpp::export]]
const arma::mat& x) {
arma::mat eigen_sym_mat(
arma::mat y = eig_sym(x);return y;
}
In order to get the RcppArmadillo function to work, we had to dedicate time to search online about the error function 'enterRNGScope' not provided by package 'Rcpp'
, which required to include // [[Rcpp::depends(RcppArmadillo)]]
for the function to work.
The package repository includes the directory cpp11armadillotest
, which contains an R package that uses Armadillo, and that provides additional examples for eigenvalues, Cholesky and QR decomposition, and linear models.
RcppArmadillo has been and will continue to be widely successful. cpp11armadillo is a alternative templated implementation with different design trade-offs and features. Both packages can co-exist and continue to enrich the R community.
Eddelbuettel, Dirk, and Conrad Sanderson. 2014. “RcppArmadillo: Accelerating R with High-Performance C++ Linear Algebra.” Computational Statistics & Data Analysis 71 (March): 1054–63. https://doi.org/10.1016/j.csda.2013.02.005.
Hansen, Bruce. 2022. Econometrics. Princeton University Press.
Jakob, Wenzel, Jason Rhinelander, and Dean Moldovan. 2016. Pybind11 — Seamless Operability Between C++11 and Python. https://github.com/pybind/pybind11.
Sanderson, Conrad, and Ryan Curtin. 2016. “Armadillo: A Template-Based C++ Library for Linear Algebra.” Journal of Open Source Software 1 (2): 26. https://doi.org/10.21105/joss.00026.
Vaughan, Davis, Jim Hester, and Romain François. 2023. Cpp11: A C++11 Interface for R’s c Interface. https://CRAN.R-project.org/package=cpp11.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the Tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.