I need to simple wrapper to serialize arbitrary R objects from within Rcpp code. Below a simplified version of my code:
// [[Rcpp::export]]
Rcpp::RawVector cpp_serialize(RObject x) {
Rcpp::Function serialize = Rcpp::Environment::namespace_env("base")["serialize"];
return serialize(x, R_NilValue);
}
This works great, however I found that for objects of class call the call gets evaluated before being serialized. How can I prevent this from happening? I just want to mimic serialize() in R.
# Works as intended
identical(serialize(iris, NULL), cpp_serialize(iris))
# Does not work: call is evaluated
call_object <- call("rnorm", 1000)
identical(serialize(call_object, NULL), cpp_serialize(call_object))
Update: I have a workaround in place (see below) but I am still very interested in a proper solution.
Rcpp::RawVector cpp_serialize(RObject x) {
Rcpp::Environment env;
env["MY_R_OBJECT"] = x;
Rcpp::ExpressionVector expr("serialize(MY_R_OBJECT, NULL)");
Rcpp::RawVector buf = Rcpp::Rcpp_eval(expr, env);
}
I think you've found an unexpected behavior in the Rcpp::Function class. An MRE:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
RObject cpp_identity(RObject x) {
Rcpp::Function identity("identity");
return identity(x);
}
/*** R
quoted <- quote(print(1));
identity(quoted)
cpp_identity(quoted)
*/
gives
> quoted <- quote(print(1));
> identity(quoted)
print(1)
> cpp_identity(quoted)
[1] 1
[1] 1
This happens because Rcpp effectively performs this evaluation behind the scenes:
Rcpp_eval(Rf_lang2(Rf_install("identity"), x))
which is basically like
eval(call("identity", quoted))
but the call object is not 'protected' from evaluation.
tl;dr: The question was How does one serialize to Raw vectors from C? The (compiled C) function serializeToRaw() in the RApiSerialization package providing R's own serialization code. As the benchmark below shows, it is about three times faster than what was suggested above.
Longer Answer: I would not recommend mucking around with Rcpp::Function() for this.. We do in fact provide a proper package for R which access to serialization: RApiSerialization. It does not do much, but it exports exactly two function to serialize, and deserialize, from and to RAW which the RcppRedis package needs and uses.
So we can do the same here. I just called Rcpp.package.skeleton() to have a package 'jeroen' created, added the LinkingTo: and Imports: to DESCRIPTION and the imports() to NAMESPACE, and then this works:
#include <Rcpp.h>
#include <RApiSerializeAPI.h> // provides C API with serialization
// [[Rcpp::export]]
Rcpp::RawVector cpp_serialize(SEXP s) {
Rcpp::RawVector x = serializeToRaw(s); // from RApiSerialize
return x;
}
It is basically a simpler version of what you have above.
And we can call that as you do:
testJeroen <- function() {
## Works as intended
res <- identical(serialize(iris, NULL), cpp_serialize(iris))
## Didn't work above, works now
call_object <- call("rnorm", 1000)
res <- res &&
identical(serialize(call_object, NULL), cpp_serialize(call_object))
res
}
and lo and behold, it works:
R> library(jeroen)
Loading required package: RApiSerialize
R> testJeroen()
[1] TRUE
R>
So in short: if you don't want to muck with R, don't work with Rcpp::Function() objects.
Benchmark: Using a simple
library(jeroen) # package containing both functions from here
library(microbenchmark)
microbenchmark(cpp=cpp_serialize(iris), # my suggestion
env=env_serialize(iris)) # OP's suggestion, renamed
we get
edd#max:/tmp/jeroen$ Rscript tests/quick.R
Loading required package: RApiSerialize
Unit: microseconds
expr min lq mean median uq max neval cld
cpp 17.471 22.1225 28.0987 24.4975 26.4795 420.001 100 a
env 85.028 91.0055 94.8772 92.9465 94.9635 236.710 100 b
edd#max:/tmp/jeroen$
showing that the answer by OP is nearly three times slower.
Related
I was wondering whether it is possible to make an infix function, e.g. A %o% B with Rcpp.
I know that this is possible using the inline package, but have yet been able to find a method for doing this when using sourceCpp().
I have made the following infix implementation of %o% / outer() when arguments are sure to be vectors using RcppEigen and inline:
`%op%` <- cxxfunction(signature(v1="NumericVector",
v2="NumericVector"),
plugin = "RcppEigen",
body = c("
NumericVector xx(v1);
NumericVector yy(v2);
const Eigen::Map<Eigen::VectorXd> x(as<Eigen::Map<Eigen::VectorXd> >(xx));
const Eigen::Map<Eigen::VectorXd> y(as<Eigen::Map<Eigen::VectorXd> >(yy));
Eigen::MatrixXd op = x * y.transpose();
return Rcpp::wrap(op);
"))
This can easily be implemented in to be imported using sourceCpp(), however not as an infix function.
My current attempt is as follows:
#include <Rcpp.h>
using namespace Rcpp;
#include <RcppEigen.h>
// [[Rcpp::depends(RcppEigen)]]
// [[Rcpp::export]]
NumericMatrix outerProd(NumericVector v1, NumericVector v2) {
NumericVector xx(v1);
NumericVector yy(v2);
const Eigen::Map<Eigen::VectorXd> x(as<Eigen::Map<Eigen::VectorXd> >(xx));
const Eigen::Map<Eigen::VectorXd> y(as<Eigen::Map<Eigen::VectorXd> >(yy));
Eigen::MatrixXd op = x * y.transpose();
return Rcpp::wrap(op);
}
So to summarize my question.. Is it possible to make an infix function available through sourceCpp?
Is it possible to make an infix function available through sourceCpp?
Yes.
As always, one should read the Rcpp vignettes!
In particular here, if you look in Section 1.6 of the Rcpp attributes vignette, you'd see you can modify the name of a function using the name parameter for Rcpp::export.
For example, we could do:
#include <Rcpp.h>
// [[Rcpp::export(name = `%+%`)]]
Rcpp::NumericVector add(Rcpp::NumericVector x, Rcpp::NumericVector y) {
return x + y;
}
/*** R
1:3 %+% 4:6
*/
Then we'd get:
Rcpp::sourceCpp("~/infix-test.cpp")
> 1:3 %+% 4:6
[1] 5 7 9
So, you still have to name C++ functions valid C++ names in the code, but you can export it to R through the name parameter of Rcpp::export without having to do anything further on the R side.
John Chambers states three principles on page four of the (highly recommended) "Extending R" book:
Everything that exists in R is an object.
Everything that happens in R is a function call.
Interfaces to other software are part of R.
So per point two, you can of course use sourceCpp() to create your a compiled function and hang that at any odd infix operator you like.
Code Example
library(Rcpp)
cppFunction("std::string cc(std::string a, std::string b) { return a+b; }")
`%+%` <- function(a,b) cc(a,b)
cc("Hello", "World")
"hello" %+% "world"
Output
R> library(Rcpp)
R> cppFunction("std::string cc(std::string a, std::string b) { return a+b; }")
R> `%+%` <- function(a,b) cc(a,b)
R>
R> cc("Hello", "World")
[1] "HelloWorld"
R>
R> "hello" %+% "world"
[1] "helloworld"
R>
Summary
Rcpp is really just one cog in the machinery.
Edit
It also works with your initial function, with some minor simplification. For
`%op%` <- cppFunction("Eigen::MatrixXd op(Eigen::VectorXd x, Eigen::VectorXd y) { Eigen::MatrixXd op = x * y.transpose(); return op; }", depends="RcppEigen")
as.numeric(1:3) %op% as.numeric(3:1)
we get
R> `%op%` <- cppFunction("Eigen::MatrixXd op(Eigen::VectorXd x, Eigen::VectorXd y) { Eigen::MatrixXd op = x * y.transpose(); return op; }", depends="RcppEigen")
R> as.numeric(1:3) %op% as.numeric(3:1)
[,1] [,2] [,3]
[1,] 3 2 1
[2,] 6 4 2
[3,] 9 6 3
R>
(modulo some line noise from the compiler).
I've written a function in Rcpp and compiled it with inline. Now, I want to run it in parallel on different cores, but I'm getting a strange error. Here's a minimal example, where the function funCPP1 can be compiled and runs well by itself, but cannot be called by snow's clusterCall function. The function runs well as a single process, but gives the following error when ran in parallel:
Error in checkForRemoteErrors(lapply(cl, recvResult)) :
2 nodes produced errors; first error: NULL value passed as symbol address
And here is some code:
## Load and compile
library(inline)
library(Rcpp)
library(snow)
src1 <- '
Rcpp::NumericMatrix xbem(xbe);
int nrows = xbem.nrow();
Rcpp::NumericVector gv(g);
for (int i = 1; i < nrows; i++) {
xbem(i,_) = xbem(i-1,_) * gv[0] + xbem(i,_);
}
return xbem;
'
funCPP1 <- cxxfunction(signature(xbe = "numeric", g="numeric"),body = src1, plugin="Rcpp")
## Single process
A <- matrix(rnorm(400), 20,20)
funCPP1(A, 0.5)
## Parallel
cl <- makeCluster(2, type = "SOCK")
clusterExport(cl, 'funCPP1')
clusterCall(cl, funCPP1, A, 0.5)
Think it through -- what does inline do? It creates a C/C++ function for you, then compiles and links it into a dynamically-loadable shared library. Where does that one sit? In R's temp directory.
So you tried the right thing by shipping the R frontend calling that shared library to the other process (which has another temp directory !!), but that does not get the dll / so file there.
Hence the advice is to create a local package, install it and have both snow processes load and call it.
(And as always: better quality answers may be had on the rcpp-devel list which is read by more Rcpp constributors than SO is.)
Old question, but I stumbled across it while looking through the top Rcpp tags so maybe this answer will be of use still.
I think Dirk's answer is proper when the code you've written is fully de-bugged and does what you want, but it can be a hassle to write a new package for such as small piece of code like in the example. What you can do instead is export the code block, export a "helper" function that compiles source code and run the helper. That'll make the CXX function available, then use another helper function to call it. For instance:
# Snow must still be installed, but this functionality is now in "parallel" which ships with base r.
library(parallel)
# Keep your source as an object
src1 <- '
Rcpp::NumericMatrix xbem(xbe);
int nrows = xbem.nrow();
Rcpp::NumericVector gv(g);
for (int i = 1; i < nrows; i++) {
xbem(i,_) = xbem(i-1,_) * gv[0] + xbem(i,_);
}
return xbem;
'
# Save the signature
sig <- signature(xbe = "numeric", g="numeric")
# make a function that compiles the source, then assigns the compiled function
# to the global environment
c.inline <- function(name, sig, src){
library(Rcpp)
funCXX <- inline::cxxfunction(sig = sig, body = src, plugin="Rcpp")
assign(name, funCXX, envir=.GlobalEnv)
}
# and the function which retrieves and calls this newly-compiled function
c.namecall <- function(name,...){
funCXX <- get(name)
funCXX(...)
}
# Keep your example matrix
A <- matrix(rnorm(400), 20,20)
# What are we calling the compiled funciton?
fxname <- "TestCXX"
## Parallel
cl <- makeCluster(2, type = "PSOCK")
# Export all the pieces
clusterExport(cl, c("src1","c.inline","A","fxname"))
# Call the compiler function
clusterCall(cl, c.inline, name=fxname, sig=sig, src=src1)
# Notice how the function now named "TestCXX" is available in the environment
# of every node?
clusterCall(cl, ls, envir=.GlobalEnv)
# Call the function through our wrapper
clusterCall(cl, c.namecall, name=fxname, A, 0.5)
# Works with my testing
I've written a package ctools (shameless self-promotion) which wraps up a lot of the functionality that is in the parallel and Rhpc packages for cluster computing, both with PSOCK and MPI. I already have a function called "c.sourceCpp" which calls "Rcpp::sourceCpp" on every node in much the same way as above. I'm going to add in a "c.inlineCpp" which does the above now that I see the usefulness of it.
Edit:
In light of Coatless' comments, the Rcpp::cppFunction() in fact negates the need for the c.inline helper here, though the c.namecall is still needed.
src2 <- '
NumericMatrix TestCpp(NumericMatrix xbe, int g){
NumericMatrix xbem(xbe);
int nrows = xbem.nrow();
NumericVector gv(g);
for (int i = 1; i < nrows; i++) {
xbem(i,_) = xbem(i-1,_) * gv[0] + xbem(i,_);
}
return xbem;
}
'
clusterCall(cl, Rcpp::cppFunction, code=src2, env=.GlobalEnv)
# Call the function through our wrapper
clusterCall(cl, c.namecall, name="TestCpp", A, 0.5)
I resolved it by sourcing on each cluster cluster node an R file with the wanted C inline function:
clusterEvalQ(cl,
{
library(inline)
invisible(source("your_C_func.R"))
})
And your file your_C_func.R should contain the C function definition:
c_func <- cfunction(...)
I am wondering if I can apply lgamma on all entries of a large matrix using Rcpp. I tried using a vector:
// lgammaRcpp.cpp
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector lgammaRcpp(NumericVector v){
NumericVector out;
out = lgamma(v);
return(out);
}
I did a simple microbenchmarking:
library("microbenchmark")
x <- round(runif(100000)+50000);
microbenchmark(
lgammaRcpp(x),
lgamma(x)
)
and the Rcpp is slightly faster:
Unit: milliseconds
expr min lq mean median uq max neval
lgammaRcpp(x) 5.405556 5.416283 5.810254 5.436139 5.511993 8.650419 100
lgamma(x) 5.613717 5.628769 6.114942 5.644215 6.872677 9.947497 100
When I try using a "NumericMatrix", however:
// [[Rcpp::export]]
NumericMatrix lgammaRcpp(NumericMatrix v){
NumericMatrix out;
out = lgamma(v);
return(out);
}
there are errors that I don't understand, e.g.
/home/canghel/R/x86_64-pc-linux-gnu-library/3.4/Rcpp/include/Rcpp/vector /Matrix.h:83:13: note: Rcpp::Matrix<RTYPE, StoragePolicy>& Rcpp::Matrix<RTYPE, StoragePolicy>::operator=(const Rcpp::Matrix<RTYPE, StoragePolicy>&) [with int RTYPE = 14; StoragePolicy = Rcpp::PreserveStorage]
Matrix& operator=(const Matrix& other) {
My questions are: 1) Is there a way to modify my function to apply lgamma over all entries to a matrix? and 2) Is it worth it, or is the underlying library that is called for the lgamma function the same for C++ and R?
It seems better (i.e. faster) to apply functions like lgamma/digamma to a matrix using the Rfast package.
library("microbenchmark");
library("RcppArmadillo");
library("Rfast");
sourceCpp("lgammaRcpp.cpp");
x <- matrix(round(runif(100000)+50000), 100, 1000);
microbenchmark(
lgammaRcpp(x),
lgamma(x),
Rfast::Lgamma(x)
)
Unit: milliseconds
expr min lq mean median uq max neval
lgammaRcppArma(x) 4.654526 4.919831 5.577843 5.413790 5.888895 9.258325 100
lgamma(x) 5.572671 5.840268 6.582007 6.131651 7.280895 8.779301 100
Rfast::Lgamma(x) 4.450824 4.588596 5.128323 4.791287 5.608678 6.865331 100
where I had:
#include<RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
arma::mat lgammaRcpp(arma::mat m) {
arma::mat out = lgamma(m);
return(out);
}
Rcpp Sugar tends to return Vectors unless otherwise specified. Thus, you will always get back in this case a Vector of type Numeric e.g. NumericVector. See my notes on different sugar functions here: https://github.com/coatless/rcpp-api
The following allows for a compilation under the above note:
#include <Rcpp.h>
// [[Rcpp::export]]
NumericVector lgammaRcpp(NumericMatrix v) {
NumericVector out;
out = lgamma(v);
return(out);
}
It is highly unlikely you will see a large speed up as the functions being used are the same. This is partially indicated with your above benchmarks and can be verified by looking at Rcpp Math defines. Now, this isn't to say a benefit is not available. In particular, the main benefit here is if you are encapsulating a routine completely in C++. In which case, your routine will be significantly quicker if you use Sugar functions if compared to calling an R function from C++.
In his "Advanced R" book, Hadley Wickham says "noNA(x) asserts that the vector x does not contain any missing values." However I still don't know how to use it. I can't do
if (noNA(x))
do this
so how am I supposed to use it?
http://adv-r.had.co.nz/Rcpp.html#rcpp-sugar
Many of the Rcpp sugar expressions are implemented through template classes which have specializations for cases when the input object is known to be free of missing values, thereby allowing the underlying algorithm to avoid having to perform the extra work of dealing with NA values (e.g. calls to is_na). This is only possible because the VectorBase class has a boolean parameter indicating whether the underlying object can (can, not that it necessarily does) have NA values, or not.
noNA returns (when called on a VectorBase object) an instance of the Nona template class. Note that Nona itself derives from
Rcpp::VectorBase<RTYPE, false, Nona<RTYPE,NA,VECTOR>>
// ^^^^^
meaning that the returned object gets encoded with information that essentially says "you can assume that this data is free of NA values".
As an example, Rcpp::sum is implemented via the Sum class in the Rcpp::sugar namespace. In the default case, we see that there is extra work to manage the possibility of missing values:
STORAGE get() const {
STORAGE result = 0 ;
R_xlen_t n = object.size() ;
STORAGE current ;
for( R_xlen_t i=0; i<n; i++){
current = object[i] ;
if( Rcpp::traits::is_na<RTYPE>(current) ) // here
return Rcpp::traits::get_na<RTYPE>() ; // here
result += current ;
}
return result ;
}
On the other hand, there is also a specialization for cases when the input does not have missing values, in which the algorithm does less work:
STORAGE get() const {
STORAGE result = 0 ;
R_xlen_t n = object.size() ;
for( R_xlen_t i=0; i<n; i++){
result += object[i] ;
}
return result ;
}
To answer your question of "how do I apply this in practice?", here is an example:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
int Sum(IntegerVector x) {
return sum(x);
}
// [[Rcpp::export]]
int SumNoNA(IntegerVector x) {
return sum(noNA(x));
}
Benchmarking these two functions,
set.seed(123)
x <- as.integer(rpois(1e6, 25))
all.equal(Sum(x), SumNoNA(x))
# [1] TRUE
microbenchmark::microbenchmark(
Sum(x),
SumNoNA(x),
times = 500L
)
# Unit: microseconds
# expr min lq mean median uq max neval
# Sum(x) 577.386 664.620 701.2422 677.1640 731.7090 1214.447 500
# SumNoNA(x) 454.990 517.709 556.5783 535.1935 582.7065 1138.426 500
the noNA version is indeed faster.
I've written a function in Rcpp and compiled it with inline. Now, I want to run it in parallel on different cores, but I'm getting a strange error. Here's a minimal example, where the function funCPP1 can be compiled and runs well by itself, but cannot be called by snow's clusterCall function. The function runs well as a single process, but gives the following error when ran in parallel:
Error in checkForRemoteErrors(lapply(cl, recvResult)) :
2 nodes produced errors; first error: NULL value passed as symbol address
And here is some code:
## Load and compile
library(inline)
library(Rcpp)
library(snow)
src1 <- '
Rcpp::NumericMatrix xbem(xbe);
int nrows = xbem.nrow();
Rcpp::NumericVector gv(g);
for (int i = 1; i < nrows; i++) {
xbem(i,_) = xbem(i-1,_) * gv[0] + xbem(i,_);
}
return xbem;
'
funCPP1 <- cxxfunction(signature(xbe = "numeric", g="numeric"),body = src1, plugin="Rcpp")
## Single process
A <- matrix(rnorm(400), 20,20)
funCPP1(A, 0.5)
## Parallel
cl <- makeCluster(2, type = "SOCK")
clusterExport(cl, 'funCPP1')
clusterCall(cl, funCPP1, A, 0.5)
Think it through -- what does inline do? It creates a C/C++ function for you, then compiles and links it into a dynamically-loadable shared library. Where does that one sit? In R's temp directory.
So you tried the right thing by shipping the R frontend calling that shared library to the other process (which has another temp directory !!), but that does not get the dll / so file there.
Hence the advice is to create a local package, install it and have both snow processes load and call it.
(And as always: better quality answers may be had on the rcpp-devel list which is read by more Rcpp constributors than SO is.)
Old question, but I stumbled across it while looking through the top Rcpp tags so maybe this answer will be of use still.
I think Dirk's answer is proper when the code you've written is fully de-bugged and does what you want, but it can be a hassle to write a new package for such as small piece of code like in the example. What you can do instead is export the code block, export a "helper" function that compiles source code and run the helper. That'll make the CXX function available, then use another helper function to call it. For instance:
# Snow must still be installed, but this functionality is now in "parallel" which ships with base r.
library(parallel)
# Keep your source as an object
src1 <- '
Rcpp::NumericMatrix xbem(xbe);
int nrows = xbem.nrow();
Rcpp::NumericVector gv(g);
for (int i = 1; i < nrows; i++) {
xbem(i,_) = xbem(i-1,_) * gv[0] + xbem(i,_);
}
return xbem;
'
# Save the signature
sig <- signature(xbe = "numeric", g="numeric")
# make a function that compiles the source, then assigns the compiled function
# to the global environment
c.inline <- function(name, sig, src){
library(Rcpp)
funCXX <- inline::cxxfunction(sig = sig, body = src, plugin="Rcpp")
assign(name, funCXX, envir=.GlobalEnv)
}
# and the function which retrieves and calls this newly-compiled function
c.namecall <- function(name,...){
funCXX <- get(name)
funCXX(...)
}
# Keep your example matrix
A <- matrix(rnorm(400), 20,20)
# What are we calling the compiled funciton?
fxname <- "TestCXX"
## Parallel
cl <- makeCluster(2, type = "PSOCK")
# Export all the pieces
clusterExport(cl, c("src1","c.inline","A","fxname"))
# Call the compiler function
clusterCall(cl, c.inline, name=fxname, sig=sig, src=src1)
# Notice how the function now named "TestCXX" is available in the environment
# of every node?
clusterCall(cl, ls, envir=.GlobalEnv)
# Call the function through our wrapper
clusterCall(cl, c.namecall, name=fxname, A, 0.5)
# Works with my testing
I've written a package ctools (shameless self-promotion) which wraps up a lot of the functionality that is in the parallel and Rhpc packages for cluster computing, both with PSOCK and MPI. I already have a function called "c.sourceCpp" which calls "Rcpp::sourceCpp" on every node in much the same way as above. I'm going to add in a "c.inlineCpp" which does the above now that I see the usefulness of it.
Edit:
In light of Coatless' comments, the Rcpp::cppFunction() in fact negates the need for the c.inline helper here, though the c.namecall is still needed.
src2 <- '
NumericMatrix TestCpp(NumericMatrix xbe, int g){
NumericMatrix xbem(xbe);
int nrows = xbem.nrow();
NumericVector gv(g);
for (int i = 1; i < nrows; i++) {
xbem(i,_) = xbem(i-1,_) * gv[0] + xbem(i,_);
}
return xbem;
}
'
clusterCall(cl, Rcpp::cppFunction, code=src2, env=.GlobalEnv)
# Call the function through our wrapper
clusterCall(cl, c.namecall, name="TestCpp", A, 0.5)
I resolved it by sourcing on each cluster cluster node an R file with the wanted C inline function:
clusterEvalQ(cl,
{
library(inline)
invisible(source("your_C_func.R"))
})
And your file your_C_func.R should contain the C function definition:
c_func <- cfunction(...)