I'm trying to run something like
R
my_r_function <- function(input_a) {return(input_a**3)}
RunFunction(c(1,2,3), my_r_function)
CPP
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector RunFunction(NumericVector a, Function func)
{
NumericVector b = NumericVector(a.size());
for(int i=0; i<a.size(); i++)
b[i] = func(a[i]);
return b;
}
How would I make "Function func" actually work in Rcpp?
P.S. I understand there are ways to do this without Rcpp (apply comes to mind for this example) but I'm just using this as an example to demonstrate what I'm looking for.
You should be able to use the example in the link I provided above to get your code working; but you should also take note of Dirk's warning,
Calling a function is simple and tempting. It is also slow as there
are overheads involved. And calling it repeatedly from inside your C++
code, possibly buried within several loops, is outright silly.
which can be demonstrated by modifying your above code slightly and benchmarking the two versions:
#include <Rcpp.h>
// [[Rcpp::export]]
Rcpp::NumericVector RunFunction(Rcpp::NumericVector a, Rcpp::Function func)
{
Rcpp::NumericVector b = func(a);
return b;
}
// [[Rcpp::export]]
Rcpp::NumericVector RunFunction2(Rcpp::NumericVector a, Rcpp::Function func)
{
Rcpp::NumericVector b(a.size());
for(int i = 0; i < a.size(); i++){
b[i] = Rcpp::as<double>(func(a[i]));
}
return b;
}
/*** R
my_r_function <- function(input_a) {return(input_a**3)}
x <- 1:10
##
RunFunction(x,my_r_function)
RunFunction2(x,my_r_function)
##
library(microbenchmark)
microbenchmark(
RunFunction(rep(1:10,10),my_r_function),
RunFunction2(rep(1:10,10),my_r_function))
Unit: microseconds
expr min lq mean median uq max neval
RunFunction(rep(1:10, 10), my_r_function) 21.390 22.9985 25.74988 24.0840 26.464 43.722 100
RunFunction2(rep(1:10, 10), my_r_function) 843.864 903.0025 1048.13175 951.2405 1057.899 2387.550 100
*/
Notice that RunFunction is ~40x faster than RunFunction2: in the former we only incur the overhead of calling func from inside the C++ code once, whereas in the latter case we have to make the exchange for each element of the input vector. If you tried running this on even longer vectors, I'm sure you would see a substantially worse performance from RunFunction2 relative to RunFunction. So, if you are going to be calling R functions from inside of your C++ code, you should try to take advantage of R's native vectorization (if possible) rather than repeatedly making calls to the R function in a loop, at least for reasonably simple calculations like x**3.
Also, if you were wondering why your code wasn't compiling, it was because of this line:
b[i] = func(a[i]);
You were presumably getting the error
cannot convert ‘SEXP’ to ‘Rcpp::traits::storage_type<14>::type {aka
double}’ in assignment
which I resolved by wrapping the return value of func(a[i]) in Rcpp::as<double>() above. However, this clearly isn't worth the trouble because you end up with a much slower function overall anyhow.
You can use 'transform()' and avoid using loops! Try the following code:
List RunFunction(List input, Function f) {
List output(input.size());
std::transform(input.begin(), input.end(), output.begin(), f);
output.names() = input.names();
}
Related
I commonly work with a short Rcpp function that takes as input a matrix where each row contains K probabilities that sum to 1. The function then randomly samples for each row an integer between 1 and K corresponding to the provided probabilities. This is the function:
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadilloExtensions/sample.h>
using namespace Rcpp;
// [[Rcpp::export]]
IntegerVector sample_matrix(NumericMatrix x, IntegerVector choice_set) {
int n = x.nrow();
IntegerVector result(n);
for ( int i = 0; i < n; ++i ) {
result[i] = RcppArmadillo::sample(choice_set, 1, false, x(i, _))[0];
}
return result;
}
I recently updated R and all packages. Now I cannot compile this function anymore. The reason is not clear to me. Running
library(Rcpp)
library(RcppArmadillo)
Rcpp::sourceCpp("sample_matrix.cpp")
throws the following error:
error: call of overloaded 'sample(Rcpp::IntegerVector&, int, bool, Rcpp::Matrix<14>::Row)' is ambiguous
This basically tells me that my call to RcppArmadillo::sample() is ambiguous. Can anyone enlighten me as to why this is the case?
There are two things happening here, and two parts to your problem and hence the answer.
The first is "meta": why now? Well we had a bug let in the sample() code / setup which Christian kindly fixed for the most recent RcppArmadillo release (and it is all documented there). In short, the interface for the very probability argument giving you trouble here was changed as it was not safe for re-use / repeated use. It is now.
Second, the error message. You didn't say what compiler or version you use but mine (currently g++-9.3) is actually pretty helpful with the error. It is still C++ so some interpretative dance is needed but in essence it clearly stating you called with Rcpp::Matrix<14>::Row and no interface is provided for that type. Which is correct. sample() offers a few interface, but none for a Row object. So the fix is, once again, simple. Add a line to aid the compiler by making the row a NumericVector and all is good.
Fixed code
#include <RcppArmadillo.h>
#include <RcppArmadilloExtensions/sample.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
// [[Rcpp::export]]
IntegerVector sample_matrix(NumericMatrix x, IntegerVector choice_set) {
int n = x.nrow();
IntegerVector result(n);
for ( int i = 0; i < n; ++i ) {
Rcpp::NumericVector z(x(i, _));
result[i] = RcppArmadillo::sample(choice_set, 1, false, z)[0];
}
return result;
}
Example
R> Rcpp::sourceCpp("answer.cpp") # no need for library(Rcpp)
R>
I have a C function from a down-stream library that I call in C like this
result = cfunction(input_function)
input_function is a callback that needs to have the following structure
double input_function(const double &x)
{
return(x*x);
}
Where x*x is a user-defined computation that is usually much more complicated. I'd like to wrap cfunction using Rcpp so that the R user could call it on arbitrary R functions.
NumericVector rfunction(Function F){
NumericVector result(1);
// MAGIC THAT I DON'T KNOW HOW TO DO
// SOMEHOW TURN F INTO COMPATIBLE input_funcion
result[0] = cfunction(input_function);
return(result);
}
The R user then might do rfunction(function(x) {x*x}) and get the right result.
I am aware that calling R functions within cfunction will kill the speed but I figure that I can figure out how to pass compiled functions later on. I'd just like to get this part working.
The closest thing I can find that does what I need is this https://sites.google.com/site/andrassali/computing/user-supplied-functions-in-rcppgsl which wraps a function that uses callback that has an oh-so-useful second parameter within which I could stuff the R function.
Advice would be gratefully received.
One possible solution would be saving the R-function into a global variable and defining a function that uses that global variable. Example implementation where I use an anonymous namespace to make the variable known only within the compilation unit:
#include <Rcpp.h>
extern "C" {
double cfunction(double (*input_function)(const double&)) {
return input_function(42);
}
}
namespace {
std::unique_ptr<Rcpp::Function> func;
}
double input_function(const double &x) {
Rcpp::NumericVector result = (*func)(x);
return result(0);
}
// [[Rcpp::export]]
double rfunction(Rcpp::Function F){
func = std::make_unique<Rcpp::Function>(F);
return cfunction(input_function);
}
/*** R
rfunction(sqrt)
rfunction(log)
*/
Output:
> Rcpp::sourceCpp('57137507/code.cpp')
> rfunction(sqrt)
[1] 6.480741
> rfunction(log)
[1] 3.73767
I am translating my R code with some prepared functions to RcppArmadillo. I want to use some of these functions directly in my Rcpp code,instead of translating. For example, I want to call the sigma2 function:
sigma2<- function(xi.vec,w.vec,log10lambda,n,q){
lambda <- 10^log10lambda
(1/(n-q))*sum((lambda*xi.vec*(w.vec^2))/(lambda*xi.vec+1))
}
A typical Rcpp code is as below:
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
SEXP myS(){
Rcpp::Environment myEnv = Rcpp::Environment::global_env();
Rcpp::Function myS = myEnv["sigma2"];
arma::vec xvec = myEnv["xi.vec"];
arma::vec wvec = myEnv["w.vec"];
double l = myEnv["log10lambda"];
int n = myEnv["n"];
int q = myEnv["q"];
return myS(Rcpp::Named("xi.vec",xvec),
Rcpp::Named("w.vec",wvec),
Rcpp::Named("l",l),
Rcpp::Named("n",n),
Rcpp::Named("q",q));
}
Of course it works. But my problem is that in my case, the parameters of sigma2 function should be defined before as output of another function(say func1) in RcppArmadillo and they have armadillo data type. For instance, xi.vec and w.vec have vec type. Now I want to know how can I modified this code to call sigma2? Do I need to change my environment?
First, just say no to embedding R functions and environments into C++ routines. There is no speedup in this case; only a considerable slowdown. Furthermore, there is a greater potential for things to go cockeye if the variables are not able to be retrieved in the global.env scope.
In your case, you seem to be calling myS() from within myS() with no terminating condition. Thus, your function will never end.
e.g.
SEXP myS(){
Rcpp::Function myS = myEnv["sigma2"];
return myS(Rcpp::Named("xi.vec",xvec),
Rcpp::Named("w.vec",wvec),
Rcpp::Named("l",l),
Rcpp::Named("n",n),
Rcpp::Named("q",q));
}
Switch one to be myS_R and myS_cpp.
Regarding environment hijacking, you would need to pass down to C++ the values. You cannot reach into an R function to obtain values specific passed to it before it is called.
e.g.
SEXP myS_cpp(arma::vec xvec, arma::vec wvec, double l, int n, int q){
// code here
}
I have an R code with a bunch of user-defined R functions. I'm trying to make the code run faster and of course the best option is to use Rcpp. My code involves functions that call each other. Therefore, If I write some functions in C++, I should be able to call and to run some of my R functions in my c++ code. In a simple example consider the code below in R:
mySum <- function(x, y){
return(2*x + 3*y)
}
x <<- 1
y <<- 1
Now consider the C++ code in which I'm trying to access the function above:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
int mySuminC(){
Environment myEnv = Environment::global_env();
Function mySum = myEnv["mySum"];
int x = myEnv["x"];
int y = myEnv["y"];
return wrap(mySum(Rcpp::Named("x", x), Rcpp::Named("y", y)));
}
When I source the file in R with the inline function sourceCpp(), I get the error:
"invalid conversion from 'SEXPREC*' to int
Could anyone help me on debugging the code? Is my code efficient? Can it be summarized? Is there any better idea to use mySum function than what I did in my code?
Thanks very much for your help.
You declare that the function should return an int, but use wrap which indicates the object returned should be a SEXP. Moreover, calling an R function from Rcpp (through Function) also returns a SEXP.
You want something like:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
SEXP mySuminC(){
Environment myEnv = Environment::global_env();
Function mySum = myEnv["mySum"];
int x = myEnv["x"];
int y = myEnv["y"];
return mySum(Rcpp::Named("x", x), Rcpp::Named("y", y));
}
(or, leave function return as int and use as<int> in place of wrap).
That said, this is kind of non-idiomatic Rcpp code. Remember that calling R functions from C++ is still going to be slow.
I want to pass a large matrix to a RcppArmadillo function (about 30,000*30,000) and have the feeling that this passing alone eats up all the performance gains. The question was also raised here with the suggested to solution to use advanced constructors with the copy_aux_mem = false argument. This seems to be a good solution also because I only need to read rows from the matrix without changing anything. I am having problems implementing the solution correctly though. This is probably just a simply syntax question.
Here is my current set-up of the function call (simplified, of course):
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
arma::vec test(arma::mat M) {
return(M.row(0))
}
this is pretty slow with large a matrix M (e.g. M=matrix(rnorm(30000*30000), nrow=30000, ncol=30000). So I would like to use an advanced constructor as documented here. The syntax is mat(aux_mem*, n_rows, n_cols, copy_aux_mem = true, strict = true) and copy_aux_mem should be set to false to 'pass-by-reference'. I just not sure about the syntax in the function definition. How do I use this in arma::vec test(arma::mat M) {?
This has been discussed extensively in the Rcpp mailing list. See this thread. The solution that has been implemented in RcppArmadillo is to pass the arma::mat by reference. Internally this will call the advanced constructor for you.
So with this version, you would do something like this:
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
arma::vec test(const arma::mat& M) {
// do whatever with M
...
}
And the data from the R matrix is not copied but rather borrowed. More details in the thread.
Here are some benchmarks comparing the time it takes to copy or pass by reference:
expr min lq median uq max neval
arma_test_value(m) 3540.369 3554.4105 3572.3305 3592.5795 4168.671 100
arma_test_ref(m) 4.046 4.3205 4.7770 15.5855 16.671 100
arma_test_const_ref(m) 3.994 4.3660 5.5125 15.7355 34.874 100
With these functions:
#include <RcppArmadillo.h>
using namespace Rcpp ;
// [[Rcpp::depends("RcppArmadillo")]]
// [[Rcpp::export]]
void arma_test_value( arma::mat x){}
// [[Rcpp::export]]
void arma_test_ref( arma::mat& x){}
// [[Rcpp::export]]
void arma_test_const_ref( const arma::mat& x){}
With the CRAN version of RcppArmadillo, you would use that sort of syntax:
void foo( NumericMatrix x_ ){
arma::mat M( x_.begin(), x_.nrow(), x_.ncol(), false ) ;
// do whatever with M
}
This has been used in many places, including several articles in the Rcpp gallery.