Rcpp function with mmap argument? - r

I wish to do calculations on elements of a vector using Rcpp, but the vector is getting so large (~60 GB) that I'm resorting to memory mapping it using the mmap package, but now it's the wrong type for my Rcpp function. Can this be overcome?
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
double testRcpp(NumericVector input, int index) {
return input(index);
}
/*** R
writeBin(seq(0,1,1e-6),"test.bin")
bigvector1 <- seq(0,1,1e-6)
bigvector2 <- mmap("test.bin",mode=double())
testRcpp(bigvector1,3)
testRcpp(bigvector2,3) #"Not compatible with requested type: [type=environment; target=double]"
*/

Since the mmap function in r returns an object with type=environment write bigvector2[] instead of bigvector2 to use its elements. Basically replace
testRcpp(bigvector2,3)
to
testRcpp(bigvector2[],3)
If you want to try using mmap in the cpp part of Rcpp in windows you can use my repo from https://github.com/CoderRC/libmingw32_extended .

Related

How do I resolve compile error in RcppParallel function which points to an RcppParallel header file

I'm trying to speed up numeric computation in my R code using RcppParallel and am attempting to edit an example that uses the Cpp sqrt() function to take the square root of each element of a matrix. My edited code replaces matrices with vectors and multiplies the sqrt() by a constant. (In actual use I will have 3 constants and my own operator function.)
The example comes from
https://gallery.rcpp.org/articles/parallel-matrix-transform/
The compiler identifies the error as in the 'algorithm' file on a comment line:
Line 7 no matching function for call to object of type 'SquareRootPlus::sqrtWrapper'
This is my initial attempt to use RcppParallel and I've not used
Cpp for several years.
Edit: running macOS Ventura on apple silicon,
Rcpp ver 1.0.10,
RcppParallel ver 5.1.6,
and R version 4.2.1 (2022-06-23) -- "Funny-Looking Kid"
It would be called like this (if it compiled):
where c is a numerical constant aka a double and res is a numerical vector
res <- parallelMatrixSqrt(someNumericalVector, c)
My testing code is:
#include <Rcpp.h>
#include <RcppParallel.h>
using namespace RcppParallel;
using namespace Rcpp;
struct SquareRootPlus : public Worker
{
// source vector etc
const RVector<double> input;
const double constParam;
// destination vector
RVector<double> output;
// initialize with source and destination
// get the data type correctly unless auto promoted/cast
SquareRootPlus(const Rcpp::NumericVector input, const double constParam,
Rcpp::NumericVector output)
: input(input), constParam(constParam), output(output) {}
struct sqrt_wrapper { // describe worker function
public: double operator()(double a, double cp) {
return ::sqrt(a) * cp;
}
};
// take the square root of the range of elements requested
// (and multiply it by the constant)
void operator()(std::size_t begin, std::size_t end) {
std::transform(input.begin() + begin,
input.begin() + end,
output.begin() + begin,
sqrt_wrapper());
}
};
// public called routine
// [[Rcpp::export]]
Rcpp::NumericVector paralleVectorSqrt(Rcpp::NumericVector x, double c) {
// allocate the output matrix
Rcpp::NumericVector output(x.length());
// SquareRoot functor (pass input and output matrixes)
SquareRootPlus squareRoot(x, c, output);
// call parallelFor to do the work
parallelFor(0, x.length(), squareRoot);
// return the output matrix
return output;
}
That still works fine for me (Ubuntu 22.10, g++-12) -- modulo same warnings we often get from libraries like Boost, and here now from the include TBB library (and the repo should have a newer one so you can try that).
I just did (straight from the Rcpp Gallery source directory):
> library(Rcpp)
> sourceCpp("2014-06-29-parallel-matrix-transform.cpp")
In file included from /usr/local/lib/R/site-library/RcppParallel/include/tbb/tbb.h:41,
from /usr/local/lib/R/site-library/RcppParallel/include/RcppParallel/TBB.h:10,
from /usr/local/lib/R/site-library/RcppParallel/include/RcppParallel.h:21,
from 2014-06-29-parallel-matrix-transform.cpp:59:
/usr/local/lib/R/site-library/RcppParallel/include/tbb/concurrent_hash_map.h:343:23: warning: ‘template<class _Category, class _Tp, class _Distance, class _Pointer, class _Reference> struct std::iterator’ is dep
recated [-Wdeprecated-declarations]
[... more like this omitted for brevity ...]
> # allocate a matrix
> m <- matrix(as.numeric(c(1:1000000)), nrow = 1000, ncol = 1000)
> # ensure that serial and parallel versions give the same result
> stopifnot(identical(matrixSqrt(m), parallelMatrixSqrt(m)))
> # compare performance of serial and parallel
> library(rbenchmark)
> res <- benchmark(matrixSqrt(m),
+ parallelMatrixSqrt(m),
+ order="relative")
> res[,1:4]
test replications elapsed relative
2 parallelMatrixSqrt(m) 100 0.496 1.000
1 matrixSqrt(m) 100 0.565 1.139
>
and as you can see it not only builds but also runs the example call from R.
You would have to give us more detail about how you call it and what OS and package versions you use. And I won't have time now to dig into your code and do a code review for you but given that (still relatively simple) reference example works maybe you can reduce your currently-not-working approach down to something simpler that works.
Edit Your example appears to have switched from a unary function to one with two arguments in the signature. Sadly it ain't that easy. The fuller error message is (on my side with g++-12)
/usr/include/c++/12/bits/stl_algo.h:4263:31: error: no match for call to ‘(SquareRootPlus::sqrt_wrapper) (const double&)’
4263 | *__result = __unary_op(*__first);
| ~~~~~~~~~~^~~~~~~~~~
question.cpp:25:20: note: candidate: ‘double SquareRootPlus::sqrt_wrapper::operator()(double, double)’
25 | public: double operator()(double a, double cp) {
| ^~~~~~~~
question.cpp:25:20: note: candidate expects 2 arguments, 1 provided
So you need to rework / extend the example framework for this.
Edit 2: The gory details about std::transform() and its unary function are e.g. here at cppreference.com.
Edit 3: Building on the previous comment, when you step back a bit and look at what is happening here you may seen that RcppParellel excels at parceling up a large data structure, then submitting all the piece in parallel and finally reassemble the result. That still works. You simply cannot apply for 'richer signature function' via std::transform(). No more, no less. You need to work the guts of work which applies your function to the chunk it sees. Check the other RcppParallel examples for inspiration.

call r script from c++ gives cannot convert Rcpp::CharacterVector to const char*

Is there a way to call an r script from C++ ?
I have an rscript , for example:
myScript.R:
runif(100)
I want to execute this script from C++ and pass the result.
I tried:
#include <Rcpp.h>
#include <iostream>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector loadFile(CharacterVector inFile){
NumericVector one = system(inFile);
return one;
}
inFile : "C:/Program Files/R/R-3.4.2/bin/x64/Rscript C:/Rscripts/myScript.R"
but it gives me :
cannot convert Rcpp::CharacterVector (aka Rcpp::Vector<16>} to const char* for argument 1 to int system(const char*)
Usual convention is to use Rcpp so as to write expensive C++ code in R.
If you would like to invoke an R script from within c++, and then work with the result, one approach would be be to make use of the popen function.
To see how to convert a rcpp character to std::string see Converting element of 'const Rcpp::CharacterVector&' to 'std::string'
Example:
std::string r_execute = "C:/Program Files/R/R-3.4.2/bin/x64/Rscript C:/Rscripts/myScript.R"
FILE *fp = popen(r_execute ,"r");
You should be able to read the result of the operation from the file stream.

Rcpp Eigen accessing a row from a SparseMatrix

I am using a sparse matrix from Eigen as a data structure for some of my functions. My function will iterate over the rows of a sparse matrix and perform some operations. I am trying to access the row of a sparse matrix as a SparseVector, but for some reason that causes R to crash.
#include <RcppEigen.h>
// [[Rcpp::depends(RcppEigen)]]
using Eigen::Map;
using Eigen::MatrixXd;
using Eigen::VectorXd;
using Eigen::SparseMatrix;
using Eigen::MappedSparseMatrix;
using Eigen::SparseVector;
using Eigen::Triplet;
using namespace Rcpp;
using namespace Eigen;
// [[Rcpp::export]]
Eigen::SparseVector<double> getSparseVec(const Eigen::MappedSparseMatrix<double>& X) {
return X.row(1);
}
The return type of .row() seems to be a ConstRowXpr, but I don't know how to get that into the form of a SparseVector, and I've been unable to find the doc page for ConstRowXpr

calling function from package in rcpp code [duplicate]

This question already has an answer here:
calling a user-defined R function from C++ using Rcpp
(1 answer)
Closed 6 years ago.
I have a package X in R. The package has a function foo(). I want to call the function foo() in a cpp file (using Rcpp). Is it possible?
#include <Rcpp.h>
void function01() {
// call foo() from package X ??
}
This is sort of a duplicate. Though, the majority of cases do not involve calling from a user defined package.
As a result, the mold to use is:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
void function01(){
// Obtain environment containing function
Rcpp::Environment package_env("package:package_name_here");
// Make function callable from C++
Rcpp::Function rfunction = package_env["function_name"];
// Call the function and receive output (might not be list)
Rcpp::List test_out = rfunction(....);
}

calling a user-defined R function from C++ using Rcpp

I have an R code with a bunch of user-defined R functions. I'm trying to make the code run faster and of course the best option is to use Rcpp. My code involves functions that call each other. Therefore, If I write some functions in C++, I should be able to call and to run some of my R functions in my c++ code. In a simple example consider the code below in R:
mySum <- function(x, y){
return(2*x + 3*y)
}
x <<- 1
y <<- 1
Now consider the C++ code in which I'm trying to access the function above:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
int mySuminC(){
Environment myEnv = Environment::global_env();
Function mySum = myEnv["mySum"];
int x = myEnv["x"];
int y = myEnv["y"];
return wrap(mySum(Rcpp::Named("x", x), Rcpp::Named("y", y)));
}
When I source the file in R with the inline function sourceCpp(), I get the error:
"invalid conversion from 'SEXPREC*' to int
Could anyone help me on debugging the code? Is my code efficient? Can it be summarized? Is there any better idea to use mySum function than what I did in my code?
Thanks very much for your help.
You declare that the function should return an int, but use wrap which indicates the object returned should be a SEXP. Moreover, calling an R function from Rcpp (through Function) also returns a SEXP.
You want something like:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
SEXP mySuminC(){
Environment myEnv = Environment::global_env();
Function mySum = myEnv["mySum"];
int x = myEnv["x"];
int y = myEnv["y"];
return mySum(Rcpp::Named("x", x), Rcpp::Named("y", y));
}
(or, leave function return as int and use as<int> in place of wrap).
That said, this is kind of non-idiomatic Rcpp code. Remember that calling R functions from C++ is still going to be slow.

Resources