Rcpp template class for custom input and output - r

I'm trying to create C++ function which will apply any funtion on R vector of any type. I've been reading and searching for an answer but my knowledge is still too chaotic and I can't put everything together. I was inspired by sapply example and some from Rcpp gallery but it's to advanced so far.
What I've already done is simple class which kinda-works, but I'm having problem even with this one. Error happens when I'm trying to call function which returns something else than numeric. However I don't know how to extend function to work with custom output type.
At this point, I don't know how:
Obtain type of R Function return value and use this type do define out - same size as x
Alternatively - use std::string type argument in apply_fun which could switch OUTTYPE
Pass any x to the class and use in f - I think I've managed this correctly with <int XTYPE>
Perhaps answer to this question might be to complex so I appreciate all hints. Below I present current progress. Thanks!
Rcpp class
#include <Rcpp.h>
namespace apply {
template <int XTYPE>
class SomeClass {
private:
Rcpp::Vector<XTYPE> x;
Rcpp::Function f;
public:
Rcpp::Vector<XTYPE> run() {
typedef typename Rcpp::traits::storage_type<XTYPE>::type STORAGE;
int n = x.size();
Rcpp::Vector<XTYPE> out(n);
for (unsigned int i{0}; i < n; i++) {
out(i) = Rcpp::as<STORAGE>(f(x(i)));
// Rcpp::Rcout << out(i) << std::endl;
}
return out;
}
SomeClass (Rcpp::Vector<XTYPE> x, Rcpp::Function f)
: x{x}, f{f} {
std::cout << "Initialized SomeClass" << std::endl;
}
};
}
Exported Rcpp function
//' #export
//[[Rcpp::export]]
Rcpp::RObject apply_fun(Rcpp::RObject x,
Rcpp::Function f) {
if (TYPEOF(x) == INTSXP) {
apply::SomeClass<13> r{Rcpp::as<Rcpp::IntegerVector>(x), f};
return r.run();
} else if (TYPEOF(x) == REALSXP) {
apply::SomeClass<14> r{Rcpp::as<Rcpp::NumericVector>(x), f};
return r.run();
} else if (TYPEOF(x) == STRSXP) {
apply::SomeClass<16> r{Rcpp::as<Rcpp::CharacterVector>(x), f};
return r.run();
} else if (TYPEOF(x) == LGLSXP) {
apply::SomeClass<10> r{Rcpp::as<Rcpp::LogicalVector>(x), f};
return r.run();
} else if (TYPEOF(x) == CPLXSXP) {
apply::SomeClass<15> r{Rcpp::as<Rcpp::ComplexVector>(x), f};
return r.run();
} else {
Rcpp::stop("Invalid data type - only integer, numeric, character, factor, date, logical, complex vectors are possible.");
}
return R_NilValue;
}
R calls
apply_fun(c(1.5, 2.5, 3.5), f = function(x) { x + 10})
# 11.5 12.5 13.5
apply_fun(letters[1:3], f = function(x) paste(x, "-"))
# Error in apply_run(letters[1:3], f = function(x) x) :
# Evaluation error: unimplemented type 'char' in 'eval'

Related

Error in .Call(tXPMCpp, x) : first argument must be a string (of length 1) or native symbol reference

I've following R Code
tXPM <- function(x) {
.Call(tXPMCpp, x)
}
and following RCPP code
SEXP tXPMCpp (SEXP xSEXP){
arma::mat GeneExp = Rcpp::as<arma::mat>(xSEXP);
arma::rowvec ColumnSums = sum(GeneExp, 0);
int_fast32_t i=0, n=0;
arma::mat::iterator it_end = GeneExp.end();
//One pass linear regression with one pass variance, skewness
for (arma::mat::iterator it = GeneExp.begin(); it != it_end; ++it) {
//std::cout << (*it) << std::endl;
*it = *it/ColumnSums.at(i);
n++;
if (n == int(GeneExp.n_rows)) {
n=0;
i++;
}
}
return Rcpp::wrap(trans(GeneExp));}
When i call this function by passing it datamatrix it shows following error:
Error in .Call(tXPMCpp, x) : first argument must be a string (of length 1) or native symbol reference
How can i call this function?

QtConcurrent::map segmentation fault

When I have been trying to implement "parallel for" using QtConcurrent::map:
QFuture<void> parForAsync(size_t n, std::function<void (size_t)> Op)
{
size_t nThreads =
static_cast<size_t>(QThreadPool::globalInstance()->maxThreadCount());
size_t nn = n/nThreads + 1;
using Sequence = QVector<std::function<void()>>;
Sequence vFuns;
for(size_t i = 0; i < n; i+=nn)
{
size_t firstIdx = i,
lastIdx = i + nn > n ? n : i + nn;
vFuns.push_back([=]()->void
{
for(size_t i = firstIdx; i < lastIdx; ++i)
{
Op(i);
}
});
}
return QtConcurrent::map<Sequence> //<-Segmentation fault!
(vFuns, [](std::function<void()> f)
{
f();
});
}
I've got segmentation fault in this place:
template<typename _Res, typename... _ArgTypes>
function<_Res(_ArgTypes...)>::
function(const function& __x)
: _Function_base()
{
if (static_cast<bool>(__x))
{
__x._M_manager(_M_functor, __x._M_functor, __clone_functor); //<-Segmentation fault!
_M_invoker = __x._M_invoker;
_M_manager = __x._M_manager;
}
}
Why is this happening? It seems that std::function had passed checking. How can I make this code working?
Thanks in advance!
I cannot reproduce your case but I can give you some example to illustrate issue
QFuture<void> test ()
{
QVector<int> v; // LOCAL VARIABLE IN SCOPE OF test FUNCTION
// preparing v vector
QFuture<void> f = QtConcurrent::map(v,someFunction); // returns immediately
return f;
}
[1] QtConcurrent::map takes v by reference NOT BY COPY.
[2] QtConcurrent::map returns immediately.
[3] So when test function ends, parallel operations started by map use v vector which was deleted because it is local variable in test function.
You can use waitForFinished for QFuture but then your function doesn't make sense because it blocks until parallel task ends.

Rcpp memory management

I am trying to convert some character data to numeric as below. The data will come with special caracters so I have to get them out. I convert the data to std:string to search for the special caracters. Dos it creates a new variable in memory? I want to know if there is a better way to do it.
NumericVector converter_ra_(Rcpp::RObject x){
if(x.sexp_type() == STRSXP){
CharacterVector y(x);
NumericVector resultado(y.size());
for(unsigned int i = 0; i < y.size(); i++){
std::string ra_string = Rcpp::as<std::string>(y[i]);
//std::cout << ra_string << std::endl;
double t = 0;
int base = 0;
for(int j = (int)ra_string.size(); j >= 0; j--){
if(ra_string[j] >= 48 && ra_string[j] <= 57){
t += ((ra_string[j] - '0') * base_m[base]);
base++;
}
}
//std::cout << t << std::endl;
resultado[i] = t;
}
return resultado;
}else if(x.sexp_type() == REALSXP){
return NumericVector(x);
}
return NumericVector();
}
Does it creates a new variable in memory?
If the input object actually is a numeric vector (REALSXP) and you are simply returning, e.g. as<NumericVector>(input), then no additional variables are created. In any other case new memory will, of course, need to be allocated for the returned object. For example,
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector demo(RObject x) {
if (x.sexp_type() == REALSXP) {
return as<NumericVector>(x);
}
return NumericVector::create();
}
/*** R
y <- rnorm(3)
z <- letters[1:3]
data.table::address(y)
# [1] "0x6828398"
data.table::address(demo(y))
# [1] "0x6828398"
data.table::address(z)
# [1] "0x68286f8"
data.table::address(demo(z))
# [1] "0x5c7eea0"
*/
I want to know if there is a better way to do it.
First you need to define "better":
Faster?
Uses less memory?
Fewer lines of code?
More idiomatic?
Personally, I would start with the last definition since it often entails one or more of the others. For example, in this approach we
Define a function object Predicate that relies on the standard library function isdigit rather than trying to implement this locally
Define another function object that uses the erase-remove idiom to eliminate characters as determined by Predicate; and if necessary, uses std::atoi to convert what remains into a double (again, instead of trying to implement this ourselves)
Uses an Rcpp idiom -- the as converter -- to convert the STRSXP to a std::vector<std::string>
Calls std::transform to convert this into the result vector
#include <Rcpp.h>
using namespace Rcpp;
struct Predicate {
bool operator()(char c) const
{ return !(c == '.' || std::isdigit(c)); }
};
struct Converter {
double operator()(std::string s) const {
s.erase(
std::remove_if(s.begin(), s.end(), Predicate()),
s.end()
);
return s.empty() ? NA_REAL : std::atof(s.c_str());
}
};
// [[Rcpp::export]]
NumericVector convert(RObject obj) {
if (obj.sexp_type() == REALSXP) {
return as<NumericVector>(obj);
}
if (obj.sexp_type() != STRSXP) {
return NumericVector::create();
}
std::vector<std::string> x = as<std::vector<std::string> >(obj);
NumericVector res(x.size(), NA_REAL);
std::transform(x.begin(), x.end(), res.begin(), Converter());
return res;
}
Testing this for minimal functionality,
x <- c("123 4", "abc 1567.35 def", "abcdef", "")
convert(x)
# [1] 1234.00 1567.35 NA NA
(y <- rnorm(3))
# [1] 1.04201552 -0.08965042 -0.88236960
convert(y)
# [1] 1.04201552 -0.08965042 -0.88236960
convert(list())
# numeric(0)
Will this be as performant as something hand-written by a seasoned C or C++ programmer? Almost certainly not. However, since we used library functions and common idioms, it is reasonably concise, likely to be bug-free, and the intention is fairly evident even at a quick glance. If you need something faster then there are probably a handful of optimizations to be made, but there's no need to begin on that premise without benchmarking and profiling first.

Should SEXP function args be PROTECTed when put inside an Rcpp::Xptr?

Look at the (oversimplified) Rcpp + R code below :
test.cpp :
#include <Rcpp.h>
using namespace Rcpp;
class VecWrap{
public:
SEXP vector;
int type;
VecWrap(SEXP vector)
{
this->vector = vector;
this->type = TYPEOF(vector);
if(this->type != INTSXP && this->type != REALSXP)
stop("invalid type");
}
bool contains(double val){
if(type == INTSXP){
IntegerVector v = vector;
for(int i = 0; i < v.size(); i++)
if(v[i] == val)
return true;
}else if(type == REALSXP){
NumericVector v = vector;
for(int i = 0; i < v.size(); i++)
if(v[i] == val)
return true;
}
return false;
}
};
// [[Rcpp::export]]
SEXP createVecWrap(SEXP x) {
VecWrap* w = new VecWrap(x);
return XPtr< VecWrap >(w);
}
// [[Rcpp::export]]
SEXP vecWrapContains(XPtr< VecWrap > w, double val){
return wrap(w->contains(val));
}
test.R :
library(Rcpp)
sourceCpp(file='test.cpp')
v <- 1:10e7
w <- createVecWrap(v)
vecWrapContains(w, 10000) # it works
# remove v and call the garbage collector
rm(v)
gc()
vecWrapContains(w, 10000) # R crashes (but it works with small vector "v")
Basically I put inside the custom class VecWrap the SEXP vector received as argument of createVecWrap function, in order to use it later.
But, as explained by the comments in the code, if I remove the vector v from the R-side and call the garbage collector, the R process crashes when I try to access the vector.
Should the vector be protected by the GC in someway ? If so, how? (Rcpp-style if possible)
Generally speaking you should try to stick to the C++ type system / Rcpp classes as much as possible (re: avoid handling SEXP directly if possible). However, the RObject class will provide your SEXP with protection from the garbage collector, and seems to work in this case:
#include <Rcpp.h>
class VecWrap {
public:
Rcpp::RObject vector;
int type;
VecWrap(SEXP vector_)
: vector(vector_)
{
type = vector.sexp_type();
if (type != INTSXP && type != REALSXP) {
Rcpp::stop("invalid type");
}
}
bool contains(double val) {
if (type == INTSXP){
Rcpp::IntegerVector v = Rcpp::as<Rcpp::IntegerVector>(vector);
for (int i = 0; i < v.size(); i++) {
if (v[i] == val) return true;
}
} else if (type == REALSXP) {
Rcpp::NumericVector v = Rcpp::as<Rcpp::NumericVector>(vector);
for (int i = 0; i < v.size(); i++) {
if (v[i] == val) return true;
}
}
return false;
}
};
// [[Rcpp::export]]
Rcpp::XPtr<VecWrap> createVecWrap(SEXP x) {
return Rcpp::XPtr<VecWrap>(new VecWrap(x));
}
// [[Rcpp::export]]
bool vecWrapContains(Rcpp::XPtr<VecWrap> w, double val) {
return w->contains(val);
}
v <- 1:10e7
w <- createVecWrap(v)
vecWrapContains(w, 10000)
# [1] TRUE
rm(v)
gc()
# used (Mb) gc trigger (Mb) max used (Mb)
# Ncells 366583 19.6 750400 40.1 460000 24.6
# Vcells 100559876 767.3 145208685 1107.9 100560540 767.3
vecWrapContains(w, 10000)
# [1] TRUE
Unrelated: consider using { } for your control flow structures, and don't get carried away with this->; both of those will improve the readability of your code IMO.

Rcpp_eval causes segfault when passing unevaluated arguments

I want to evaluate variable inside Rcpp function
SEXP foo(SEXP arg) {
SEXP x = NULL;
try {
x = Rcpp_eval(arg, Environment::global_env());
} catch(...) {
printf("Error\n");
}
return x;
}
If arg is initialized in .GlobalEnv it seems fine.
x <- 1
foo(substitute(x))
But if arg is not initialized in .GlobalEnv segfault occurs
foo(substitute(y))
What am I doing wrong? Or it's a problem in Rcpp?
Set x to R_NilValue to return R's NULL on error, rather than a NULL pointer. Presumably you don't want to use printf() to handle the error. I guess you meant
x = Rcpp_eval(arg, Environment::global_env());
(arg rather than mode).
SEXP foo(SEXP arg) {
SEXP x = R_NilValue;
try {
x = Rcpp_eval(arg, Environment::global_env());
} catch(...) {
printf("Error\n");
}
return x;
}

Resources