Moving element from a list of lists to StringVector - r

I am trying to move character elements from list to a character vector using Rcpp.
Rcpp::cppFunction('
Rcpp::StringVector foo(Rcpp::List lst) {
Rcpp::StringVector res(2);
res(0) = "aha";
Rcpp::List tmp = lst(1);
res(1) = tmp(1); // fails to compile, tried different things including tmp(1).str()
return res;
}
')
# Excpected output
films = list(list("a"), list("B", "bingo"))
foo(films)
# [1] "aha" "bingo"

Related

Setting colnames in R's cpp11

Provided that cpp11 does not provide any "sugar", we need to use attributes.
I am trying to set colnames in a C++ function, as in the next MWE
#include "cpp11.hpp"
using namespace cpp11;
// THIS WORKS
[[cpp11::register]]
doubles cpp_names(writable::doubles X) {
X.attr("names") = {"a","b"};
return X;
}
// THIS WON'T
[[cpp11::register]]
doubles_matrix<> cpp_colnames(writable::doubles_matrix<> X) {
X.attr("dimnames") = list(NULL, {"A","B"});
return X;
}
How can I pass a list with two vectors, one NULL and the other ("a","b"), that can be correctly converted to a SEXP?
In Rcpp, one would do colnames(X) = {"a","b"}.
My approach can be wrong.

Convert RcppArmadillo vector to Rcpp vector

I am trying to convert RcppArmadillo vector (e.g. arma::colvec) to a Rcpp vector (NumericVector). I know I can first convert arma::colvec to SEXP and then convert SEXP to NumericVector (e.g. as<NumericVector>(wrap(temp)), assuming temp is an arma::colvec object). But what is a good way to do that?
I want to do that simply because I am unsure if it is okay to pass arma::colvec object as a parameter to an Rcpp::Function object.
I was trying to Evaluate a Rcpp::Function with argument arma::vec, it seems that it takes the argument in four forms without compilation errors. That is, if f is a Rcpp::Function and a is a arma::vec, then
f(a)
f(wrap(a))
f(as<NumericVector>(wrap(a)))
f(NumericVector(a.begin(),a.end()))
produce no compilation and runtime errors, at least apparently.
For this reason, I have conducted a little test for the four versions of arguments. Since I suspect that somethings will go wrong in garbage collection, I test them again gctorture.
gctorture(on=FALSE)
Rcpp::sourceCpp(code = '
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
// [[Rcpp::export]]
double foo1(arma::vec a, arma::vec b, Function f){
double sum = 0.0;
for(int i=0;i<100;i++){
sum += as<double>(f(a, b));
}
return sum;
}
// [[Rcpp::export]]
double foo2(arma::vec a, arma::vec b, Function f){
double sum = 0.0;
for(int i=0;i<100;i++){
sum += as<double>(f(wrap(a),wrap(b)));
}
return sum;
}
// [[Rcpp::export]]
double foo3(arma::vec a, arma::vec b, Function f){
double sum = 0.0;
for(int i=0;i<100;i++){
sum += as<double>(f(as<NumericVector>(wrap(a)),as<NumericVector>(wrap(b))));
}
return sum;
}
// [[Rcpp::export]]
double foo4(arma::vec a, arma::vec b, Function f){
double sum = 0.0;
for(int i=0;i<100;i++){
sum += as<double>(f(NumericVector(a.begin(),a.end()),NumericVector(b.begin(),b.end())));
}
return sum;
}
')
# note that when gctorture is on, the program will be very slow as it
# tries to perfrom GC for every allocation.
# gctorture(on=TRUE)
f = function(x,y) {
mean(x) + mean(y)
}
# all three functions should return 700
foo1(c(1,2,3), c(4,5,6), f) # error
foo2(c(1,2,3), c(4,5,6), f) # wrong answer (occasionally)!
foo3(c(1,2,3), c(4,5,6), f) # correct answer
foo4(c(1,2,3), c(4,5,6), f) # correct answer
As a result, the first method produces an error, the second method produces a wrong answer and only the third and the fourth method return the correct answer.
> # they should return 700
> foo1(c(1,2,3), c(4,5,6), f) # error
Error: invalid multibyte string at '<80><a1><e2>'
> foo2(c(1,2,3), c(4,5,6), f) # wrong answer (occasionally)!
[1] 712
> foo3(c(1,2,3), c(4,5,6), f) # correct answer
[1] 700
> foo4(c(1,2,3), c(4,5,6), f) # correct answer
[1] 700
Note that, if gctorture is set FALSE, then all functions will return a correct result.
> foo1(c(1,2,3), c(4,5,6), f) # error
[1] 700
> foo2(c(1,2,3), c(4,5,6), f) # wrong answer (occasionally)!
[1] 700
> foo3(c(1,2,3), c(4,5,6), f) # correct answer
[1] 700
> foo4(c(1,2,3), c(4,5,6), f) # correct answer
[1] 700
It means that method 1 and method 2 are subjected to break when garbage is collected during runtime and we don't know when it happens. Thus, it is dangerous to not wrap the parameter properly.
Edit: as of 2017-12-05, all four conversions produce the correct result.
f(a)
f(wrap(a))
f(as<NumericVector>(wrap(a)))
f(NumericVector(a.begin(),a.end()))
and this is the benchmark
> microbenchmark(foo1(c(1,2,3), c(4,5,6), f), foo2(c(1,2,3), c(4,5,6), f), foo
3(c(1,2,3), c(4,5,6), f), foo4(c(1,2,3), c(4,5,6), f))
Unit: milliseconds
expr min lq mean median uq
foo1(c(1, 2, 3), c(4, 5, 6), f) 2.575459 2.694297 2.905398 2.734009 2.921552
foo2(c(1, 2, 3), c(4, 5, 6), f) 2.574565 2.677380 2.880511 2.731615 2.847573
foo3(c(1, 2, 3), c(4, 5, 6), f) 2.582574 2.701779 2.862598 2.753256 2.875745
foo4(c(1, 2, 3), c(4, 5, 6), f) 2.378309 2.469361 2.675188 2.538140 2.695720
max neval
4.186352 100
5.336418 100
4.611379 100
3.734019 100
And f(NumericVector(a.begin(),a.end())) is marginally faster than other methods.
This should works with arma::vec, arma::rowvec and arma::colvec:
template <typename T>
Rcpp::NumericVector arma2vec(const T& x) {
return Rcpp::NumericVector(x.begin(), x.end());
}
I had the same question. I used wrap to do the conversion at the core of several layers of for loops and it was very slow. I think the wrap function is to blame for dragging the speed down so I wish to know if there is an elegant way to do this.
As for Raymond's question, you might want to try including the namespace like: Rcpp::as<Rcpp::NumericVector>(wrap(A)) instead or include a line using namespace Rcpp; at the beginning of your code.

Passing a `data.table` to c++ functions using `Rcpp` and/or `RcppArmadillo`

Is there a way to pass a data.table objects to c++ functions using Rcpp and/or RcppArmadillo without manually transforming to data.table to a data.frame? In the example below test_rcpp(X2) and test_arma(X2) both fail with c++ exception (unknown reason).
R code
X=data.frame(c(1:100),c(1:100))
X2=data.table(X)
test_rcpp(X)
test_rcpp(X2)
test_arma(X)
test_arma(X2)
c++ functions
NumericMatrix test_rcpp(NumericMatrix X) {
return(X);
}
mat test_arma(mat X) {
return(X);
}
Building on top of other answers, here is some example code:
#include <Rcpp.h>
using namespace Rcpp ;
// [[Rcpp::export]]
double do_stuff_with_a_data_table(DataFrame df){
CharacterVector x = df["x"] ;
NumericVector y = df["y"] ;
IntegerVector z = df["v"] ;
/* do whatever with x, y, v */
double res = sum(y) ;
return res ;
}
So, as Matthew says, this treats the data.table as a data.frame (aka a Rcpp::DataFrame in Rcpp).
require(data.table)
DT <- data.table(
x=rep(c("a","b","c"),each=3),
y=c(1,3,6),
v=1:9)
do_stuff_with_a_data_table( DT )
# [1] 30
This completely ignores the internals of the data.table.
Try passing the data.table as a DataFrame rather than NumericMatrix. It is a data.frame anyway, with the same structure, so you shouldn't need to convert it.
Rcpp sits on top of native R types encoded as SEXP. This includes eg data.frame or matrix.
data.table is not native, it is an add-on. So someone who wants this (you?) has to write a converter, or provide funding for someone else to write one.
For reference, I think the good thing is to output a list from rcpp as data.table allow update via lists.
Here is a dummy example:
cCode <-
'
DataFrame DT(DTi);
NumericVector x = DT["x"];
int N = x.size();
LogicalVector b(N);
NumericVector d(N);
for(int i=0; i<N; i++){
b[i] = x[i]<=4;
d[i] = x[i]+1.;
}
return Rcpp::List::create(Rcpp::Named("b") = b, Rcpp::Named("d") = d);
';
require("data.table");
require("rcpp");
require("inline");
DT <- data.table(x=1:9,y=sample(letters,9)) #declare a data.table
modDataTable <- cxxfunction(signature(DTi="data.frame"), plugin="Rcpp", body=cCode)
DT_add <- modDataTable(DT) #here we get the list
DT[, names(DT_add):=DT_add] #here we update by reference the data.table

Efficient subsetting in Rcpp (equivalent of the R "which" command)

In Rcpp, there are various "Rcpp sugar" commands that permit nice vectorised operations in the code. In the code below I move across a data frame, break it into vectors, then use the "ifelse" and "sum" sugar commands to compute the mean of v over the rows where x equals either y or y+1. All seems to work correctly.
Just wondering if there is a neater way than this - e.g. an equivalent of the "which" command that gives index points satisfying a particular condition? There seems to be a facility available as "find" in Armadillo but that means using incompatible object types (you can't use "find" and "ifelse" together).
On the same topic, is it possible to get "ifelse" to accept a compound logical condition? In the example below, for instance, the definition of indic is formed of two "ifelse" commands, and it would obviously be cleaner as one. Any thoughts would be much appreciated.
Look forward to hearing your responses :)
require(Rcpp)
require(inline)
set.seed(42)
df = data.frame(x = rpois(1000,3), y = rpois(1000,3), v = rnorm(1000),
stringsAsFactors=FALSE)
myfunc1 = cxxfunction(
signature(DF = "data.frame"),
plugin = "Rcpp",
body = '
using namespace Rcpp;
DataFrame df(DF);
IntegerVector x = df["x"];
IntegerVector y = df["y"];
NumericVector v = df["v"];
LogicalVector indic = ifelse(x==y,true,ifelse(x==y+1,true,false));
double subsum = sum(ifelse(indic,v,0));
int subsize = sum(indic);
double mn = ((subsize>0) ? subsum/subsize : 0.0);
return(Rcpp::List::create(_["subsize"] = subsize,
_["submean"] = mn
));
'
)
myfunc1(df)
### OUTPUT:
#
# $subsize
# [1] 300
#
# $submean
# [1] 0.1091555
#
Rcpp (>= 0.10.0) implements the | operator between two logical sugar expressions. So you can do:
require( Rcpp )
cppFunction( code = '
List subsum( IntegerVector x, IntegerVector y, NumericVector v){
using namespace Rcpp ;
LogicalVector indic = (x==y) | (x==y+1) ;
int subsize = sum(indic) ;
double submean = subsize == 0 ? 0.0 : sum(ifelse(indic,v,0)) / subsize ;
return List::create( _["subsize"] = subsize, _["submean"] = submean ) ;
}
' )
subsum( rpois(1000,3), rpois(1000,3), rnorm(1000) )
# $subsize
# [1] 320
#
# $submean
# [1] -0.05708866

How to test Rcpp::CharacterVector elements for equality?

I am trying to write some simple Rcpp code examples. This is remarkably easy with the Rcpp and inline packages.
But I am stumped on how to test whether two character elements for equality. The following example compares the first elements of two character vectors. But I can't get it to compile.
What is the trick?
library(Rcpp)
library(inline)
cCode <- '
Rcpp::CharacterVector cx(x);
Rcpp::CharacterVector cy(y);
Rcpp::LogicalVector r(1);
r[0] = (cx[0] == cy[0]);
return(r);
'
cCharCompare <- cxxfunction(signature(x="character", y="character"),
plugin="Rcpp", body=cCode)
cCharCompare("a", "b")
--
The comparison using == works perfectly fine if one of the two elements is a constant. The following code compiles and gives expected results:
cCode <- '
Rcpp::CharacterVector cx(x);
Rcpp::LogicalVector r(1);
r[0] = (cx[0] == "a");
return(r);
'
cCharCompareA <- cxxfunction(signature(x="character"), plugin="Rcpp", body=cCode)
cCharCompareA("a")
[1] TRUE
cCharCompareA("b")
[1] FALSE
The equality operator has been introduced in Rcpp 0.10.4. The implementation looks like this in the string_proxy class:
bool operator==( const string_proxy& other){
return strcmp( begin(), other.begin() ) == 0 ;
}
So now we can write:
#include <Rcpp.h>
using namespace Rcpp ;
// [[Rcpp::export]]
LogicalVector test( CharacterVector x, CharacterVector y){
Rcpp::LogicalVector r(x.size());
for( int i=0; i<x.size(); i++){
r[i] = (x[i] == y[i]);
}
return(r);
}
And something similar is used on our unit tests:
> test(letters, letters)
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[16] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
Try this:
// r[0] = (cx[0] == cy[0]);
// r[0] = ((char*)cx[0] == (char*)cy[0]); <- this is wrong
r[0] = (*(char*)cx[0] == *(char*)cy[0]); // this is correct.
It is not easy to explain, but
CharacterVector is not char[].
operator [] returns StringProxy.
StringProxy is not a type of char.
StringProxy has a member operator function char* that convert StringProxy to char*.
So, maybe (char*)cx[0] is a pointer.
Now I forget many things about C++ syntax...
The reason hy the compile fails is the failure of type inference in operator overload == for StringProxy.
Very nice (technical) answer by #kohske, but here is something more C++-ish: just compare strings!
library(inline) ## implies library(Rcpp) when we use the plugin
cCode <- '
std::string cx = Rcpp::as<std::string>(x);
std::string cy = Rcpp::as<std::string>(y);
bool res = (cx == cy);
return(Rcpp::wrap(res));
'
cCharCompare <- cxxfunction(signature(x="character", y="character"),
plugin="Rcpp", body=cCode)
cCharCompare("a", "b")
If you really want to compare just the first character of the strings, then you can go from x to x.c_str() and either index its initial element, or just dereference the pointer to the first char.
A more R-ish answer could maybe sweep over actual vectors of strings...

Resources