How to test Rcpp::CharacterVector elements for equality? - r

I am trying to write some simple Rcpp code examples. This is remarkably easy with the Rcpp and inline packages.
But I am stumped on how to test whether two character elements for equality. The following example compares the first elements of two character vectors. But I can't get it to compile.
What is the trick?
library(Rcpp)
library(inline)
cCode <- '
Rcpp::CharacterVector cx(x);
Rcpp::CharacterVector cy(y);
Rcpp::LogicalVector r(1);
r[0] = (cx[0] == cy[0]);
return(r);
'
cCharCompare <- cxxfunction(signature(x="character", y="character"),
plugin="Rcpp", body=cCode)
cCharCompare("a", "b")
--
The comparison using == works perfectly fine if one of the two elements is a constant. The following code compiles and gives expected results:
cCode <- '
Rcpp::CharacterVector cx(x);
Rcpp::LogicalVector r(1);
r[0] = (cx[0] == "a");
return(r);
'
cCharCompareA <- cxxfunction(signature(x="character"), plugin="Rcpp", body=cCode)
cCharCompareA("a")
[1] TRUE
cCharCompareA("b")
[1] FALSE

The equality operator has been introduced in Rcpp 0.10.4. The implementation looks like this in the string_proxy class:
bool operator==( const string_proxy& other){
return strcmp( begin(), other.begin() ) == 0 ;
}
So now we can write:
#include <Rcpp.h>
using namespace Rcpp ;
// [[Rcpp::export]]
LogicalVector test( CharacterVector x, CharacterVector y){
Rcpp::LogicalVector r(x.size());
for( int i=0; i<x.size(); i++){
r[i] = (x[i] == y[i]);
}
return(r);
}
And something similar is used on our unit tests:
> test(letters, letters)
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[16] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

Try this:
// r[0] = (cx[0] == cy[0]);
// r[0] = ((char*)cx[0] == (char*)cy[0]); <- this is wrong
r[0] = (*(char*)cx[0] == *(char*)cy[0]); // this is correct.
It is not easy to explain, but
CharacterVector is not char[].
operator [] returns StringProxy.
StringProxy is not a type of char.
StringProxy has a member operator function char* that convert StringProxy to char*.
So, maybe (char*)cx[0] is a pointer.
Now I forget many things about C++ syntax...
The reason hy the compile fails is the failure of type inference in operator overload == for StringProxy.

Very nice (technical) answer by #kohske, but here is something more C++-ish: just compare strings!
library(inline) ## implies library(Rcpp) when we use the plugin
cCode <- '
std::string cx = Rcpp::as<std::string>(x);
std::string cy = Rcpp::as<std::string>(y);
bool res = (cx == cy);
return(Rcpp::wrap(res));
'
cCharCompare <- cxxfunction(signature(x="character", y="character"),
plugin="Rcpp", body=cCode)
cCharCompare("a", "b")
If you really want to compare just the first character of the strings, then you can go from x to x.c_str() and either index its initial element, or just dereference the pointer to the first char.
A more R-ish answer could maybe sweep over actual vectors of strings...

Related

Is there any way in which to make an Infix function using sourceCpp()

I was wondering whether it is possible to make an infix function, e.g. A %o% B with Rcpp.
I know that this is possible using the inline package, but have yet been able to find a method for doing this when using sourceCpp().
I have made the following infix implementation of %o% / outer() when arguments are sure to be vectors using RcppEigen and inline:
`%op%` <- cxxfunction(signature(v1="NumericVector",
v2="NumericVector"),
plugin = "RcppEigen",
body = c("
NumericVector xx(v1);
NumericVector yy(v2);
const Eigen::Map<Eigen::VectorXd> x(as<Eigen::Map<Eigen::VectorXd> >(xx));
const Eigen::Map<Eigen::VectorXd> y(as<Eigen::Map<Eigen::VectorXd> >(yy));
Eigen::MatrixXd op = x * y.transpose();
return Rcpp::wrap(op);
"))
This can easily be implemented in to be imported using sourceCpp(), however not as an infix function.
My current attempt is as follows:
#include <Rcpp.h>
using namespace Rcpp;
#include <RcppEigen.h>
// [[Rcpp::depends(RcppEigen)]]
// [[Rcpp::export]]
NumericMatrix outerProd(NumericVector v1, NumericVector v2) {
NumericVector xx(v1);
NumericVector yy(v2);
const Eigen::Map<Eigen::VectorXd> x(as<Eigen::Map<Eigen::VectorXd> >(xx));
const Eigen::Map<Eigen::VectorXd> y(as<Eigen::Map<Eigen::VectorXd> >(yy));
Eigen::MatrixXd op = x * y.transpose();
return Rcpp::wrap(op);
}
So to summarize my question.. Is it possible to make an infix function available through sourceCpp?
Is it possible to make an infix function available through sourceCpp?
Yes.
As always, one should read the Rcpp vignettes!
In particular here, if you look in Section 1.6 of the Rcpp attributes vignette, you'd see you can modify the name of a function using the name parameter for Rcpp::export.
For example, we could do:
#include <Rcpp.h>
// [[Rcpp::export(name = `%+%`)]]
Rcpp::NumericVector add(Rcpp::NumericVector x, Rcpp::NumericVector y) {
return x + y;
}
/*** R
1:3 %+% 4:6
*/
Then we'd get:
Rcpp::sourceCpp("~/infix-test.cpp")
> 1:3 %+% 4:6
[1] 5 7 9
So, you still have to name C++ functions valid C++ names in the code, but you can export it to R through the name parameter of Rcpp::export without having to do anything further on the R side.
John Chambers states three principles on page four of the (highly recommended) "Extending R" book:
Everything that exists in R is an object.
Everything that happens in R is a function call.
Interfaces to other software are part of R.
So per point two, you can of course use sourceCpp() to create your a compiled function and hang that at any odd infix operator you like.
Code Example
library(Rcpp)
cppFunction("std::string cc(std::string a, std::string b) { return a+b; }")
`%+%` <- function(a,b) cc(a,b)
cc("Hello", "World")
"hello" %+% "world"
Output
R> library(Rcpp)
R> cppFunction("std::string cc(std::string a, std::string b) { return a+b; }")
R> `%+%` <- function(a,b) cc(a,b)
R>
R> cc("Hello", "World")
[1] "HelloWorld"
R>
R> "hello" %+% "world"
[1] "helloworld"
R>
Summary
Rcpp is really just one cog in the machinery.
Edit
It also works with your initial function, with some minor simplification. For
`%op%` <- cppFunction("Eigen::MatrixXd op(Eigen::VectorXd x, Eigen::VectorXd y) { Eigen::MatrixXd op = x * y.transpose(); return op; }", depends="RcppEigen")
as.numeric(1:3) %op% as.numeric(3:1)
we get
R> `%op%` <- cppFunction("Eigen::MatrixXd op(Eigen::VectorXd x, Eigen::VectorXd y) { Eigen::MatrixXd op = x * y.transpose(); return op; }", depends="RcppEigen")
R> as.numeric(1:3) %op% as.numeric(3:1)
[,1] [,2] [,3]
[1,] 3 2 1
[2,] 6 4 2
[3,] 9 6 3
R>
(modulo some line noise from the compiler).

Boolean AND and OR selected rows/columns in R, without creation of a temporary copy?

I have an extremely large matrix full of boolean TRUEs and FALSEs. I need to check certain column combinations to find rows where either all of the specified columns are true, or (in some cases) any of the specified columns are true.
I can do it using apply() and all():
> toymat <- matrix(sample(c(F,T),50,rep=T),5,10)
> toymat[,c(1,5,6)]
[,1] [,2] [,3]
[1,] TRUE FALSE FALSE
[2,] FALSE FALSE TRUE
[3,] TRUE FALSE FALSE
[4,] TRUE TRUE FALSE
[5,] FALSE FALSE TRUE
> apply(toymat[, c(1,5,6)],1,all)
[1] FALSE FALSE FALSE FALSE FALSE
But if I invoke apply with a function that would change a value, it seems to be passing by value, not passing by reference. In other words it's creating a temporary copy of "toymat[, c(1,5,6)]" to run apply on (which would not be desirable, because the actual matrix is huge and the code will be doing this many times).
Is there a way I can AND or OR together an arbitrary number of selected columns or selected rows without a temporary copy being created?
This is a perfect use case for Rcpp. Just use:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
IntegerVector rowsums_bool(const LogicalMatrix& x,
const IntegerVector& ind_col) {
int i, j, j2, n = x.nrow(), m = ind_col.size();
IntegerVector res(n);
for (j = 0; j < m; j++) {
j2 = ind_col[j] - 1;
for (i = 0; i < n; i++) {
if (x(i, j2)) res[i]++;
}
}
return res;
}
/*** R
toymat <- matrix(sample(c(F,T),50,rep=T),5,10)
toymat[,c(1,5,6)]
(tmp <- rowsums_bool(toymat, c(1,5,6)))
tmp == 3 ## ALL
tmp != 0 ## ANY
*/

rcpp: how to apply gamma function to a scalar?

I guess the function gamma only works for a vector as the input. Is there a way to apply it to a scalar, say,gamma(3)`?
Actually, I would get the correct output if I include gamma(3) as part of my code, but there's a warning message....
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
List fool(NumericVector vec){
double res = 0;
res = sum(gamma(vec)) + gamma(3);
List result;result["num"] = res;
return result;
}
Here is the warning messgae:
exp.cpp:7:27: warning: 'gamma' is deprecated: first deprecated in OS X 10.9 [-Wdeprecated-declarations]
res = sum(gamma(vec)) + gamma(3);
^
/usr/include/math.h:720:15: note: 'gamma' has been explicitly marked deprecated here
extern double gamma(double) __OSX_AVAILABLE_BUT_DEPRECATED(__MAC_10_0, __MAC_10_9, __IPHONE_NA, __IPHONE_NA);
^
1 warning generated.
Thanks for posting code. You fell victim of being careless with namespaces. There is a (vectorized) gamma() in the Rcpp namespace -- the first argument and there is (was) a scalar gamma() (preferred: tgamma()) in the C math library. And it is better to be explicit.
Corrected code below:
#include <Rcpp.h>
// [[Rcpp::export]]
double fool(Rcpp::NumericVector vec){
double res =
Rcpp::sum(Rcpp::gamma(vec)) + // Rcpp sugar sum() and gamma() on vector
::tgamma(3.0); // math library tgamma of double
return res;
}
/*** R
v <- 1:5
fool(v)
sum(gamma(v)) + gamma(3)
*/
Output
R> sourceCpp("/tmp/shuang.cpp")
R> v <- 1:5
R> fool(v)
[1] 36
R> sum(gamma(v)) + gamma(3)
[1] 36
R>

Cumany function applied on NA values

I have the following vector:
x <- c(FALSE,FALSE,NA,TRUE,FALSE)
I use the cumany() function to see if there is at least one TRUE value within a window of the first element up to each element in the vector or in other words in the window [1, 1:length(x)].
library(dplyr)
cumany(x)
[1] FALSE FALSE NA NA NA
The output surprises me. I would expect the cumany function to work as following
for(i in 1:length(x)){
print(any(x[1:i]))
}
Therefore I would expect an output as following
[1] FALSE FALSE NA TRUE TRUE
How is the cumany() function defined when it comes to NA values?
Update:
This was a bug in previous dplyr versions and has been corrected. Just update the package if you have the same problem.
To answer the question about how is implemented we need to dive into the implementation, which is done in C++.
As you can see below, the vector is initialized with NAs, however there is a crucial line of code which propagates an information if at least one TRUE value was met before NAs
out[i] = current || out[i - 1];
There is a brief discussion about expected behaviour on GitHub.
If your result is different from what you expected than there is a high chance that you need to update the dplyr package.
For more implementation details see this code below:
LogicalVector cumany(LogicalVector x) {
int n = x.length();
LogicalVector out(n, NA_LOGICAL);
int current = out[0] = x[0];
if (current == NA_LOGICAL) return out;
if (current == TRUE) {
std::fill(out.begin(), out.end(), TRUE);
return out;
}
for (int i = 1; i < n; i++) {
current = x[i];
if (current == NA_LOGICAL) break;
if (current == TRUE) {
std::fill(out.begin() + i, out.end(), TRUE);
break;
}
out[i] = current || out[i - 1];
}
return out;
}
One option would be to replace the NA with FALSE, do the cumany and use | to get the original NA fill the position
cumany(replace(x, is.na(x), FALSE))|x
#[1] FALSE FALSE NA TRUE TRUE
To rewrite it in all base R,
Reduce(any, x, accumulate = TRUE) | x
#> [1] FALSE FALSE NA TRUE TRUE

Efficient subsetting in Rcpp (equivalent of the R "which" command)

In Rcpp, there are various "Rcpp sugar" commands that permit nice vectorised operations in the code. In the code below I move across a data frame, break it into vectors, then use the "ifelse" and "sum" sugar commands to compute the mean of v over the rows where x equals either y or y+1. All seems to work correctly.
Just wondering if there is a neater way than this - e.g. an equivalent of the "which" command that gives index points satisfying a particular condition? There seems to be a facility available as "find" in Armadillo but that means using incompatible object types (you can't use "find" and "ifelse" together).
On the same topic, is it possible to get "ifelse" to accept a compound logical condition? In the example below, for instance, the definition of indic is formed of two "ifelse" commands, and it would obviously be cleaner as one. Any thoughts would be much appreciated.
Look forward to hearing your responses :)
require(Rcpp)
require(inline)
set.seed(42)
df = data.frame(x = rpois(1000,3), y = rpois(1000,3), v = rnorm(1000),
stringsAsFactors=FALSE)
myfunc1 = cxxfunction(
signature(DF = "data.frame"),
plugin = "Rcpp",
body = '
using namespace Rcpp;
DataFrame df(DF);
IntegerVector x = df["x"];
IntegerVector y = df["y"];
NumericVector v = df["v"];
LogicalVector indic = ifelse(x==y,true,ifelse(x==y+1,true,false));
double subsum = sum(ifelse(indic,v,0));
int subsize = sum(indic);
double mn = ((subsize>0) ? subsum/subsize : 0.0);
return(Rcpp::List::create(_["subsize"] = subsize,
_["submean"] = mn
));
'
)
myfunc1(df)
### OUTPUT:
#
# $subsize
# [1] 300
#
# $submean
# [1] 0.1091555
#
Rcpp (>= 0.10.0) implements the | operator between two logical sugar expressions. So you can do:
require( Rcpp )
cppFunction( code = '
List subsum( IntegerVector x, IntegerVector y, NumericVector v){
using namespace Rcpp ;
LogicalVector indic = (x==y) | (x==y+1) ;
int subsize = sum(indic) ;
double submean = subsize == 0 ? 0.0 : sum(ifelse(indic,v,0)) / subsize ;
return List::create( _["subsize"] = subsize, _["submean"] = submean ) ;
}
' )
subsum( rpois(1000,3), rpois(1000,3), rnorm(1000) )
# $subsize
# [1] 320
#
# $submean
# [1] -0.05708866

Resources