Retrieve object from a list Rcpp - r

I'm new to Rcpp and I'm struggling with it. I have a function that return a list with 2 objects: max and argmax from a vector. I would like to retrieve only max or only argmax from that list in another function. How can I do that?
Below an example:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
List max_argmax_cpp(NumericVector x){
double max = x[0];
int argmax = 0 + 1;
for(int i = 1; i < x.length(); i++){
if(x[i]>x[i-1]){
max = x[i];
argmax = i+1;
}
}
List Output;
Output["Max"] = max;
Output["Argmax"] = argmax;
return(Output);
}
// [[Rcpp::export]]
int max_only(NumericVector x){
int max = **only max from max_argmax_cpp(x)**;
return(max);
}

In your second example you can simply call your original function and assign it to a List, whose elements can the be retrieved by name (or position):
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
List max_argmax_cpp(NumericVector x){
double max = x[0];
int argmax = 0 + 1;
for(int i = 1; i < x.length(); i++){
if(x[i]>x[i-1]){
max = x[i];
argmax = i+1;
}
}
List Output;
Output["Max"] = max;
Output["Argmax"] = argmax;
return(Output);
}
// [[Rcpp::export]]
double max_only(NumericVector x){
List l = max_argmax_cpp(x);
double max = l["Max"];
return(max);
}
/*** R
set.seed(42)
x <- runif(100)
max_argmax_cpp(x)
max_only(x)
*/
Output:
> set.seed(42)
> x <- runif(100)
> max_argmax_cpp(x)
$Max
[1] 0.7439746
$Argmax
[1] 99
> max_only(x)
[1] 0.7439746

Related

Allocate Rcpp List of n NumericMatrix

Is there a way to allocate an Rcpp List of length n, where each element of the List will be filled with a NumericMatrix, but the size of each NumericMatrix can change?
I have an idea for doing this using std::list and push_back(), but the size of the list may be quite large and I want to avoid the overhead of creating an extra copy of the list when I return from the function.
The below R code gives an idea of what I hope to do:
myvec = function(n) {
x = vector("list", n)
for (i in seq_len(n)) {
nc = sample(1:3, 1)
nr = sample(1:3, 1)
x[[i]] = matrix(rbinom(nc * nr, size = 1, prob = 0.5),
nrow = nr, ncol = nc)
}
x
}
This could result in something like:
> myvec(2)
[[1]]
[,1]
[1,] 0
[2,] 1
[[2]]
[,1] [,2] [,3]
[1,] 0 1 0
[2,] 0 1 1
Update: based on the comments of #Dirk and #Ralf, I created functions based on Rcpp::List and std::list with a wrap at the end. Speed comparisons don't seem to favor one version over the other, but perhaps there's an inefficiency I'm not aware of.
src = '
#include <Rcpp.h>
// [[Rcpp::export]]
Rcpp::List myvec(int n) {
Rcpp::RNGScope rngScope;
Rcpp::List x(n);
// Rcpp::IntegerVector choices = {1, 2 ,3};
Rcpp::IntegerVector choices = Rcpp::seq_len(50);
for (int i = 0; i < n; ++i) {
int nc = Rcpp::sample(choices, 1).at(0);
int nr = Rcpp::sample(choices, 1).at(0);
Rcpp::NumericVector entries = Rcpp::rbinom(nc * nr, 1, 0.5);
x(i) = Rcpp::NumericMatrix(nc, nr, entries.begin());
}
return x;
}
// [[Rcpp::export]]
Rcpp::List myvec2(int n) {
Rcpp::RNGScope scope;
std::list< Rcpp::NumericMatrix > x;
// Rcpp::IntegerVector choices = {1, 2 ,3};
Rcpp::IntegerVector choices = Rcpp::seq_len(50);
for (int i = 0; i < n; ++i) {
int nc = Rcpp::sample(choices, 1).at(0);
int nr = Rcpp::sample(choices, 1).at(0);
Rcpp::NumericVector entries = Rcpp::rbinom(nc * nr, 1, 0.5);
x.push_back( Rcpp::NumericMatrix(nc, nr, entries.begin()));
}
return Rcpp::wrap(x);
}
'
sourceCpp(code = src)
Resulting benchmarks on my computer are:
> library(microbenchmark)
> rcpp_list = function() {
+ set.seed(10);myvec(105)
+ }
> std_list = function() {
+ set.seed(10);myvec2(105)
+ }
> microbenchmark(rcpp_list(), std_list(), times = 1000)
Unit: milliseconds
expr min lq mean median uq
rcpp_list() 1.8901 1.92535 2.205286 1.96640 2.22380
std_list() 1.9164 1.95570 2.224941 2.00555 2.32315
max neval cld
7.1569 1000 a
7.1194 1000 a
The fundamental issue that Rcpp objects are R objects governed my R's memory management where resizing is expensive: full copies.
So when I have tasks similar to yours where sizes may change, or are unknown, I often work with different data structures -- the STL gives us plenty -- and only convert to R(cpp) at the return step at the end.
The devil in the detail here (as always). Profile, experiment, ...
Edit: And in the narrower sense of "can we return a List of NumericMatrix objects with varying sizes" the answer is of course we can because that is what List objects do. You can also insert other types.
As Dirk said, it is of course possible to create a list with matrices of different size. To make it a bit more concrete, here a translation of your R function:
#include <Rcpp.h>
// [[Rcpp::plugins(cpp11)]]
// [[Rcpp::export]]
Rcpp::List myvec(int n) {
Rcpp::List x(n);
Rcpp::IntegerVector choices = {1, 2 ,3};
for (int i = 0; i < n; ++i) {
int nc = Rcpp::sample(choices, 1).at(0);
int nr = Rcpp::sample(choices, 1).at(0);
Rcpp::NumericVector entries = Rcpp::rbinom(nc * nr, 1, 0.5);
x(i) = Rcpp::NumericMatrix(nc, nr, entries.begin());
}
return x;
}
/***R
myvec(2)
*/
The main difference to the R code are the explicitly named vectors choices and entries, which are only implicit in the R code.

Convert individual Rcpp::IntegerVector element to a character

I have to convert individual elements of Rcpp::IntegerVector into their string form so I can add another string to them. My code looks like this:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
Rcpp::String int_to_char_single_fun(int x){
// Obtain environment containing function
Rcpp::Environment base("package:base");
// Make function callable from C++
Rcpp::Function int_to_string = base["as.character"];
// Call the function and receive its list output
Rcpp::String res = int_to_string(Rcpp::_["x"] = x); // example of original param
// Return test object in list structure
return (res);
}
//[[Rcpp::export]]
Rcpp::CharacterVector add_chars_to_int(Rcpp::IntegerVector x){
int n = x.size();
Rcpp::CharacterVector BASEL_SEG(n);
for(int i = 0; i < n; i++){
BASEL_SEG[i] = "B0" + int_to_char_single_fun(x[i]);
}
return BASEL_SEG;
}
/*** R
int_vec <- as.integer(c(1,2,3,4,5))
BASEL_SEG_char <- add_chars_to_int(int_vec)
*/
I get the following error:
no match for 'operator+'(operand types are 'const char[3]' and 'Rcpp::String')
I cannot import any C++ libraries like Boost to do this and can only use Rcpp functionality to do this. How do I add string to integer here in Rcpp?
We basically covered this over at the Rcpp Gallery when we covered Boost in an example for lexical_cast (though that one went the other way). So rewriting it quickly yields this:
Code
// We can now use the BH package
// [[Rcpp::depends(BH)]]
#include <Rcpp.h>
#include <boost/lexical_cast.hpp>
using namespace Rcpp;
using boost::lexical_cast;
using boost::bad_lexical_cast;
// [[Rcpp::export]]
std::vector<std::string> lexicalCast(std::vector<int> v) {
std::vector<std::string> res(v.size());
for (unsigned int i=0; i<v.size(); i++) {
try {
res[i] = lexical_cast<std::string>(v[i]);
} catch(bad_lexical_cast &) {
res[i] = "(failed)";
}
}
return res;
}
/*** R
lexicalCast(c(42L, 101L))
*/
Output
R> Rcpp::sourceCpp("/tmp/lexcast.cpp")
R> lexicalCast(c(42L, 101L))
[1] "42" "101"
R>
Alternatives
Because converting numbers to strings is as old as computing itself you could also use:
itoa()
snprintf()
streams
and probably a few more I keep forgetting.
As others have pointed out, there are several ways to do this. Here are two very straightforward approaches.
1. std::to_string
Rcpp::CharacterVector add_chars_to_int1(Rcpp::IntegerVector x){
int n = x.size();
Rcpp::CharacterVector BASEL_SEG(n);
for(int i = 0; i < n; i++){
BASEL_SEG[i] = "B0" + std::to_string(x[i]);
}
return BASEL_SEG;
}
2. Creating a new Rcpp::CharacterVector
Rcpp::CharacterVector add_chars_to_int2(Rcpp::IntegerVector x){
int n = x.size();
Rcpp::CharacterVector BASEL_SEG(n);
Rcpp::CharacterVector myIntToStr(x.begin(), x.end());
for(int i = 0; i < n; i++){
BASEL_SEG[i] = "B0" + myIntToStr[i];
}
return BASEL_SEG;
}
Calling them:
add_chars_to_int1(int_vec) ## using std::to_string
[1] "B01" "B02" "B03" "B04" "B05"
add_chars_to_int2(int_vec) ## converting to CharacterVector
[1] "B01" "B02" "B03" "B04" "B05"

Segment fault when using Rcpp/Armadillo and openMP prarallel with user-defined function

I was trying to use rcpp/armadillo with openmp to speed up a loop in R. The loop takes a matrix with each row containing indices of a location vector(or matrix if it's 2D locations) as input(and other matrix/vec to be used). Inside the loop, I extracted each row of input indices matrix and find the corresponding locations, calculate distance matrix, and covariance matrix, do cholesky and backsolve, save the backsolve results to a new matrix. Here is the rcpp code:
`#include <iostream>
#include <RcppArmadillo.h>
#include <omp.h>
#include <Rcpp.h>
// [[Rcpp::plugins(openmp)]]
using namespace Rcpp;
using namespace arma;
using namespace std;
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
mat NZentries_new2 (int m, int nnp, const mat& locs, const umat& revNNarray, const mat& revCondOnLatent, const vec& nuggets, const vec covparms){
// initialized the output matrix
mat Lentries=zeros(nnp,m+1);
// initialized objects in parallel part
int n0; //number of !is_na elements
uvec inds;//
vec revCon_row;//
uvec inds00;//
vec nug;//
mat covmat;//
vec onevec;//
vec M;//
mat dist;//
int k;//
omp_set_num_threads(2);// selects the number of cores to use.
#pragma omp parallel for shared(locs,revNNarray,revCondOnLatent,nuggets,nnp,m,Lentries) private(k,M,dist,onevec,covmat,nug,n0,inds,revCon_row,inds00) default(none) schedule(static)
for (k = 0; k < nnp; k++) {
// extract a row to work with
inds=revNNarray.row(k).t();
revCon_row=revCondOnLatent.row(k).t();
if (k < m){
n0=k+1;
} else {
n0=m+1;
}
// extract locations
inds00=inds(span(m+1-n0,m))-ones<uvec>(n0);
nug=nuggets.elem(inds00) % (ones(n0)-revCon_row(span(m+1-n0,m))); // vec is vec, cannot convert to mat
dist=calcPWD2(locs.rows(inds00));
#pragma omp critical
{
//calculate covariance matrix
covmat= MaternFun(dist,covparms) + diagmat(nug) ; // summation from arma
}
// get last row of inverse Cholesky
onevec = zeros(n0);
onevec[n0-1] = 1;
M=solve(chol(covmat,"upper"),onevec);
// save the entries to matrix
Lentries(k,span(0,n0-1)) = M.t();
}
return Lentries;
}`
The current version works fine but speed is slow(almost the same as no parallel version), if I take the line in omp critical bracket out, it cause segment fault and R will be crashed. This MaterFun is a function I defined as below with several other small functions. So my question is that why MaternFun has to stay in the critical part.
// [[Rcpp::export]]
mat MaternFun( mat distmat, vec covparms ){
int d1 = distmat.n_rows;
int d2 = distmat.n_cols;
int j1;
int j2;
mat covmat(d1,d2);
double scaledist;
double normcon = covparms(0)/(pow(2.0,covparms(2)-1)*Rf_gammafn(covparms(2)));
for (j1 = 0; j1 < d1; j1++){
for (j2 = 0; j2 < d2; j2++){
if ( distmat(j1,j2) == 0 ){
covmat(j1,j2) = covparms(0);
} else {
scaledist = distmat(j1,j2)/covparms(1);
covmat(j1,j2) = normcon*pow( scaledist, covparms(2) )*
Rf_bessel_k(scaledist,covparms(2),1.0);
}
}
}
return covmat;
}
// [[Rcpp::export]]
double dist2(double lat1,double long1,double lat2,double long2) {
double dist = sqrt(pow(lat1 - lat2, 2) + pow(long1 - long2, 2)) ;
return (dist) ;
}
// [[Rcpp::export]]
mat calcPWD2( mat x) {//Rcpp::NumericMatrix
int outrows = x.n_rows ;
int outcols = x.n_rows ;
mat out(outrows, outcols) ;
for (int arow = 0 ; arow < outrows ; arow++) {
for (int acol = 0 ; acol < outcols ; acol++) {
out(arow, acol) = dist2(x(arow, 0),x(arow, 1),
x(acol, 0),x(acol, 1)) ; //extract element from mat
}
}
return (out) ;
}
Here is some sample inputs for testing the MaterFun in R:
library(fields)
distmat=rdist(1:5) # distance matrix
covparms=c(1,0.2,1.5)
The issue is there are two calls to R math functions (Rf_bessel_k and Rf_gammafn) that require the access to be single threaded instead of parallel.
To get around this, let's add a dependency on boost via BH to obtain the cyl_bessel_k and tgamma functions. Alternatively, there is always the option of reimplementing R's besselK and gamma in C++ so it doesn't use the single-threaded R variant.
This gives:
#include <Rcpp.h>
#include <boost/math/special_functions/bessel.hpp>
#include <boost/math/special_functions/gamma.hpp>
// [[Rcpp::depends(BH)]]
// [[Rcpp::export]]
double besselK_boost(double x, double v) {
return boost::math::cyl_bessel_k(v, x);
}
// [[Rcpp::export]]
double gamma_fn_boost(double x) {
return boost::math::tgamma(x);
}
Test Code
x0 = 9.536743e-07
nu = -10
all.equal(besselK(x0, nu), besselK_boost(x0, nu))
# [1] TRUE
x = 2
all.equal(gamma(x), gamma_fn_boost(x))
# [1] TRUE
Note: The order of parameters for boost's variant differs from R's:
cyl_bessel_k(v, x)
Rf_bessel_k(x, v, expon.scaled = FALSE)
From here, we can modify the MaternFun. Unfortunately, because calcPWD2 is missing, the furthest we can go is switching to use boost and incorporating in OpenMP protections.
#include <RcppArmadillo.h>
#include <boost/math/special_functions/bessel.hpp>
#include <boost/math/special_functions/gamma.hpp>
#ifdef _OPENMP
#include <omp.h>
#endif
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::depends(BH)]]
// [[Rcpp::plugins(openmp)]]
// [[Rcpp::export]]
arma::mat MaternFun(arma::mat distmat, arma::vec covparms) {
int d1 = distmat.n_rows;
int d2 = distmat.n_cols;
int j1;
int j2;
arma::mat covmat(d1,d2);
double scaledist;
double normcon = covparms(0) /
(pow(2.0, covparms(2) - 1) * boost::math::tgamma(covparms(2)));
for (j1 = 0; j1 < d1; ++j1){
for (j2 = 0; j2 < d2; ++j2){
if ( distmat(j1, j2) == 0 ){
covmat(j1, j2) = covparms(0);
} else {
scaledist = distmat(j1, j2)/covparms(1);
covmat(j1, j2) = normcon * pow( scaledist, covparms(2) ) *
boost::math::cyl_bessel_k(covparms(2), scaledist);
}
}
}
return covmat;
}

Warning when downcasting in Rcpp?

I have an Rcpp function that should take an IntegerVector as input (as toInt). I want to use it on vector of integers, but also on vector of doubles that are just integers (e.g. 1:4 is of type integer but 1:4 + 1 is of type double).
Yet, when this is used on real floating point numbers (e.g. 1.5), I would like it to return a warning or an error instead of silently rounding all values (to make them integers).
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
IntegerVector toInt(RObject x) {
return as<IntegerVector>(x);
}
> toInt(c(1.5, 2.4)) # I would like a warning
[1] 1 2
> toInt(1:2 + 1) # No need of warning
[1] 2 3
Rcpp sugar has all you need. Here is one possible implementation:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
IntegerVector fprive(const RObject & x) {
NumericVector nv(x);
IntegerVector iv(x);
if (is_true(any(nv != NumericVector(iv)))) warning("Uh-oh");
return(iv);
}
/*** R
fprive(c(1.5, 2))
fprive(c(1L, 2L))
*/
Its output is as follows:
R> Rcpp::sourceCpp('/tmp/fprive.cpp')
R> fprive(c(1.5, 2))
[1] 1 2
R> fprive(c(1L, 2L))
[1] 1 2
Warning message:
In fprive(c(1.5, 2)) : Uh-oh
R>
Because it is a warning object, you can control via options("warn") whether you want to abort, print immediately, print at end, ignore, ...
The first solution I thought of
// [[Rcpp::export]]
IntegerVector toInt2(const NumericVector& x) {
for (int i = 0; i < x.size(); i++) {
if (x[i] != (int)x[i]) {
warning("Uh-oh");
break;
}
}
return as<IntegerVector>(x);
}
but I wondered if there wasn't an unnecessary copy when x was an IntegerVector, so I made this other solution:
// [[Rcpp::export]]
IntegerVector toInt3(const RObject& x) {
NumericVector nv(x);
for (int i = 0; i < nv.size(); i++) {
if (nv[i] != (int)nv[i]) {
warning("Uh-oh");
break;
}
}
return as<IntegerVector>(x);
}
But, maybe the best solution would be to test if the RObject is already of type int and to fill the resulting vector at the same time of checking the type:
// [[Rcpp::export]]
SEXP toInt4(const RObject& x) {
if (TYPEOF(x) == INTSXP) return x;
NumericVector nv(x);
int i, n = nv.size();
IntegerVector res(n);
for (i = 0; i < n; i++) {
res[i] = nv[i];
if (nv[i] != res[i]) {
warning("Uh-oh");
break;
}
}
for (; i < n; i++) res[i] = nv[i];
return res;
}
Some benchmarking:
x <- seq_len(1e7)
x2 <- x; x2[1] <- 1.5
x3 <- x; x3[length(x3)] <- 1.5
microbenchmark::microbenchmark(
fprive(x), toInt2(x), toInt3(x), toInt4(x),
fprive(x2), toInt2(x2), toInt3(x2), toInt4(x2),
fprive(x3), toInt2(x3), toInt3(x3), toInt4(x3),
times = 20
)
Unit: microseconds
expr min lq mean median uq max neval
fprive(x) 229865.629 233539.952 236049.68870 235623.390 238500.4335 244608.276 20
toInt2(x) 98249.764 99520.233 102026.44305 100468.627 103480.8695 114144.022 20
toInt3(x) 50631.512 50838.560 52307.34400 51417.296 52524.0260 58311.909 20
toInt4(x) 1.165 6.955 46.63055 10.068 11.0755 766.022 20
fprive(x2) 63134.534 64026.846 66004.90820 65079.292 66674.4835 74907.065 20
toInt2(x2) 43073.288 43435.478 44068.28935 43990.455 44528.1800 45745.834 20
toInt3(x2) 42968.743 43461.838 44268.58785 43682.224 44235.6860 51906.093 20
toInt4(x2) 19379.401 19640.198 20091.04150 19918.388 20232.4565 21756.032 20
fprive(x3) 254034.049 256154.851 258329.10340 258676.363 259549.3530 264550.346 20
toInt2(x3) 77983.539 79162.807 79901.65230 79424.011 80030.3425 87906.977 20
toInt3(x3) 73521.565 74329.410 76050.63095 75128.253 75867.9620 88240.937 20
toInt4(x3) 22109.970 22529.713 23759.99890 23072.738 23688.5365 30905.478 20
So, toInt4 seems the best solution.

Rcpp returns large negative number when 2 large positives are multiplied

I am creating a function that calculates area under the curve and when I take the 2 partials and multiply them for the numerator I exceed 2^31 and then a value like -2013386137 is used in the calculation.
Here are the cpp chunks
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector sort_rcpp(NumericVector x) {
std::vector<double> tmp = Rcpp::as< std::vector<double> > (x);
std::sort(tmp.begin(), tmp.end());
return wrap(tmp);
}
// [[Rcpp::export]]
IntegerVector rank(NumericVector x) {
return match(x, sort_rcpp(x));
}
// [[Rcpp::export]]
double auc_(NumericVector actual, NumericVector predicted) {
double n = actual.size();
IntegerVector Ranks = rank(predicted);
int NPos = sum(actual == 1);
int NNeg = (actual.size() - NPos);
int sumranks = 0;
for(int i = 0; i < n; ++i) {
if (actual[i] == 1){
sumranks = sumranks + Ranks[i];
}
}
double p1 = (sumranks - NPos*( NPos + 1 ) / 2);
long double p2 = NPos*NNeg;
double auc = p1 / p2;
return auc ;
}
and then the test example that has the issue
N = 100000
Actual = as.numeric(runif(N) > .65)
Predicted = as.numeric(runif(N))
actual = Actual
predicted = Predicted
auc_(Actual, Predicted)
I am also putting this in an R package
devtools::install_github("JackStat/ModelMetrics")
N = 100000
Actual = as.numeric(runif(N) > .65)
Predicted = as.numeric(runif(N))
actual = Actual
predicted = Predicted
ModelMetrics::auc(Actual, Predicted)
You use int internally in your function which leads to overflow. Use a double and things look sunnier:
R> sourceCpp("/tmp/jackstat.cpp")
R> N <- 100000
R> Actual <- as.numeric(runif(N) > .65)
R> Predicted <- as.numeric(runif(N))
R> auc1(Actual, Predicted) # your function
[1] -0.558932
R> auc2(Actual, Predicted) # my variant using double
[1] 0.499922
R>
The complete corrected file is below:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector sort_rcpp(NumericVector x) {
std::vector<double> tmp = Rcpp::as< std::vector<double> > (x);
std::sort(tmp.begin(), tmp.end());
return wrap(tmp);
}
// [[Rcpp::export]]
IntegerVector rank(NumericVector x) {
return match(x, sort_rcpp(x));
}
// [[Rcpp::export]]
double auc1(NumericVector actual, NumericVector predicted) {
double n = actual.size();
IntegerVector Ranks = rank(predicted);
int NPos = sum(actual == 1);
int NNeg = (actual.size() - NPos);
int sumranks = 0;
for(int i = 0; i < n; ++i) {
if (actual[i] == 1){
sumranks = sumranks + Ranks[i];
}
}
double p1 = (sumranks - NPos*( NPos + 1 ) / 2);
long double p2 = NPos*NNeg;
double auc = p1 / p2;
return auc ;
}
// [[Rcpp::export]]
double auc2(NumericVector actual, NumericVector predicted) {
double n = actual.size();
IntegerVector Ranks = rank(predicted);
double NPos = sum(actual == 1);
double NNeg = (actual.size() - NPos);
double sumranks = 0;
for(int i = 0; i < n; ++i) {
if (actual[i] == 1){
sumranks = sumranks + Ranks[i];
}
}
double p1 = (sumranks - NPos*( NPos + 1 ) / 2);
long double p2 = NPos*NNeg;
double auc = p1 / p2;
return auc ;
}
/*** R
N <- 100000
Actual <- as.numeric(runif(N) > .65)
Predicted <- as.numeric(runif(N))
auc1(Actual, Predicted)
auc2(Actual, Predicted)
*/

Resources