Rcpp memory management - r

I am trying to convert some character data to numeric as below. The data will come with special caracters so I have to get them out. I convert the data to std:string to search for the special caracters. Dos it creates a new variable in memory? I want to know if there is a better way to do it.
NumericVector converter_ra_(Rcpp::RObject x){
if(x.sexp_type() == STRSXP){
CharacterVector y(x);
NumericVector resultado(y.size());
for(unsigned int i = 0; i < y.size(); i++){
std::string ra_string = Rcpp::as<std::string>(y[i]);
//std::cout << ra_string << std::endl;
double t = 0;
int base = 0;
for(int j = (int)ra_string.size(); j >= 0; j--){
if(ra_string[j] >= 48 && ra_string[j] <= 57){
t += ((ra_string[j] - '0') * base_m[base]);
base++;
}
}
//std::cout << t << std::endl;
resultado[i] = t;
}
return resultado;
}else if(x.sexp_type() == REALSXP){
return NumericVector(x);
}
return NumericVector();
}

Does it creates a new variable in memory?
If the input object actually is a numeric vector (REALSXP) and you are simply returning, e.g. as<NumericVector>(input), then no additional variables are created. In any other case new memory will, of course, need to be allocated for the returned object. For example,
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector demo(RObject x) {
if (x.sexp_type() == REALSXP) {
return as<NumericVector>(x);
}
return NumericVector::create();
}
/*** R
y <- rnorm(3)
z <- letters[1:3]
data.table::address(y)
# [1] "0x6828398"
data.table::address(demo(y))
# [1] "0x6828398"
data.table::address(z)
# [1] "0x68286f8"
data.table::address(demo(z))
# [1] "0x5c7eea0"
*/
I want to know if there is a better way to do it.
First you need to define "better":
Faster?
Uses less memory?
Fewer lines of code?
More idiomatic?
Personally, I would start with the last definition since it often entails one or more of the others. For example, in this approach we
Define a function object Predicate that relies on the standard library function isdigit rather than trying to implement this locally
Define another function object that uses the erase-remove idiom to eliminate characters as determined by Predicate; and if necessary, uses std::atoi to convert what remains into a double (again, instead of trying to implement this ourselves)
Uses an Rcpp idiom -- the as converter -- to convert the STRSXP to a std::vector<std::string>
Calls std::transform to convert this into the result vector
#include <Rcpp.h>
using namespace Rcpp;
struct Predicate {
bool operator()(char c) const
{ return !(c == '.' || std::isdigit(c)); }
};
struct Converter {
double operator()(std::string s) const {
s.erase(
std::remove_if(s.begin(), s.end(), Predicate()),
s.end()
);
return s.empty() ? NA_REAL : std::atof(s.c_str());
}
};
// [[Rcpp::export]]
NumericVector convert(RObject obj) {
if (obj.sexp_type() == REALSXP) {
return as<NumericVector>(obj);
}
if (obj.sexp_type() != STRSXP) {
return NumericVector::create();
}
std::vector<std::string> x = as<std::vector<std::string> >(obj);
NumericVector res(x.size(), NA_REAL);
std::transform(x.begin(), x.end(), res.begin(), Converter());
return res;
}
Testing this for minimal functionality,
x <- c("123 4", "abc 1567.35 def", "abcdef", "")
convert(x)
# [1] 1234.00 1567.35 NA NA
(y <- rnorm(3))
# [1] 1.04201552 -0.08965042 -0.88236960
convert(y)
# [1] 1.04201552 -0.08965042 -0.88236960
convert(list())
# numeric(0)
Will this be as performant as something hand-written by a seasoned C or C++ programmer? Almost certainly not. However, since we used library functions and common idioms, it is reasonably concise, likely to be bug-free, and the intention is fairly evident even at a quick glance. If you need something faster then there are probably a handful of optimizations to be made, but there's no need to begin on that premise without benchmarking and profiling first.

Related

Rcpp template class for custom input and output

I'm trying to create C++ function which will apply any funtion on R vector of any type. I've been reading and searching for an answer but my knowledge is still too chaotic and I can't put everything together. I was inspired by sapply example and some from Rcpp gallery but it's to advanced so far.
What I've already done is simple class which kinda-works, but I'm having problem even with this one. Error happens when I'm trying to call function which returns something else than numeric. However I don't know how to extend function to work with custom output type.
At this point, I don't know how:
Obtain type of R Function return value and use this type do define out - same size as x
Alternatively - use std::string type argument in apply_fun which could switch OUTTYPE
Pass any x to the class and use in f - I think I've managed this correctly with <int XTYPE>
Perhaps answer to this question might be to complex so I appreciate all hints. Below I present current progress. Thanks!
Rcpp class
#include <Rcpp.h>
namespace apply {
template <int XTYPE>
class SomeClass {
private:
Rcpp::Vector<XTYPE> x;
Rcpp::Function f;
public:
Rcpp::Vector<XTYPE> run() {
typedef typename Rcpp::traits::storage_type<XTYPE>::type STORAGE;
int n = x.size();
Rcpp::Vector<XTYPE> out(n);
for (unsigned int i{0}; i < n; i++) {
out(i) = Rcpp::as<STORAGE>(f(x(i)));
// Rcpp::Rcout << out(i) << std::endl;
}
return out;
}
SomeClass (Rcpp::Vector<XTYPE> x, Rcpp::Function f)
: x{x}, f{f} {
std::cout << "Initialized SomeClass" << std::endl;
}
};
}
Exported Rcpp function
//' #export
//[[Rcpp::export]]
Rcpp::RObject apply_fun(Rcpp::RObject x,
Rcpp::Function f) {
if (TYPEOF(x) == INTSXP) {
apply::SomeClass<13> r{Rcpp::as<Rcpp::IntegerVector>(x), f};
return r.run();
} else if (TYPEOF(x) == REALSXP) {
apply::SomeClass<14> r{Rcpp::as<Rcpp::NumericVector>(x), f};
return r.run();
} else if (TYPEOF(x) == STRSXP) {
apply::SomeClass<16> r{Rcpp::as<Rcpp::CharacterVector>(x), f};
return r.run();
} else if (TYPEOF(x) == LGLSXP) {
apply::SomeClass<10> r{Rcpp::as<Rcpp::LogicalVector>(x), f};
return r.run();
} else if (TYPEOF(x) == CPLXSXP) {
apply::SomeClass<15> r{Rcpp::as<Rcpp::ComplexVector>(x), f};
return r.run();
} else {
Rcpp::stop("Invalid data type - only integer, numeric, character, factor, date, logical, complex vectors are possible.");
}
return R_NilValue;
}
R calls
apply_fun(c(1.5, 2.5, 3.5), f = function(x) { x + 10})
# 11.5 12.5 13.5
apply_fun(letters[1:3], f = function(x) paste(x, "-"))
# Error in apply_run(letters[1:3], f = function(x) x) :
# Evaluation error: unimplemented type 'char' in 'eval'

efficiently sample from modified arma::vec object

I am using Rcpp to speed up some R code. However, I'm really struggling with types - since these are foreign in R. Here's a simplified version of what I'm trying to do:
#include <RcppArmadillo.h>
#include <algorithm>
//[[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
using namespace arma;
// [[Rcpp::export]]
NumericVector fun(SEXP Pk, int k, int i, const vec& a, const mat& D) {
// this is dummy version of my actual function - with actual arguments.;
// I'm guessing SEXP is going to need to be replaced with something else when it's called from C++ not R.;
return D.col(i);
}
// [[Rcpp::export]]
NumericVector f(const arma::vec& assignment, char k, int B, const mat& D) {
uvec k_ind = find(assignment == k);
NumericVector output(assignment.size()); // for dummy output.
uvec::iterator k_itr = k_ind.begin();
for(; k_itr != k_ind.end(); ++k_itr) {
// this is R code, as I don't know the best way to do this in C++;
k_rep = sample(c(assignment[assignment != k], -1), size = B, replace = TRUE);
output = fun(k_rep, k, *k_itr, assignment, D);
// do something with output;
}
// compile result, ultimately return a List (after I figure out how to do that. For right now, I'll cheat and return the last output);
return output;
}
The part I'm struggling with is the random sampling of assignment. I know that sample has been implemented in Rarmadillo. However, I can see two approaches to this, and I'm not sure which is more efficient/doable.
Approach 1:
Make a table of theassignment values. Replace assignment == k with -1 and set its "count" equal to 1.
sample from the table "names" with probability proportional to the count.
Approach 2:
Copy the relevant subset of the assignment vector into a new vector with an extra spot for -1.
Sample from the copied vector with equal probabilities.
I want to say that approach 1 would be more efficient, except that assignment is currently of type arma::vec, and I'm not sure how to make the table from that - or how much of a cost there is to converting it to a more-compatible format. I think I could implement Approach 2, but I'm hoping to avoid the expensive copy.
Thanks for any insights you can provide.
many variable declaration is not coherent with the assignment made by you, like assignment = k is impossible to compare as assignment has real value and k is a char. as the queston is bad written I feel free to change the variables type. this compile..
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
#include <RcppArmadilloExtensions/sample.h>
// [[Rcpp::export]]
arma::vec fun(const Rcpp::NumericVector& Pk, int k, unsigned int i, const arma::ivec& a, const arma::mat& D)
{
return D.col(i);
}
// [[Rcpp::export]]
Rcpp::NumericMatrix f(const arma::ivec& assignment, int k, unsigned int B, const arma::mat& D)
{
arma::uvec k_ind = find(assignment == k);
arma::ivec KK = assignment(find(assignment != k));
//these 2 row are for KK = c(assignment[assignment != k], -1)
//I dont know what is this -1 is for, why -1 ? maybe you dont need it.
KK.insert_rows(KK.n_rows, 1);
KK(KK.n_rows - 1) = -1;
arma::uvec k_ind_not = find(assignment != k);
Rcpp::NumericVector k_rep(B);
arma::mat output(D.n_rows,k_ind.n_rows); // for dummy output.
for(unsigned int i =0; i < k_ind.n_rows ; i++)
{
k_rep = Rcpp::RcppArmadillo::sample(KK, B, true);
output(arma::span::all, i) = fun(k_rep, k, i, assignment, D);
// do something with output;
}
// compile result, ultimately return a List (after I figure out how to do that. For right now, I'll cheat and return the last output);
return Rcpp::wrap(output);
}
this is not optimized (as the question is bogus), this is badly written, beccause as I think R would be sufficiently faster in searching index of a vector (so do this in R and implemement only fun in Rcpp)...is not useful to waste time here, there are other problems that need a solver implemented in Rcpp , not this searching stuff...
but this is not a useful question as you are asking more for an algorithm than for example signature of function

Index element from list in Rcpp

Suppose I have a List in Rcpp, here called x containing matrices. I can extract one of the elements using x[0] or something. However, how do I extract a specific element of that matrix? My first thought was x[0](0,0) but that does not seem to work. I tried using * signs but also doesn't work.
Here is some example code that prints the matrix (shows matrix can easily be extracted):
library("Rcpp")
cppFunction(
includes = '
NumericMatrix RandMat(int nrow, int ncol)
{
int N = nrow * ncol;
NumericMatrix Res(nrow,ncol);
NumericVector Rands = runif(N);
for (int i = 0; i < N; i++)
{
Res[i] = Rands[i];
}
return(Res);
}',
code = '
void foo()
{
List x;
x[0] = RandMat(3,3);
Rf_PrintValue(wrap( x[0] )); // Prints first matrix in list.
}
')
foo()
How could I change the line Rf_PrintValue(wrap( x[0] )); here to print the the element in the first row and column? In the code I want to use it for I need to extract this element to do computations.
Quick ones:
Compound expression in C++ can bite at times; the template magic gets in the way. So just assign from the List object to a whatever the element is, eg a NumericMatrix.
Then pick from the NumericMatrix as you see fit. We have row, col, element, ... access.
Printing can be easier using Rcpp::Rcout << anElement but note that we currently cannot print entire matrices or vectors -- but the int or double types are fine.
Edit:
Here is a sample implementation.
#include <Rcpp.h>
// [[Rcpp::export]]
double sacha(Rcpp::List L) {
double sum = 0;
for (int i=0; i<L.size(); i++) {
Rcpp::NumericMatrix M = L[i];
double topleft = M(0,0);
sum += topleft;
Rcpp::Rcout << "Element is " << topleft << std::endl;
}
return sum;
}
/*** R
set.seed(42)
L <- list(matrix(rnorm(9),3), matrix(1:9,3), matrix(sqrt(1:4),2))
sacha(L) # fix typo
*/
And its result:
R> Rcpp::sourceCpp('/tmp/sacha.cpp')
R> set.seed(42)
R> L <- list(matrix(rnorm(9),3), matrix(1:9,3), matrix(sqrt(1:4),2))
R> sacha(L)
Element is 1.37096
Element is 1
Element is 1
[1] 3.37096
R>
You have to be explicit at some point. The List class has no idea about the types of elements it contains, it does not know it is a list of matrices.
Dirk has shown you what we usually do, fetch the element as a NumericMatrix and process the matrix.
Here is an alternative that assumes that all elements of your list have the same structure, using a new class template: ListOf with enough glue to make the user code seamless. This just moves to a different place the explicitness.
#include <Rcpp.h>
using namespace Rcpp ;
template <typename WHAT>
class ListOf : public List {
public:
template <typename T>
ListOf( const T& x) : List(x){}
WHAT operator[](int i){ return as<WHAT>( ( (List*)this)->operator[]( i) ) ; }
} ;
// [[Rcpp::export]]
double sacha( ListOf<NumericMatrix> x){
double sum = 0.0 ;
for( int i=0; i<x.size(); i++){
sum += x[i](0,0) ;
}
return sum ;
}
/*** R
L <- list(matrix(rnorm(9),3), matrix(1:9,3), matrix(sqrt(1:4),2))
sacha( L )
*/
When I sourceCpp this file, I get:
> L <- list(matrix(rnorm(9), 3), matrix(1:9, 3), matrix(sqrt(1:4), 2))
> sacha(L)
[1] 1.087057

How to access elements of a vector in a Rcpp::List

I am puzzled.
The following compile and work fine:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
List test(){
List l;
IntegerVector v(5, NA_INTEGER);
l.push_back(v);
return l;
}
In R:
R) test()
[[1]]
[1] NA NA NA NA NA
But when I try to set the IntegerVector in the list:
// [[Rcpp::export]]
List test(){
List l;
IntegerVector v(5, NA_INTEGER);
l.push_back(v);
l[0][1] = 1;
return l;
}
It does not compile:
test.cpp:121:8: error: invalid use of incomplete type 'struct SEXPREC'
C:/PROGRA~1/R/R-30~1.0/include/Rinternals.h:393:16: error: forward declaration of 'struct SEXPREC'
It is because of this line:
l[0][1] = 1;
The compiler has no idea that l is a list of integer vectors. In essence l[0] gives you a SEXP (the generic type for all R objects), and SEXP is an opaque pointer to SEXPREC of which we don't have access to te definition (hence opaque). So when you do the [1], you attempt to get the second SEXPREC and so the opacity makes it impossible, and it is not what you wanted anyway.
You have to be specific that you are extracting an IntegerVector, so you can do something like this:
as<IntegerVector>(l[0])[1] = 1;
or
v[1] = 1 ;
or
IntegerVector x = l[0] ; x[1] = 1 ;
All of these options work on the same underlying data structure.
Alternatively, if you really wanted the syntax l[0][1] you could define your own data structure expressing "list of integer vectors". Here is a sketch:
template <class T>
class ListOf {
public:
ListOf( List data_) : data(data_){}
T operator[](int i){
return as<T>( data[i] ) ;
}
operator List(){ return data ; }
private:
List data ;
} ;
Which you can use, e.g. like this:
// [[Rcpp::export]]
List test2(){
ListOf<IntegerVector> l = List::create( IntegerVector(5, NA_INTEGER) ) ;
l[0][1] = 1 ;
return l;
}
Also note that using .push_back on Rcpp vectors (including lists) requires a complete copy of the list data, which can cause slow you down. Only use resizing functions when you don't have a choice.

is it possible to return two vectors from a function?

Im trying to do a merge sort in cpp on a vector called x, which contains x coordinates. As the mergesort sorts the x coordinates, its supposed to move the corresponding elements in a vector called y, containing the y coordinates. the only problem is that i dont know how to (or if i can) return both resulting vectors from the merge function.
alternatively if its easier to implement i could use a slower sort method.
No, you cannot return 2 results from a method like in this example.
vector<int>, vector<int> merge_sort();
What you can do is pass 2 vectors by reference to a function and the resultant mergesorted vector affects the 2 vectors...e.g
void merge_sort(vector<int>& x, vector<int>& y);
Ultimately, you can do what #JoshD mentioned and create a struct called point and merge sort the vector of the point struct instead.
Try something like this:
struct Point {
int x;
int y;
operator <(const Point &rhs) {return x < rhs.x;}
};
vector<Point> my_points.
mergesort(my_points);
Or if you want to sort Points with equal x value by the y cordinate:
Also, I thought I'd add, if you really ever need to, you can alway return a std::pair. A better choice is usually to return through the function parameters.
operator <(const Point &rhs) {return (x < rhs.x || x == rhs.x && y < rhs.y);}
Yes, you can return a tuple, then use structured binding (since C++17).
Here's a full example:
#include <cstdlib>
#include <iostream>
#include <numeric>
#include <tuple>
#include <vector>
using namespace std::string_literals;
auto twoVectors() -> std::tuple<std::vector<int>, std::vector<int>>
{
const std::vector<int> a = { 1, 2, 3 };
const std::vector<int> b = { 4, 5, 6 };
return { a, b };
}
auto main() -> int
{
auto [a, b] = twoVectors();
auto const sum = std::accumulate(a.begin(), a.end(), std::accumulate(b.begin(), b.end(), 0));
std::cout << "sum: "s << sum << std::endl;
return EXIT_SUCCESS;
}
You can have a vector of vectors
=> vector<vector > points = {{a, b}, {c, d}};
now you can return points.
Returning vectors is most probably not what you want, as they are copied for this purpose (which is slow). Have a look at this implementation, for example.

Resources