Setting colnames in R's cpp11 - r

Provided that cpp11 does not provide any "sugar", we need to use attributes.
I am trying to set colnames in a C++ function, as in the next MWE
#include "cpp11.hpp"
using namespace cpp11;
// THIS WORKS
[[cpp11::register]]
doubles cpp_names(writable::doubles X) {
X.attr("names") = {"a","b"};
return X;
}
// THIS WON'T
[[cpp11::register]]
doubles_matrix<> cpp_colnames(writable::doubles_matrix<> X) {
X.attr("dimnames") = list(NULL, {"A","B"});
return X;
}
How can I pass a list with two vectors, one NULL and the other ("a","b"), that can be correctly converted to a SEXP?
In Rcpp, one would do colnames(X) = {"a","b"}.
My approach can be wrong.

Related

An Rcpp function argument being a List filled with default values

It's my first post here! Woohoo!
I would like to have an Rcpp function that would have an argument being a list with some specified default values of its entries. I would like this function to work similarly to the following R function:
> nicetry_R = function(sv = list( a = rep(0, 2)) ) { sv$a }
> nicetry_R()
[1] 0 0
> nicetry_R(list(a=rep(1,2)))
[1] 1 1
In this function argument sv is a list specified by default to contain vector a set to rep(0,2).
What I tried in RcppArmadillo is a solution that is working by refering to R_NilValue in the argument List specification. Here's the content of my nicetry.cpp file:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector nicetry (Nullable<List> sv = R_NilValue) {
NumericVector aux_a(2);
if (sv.isNotNull()){
List SV(sv);
aux_a = SV["a"];
}
return aux_a;
}
/*** R
nicetry()
nicetry(list(a=rep(1,2)))
*/
It does what I want it to do in the sense that it produces the desired outcome:
> nicetry()
[1] 0 0
> nicetry(list(a=rep(1,2)))
[1] 1 1
But I would prefer to have the default values of the list specified in the argument line. The example below is just one of my failed attempts:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector nicetry (
List sv = List::create(Named("a",NumericVector::create(0,0)))
) {
NumericVector aux_a = sv["a"];
return aux_a;
}
/*** R
nicetry()
*/
In this example, the argument declaration contains the list. It does not compile though. When I copy the declaration of the List from the argument declaration section leaving this section empty, and paste the line in the body of the function: List sv = List::create(Named("a",NumericVector::create(0,0))); then the list is created and the outcome provided. But that's not what I want.
Do you think it's possible to declare such a list as an argument at all?
Thanks so much! Greetings, T

memory efficient method to calculate distance matrix [duplicate]

I have an object of class big.matrix in R with dimension 778844 x 2. The values are all integers (kilometres). My objective is to calculate the Euclidean distance matrix using the big.matrix and have as a result an object of class big.matrix. I would like to know if there is an optimal way of doing that.
The reason for my choice of using the class big.matrix is memory limitation. I could transform my big.matrix to an object of class matrix and calculate the Euclidean distance matrix using dist(). However, dist() would return an object of size that would not be allocated in the memory.
Edit
The following answer was given by John W. Emerson, author and maintainer of the bigmemory package:
You could use big algebra I expect, but this would also be a very nice use case for Rcpp via sourceCpp(), and very short and easy. But in short, we don't even attempt to provide high-level features (other than the basics which we implemented as proof-of-concept). No single algorithm could cover all use cases once you start talking out-of-memory big.
Here is a way using RcppArmadillo. Much of this is very similar to the RcppGallery example. This will return a big.matrix with the associated pairwise (by row) euclidean distances. I like to wrap my big.matrix functions in a wrapper function to create a cleaner syntax (i.e. avoid the #address and other initializations.
Note - as we are using bigmemory (and therefore concerned with RAM usage) I have this example returned the N-1 x N-1 matrix of only lower triangular elements. You could modify this but this is what I threw together.
euc_dist.cpp
// To enable the functionality provided by Armadillo's various macros,
// simply include them before you include the RcppArmadillo headers.
#define ARMA_NO_DEBUG
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo, BH, bigmemory)]]
using namespace Rcpp;
using namespace arma;
// The following header file provides the definitions for the BigMatrix
// object
#include <bigmemory/BigMatrix.h>
// C++11 plugin
// [[Rcpp::plugins(cpp11)]]
template <typename T>
void BigArmaEuclidean(const Mat<T>& inBigMat, Mat<T> outBigMat) {
int W = inBigMat.n_rows;
for(int i = 0; i < W - 1; i++){
for(int j=i+1; j < W; j++){
outBigMat(j-1,i) = sqrt(sum(pow((inBigMat.row(i) - inBigMat.row(j)),2)));
}
}
}
// [[Rcpp::export]]
void BigArmaEuc(SEXP pInBigMat, SEXP pOutBigMat) {
// First we tell Rcpp that the object we've been given is an external
// pointer.
XPtr<BigMatrix> xpMat(pInBigMat);
XPtr<BigMatrix> xpOutMat(pOutBigMat);
int type = xpMat->matrix_type();
switch(type) {
case 1:
BigArmaEuclidean(
arma::Mat<char>((char *)xpMat->matrix(), xpMat->nrow(), xpMat->ncol(), false),
arma::Mat<char>((char *)xpOutMat->matrix(), xpOutMat->nrow(), xpOutMat->ncol(), false)
);
return;
case 2:
BigArmaEuclidean(
arma::Mat<short>((short *)xpMat->matrix(), xpMat->nrow(), xpMat->ncol(), false),
arma::Mat<short>((short *)xpOutMat->matrix(), xpOutMat->nrow(), xpOutMat->ncol(), false)
);
return;
case 4:
BigArmaEuclidean(
arma::Mat<int>((int *)xpMat->matrix(), xpMat->nrow(), xpMat->ncol(), false),
arma::Mat<int>((int *)xpOutMat->matrix(), xpOutMat->nrow(), xpOutMat->ncol(), false)
);
return;
case 8:
BigArmaEuclidean(
arma::Mat<double>((double *)xpMat->matrix(), xpMat->nrow(), xpMat->ncol(), false),
arma::Mat<double>((double *)xpOutMat->matrix(), xpOutMat->nrow(), xpOutMat->ncol(), false)
);
return;
default:
// We should never get here, but it resolves compiler warnings.
throw Rcpp::exception("Undefined type for provided big.matrix");
}
}
My little wrapper
bigMatrixEuc <- function(bigMat){
zeros <- big.matrix(nrow = nrow(bigMat)-1,
ncol = nrow(bigMat)-1,
init = 0,
type = typeof(bigMat))
BigArmaEuc(bigMat#address, zeros#address)
return(zeros)
}
The test
library(Rcpp)
sourceCpp("euc_dist.cpp")
library(bigmemory)
set.seed(123)
mat <- matrix(rnorm(16), 4)
bm <- as.big.matrix(mat)
# Call new euclidean function
bm_out <- bigMatrixEuc(bm)[]
# pull out the matrix elements for out purposes
distMat <- as.matrix(dist(mat))
distMat[upper.tri(distMat, diag=TRUE)] <- 0
distMat <- distMat[2:4, 1:3]
# check if identical
all.equal(bm_out, distMat, check.attributes = FALSE)
[1] TRUE

Converting String Versions of "Infinity" to Numeric in Rcpp

I have some JSON response that encodes Inf/-Inf/NaN as strings, so the JSON array it returns will look like [1.0, "Infinity", 2.0]. I parse this using a JSON library and end up with a list that looks like list(1.0, "Infinity", 2.0) and I want to convert it to be list(1.0, Inf, 2.0), for performance reasons I need this to use Rcpp. Here is the code I tried doing but I can't seem to get Rcpp to not yell at me about
library(Rcpp)
cppFunction('
NumericVector convertThings(List data) {
const size_t num_rows = data.size();
NumericVector rv(num_rows);
for (size_t i = 0; i < num_rows; ++i) {
if (as<String>(data[i]) == "Infinity") {
rv[i] = R_PosInf;
} else {
rv[i] = as<double>(data[i]);
}
}
return rv;
}
')
convertThings(list('Infinity', 1.0))
# expected output c(Inf, 1.0)
The error I am seeing is Error: not compatible with requested type. Help is much appreciated!
That is a basic C++ problem: how to convert text to numbers reliably.
One possibly answer is provided by the Boost.Lexical_Cast library and illustrated in this Rcpp Gallery post. Just using the first example:
R> library(Rcpp)
R> sourceCpp("/tmp/boostLexicalCastExample.cpp") # from post
R> lexcicalCast(c("Inf", "inf", "Infinity", "NA", 42))
[1] Inf Inf Inf NA 42
R>
As you can see, it matches at least three different ways of spelling infinity in text.

Order of methods in R reference class and multiple files

There is one thing I really don't like about R reference class: the order you write the methods matters. Suppose your class goes like this:
myclass = setRefClass("myclass",
fields = list(
x = "numeric",
y = "numeric"
))
myclass$methods(
afunc = function(i) {
message("In afunc, I just call bfunc...")
bfunc(i)
}
)
myclass$methods(
bfunc = function(i) {
message("In bfunc, I just call cfunc...")
cfunc(i)
}
)
myclass$methods(
cfunc = function(i) {
message("In cfunc, I print out the sum of i, x and y...")
message(paste("i + x + y = ", i+x+y))
}
)
myclass$methods(
initialize = function(x, y) {
x <<- x
y <<- y
}
)
And then you start an instance, and call a method:
x = myclass(5, 6)
x$afunc(1)
You will get an error:
Error in x$afunc(1) : could not find function "bfunc"
I am interested in two things:
Is there a way to work around this nuisance?
Does this mean I can never split a really long class file into multiple files? (e.g. one file for each method.)
Calling bfunc(i) isn't going to invoke the method since it doesn't know what object it is operating on!
In your method definitions, .self is the object being methodded on (?). So change your code to:
myclass$methods(
afunc = function(i) {
message("In afunc, I just call bfunc...")
.self$bfunc(i)
}
)
(and similarly for bfunc). Are you coming from C++ or some language where functions within methods are automatically invoked within the object's context?
Some languages make this more explicit, for example in Python a method with one argument like yours actually has two arguments when defined, and would be:
def afunc(self, i):
[code]
but called like:
x.afunc(1)
then within the afunc there is the self variable which referes to x (although calling it self is a universal convention, it could be called anything).
In R, the .self is a little bit of magic sprinkled over reference classes. I don't think you could change it to .this even if you wanted.

Passing a `data.table` to c++ functions using `Rcpp` and/or `RcppArmadillo`

Is there a way to pass a data.table objects to c++ functions using Rcpp and/or RcppArmadillo without manually transforming to data.table to a data.frame? In the example below test_rcpp(X2) and test_arma(X2) both fail with c++ exception (unknown reason).
R code
X=data.frame(c(1:100),c(1:100))
X2=data.table(X)
test_rcpp(X)
test_rcpp(X2)
test_arma(X)
test_arma(X2)
c++ functions
NumericMatrix test_rcpp(NumericMatrix X) {
return(X);
}
mat test_arma(mat X) {
return(X);
}
Building on top of other answers, here is some example code:
#include <Rcpp.h>
using namespace Rcpp ;
// [[Rcpp::export]]
double do_stuff_with_a_data_table(DataFrame df){
CharacterVector x = df["x"] ;
NumericVector y = df["y"] ;
IntegerVector z = df["v"] ;
/* do whatever with x, y, v */
double res = sum(y) ;
return res ;
}
So, as Matthew says, this treats the data.table as a data.frame (aka a Rcpp::DataFrame in Rcpp).
require(data.table)
DT <- data.table(
x=rep(c("a","b","c"),each=3),
y=c(1,3,6),
v=1:9)
do_stuff_with_a_data_table( DT )
# [1] 30
This completely ignores the internals of the data.table.
Try passing the data.table as a DataFrame rather than NumericMatrix. It is a data.frame anyway, with the same structure, so you shouldn't need to convert it.
Rcpp sits on top of native R types encoded as SEXP. This includes eg data.frame or matrix.
data.table is not native, it is an add-on. So someone who wants this (you?) has to write a converter, or provide funding for someone else to write one.
For reference, I think the good thing is to output a list from rcpp as data.table allow update via lists.
Here is a dummy example:
cCode <-
'
DataFrame DT(DTi);
NumericVector x = DT["x"];
int N = x.size();
LogicalVector b(N);
NumericVector d(N);
for(int i=0; i<N; i++){
b[i] = x[i]<=4;
d[i] = x[i]+1.;
}
return Rcpp::List::create(Rcpp::Named("b") = b, Rcpp::Named("d") = d);
';
require("data.table");
require("rcpp");
require("inline");
DT <- data.table(x=1:9,y=sample(letters,9)) #declare a data.table
modDataTable <- cxxfunction(signature(DTi="data.frame"), plugin="Rcpp", body=cCode)
DT_add <- modDataTable(DT) #here we get the list
DT[, names(DT_add):=DT_add] #here we update by reference the data.table

Resources