In a very first attempt at creating a C++ function which can be called from R using Rcpp, I have a simple function to compute a minimum spanning tree from a distance matrix using Prim's algorithm. This function has been converted into C++ from a former version in ANSI C (which works fine).
Here it is:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
DataFrame primlm(const int n, NumericMatrix d)
{
double const din = 9999999.e0;
long int i1, nc, nc1;
double dlarge, dtot;
NumericVector is, l, lp, dist;
l(1) = 1;
is(1) = 1;
for (int i=2; i <= n; i++) {
is(i) = 0;
}
for (int i=2; i <= n; i++) {
dlarge = din;
i1 = i - 1;
for (int j=1; j <= i1; j++) {
for (int k=1; k <= n; k++) {
if (l(j) == k)
continue;
if (d[l(j), k] > dlarge)
continue;
if (is(k) == 1)
continue;
nc = k;
nc1 = l(j);
dlarge = d(nc1, nc);
}
}
is(nc) = 1;
l(i) = nc;
lp(i) = nc1;
dist(i) = dlarge;
}
dtot = 0.e0;
for (int i=2; i <= n; i++){
dtot += dist(i);
}
return DataFrame::create(Named("l") = l,
Named("lp") = lp,
Named("dist") = dist,
Named("dtot") = dtot);
}
When I compile this function using Rcpp under RStudio, I get two warnings, complaining that variables 'nc' and 'nc1' have not been initialized. Frankly, I could not understand that, as it seems to me that both variables are being initialized inside the third loop. Also, why there is no similar complaint about variable 'i1'?
Perhaps it comes as no surprise that, when attempting to call this function from R, using the below code, what I get is a crash of the R system!
# Read test data
df <- read.csv("zygo.csv", header=TRUE)
lonlat <- data.frame(df$Longitude, df$Latitude)
colnames(lonlat) <- c("lon", "lat")
# Compute distance matrix using geosphere library
library(geosphere)
d <- distm(lonlat, lonlat, fun=distVincentyEllipsoid)
# Calls Prim minimum spanning tree routine via Rcpp
library(Rcpp)
sourceCpp("Prim.cpp")
n <- nrow(df)
p <- primlm(n, d)
Here is the dataset I use for testing purposes:
"Scientific name",Locality,Longitude,Latitude Zygodontmys,Bush Bush
Forest,-61.05,10.4 Zygodontmys,Cerro Azul,-79.4333333333,9.15
Zygodontmys,Dividive,-70.6666666667,9.53333333333 Zygodontmys,Hato El
Frio,-63.1166666667,7.91666666667 Zygodontmys,Finca Vuelta
Larga,-63.1166666667,10.55 Zygodontmys,Isla
Cebaco,-81.1833333333,7.51666666667 Zygodontmys,Kayserberg
Airstrip,-56.4833333333,3.1 Zygodontmys,Limao,-60.5,3.93333333333
Zygodontmys,Montijo Bay,-81.0166666667,7.66666666667
Zygodontmys,Parcela 200,-67.4333333333,8.93333333333 Zygodontmys,Rio
Chico,-65.9666666667,10.3166666667 Zygodontmys,San Miguel
Island,-78.9333333333,8.38333333333
Zygodontmys,Tukuko,-72.8666666667,9.83333333333
Zygodontmys,Urama,-68.4,10.6166666667
Zygodontmys,Valledup,-72.9833333333,10.6166666667
Could anyone give me a hint?
The initializations of ncand nc1 are never reached if one of the three if statements is true. It might be that this is not possible with your data, but the compiler has no way knowing that.
However, this is not the reason for the crash. When I run your code I get:
Index out of bounds: [index=1; extent=0].
This comes from here:
NumericVector is, l, lp, dist;
l(1) = 1;
is(1) = 1;
When declaring a NumericVector you have to tell the required size if you want to assign values by index. In your case
NumericVector is(n), l(n), lp(n), dist(n);
might work. You have to analyze the C code carefully w.r.t. memory allocation and array boundaries.
Alternatively you could use the C code as is and use Rcpp to build a wrapper function, e.g.
#include <array>
#include <Rcpp.h>
using namespace Rcpp;
// One possibility for the function signature ...
double prim(const int n, double *d, double *l, double *lp, double *dist) {
....
}
// [[Rcpp::export]]
List primlm(NumericMatrix d) {
int n = d.nrow();
std::array<double, n> lp; // adjust size as needed!
std::array<double, n> dist; // adjust size as needed!
double dtot = prim(n, d.begin(), l.begin(), lp.begin(), dist.begin());
return List::create(Named("l") = l,
Named("lp") = lp,
Named("dist") = dist,
Named("dtot") = dtot);
}
Notes:
I am returning a List instead of a DataFrame since dtot is a scalar value.
The above code is meant to illustrate the idea. Most likely it will not work without adjustments!
I have a list (with list inside) in R and need import to C using Rcpp.
# R
MainList <- list()
MainList$myint <- 2
MainList$mylist <- list(matrix(1,2,2), matrix(2,2,2))
MainList
My goal is import the list in R (MainList$mylist in example) and copy to one 3D array in C.
I tried this:
// Rcpp
// [[Rcpp::export]]
List MyFunction (List MainList){
int N = as<int>(MainList["myint"]);
List mylistRcpp = as<List>(MainList["mylist"]); // It this work? Apparently no
double*** mylistC; // already with allocate memory
for (int h=0; h<N; h++){
NumericMatrix temp = mylistRcpp[h];
for (int i=0; i<N; i++){
for (int n=0; n<N; n++){
mylistC[h][i][n] = temp(i, n);
}
}
}
return List::create(Named("1") = N,
Named("2") = N);
}
I can import the list this way? There are some easy way to copy without copy one by one? I need the 3D array for another function. I not sure about how import the list from R to Rcpp.
Yes you can insert an Rcpp::List inside an Rcpp::List and recurse as deep as you want. No need for double** and other gymnastics.
Code
#include <Rcpp.h>
// [[Rcpp::export]]
Rcpp::List ListExample() {
std::string abc = "def";
double tol = 0.001;
Rcpp::List l = Rcpp::List::create(Rcpp::Named("method", abc),
Rcpp::Named("tolerance", tol));
Rcpp::List ll = Rcpp::List::create(Rcpp::Named("method", abc),
Rcpp::Named("tolerance", tol),
Rcpp::Named("list", l));
return ll;
}
/*** R
ListExample()
*/
Demo
R> sourceCpp("/tmp/soQ.cpp")
R> ListExample()
$method
[1] "def"
$tolerance
[1] 0.001
$list
$list$method
[1] "def"
$list$tolerance
[1] 0.001
R>
As you can see we have a list inside a list.
I have a dataframe 'tmp' on which I need to do perform calculation using the last row of another dataframe 'SpreadData'. I am using following code:
for(i in 1:ncol(tmp)){for(j in 1:nrow(tmp)){PNLData[j,i] = 10*tmp[j,i]*SpreadData[nrow(SpreadData),i]}}
Is there any faster method using mapply or something else so that I need not to use for loop.
Thanks
You can use sweep():
PNLData <- sweep(10 * tmp, 2, SpreadData[nrow(SpreadData), ], "*")
PS1: you can replace SpreadData[nrow(SpreadData), ] by tail(SpreadData, 1).
PS2: I think this makes two copies of your data. If you have a large matrix, you'd better use Rcpp.
Edit: Rcpp solution: put that an .cpp file and source it.
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericMatrix rcppFun(const NumericMatrix& x,
const NumericVector& lastCol) {
int n = x.nrow();
int m = x.ncol();
NumericMatrix res(n, m);
int i, j;
for (j = 0; j < m; j++) {
for (i = 0; i < n; i++) {
res(i, j) = 10 * x(i, j) * lastCol[j];
}
}
return res;
}
And do in R PNLData <- rcppFun(tmp, SpreadData[nrow(SpreadData), ]).
Deep inside an MCMC algorithm I need to multiply a user-provided list of matrices with a vector, i.e., the following piece of Rcpp and RcppArmadillo code is called multiple times per MCMC iteration:
List mat_vec1 (const List& Mats, const vec& y) {
int n_list = Mats.size();
Rcpp::List out(n_list);
for (int i = 0; i < n_list; ++i) {
out[i] = as<mat>(Mats[i]) * y;
}
return(out);
}
The user-provided list Mats remains fixed during the MCMC, vector y changes in each iteration. Efficiency is paramount and I'm trying to see if I can speed up the code by not having to convert the elements of Mats to arma::mat that many times (it needs to be done only once). I tried the following approach
List arma_Mats (const List& Mats) {
int n_list = Mats.size();
Rcpp::List res(n_list);
for (int i = 0; i < n_list; ++i) {
res[i] = as<mat>(Mats[i]);
}
return(res);
}
and then
List mat_vec2 (const List& Mats, const vec& y) {
int n_list = Mats.size();
Rcpp::List aMats = arma_Mats(Mats);
Rcpp::List out(n_list);
for (int i = 0; i < n_list; ++i) {
out[i] = aMats[i] * y;
}
return(out);
}
but this does not seem to work. Any pointers of alternative/better solutions are much welcome.
Ok, I basically wrote the answer in the comment but it then occurred to me that we already provide a working example in the stub created by RcppArmadillo.package.skeleton():
// [[Rcpp::export]]
Rcpp::List rcpparma_bothproducts(const arma::colvec & x) {
arma::mat op = x * x.t();
double ip = arma::as_scalar(x.t() * x);
return Rcpp::List::create(Rcpp::Named("outer")=op,
Rcpp::Named("inner")=ip);
}
This returns a list the outer product (a matrix) and the inner product (a scalar) of the given vector.
As for what is fast and what is not: I recommend to not conjecture but rather profile and measure as much as you can. My inclination would be to do more (standalone) C++ code in Armadillo and only return at the very end minimizing conversions.
I am puzzled.
The following compile and work fine:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
List test(){
List l;
IntegerVector v(5, NA_INTEGER);
l.push_back(v);
return l;
}
In R:
R) test()
[[1]]
[1] NA NA NA NA NA
But when I try to set the IntegerVector in the list:
// [[Rcpp::export]]
List test(){
List l;
IntegerVector v(5, NA_INTEGER);
l.push_back(v);
l[0][1] = 1;
return l;
}
It does not compile:
test.cpp:121:8: error: invalid use of incomplete type 'struct SEXPREC'
C:/PROGRA~1/R/R-30~1.0/include/Rinternals.h:393:16: error: forward declaration of 'struct SEXPREC'
It is because of this line:
l[0][1] = 1;
The compiler has no idea that l is a list of integer vectors. In essence l[0] gives you a SEXP (the generic type for all R objects), and SEXP is an opaque pointer to SEXPREC of which we don't have access to te definition (hence opaque). So when you do the [1], you attempt to get the second SEXPREC and so the opacity makes it impossible, and it is not what you wanted anyway.
You have to be specific that you are extracting an IntegerVector, so you can do something like this:
as<IntegerVector>(l[0])[1] = 1;
or
v[1] = 1 ;
or
IntegerVector x = l[0] ; x[1] = 1 ;
All of these options work on the same underlying data structure.
Alternatively, if you really wanted the syntax l[0][1] you could define your own data structure expressing "list of integer vectors". Here is a sketch:
template <class T>
class ListOf {
public:
ListOf( List data_) : data(data_){}
T operator[](int i){
return as<T>( data[i] ) ;
}
operator List(){ return data ; }
private:
List data ;
} ;
Which you can use, e.g. like this:
// [[Rcpp::export]]
List test2(){
ListOf<IntegerVector> l = List::create( IntegerVector(5, NA_INTEGER) ) ;
l[0][1] = 1 ;
return l;
}
Also note that using .push_back on Rcpp vectors (including lists) requires a complete copy of the list data, which can cause slow you down. Only use resizing functions when you don't have a choice.