Is this correct in Rcpp? - r

I want to compare each columns, and return all the results after calculating. I try to write the codes, but the outcome was not resonable. Because if there are 5 columns in a matrix, the number of result will will be 5*4/2=10 rather than 5. I think the problem is the m in codes. I don't know whether it is correct. Thanks.
library(Rcpp)
sourceCpp(code='
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
double KS(arma::colvec x, arma::colvec y) {
int n = x.n_rows;
arma::colvec w = join_cols(x, y);
arma::uvec z = arma::sort_index(w);
w.fill(-1); w.elem( find(z <= n-1) ).ones();
return max(abs(cumsum(w)))/n;
}
// [[Rcpp::export]]
Rcpp::NumericVector K_S(arma::mat mt) {
int n = mt.n_cols;
int m = 1;
Rcpp::NumericVector results(n);
for (int i = 0; i < n-1; i++) {
for (int j = i+1; j < n; j++){
arma::colvec x=mt.col(i);
arma::colvec y=mt.col(j);
results[m] = KS(x, y);
m ++;
}
}
return results;
}
')
set.seed(1)
mt <- matrix(rnorm(400*5), ncol=5)
result <- K_S(t(mt))
> result
[1] 0.0000 0.1050 0.0675 0.0475 0.0650

You had a couple of small errors. In fixing it, an intermediate version I had just filled a similar n by n matrix -- that made indexing errors obvious. Returning an arma::rowvec also helps with possible out-of-bounds index errors (it errors by default) but lastly you (in this case !!) can actually just grow a std::vector instead.
Code
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
double KS(arma::colvec x, arma::colvec y) {
int n = x.n_rows;
arma::colvec w = join_cols(x, y);
arma::uvec z = arma::sort_index(w);
w.fill(-1); w.elem( find(z <= n-1) ).ones();
return max(abs(cumsum(w)))/n;
}
// [[Rcpp::export]]
std::vector<double> K_S(arma::mat mt) {
int n = mt.n_cols;
std::vector<double> res;
for (int i = 0; i < n; i++) {
for (int j = i+1; j < n; j++){
arma::colvec x=mt.col(i);
arma::colvec y=mt.col(j);
res.push_back(KS(x, y));
}
}
return res;
}
/*** R
set.seed(1)
mt <- matrix(rnorm(400*5), ncol=5)
result <- K_S(mt)
result
*/
Output
> Rcpp::sourceCpp("~/git/stackoverflow/73916783/answer.cpp")
> set.seed(1)
> mt <- matrix(rnorm(400*5), ncol=5)
> result <- K_S(mt)
> result
[1] 0.1050 0.0675 0.0475 0.0650 0.0500 0.0775 0.0575 0.0500 0.0475 0.0600
>

Related

How to make R code with an array to be more efficient?

I have a following R code which is not efficient. I would like to make this efficient using Rcpp. Particularly, I am not used to dealing with array in Rcpp. Any help would be appreciated.
myfunc <- function(n=1600,
m=400,
p = 3,
time = runif(n,min=0.05,max=4),
qi21 = rnorm(n),
s0c = rnorm(n),
zc_min_ecox_multi = array(rnorm(n*n*p),dim=c(n,n,p)),
qi=matrix(0,n,n),
qi11 = rnorm(p),
iIc_mat = matrix(rnorm(p*p),p,p)){
for (j in 1:n){
u<-time[j]
ind<-1*(u<=time)
locu<-which(time==u)
qi2<- sum(qi21*ind) /s0c[locu]
for (i in 1:n){
qi1<- qi11%*%iIc_mat%*%matrix(zc_min_ecox_multi[i,j,],p,1)
qi[i,j]<- -(qi1+qi2)/m
}
}
}
Computing time is about 7.35 secs. I need to call this function over and over again, maybe 20 times.
system.time(myfunc())
user system elapsed
7.34 0.00 7.35
First thing to do would be to profile your code: profvis::profvis({myfunc()}).
What you can do is precompute qi11 %*% iIc_mat once.
You get (with minor improvements):
precomp <- qi11 %*% iIc_mat
for (j in 1:n) {
u <- time[j]
qi2 <- sum(qi21[u <= time]) / s0c[time == u]
for (i in 1:n) {
qi1 <- precomp %*% zc_min_ecox_multi[i, j, ]
qi[i, j] <- -(qi1 + qi2) / m
}
}
that is twice as fast (8 sec -> 4 sec).
Vectorizing the i loop then seems straightforward:
q1_all_i <- tcrossprod(precomp, zc_min_ecox_multi[, j, ])
qi[, j] <- -(q1_all_i + qi2) / m
(12 times as fast now)
And if you want to try it in Rcpp, you will first need a function to multiply the matrices...
#include<Rcpp.h>
#include<numeric>
// [[Rcpp::plugins("cpp11")]]
Rcpp::NumericMatrix mult(const Rcpp::NumericMatrix& lhs,
const Rcpp::NumericMatrix& rhs)
{
if (lhs.ncol() != rhs.nrow())
Rcpp::stop ("Incompatible matrices");
Rcpp::NumericMatrix out(lhs.nrow(),rhs.ncol());
Rcpp::NumericVector rowvec, colvec;
for (int i = 0; i < lhs.nrow(); ++i)
{
rowvec = lhs(i,Rcpp::_);
for (int j = 0; j < rhs.ncol(); ++j)
{
colvec = rhs(Rcpp::_,j);
out(i, j) = std::inner_product(rowvec.begin(), rowvec.end(),
colvec.begin(), 0.);
}
}
return out;
}
Then port your function...
// [[Rcpp::export]]
Rcpp::NumericMatrix myfunc_rcpp( int n, int m, int p,
const Rcpp::NumericVector& time,
const Rcpp::NumericVector& qi21,
const Rcpp::NumericVector& s0c,
const Rcpp::NumericVector& zc_min_ecox_multi,
const Rcpp::NumericMatrix& qi11,
const Rcpp::NumericMatrix& iIc_mat)
{
Rcpp::NumericMatrix qi(n, n);
Rcpp::NumericMatrix outermat = mult(qi11, iIc_mat);
for (int j = 0; j < n; ++j)
{
double qi2 = 0;
for(int k = 0; k < n; ++k)
{
if(time[j] <= time[k]) qi2 += qi21[k];
}
qi2 /= s0c[j];
for (int i = 0; i < n; ++i)
{
Rcpp::NumericMatrix tmpmat(p, 1);
for(int z = 0; z < p; ++z)
{
tmpmat(z, 0) = zc_min_ecox_multi[i + n*j + z*n*n];
}
Rcpp::NumericMatrix qi1 = mult(outermat, tmpmat);
qi(i,j) -= (qi1(0,0) + qi2)/m;
}
}
return qi;
}
Then in R:
my_rcpp_func <- function(n=1600,
m=400,
p = 3,
time = runif(n,min=0.05,max=4),
qi21 = rnorm(n),
s0c = rnorm(n),
zc_min_ecox_multi = array(rnorm(n*n*p),dim=c(n,n,p)),
qi11 = rnorm(p),
iIc_mat = matrix(rnorm(p*p),p,p))
{
myfunc_rcpp(n, m, p, time, qi21, s0c, as.vector(zc_min_ecox_multi),
matrix(qi11,1,p), iIc_mat)
}
This is certainly faster, and gives the same results as your own function, but it's no quicker than the in-R optimizations suggested by F Privé. Maybe optimizing the C++ code could get things even faster, but ultimately you are multiplying 2 reasonably large matrices together over 2.5 million times, so it's never going to be all that fast. R is optimized pretty well for this kind of calculation after all...

How to make Rcpp code efficient with multiple for loops?

I am trying to implement following Rcpp code by calling from R. The computing time is extremely slow. There are lots of for loops involved.
#include <RcppArmadillo.h>
using namespace Rcpp;
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
arma::mat qpart(
const int& n,
const int& p,
const int& m,
arma::vec& G,
arma::vec& ftime,
arma::vec& cause,
arma::mat& covs,
arma::mat& S1byS0hat,
arma::vec& S0hat,
arma::vec& expz){
arma::mat q(n,p);
q.zeros();
for(int u=0;u<n;++u){
arma::mat q1(1,p);
q1.zeros();
for(int iprime=0;iprime<n;++iprime){
for(int i=0;i<n;++i){
if(cause(iprime)==1 & cause(i)>1 & (ftime(i) < ftime(u)) & (ftime(u) <= ftime(iprime))){
q1 += (covs.row(i) - S1byS0hat.row(iprime))*G(iprime)/G(i)*expz(i)/S0hat(iprime);
}
}
}
q.row(u) = q1/(m*m);
}
return q;
}
Following is an example in R.
#### In R ########
n = 2000
m = 500
p=3
G = runif(n)
ftime = runif(n,0.01,5)
cause = c(rep(0,600),rep(1,1000),rep(2,400))
covs = matrix(rnorm(n*p),n,p)
S1byS0hat = matrix(rnorm(n*p),p,n)
S0hat = rnorm(n)
expz = rnorm(n)
system.time( qpart(n,p,m,G,ftime,cause,covs,t(S1byS0hat),S0hat,expz))
user system elapsed
21.5 0.0 21.5
As we can see, the computing time is very high.
Same code implemented in R and the computing time is very high.
q = matrix(0,n,p)
for(u in 1 : n){
q1 <- matrix(0,p,1)
for(iprime in 1 : n){
for(i in 1 : n){
if(cause[iprime]==1 & cause[i]>1 & (time[i]<time[u]) & (time[u] <= time[iprime])){
q1 = q1 + (covs[i,] - S1byS0hat[,iprime])*G[iprime]/G[i]*expz[i]/S0hat[iprime]
}
}
}
q[u,] = q1/(m*m)
}
Following is the formula that I am trying to implement.
Some conditions depends only on u and iprime so you can check them much before.
You can also precompute some stuff. This gives:
arma::mat qpart2(
double m,
arma::vec& ftime,
arma::vec& cause,
arma::mat& covs,
arma::mat& S1byS0hat,
arma::vec& G_div_S0hat,
arma::vec& expz_div_G){
double m2 = m * m;
int n = covs.n_rows;
int p = covs.n_cols;
arma::mat q(n, p, arma::fill::zeros);
for (int u = 0; u < n; u++) {
double ftime_u = ftime(u);
for (int iprime = 0; iprime < n; iprime++) {
if (cause(iprime) == 1 && ftime_u <= ftime(iprime)) {
for (int i = 0; i < n; i++) {
if (cause(i) > 1 && ftime(i) < ftime_u) {
double coef = G_div_S0hat(iprime) * expz_div_G(i);
for (int j = 0; j < p; j++) {
q(u, j) += (covs(i, j) - S1byS0hat(iprime, j)) * coef;
}
}
}
}
}
for (int j = 0; j < p; j++) q(u, j) /= m2;
}
return q;
}
Using qpart2(m, ftime, cause, covs, t(S1byS0hat), G / S0hat, expz / G) takes 3.7 sec (vs 32 sec for your code).
**
Small remarks:
Is there a reason why you are using arma structures instead of Rcpp ones?
You should access matrices by columns, not by rows, it should be a bit faster because they are stored column-wise.

Unexpected behaviour in Rcpp

Please note that this error was taken from a bigger context, which I cannot obviously report here entirely.
I have the following functions in the file fun.cpp
#include <RcppArmadilloExtensions/sample.h>
using namespace Rcpp;
// [[Rcpp::depends(RcppArmadillo)]]
arma::vec colMeans(arma::mat data){
int n_0 = data.n_rows;
arma::vec xbar(data.n_cols);
for(int i = 0; i < data.n_rows; i++){
for(int j = 0; j < data.n_cols; j++){
xbar[j] += data(i,j) /n_0;
}
}
return xbar;
}
// [[Rcpp::export]]
List PosteriorNIW(arma::mat data, arma::vec mu0, double lambda0,
double df0, arma::mat V){
// Compute posterior
int n = data.n_rows;
arma::vec xbar = colMeans(data);
double lambdan = lambda0 + n;
arma::vec mun = (lambda0 * mu0 + n * xbar) / lambdan;
arma::mat S;
S.zeros(data.n_cols, data.n_cols);
for(int i = 0; i < n; i++){
S += (arma::conv_to<arma::vec>::from(data.row(i)) - xbar) * arma::trans(arma::conv_to<arma::vec>::from(data.row(i)) - xbar);
}
arma::mat Vn = V + S + ((lambda0*n)/(lambda0 + n)) * (xbar - mu0) * arma::trans(xbar - mu0);
return List::create(_["mun"] = mun,
_["Vn"] = Vn,
_["lambdan"] = lambdan);
}
Calling now:
library(Rcpp); library(RcppArmadillo)
mu0 <- c(3,3)
V0 <- matrix(c(2.5,0.0,0.0,2.5), nrow = 2)
sourceCpp("fun.cpp")
data <- cbind(rep(5,15),rep(0,15))
PosteriorNIW(data, mu0, 1, 1, V0)
gives the expected result.
$mun
[,1]
[1,] 4.8750
[2,] 0.1875
$Vn
[,1] [,2]
[1,] 6.250 -5.6250
[2,] -5.625 10.9375
$lambdan
[1] 16
Now if I add to the file fun.cpp the following functions (again, these are taken from a bigger context so don't bother trying to understand but just paste them) strange things happens:
// [[Rcpp::export]]
NumericMatrix myFun(arma::mat t_dish, arma::cube data){
int l = 0;
for(int j = 0; j < data.n_rows; j++){
l++;
}
NumericMatrix Dk(l, 2);
return Dk;
}
// [[Rcpp::export]]
int myFun2(arma::cube n_cust){
arma::mat temp = n_cust.subcube(arma::span(0), arma::span(), arma::span());
int i;
for(i = 0; i < n_cust.n_cols; i++){
arma::rowvec temp2 = temp.row(i);
}
return i + 1;
}
// [[Rcpp::export]]
arma::vec myFun3(arma::mat k_tables){
arma::vec temp(k_tables.n_cols * k_tables.n_rows);
int l = 0;
if(!R_IsNA(k_tables(0,0))){
l++;
}
arma::vec temp2(l);
arma::vec tmp3 = sort(temp2);
return tmp3;
}
double myFun4(arma::vec x, double nu, arma::vec mu, arma::mat Sigma){
arma::vec product = (arma::trans(x - mu) * arma::inv(Sigma) * (x - mu));
double num = pow(1 + (1 / nu) * product[0], - ( nu + 2 ) / 2);
double den = pow(sqrt(M_PI * nu),2) * sqrt(arma::det(Sigma));
return num / den;
}
bool myFun5(NumericVector X, double z) {
return std::find(X.begin(), X.end(), z)!=X.end();
}
calling PosteriorNIW(data, mu0, 1, 1, V0) repeatedly starts giving different results every time. Note that there is no randomness in the functions and that obviously those functions have got no impact as they are not called in the original function.
I have tried on a different machine to make sure it was not a problem of my compiler but the error keeps happening.
I know that removing those function (even just one of them) fixes the problem but clearly this is not a feasible solution when I am working with more functions.
I would like to know if other users are able to replicate this behavior and if yes if there is a fix for it.
Thank you in advance
EDIT:
The version of R is 3.3.2 and Rtools is 3.4. Both Rcpp and RcppArmadillo are up-to-date
You're not zeroing xbar in your colMeans function. If I do do that:
arma::vec colMeans(arma::mat data){
int n_0 = data.n_rows;
arma::vec xbar;
xbar.zeros(data.n_cols);
for(int i = 0; i < data.n_rows; i++){
for(int j = 0; j < data.n_cols; j++){
xbar[j] += data(i,j) /n_0;
}
}
return xbar;
}
I get this everytime:
> PosteriorNIW(data, mu0, 1, 1.1, V0)
$mun
[,1]
[1,] 4.8750
[2,] 0.1875
$Vn
[,1] [,2]
[1,] 6.250 -5.6250
[2,] -5.625 10.9375
$lambdan
[1] 16
Even when I do add your extra block of code.
I don't know if these vectors are documented to be initialised to zero by their constructor (in which case this might be a bug there) or not, in which case its your bug!

Rcpp returns large negative number when 2 large positives are multiplied

I am creating a function that calculates area under the curve and when I take the 2 partials and multiply them for the numerator I exceed 2^31 and then a value like -2013386137 is used in the calculation.
Here are the cpp chunks
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector sort_rcpp(NumericVector x) {
std::vector<double> tmp = Rcpp::as< std::vector<double> > (x);
std::sort(tmp.begin(), tmp.end());
return wrap(tmp);
}
// [[Rcpp::export]]
IntegerVector rank(NumericVector x) {
return match(x, sort_rcpp(x));
}
// [[Rcpp::export]]
double auc_(NumericVector actual, NumericVector predicted) {
double n = actual.size();
IntegerVector Ranks = rank(predicted);
int NPos = sum(actual == 1);
int NNeg = (actual.size() - NPos);
int sumranks = 0;
for(int i = 0; i < n; ++i) {
if (actual[i] == 1){
sumranks = sumranks + Ranks[i];
}
}
double p1 = (sumranks - NPos*( NPos + 1 ) / 2);
long double p2 = NPos*NNeg;
double auc = p1 / p2;
return auc ;
}
and then the test example that has the issue
N = 100000
Actual = as.numeric(runif(N) > .65)
Predicted = as.numeric(runif(N))
actual = Actual
predicted = Predicted
auc_(Actual, Predicted)
I am also putting this in an R package
devtools::install_github("JackStat/ModelMetrics")
N = 100000
Actual = as.numeric(runif(N) > .65)
Predicted = as.numeric(runif(N))
actual = Actual
predicted = Predicted
ModelMetrics::auc(Actual, Predicted)
You use int internally in your function which leads to overflow. Use a double and things look sunnier:
R> sourceCpp("/tmp/jackstat.cpp")
R> N <- 100000
R> Actual <- as.numeric(runif(N) > .65)
R> Predicted <- as.numeric(runif(N))
R> auc1(Actual, Predicted) # your function
[1] -0.558932
R> auc2(Actual, Predicted) # my variant using double
[1] 0.499922
R>
The complete corrected file is below:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector sort_rcpp(NumericVector x) {
std::vector<double> tmp = Rcpp::as< std::vector<double> > (x);
std::sort(tmp.begin(), tmp.end());
return wrap(tmp);
}
// [[Rcpp::export]]
IntegerVector rank(NumericVector x) {
return match(x, sort_rcpp(x));
}
// [[Rcpp::export]]
double auc1(NumericVector actual, NumericVector predicted) {
double n = actual.size();
IntegerVector Ranks = rank(predicted);
int NPos = sum(actual == 1);
int NNeg = (actual.size() - NPos);
int sumranks = 0;
for(int i = 0; i < n; ++i) {
if (actual[i] == 1){
sumranks = sumranks + Ranks[i];
}
}
double p1 = (sumranks - NPos*( NPos + 1 ) / 2);
long double p2 = NPos*NNeg;
double auc = p1 / p2;
return auc ;
}
// [[Rcpp::export]]
double auc2(NumericVector actual, NumericVector predicted) {
double n = actual.size();
IntegerVector Ranks = rank(predicted);
double NPos = sum(actual == 1);
double NNeg = (actual.size() - NPos);
double sumranks = 0;
for(int i = 0; i < n; ++i) {
if (actual[i] == 1){
sumranks = sumranks + Ranks[i];
}
}
double p1 = (sumranks - NPos*( NPos + 1 ) / 2);
long double p2 = NPos*NNeg;
double auc = p1 / p2;
return auc ;
}
/*** R
N <- 100000
Actual <- as.numeric(runif(N) > .65)
Predicted <- as.numeric(runif(N))
auc1(Actual, Predicted)
auc2(Actual, Predicted)
*/

RcppArmadillo on several cpu cores

I have the following RccpArmadillo function that runs fine if I execute it on one cpu core. But if I use several cores, then R will crash. All the other Rcpp functions I created so far run fine on several cores (with foreach), only RccpArmadillo seems to be problematic. Any ideas how to fix that?
cppFunction('double augmentedDickeyFullerCpp(NumericVector a, NumericVector b, double gamma, double mu, int lags) {
if (gamma < 0) {
return 0;
}
int n = a.size()-1;
int lags2 = lags + 1;
// first rows, then columns
NumericMatrix x(n-lags2,lags2);
NumericMatrix zdifflag(n-lags2+1,lags2);
NumericVector diff(n);
NumericVector zdiff(n-lags2+1);
NumericVector residuals(n+1);
residuals[0] = a[0] - gamma * b[0] - mu;
// residuals a is y and b is x
for(int i = 1; i < n+1; i++) {
residuals[i] = a[i] - gamma * b[i] - mu;
diff[i-1] = residuals[i] - residuals[i-1];
}
for(int i = 0; i < n-lags2+1; i++) {
zdifflag[0,i] = residuals[i+lags2-1];
}
for(int j = 0; j < n-lags2+1; j++) {
for(int i = 0; i < lags2; i++) {
x(j,i) = diff[j+lags2-1-i];
if (i > 0) {
zdifflag(j,i) = x(j,i);
}
}
zdiff[j] = x(j,0);
}
int length = zdifflag.nrow(), k = zdifflag.ncol();
arma::mat X(zdifflag.begin(), length, k, false); // reuses memory and avoids extra copy
arma::colvec y(zdiff.begin(), zdiff.size(), false);
arma::colvec coef = arma::solve(X, y); // fit model y ~ X
arma::colvec res = y - X*coef; // residuals
// std.errors of coefficients
//arma::colvec res = y - X*coef[0];
// sqrt(sum(residuals^2)/(length - k))
double s2 = std::inner_product(res.begin(), res.end(), res.begin(), 0.0)/(length - k);
arma::colvec std_err = arma::sqrt(s2 * arma::diagvec(arma::pinv(arma::trans(X)*X)));
return coef[0]/std_err[0];
}',depends = "RcppArmadillo", includes="#include <RcppArmadillo.h>")
I generally recommend putting the code into a small package, and having each parallel worker load the package. That is known to work, both in serial and parallel, whereas relying on cppFunction() for an ad-hoc function may be too fragile for parallel execution.

Resources