I have the following function declared in Rcpp:
#include <Rcpp.h>
// [[Rcpp::depends(RcppArmadillo)]]
#include <Rmath.h>
using namespace Rcpp;
// [[Rcpp::export]]
double loglikZeta(double zold, double zstar, NumericVector y, int K, double p){
NumericVector num = Rcpp::dbinom(y,K,p*zstar);
NumericVector den = Rcpp::dbinom(y,K,p*zold);
return (num[0]/den[0]);
}
// [[Rcpp::export]]
double singleZetaSampler(NumericVector z, NumericVector y,
double p, int K, int i, double zstar){
return loglikZeta(z[i-1],zstar,y[i-1],K,p);
}
Now declare (after having loaded package and file):
z <- y <- c(rep(1,20),rep(0,20))
n <- length(y)
K <- 3
p <- 0.5
i <- 30
zstar <- 1
The unexpected behaviour is that if I try to call I have everytime different results (there is nothing random in the function):
singleZetaSampler(z,y,p,K,i,zstar)
[1] 1.000051
singleZetaSampler(z,y,p,K,i,zstar)
[1] 0.1887447
singleZetaSampler(z,y,p,K,i,zstar)
[1] 0.9999998
Is there any big error am I doing here or these results are actually unexpected?
EDIT:
Sorry if the function doesn't make sense used as it is. This was the original function:
// [[Rcpp::export]]
NumericVector zetaSampler(int n, NumericVector z, NumericVector y,
double p, int K){
NumericVector xx(n);
for(int i = 0; i < n; i++){
xx(i) = loglikZeta(z[i],1,y[i],K,p);
}
return xx;
}
and calling:
zetaSampler(length(z),z,y,p,K)
as before gives different results every time.
Two things. One actual error, one sort-of-stylistic.
The stylistic issue is that you include Rmath.h and depend on RcppArmadillo when you should not. The real error is that you sample 20 times but then set i=30 and access the 30th element. So you get random inputs.
Here is what I just ran, and it gets three times the same result.
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
double loglikZeta(double zold, double zstar, NumericVector y, int K, double p){
NumericVector num = Rcpp::dbinom(y,K,p*zstar);
NumericVector den = Rcpp::dbinom(y,K,p*zold);
return (num[0]/den[0]);
}
// [[Rcpp::export]]
double singleZetaSampler(NumericVector z, NumericVector y,
double p, int K, int i, double zstar){
return loglikZeta(z[i-1],zstar,y[i-1],K,p);
}
/*** R
z <- y <- c(rep(1,20),rep(0,20))
n <- length(y)
K <- 3
p <- 0.5
i <- 20 # not 30
zstar <- 1
singleZetaSampler(z,y,p,K,i,zstar)
singleZetaSampler(z,y,p,K,i,zstar)
singleZetaSampler(z,y,p,K,i,zstar)
*/
Output:
R> sourceCpp("/tmp/foo.cpp")
R> z <- y <- c(rep(1,20),rep(0,20))
R> n <- length(y)
R> K <- 3
R> p <- 0.5
R> i <- 20 # not 30
R> zstar <- 1
R> singleZetaSampler(z,y,p,K,i,zstar)
[1] 1
R> singleZetaSampler(z,y,p,K,i,zstar)
[1] 1
R> singleZetaSampler(z,y,p,K,i,zstar)
[1] 1
R>
Edit: Appears to work better in a repaired version forcing scalar arguments to loglikZeta():
// [[Rcpp::export]]
double loglikZeta(double zold, double zstar, double y, int K, double p){
double num = R::dbinom(y, K, p*zstar, false);
double den = R::dbinom(y, K, p*zold, false);
return (num/den);
}
Note that Rcpp::dbinom() has a signature of Rcpp::dbinom(Rcpp::NumericVector, int, double, bool=false).
Related
Is there any way to efficiently translate the outer() function for multiplication of two vectors from R base to RcppArmadillo? I attempted to do so but it is not efficient at all.
Take the following example:
library(Rcpp)
library(RcppArmadillo)
library(microbenchmark)
#Outer attempt
cppFunction(depends = "RcppArmadillo",
'
arma::mat outer_rcpp(arma::vec x, arma::vec y) {
int x_length = x.n_elem;
int y_length = y.n_elem;
arma::mat final(x_length, y_length);
// And use loops instead of outer
for(int i = 0; i < x_length; i++) {
final.col(i) = x[i] * y;
}
return(final);
}
'
)
#Test for equal results
a <- rnorm(5)
base <- base::outer(a, a)
rcpp <- outer_rcpp(a, a)
all.equal(base, rcpp)
#Test for speed
b <- rnorm(5000)
microbenchmark(base = base::outer(b, b),
rcpp = outer_rcpp(b, b), times = 10)
The results are 2 times slower using R base. I am sure that this can be done though matrix multiplication, any idea how?
As #thelatemail pointed out in the comments, the outer routine is already using a heavily optimized C routine.
src/library/base/R/outer.R: tcrossprod usage.
src/main/array.c: underlying C routine powering the tcrossprod computation.
Armadillo itself has its own optimization for addressing matrix multiplication using the dgemm and dgemv routines from LAPACK:
armadillo_bits/mul_gemm.hpp: C := alpha*op( A )op( B ) + betaC,
armadillo_bits/mul_gemv.hpp: y := alphaAx + betay, or y := alphaA**Tx + betay,
Playing around with the outerproduct calculations leads to a few optimizations. Mainly, we're opting to move the outer product into armadillo actions instead of loops:
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
arma::mat outer_rcpp(const arma::vec& x, const arma::vec& y) {
int x_length = x.n_elem;
int y_length = y.n_elem;
arma::mat final(x_length, y_length);
// And use loops instead of outer
for(int i = 0; i < x_length; i++) {
final.col(i) = x[i] * y;
}
return final;
}
// [[Rcpp::export]]
arma::mat outer_with_armadillo(const arma::vec& x, const arma::vec& y) {
arma::mat final = x*y.t();
return final;
}
// [[Rcpp::export]]
arma::mat outer_with_armadillo_transposed(const arma::vec& x, const arma::rowvec& y) {
arma::mat final = x*y;
return final;
}
Revisiting the benchmarking code, we have:
b = rnorm(5000)
b_tranposed = t(b)
bench_results = microbenchmark::microbenchmark(base = base::outer(b, b),
outer_armadillo_loop = outer_rcpp(b, b),
outer_armadillo_optimized = outer_with_armadillo(b, b),
outer_armadillo_optimized_transposed = outer_with_armadillo_transposed(b, b_tranposed), times = 10)
bench_results
expr
min
lq
mean
median
uq
max
neval
base
132.8601
141.3532
156.9979
146.7993
154.8954
234.2619
10
outer_armadillo_loop
278.4115
279.9204
317.7907
288.4212
329.0769
451.6872
10
outer_armadillo_optimized
272.4348
283.3380
347.7913
304.1181
339.3282
728.2264
10
outer_armadillo_optimized_transposed
269.7855
270.7108
297.9580
279.8099
312.3488
386.4270
10
From the results, the lowest I could achieve is having a pre-transposed b vector from column vector form into row-vector form: (n x 1) * (1 x m)
I'm still wrapping my head around Rcpp logic coming from an R context, so please be patient with me! From the following code:
Cjplus <- c(0,0)
Kseq <- c(1,2)
cand <- c(0,1)
cppFunction("NumericVector test(NumericVector Cjplus, NumericVector Kseq,
NumericVector cand, int i) {
NumericVector A = as<NumericVector>(Cjplus[Kseq-1]);
int B = cand[i-2];
as<NumericVector>(Cjplus[Kseq-1]) = A + B;
return Cjplus[Kseq-1];
}")
test(Cjplus, Kseq, cand, 3)
I expect to get [1] 1 1 as my output, but instead I get [1] 0 0. Am I indexing incorrectly here?
I don't understand why you try to use subset-assignment.
Cjplus <- c(0,0)
Kseq <- c(1,2)
cand <- c(0,1)
cppFunction("NumericVector test(NumericVector Cjplus, NumericVector Kseq,
NumericVector cand, int i) {
NumericVector A = Cjplus[Kseq-1];
double B = cand[i-2];
A = A + B;
return A;
}")
test(Cjplus, Kseq, cand, 3)
#[1] 1 1
Edit:
Here is a version that does subset-assignment.
library(Rcpp)
cppFunction("NumericVector test(const NumericVector Cjplus, NumericVector Kseq,
NumericVector cand, int i) {
NumericVector C = clone(Cjplus);
NumericVector A = C[Kseq-1];
double B = cand[i-2];
A = A + B;
C[Kseq-1] = A;
return C;
}")
test(Cjplus, Kseq, cand, 3)
#[1] 1 1
I'm very new to Rcpp. I 'm trying to write a coordinate descent algorithm for lasso in Rcpp as a self study. The code return an error:
Mat::init(): requested size is not compatible with column vector layout
I also have some problems while writing the code.
no matching function for call to 'sign'
no matching function for call to 'ifelse'
pow(X.col(j),2) : no viable conversion
I write
(S1>0)-(S1<0) for (1) to compute the sign of S1,
a if(){}else{} statement for (2) and
X.col(j)%X.col(j) for (3).
Any suggestion, please?
Here is the code.
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
using namespace Rcpp;
using namespace arma;
// [[Rcpp::export]]
mat betamat(NumericMatrix Xr, NumericVector yr, NumericVector lambdar, double tol=0.0000001) {
int N = Xr.nrow(), p = Xr.ncol(), l = lambdar.size();
mat X(Xr.begin(), N, p, false);
colvec y(yr.begin(), yr.size(), false);
vec lambda(lambdar.begin(), lambdar.size(),false);
colvec ols = solve(X,y);
mat betas = zeros<mat>(p,l);
//
bool converged = false;
for (int i = 0; i < l; ++i) {
colvec b = zeros<vec>(p);
colvec r = y-X*b;
while(converged == false){
colvec beta_old = betas;
for(int j = 0; j < p; ++j){
r = r + X.col(j)*b(j);
double xr = dot(X.col(j),r);
double S1 = xr/N;
double xx = sum(X.col(j)%X.col(j))/N;
b(j) =((S1>0)-(S1<0))*(abs(S1)-lambda(i))/xx;
if(b(j)>0){
b(j)=b(j);
}else{
b(j)=0;
}
r = r - X.col(j)*b(j);
}
converged = (sum(abs(betas - beta_old)) < tol);
}
betas.col(i) = b;
}
return betas;
}
In R, I'm calling this with
library(Rcpp)
sourceCpp("filename.cpp")
set.seed(1)
X <- matrix(rnorm(100*3),100)
y <- rnorm(100)
coefficients <- betamat(X,y,seq(0,1,0.0005))
I have an Rcpp function that should take an IntegerVector as input (as toInt). I want to use it on vector of integers, but also on vector of doubles that are just integers (e.g. 1:4 is of type integer but 1:4 + 1 is of type double).
Yet, when this is used on real floating point numbers (e.g. 1.5), I would like it to return a warning or an error instead of silently rounding all values (to make them integers).
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
IntegerVector toInt(RObject x) {
return as<IntegerVector>(x);
}
> toInt(c(1.5, 2.4)) # I would like a warning
[1] 1 2
> toInt(1:2 + 1) # No need of warning
[1] 2 3
Rcpp sugar has all you need. Here is one possible implementation:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
IntegerVector fprive(const RObject & x) {
NumericVector nv(x);
IntegerVector iv(x);
if (is_true(any(nv != NumericVector(iv)))) warning("Uh-oh");
return(iv);
}
/*** R
fprive(c(1.5, 2))
fprive(c(1L, 2L))
*/
Its output is as follows:
R> Rcpp::sourceCpp('/tmp/fprive.cpp')
R> fprive(c(1.5, 2))
[1] 1 2
R> fprive(c(1L, 2L))
[1] 1 2
Warning message:
In fprive(c(1.5, 2)) : Uh-oh
R>
Because it is a warning object, you can control via options("warn") whether you want to abort, print immediately, print at end, ignore, ...
The first solution I thought of
// [[Rcpp::export]]
IntegerVector toInt2(const NumericVector& x) {
for (int i = 0; i < x.size(); i++) {
if (x[i] != (int)x[i]) {
warning("Uh-oh");
break;
}
}
return as<IntegerVector>(x);
}
but I wondered if there wasn't an unnecessary copy when x was an IntegerVector, so I made this other solution:
// [[Rcpp::export]]
IntegerVector toInt3(const RObject& x) {
NumericVector nv(x);
for (int i = 0; i < nv.size(); i++) {
if (nv[i] != (int)nv[i]) {
warning("Uh-oh");
break;
}
}
return as<IntegerVector>(x);
}
But, maybe the best solution would be to test if the RObject is already of type int and to fill the resulting vector at the same time of checking the type:
// [[Rcpp::export]]
SEXP toInt4(const RObject& x) {
if (TYPEOF(x) == INTSXP) return x;
NumericVector nv(x);
int i, n = nv.size();
IntegerVector res(n);
for (i = 0; i < n; i++) {
res[i] = nv[i];
if (nv[i] != res[i]) {
warning("Uh-oh");
break;
}
}
for (; i < n; i++) res[i] = nv[i];
return res;
}
Some benchmarking:
x <- seq_len(1e7)
x2 <- x; x2[1] <- 1.5
x3 <- x; x3[length(x3)] <- 1.5
microbenchmark::microbenchmark(
fprive(x), toInt2(x), toInt3(x), toInt4(x),
fprive(x2), toInt2(x2), toInt3(x2), toInt4(x2),
fprive(x3), toInt2(x3), toInt3(x3), toInt4(x3),
times = 20
)
Unit: microseconds
expr min lq mean median uq max neval
fprive(x) 229865.629 233539.952 236049.68870 235623.390 238500.4335 244608.276 20
toInt2(x) 98249.764 99520.233 102026.44305 100468.627 103480.8695 114144.022 20
toInt3(x) 50631.512 50838.560 52307.34400 51417.296 52524.0260 58311.909 20
toInt4(x) 1.165 6.955 46.63055 10.068 11.0755 766.022 20
fprive(x2) 63134.534 64026.846 66004.90820 65079.292 66674.4835 74907.065 20
toInt2(x2) 43073.288 43435.478 44068.28935 43990.455 44528.1800 45745.834 20
toInt3(x2) 42968.743 43461.838 44268.58785 43682.224 44235.6860 51906.093 20
toInt4(x2) 19379.401 19640.198 20091.04150 19918.388 20232.4565 21756.032 20
fprive(x3) 254034.049 256154.851 258329.10340 258676.363 259549.3530 264550.346 20
toInt2(x3) 77983.539 79162.807 79901.65230 79424.011 80030.3425 87906.977 20
toInt3(x3) 73521.565 74329.410 76050.63095 75128.253 75867.9620 88240.937 20
toInt4(x3) 22109.970 22529.713 23759.99890 23072.738 23688.5365 30905.478 20
So, toInt4 seems the best solution.
I'm trying to convert a r function into Rcpp to try and speed thing up since it involves a for loop. Along the way I need to calculate the mean of the entries of a vector, which in R would be as simple as mean(x), but it appears to not work in Rcpp, giving me 0 0 as result everytime.
My code looks like this:
cppFunction(
"NumericVector fun(int n, double lambda, ...) {
...
NumericVector y = rpois(n, lambda);
NumericVector w = dpois(y, lambda);
NumericVector x = w*y;
double z = mean(x);
return z;
}")
Edit: So I thought my error was due to what was mentioned above, and the return of a single double of z is just me trying to isolate the issue. The following code however still does not work:
cppFunction(
"NumericVector zstat(int n, double lambda, double lambda0, int m) {
NumericVector z(m);
for (int i=1; i<m; ++i){
NumericVector y = rpois(n, lambda0);
NumericVector w = dpois(y, lambda)/dpois(y,lambda0);
double x = mean(w*y);
z[i] = (x-2)/(sqrt(2/n));
}
return z;
}")
The return type of your function is NumericVector, but Rcpp::mean returns a scalar value convertible to double. Fixing this will correct the issue:
library(Rcpp)
cppFunction(
"double fun(int n, double lambda) {
NumericVector y = rpois(n, lambda);
NumericVector w = dpois(y, lambda);
NumericVector x = w*y;
double z = mean(x);
return z;
}")
set.seed(123)
fun(50, 1.5)
# [1] 0.2992908
What is happening in your code is since NumericVector was specified as the return type, this constructor is called,
template <typename T>
Vector(T size,
typename Rcpp::traits::enable_if<traits::is_arithmetic<T>::value, void>::type* = 0) {
Storage::set__( Rf_allocVector( RTYPE, size) ) ;
init() ;
}
which casts the double to an integral type and creates a NumericVector with length equal to the truncated value of the double. To demonstrate,
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector from_double(double x) {
return x;
}
/*** R
sapply(0.5:4.5, from_double)
# [[1]]
# numeric(0)
#
# [[2]]
# [1] 0
#
# [[3]]
# [1] 0 0
#
# [[4]]
# [1] 0 0 0
#
# [[5]]
# [1] 0 0 0 0
*/
Edit: Regarding your second question, you are dividing by sqrt(2 / n), where 2 and n are both integers, which ends up causing a division by zero in most cases -- hence all of the Inf values in the result vector. You can fix this by using 2.0 instead of 2:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector zstat(int n, double lambda, double lambda0, int m) {
NumericVector z(m);
for (int i=1; i<m; ++i){
NumericVector y = rpois(n, lambda0);
NumericVector w = dpois(y, lambda)/dpois(y,lambda0);
double x = mean(w * y);
// z[i] = (x - 2) / sqrt(2 / n);
// ^^^^^
z[i] = (x - 2) / sqrt(2.0 / n);
// ^^^^^^^
}
return z;
}
/*** R
set.seed(123)
zstat(25, 2, 3, 10)
# [1] 0.0000000 -0.4427721 0.3199805 0.1016661 0.4078687 0.4054078
# [7] -0.1591861 0.9717596 0.6325110 0.1269779
*/
C++ is not R -- you need to be more careful about the types of your variables.