Removing an element from a list in Rcpp - r

Suppose I have the following list:
x <- list(a = c(1, 2), b = c("a", "c"), c = 1:10)
In R, I can remove the first element using the following two methods:
x[-1]
x[1] <- NULL
I'm trying to do same thing in Rcpp, but I can't figure it out. Following code just assigns NULL to the first element.
// [[Rcpp::export]]
Rcpp::List removeElement(Rcpp::List x)
{
x[0] = R_NilValue;
return(x);
}
Any ideas?

What about
// [[Rcpp::export]]
Rcpp::List removeElement(Rcpp::List x, int j)
{
IntegerVector idx = seq_len(x.length());
return(x[idx != j]);
}
Or if you want the indices to start from 0 use
IntegerVector idx = seq_len(x.length()) - 1;

Related

Rcpp: how to combine the R function and Rcpp function together to make a package

Suppose I have the following c++ code in a file named test.cpp
#include <Rcpp.h>
//[[Rcpp::export]]
Rcpp::NumericMatrix MyAbar (const Rcpp::NumericMatrix & x, int T){
unsigned int outrows = x.nrow(), i = 0, j = 0;
double d;
Rcpp::NumericMatrix out(outrows,outrows);
// Rcpp::LogicalVector comp;
for (i = 0; i < outrows - 1; i++){
Rcpp::NumericVector v1 = x.row(i);
Rcpp::NumericVector ans(outrows);
for (j = i + 1; j < outrows ; j ++){
d = mean(Rcpp::runif( T ) < x(i,j));
out(j,i)=d;
out(i,j)=d;
}
}
return out;
}
I know with the following command, I can have my own package
Rcpp.package.skeleton("test",cpp_files = "~/Desktop/test.cpp")
However, what if I want to combine the following R function which call the Rcpp-function into the package
random = function(A, T){
if (!is.matrix(A)){
A = Reduce("+",A)/T
}
# global constant and threshold
n = nrow(A)
B_0 = 3
w = min(sqrt(n),sqrt(T * log(n)))
q = B_0 * log(n) / (sqrt(n) * w)
A2 = MyAbar(A)
diag(A2) <- NA
K = A2 <= rowQuantiles(A2, probs=q, na.rm =TRUE)
diag(K) = FALSE
P = K %*% A * ( 1/(rowSums(K) + 1e-10))
return( (P + t(P))*0.5 )
}
How can i make it?
So you are asking how to make an R package? There are many good tutorials.
To a first approximation:
copy your file into, say, file R/random.R
deal with a help file for your function, either manually by writing man/random.Rd or by learning package roxygen2
make sure you know what NAMESPACE is for and that DESCRIPTION is right

Rcpp: Storing elements to subset of vector

I'm still wrapping my head around Rcpp logic coming from an R context, so please be patient with me! From the following code:
Cjplus <- c(0,0)
Kseq <- c(1,2)
cand <- c(0,1)
cppFunction("NumericVector test(NumericVector Cjplus, NumericVector Kseq,
NumericVector cand, int i) {
NumericVector A = as<NumericVector>(Cjplus[Kseq-1]);
int B = cand[i-2];
as<NumericVector>(Cjplus[Kseq-1]) = A + B;
return Cjplus[Kseq-1];
}")
test(Cjplus, Kseq, cand, 3)
I expect to get [1] 1 1 as my output, but instead I get [1] 0 0. Am I indexing incorrectly here?
I don't understand why you try to use subset-assignment.
Cjplus <- c(0,0)
Kseq <- c(1,2)
cand <- c(0,1)
cppFunction("NumericVector test(NumericVector Cjplus, NumericVector Kseq,
NumericVector cand, int i) {
NumericVector A = Cjplus[Kseq-1];
double B = cand[i-2];
A = A + B;
return A;
}")
test(Cjplus, Kseq, cand, 3)
#[1] 1 1
Edit:
Here is a version that does subset-assignment.
library(Rcpp)
cppFunction("NumericVector test(const NumericVector Cjplus, NumericVector Kseq,
NumericVector cand, int i) {
NumericVector C = clone(Cjplus);
NumericVector A = C[Kseq-1];
double B = cand[i-2];
A = A + B;
C[Kseq-1] = A;
return C;
}")
test(Cjplus, Kseq, cand, 3)
#[1] 1 1

Rewriting R's cummin() function using Rcpp and allowing for NAs

I'm learning Rcpp. In this example, I'm attempting to roll my own cummin() function like base R's cummin(), except I'd like my version to have a na.rm argument. This is my attempt
cummin.cpp
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector cummin_cpp(NumericVector x, bool narm = false){
// Given a numeric vector x, returns a vector of the
// same length representing the cumulative minimum value
// if narm = true, NAs will be ignored (The result may
// contain NAs if the first values of x are NA.)
// if narm = false, the resulting vector will return the
// cumulative min until the 1st NA value is encountered
// at which point all subsequent entries will be NA
if(narm){
// Ignore NAs
for(int i = 1; i < x.size(); i++){
if(NumericVector::is_na(x[i]) | (x[i-1] < x[i])) x[i] = x[i-1];
}
} else{
// Don't ignore NAs
for(int i = 1; i < x.size(); i++){
if(NumericVector::is_na(x[i-1]) | NumericVector::is_na(x[i])){
x[i] = NA_REAL;
} else if(x[i-1] < x[i]){
x[i] = x[i-1];
}
}
}
return x;
}
foo.R
library(Rcpp)
sourceCpp("cummin.cpp")
x <- c(3L, 1L, 2L)
cummin(x) # 3 1 1
cummin_cpp(x) # 3 1 1
class(cummin(x)) # integer
class(cummin_cpp(x)) # numeric
I have a few questions..
R's standard variable name is na.rm, not narm as I've done. However, it seems I can't use a dot in the c++ variable name. Is there a way around this so I can be consistent with R's convention?
I don't know ahead of time if the user's input is going to be a numeric vector or an integer vector, so I've used Rcpp's NumericVector type. Unfortunately, if the input is integer, the output is cast to numeric unlike base R's cummin() behavior. How do people usually deal with this issue?
The line if(NumericVector::is_na(x[i]) | (x[i-1] < x[i])) x[i] = x[i-1]; seems silly, but I don't know a better way to do this. Suggestions here?
I would use this:
template<typename T, int RTYPE>
Vector<RTYPE> cummin_cpp2(Vector<RTYPE> x, bool narm){
Vector<RTYPE> res = clone(x);
int i = 1, n = res.size();
T na;
if(narm){
// Ignore NAs
for(; i < n; i++){
if(ISNAN(res[i]) || (res[i-1] < res[i])) res[i] = res[i-1];
}
} else{
// Do not ignore NAs
for(; i < n; i++){
if(ISNAN(res[i-1])) {
na = res[i-1];
break;
} else if(res[i-1] < res[i]){
res[i] = res[i-1];
}
}
for(; i < n; i++){
res[i] = na;
}
}
return res;
}
// [[Rcpp::export]]
SEXP cummin_cpp2(SEXP x, bool narm = false) {
switch (TYPEOF(x)) {
case INTSXP: return cummin_cpp2<int, INTSXP>(x, narm);
case REALSXP: return cummin_cpp2<double, REALSXP>(x, narm);
default: Rcpp::stop("SEXP Type Not Supported.");
}
}
Try this on:
x <- c(NA, 7, 5, 4, NA, 2, 4)
x2 <- as.integer(x)
cummin_cpp(x, narm = TRUE)
x
cummin_cpp(x2)
x2
x <- c(NA, 7, 5, 4, NA, 2, 4)
x2 <- as.integer(x)
x3 <- replace(x, is.na(x), NaN)
cummin_cpp2(x, narm = TRUE)
x
cummin_cpp2(x2)
x2
cummin_cpp2(x3)
x3
Explanation:
Joran's advice is good, just wrap that in an R function
I use a dispatcher as Joseph Wood suggested
Beware that x is passed by reference and is modified if of the same type of what you declared (see these 2 slides)
You need to handle NA as well as NaN
You can use || instead of | to evaluate only the first condition if it is true.

Lasso solution with Rcpp: A self study

I'm very new to Rcpp. I 'm trying to write a coordinate descent algorithm for lasso in Rcpp as a self study. The code return an error:
Mat::init(): requested size is not compatible with column vector layout
I also have some problems while writing the code.
no matching function for call to 'sign'
no matching function for call to 'ifelse'
pow(X.col(j),2) : no viable conversion
I write
(S1>0)-(S1<0) for (1) to compute the sign of S1,
a if(){}else{} statement for (2) and
X.col(j)%X.col(j) for (3).
Any suggestion, please?
Here is the code.
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
using namespace Rcpp;
using namespace arma;
// [[Rcpp::export]]
mat betamat(NumericMatrix Xr, NumericVector yr, NumericVector lambdar, double tol=0.0000001) {
int N = Xr.nrow(), p = Xr.ncol(), l = lambdar.size();
mat X(Xr.begin(), N, p, false);
colvec y(yr.begin(), yr.size(), false);
vec lambda(lambdar.begin(), lambdar.size(),false);
colvec ols = solve(X,y);
mat betas = zeros<mat>(p,l);
//
bool converged = false;
for (int i = 0; i < l; ++i) {
colvec b = zeros<vec>(p);
colvec r = y-X*b;
while(converged == false){
colvec beta_old = betas;
for(int j = 0; j < p; ++j){
r = r + X.col(j)*b(j);
double xr = dot(X.col(j),r);
double S1 = xr/N;
double xx = sum(X.col(j)%X.col(j))/N;
b(j) =((S1>0)-(S1<0))*(abs(S1)-lambda(i))/xx;
if(b(j)>0){
b(j)=b(j);
}else{
b(j)=0;
}
r = r - X.col(j)*b(j);
}
converged = (sum(abs(betas - beta_old)) < tol);
}
betas.col(i) = b;
}
return betas;
}
In R, I'm calling this with
library(Rcpp)
sourceCpp("filename.cpp")
set.seed(1)
X <- matrix(rnorm(100*3),100)
y <- rnorm(100)
coefficients <- betamat(X,y,seq(0,1,0.0005))

Vectorized Method instead of using for loop

I have a dataframe 'tmp' on which I need to do perform calculation using the last row of another dataframe 'SpreadData'. I am using following code:
for(i in 1:ncol(tmp)){for(j in 1:nrow(tmp)){PNLData[j,i] = 10*tmp[j,i]*SpreadData[nrow(SpreadData),i]}}
Is there any faster method using mapply or something else so that I need not to use for loop.
Thanks
You can use sweep():
PNLData <- sweep(10 * tmp, 2, SpreadData[nrow(SpreadData), ], "*")
PS1: you can replace SpreadData[nrow(SpreadData), ] by tail(SpreadData, 1).
PS2: I think this makes two copies of your data. If you have a large matrix, you'd better use Rcpp.
Edit: Rcpp solution: put that an .cpp file and source it.
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericMatrix rcppFun(const NumericMatrix& x,
const NumericVector& lastCol) {
int n = x.nrow();
int m = x.ncol();
NumericMatrix res(n, m);
int i, j;
for (j = 0; j < m; j++) {
for (i = 0; i < n; i++) {
res(i, j) = 10 * x(i, j) * lastCol[j];
}
}
return res;
}
And do in R PNLData <- rcppFun(tmp, SpreadData[nrow(SpreadData), ]).

Resources