Using Rcpp for faster extraction and summary of lm output - r

I'm trying to speed up extraction from R's summary lm object when contained in an extremely large loop. The following is my attempt as a Rcpp to speed it up.
Toy-Data
synthetic_LMsummary<-lapply(1:100000L,function(UU){summary(lm(rnorm(500)~replicate(2,{rnorm(500)})))})
R Version
tidy.train <- function(s,SelectedRow=3) {
out<-data.frame(
estimate=s$coefficients[, "Estimate"][SelectedRow],
std.error=s$coefficients[, "Std. Error"][SelectedRow],
statistic=s$coefficients[, "t value"][SelectedRow],
p.value=s$coefficients[, "Pr(>|t|)"][SelectedRow],
rsquared=s$r.squared
)
row.names(out) <- NULL
out
}
synthetic_LMsummary %>% purrr::map_dfr(~tidy.train(.,SelectedRow=3))
Attempted Rcpp Version
Rcpp::sourceCpp(code='
#include <Rcpp.h>
// [[Rcpp::export]]
using namespace std;
using namespace Rcpp;
Rcpp::NumericMatrix RcppTidy(Rcpp::List Summary_List,int mat_cols, int select) {
int mat_rows = Summary_List.length();
Rcpp::NumericMatrix Output_Mat(mat_rows,mat_cols);
for (int i = 0; i < Summary_List.length(); ++i) {
Rcpp::List SubSet=Summary_List[i];
Rcpp::NumericMatrix CoefDF=SubSet["coefficients"];
Rcpp::NumericVector Coef=CoefDF.row(select);
Rcpp::NumericVector rsquared=SubSet["adj.r.squared"];
Output_Mat(i,_) = cbind(Coef,rsquared);
}
return Output_Mat;
}
')
Tidy_synLM<-RcppTidy(synthetic_LMsummary,5L,3L)

Related

Why do I get the error for using "pnorm" in Rcpp

I need to involve variable from arma::in my Rcpp code. But I ran into a problem when trying to use the sugar function pnorm. Here is a demo:
#include <RcppArmadillo.h>
#include <RcppArmadilloExtensions/sample.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
// [[Rcpp::export]]
double pget(NumericVector x, NumericVector beta) {
arma::colvec xx = Rcpp::as<arma::colvec>(x) ;
arma::colvec bb = Rcpp::as<arma::colvec>(beta) ;
double tt = as_scalar( arma::trans(xx) * bb);
double temp = Rcpp::pnorm(tt);
return temp;
}
Then I got an error: no matching function for call to 'pnorm5'
Does that mean I cannot use Rcpp::pnorm???
The Rcpp sugar functions are meant for vector type arguments like Rcpp::NumericVector. For scalar arguments you can use the functions in the R namespace:
#include <RcppArmadillo.h>
#include <RcppArmadilloExtensions/sample.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
// [[Rcpp::export]]
double pget(NumericVector x, NumericVector beta) {
arma::colvec xx = Rcpp::as<arma::colvec>(x) ;
arma::colvec bb = Rcpp::as<arma::colvec>(beta) ;
double tt = as_scalar( arma::trans(xx) * bb);
double temp = R::pnorm(tt, 0.0, 1.0, 1, 0);
return temp;
}
/*** R
x <- rnorm(5)
beta <- rnorm(5)
pget(x, beta)
*/
BTW, here two variants. First variant uses arma instead of Rcpp vectors as arguments. Since these are const references, no data is copied. In addition, arma::dot is used:
// [[Rcpp::export]]
double pget2(const arma::colvec& xx, const arma::colvec& bb) {
double tt = arma::dot(xx, bb);
return R::pnorm(tt, 0.0, 1.0, 1, 0);
}
The second variant calculates the scalar product without resorting to Armadillo:
// [[Rcpp::export]]
double pget3(NumericVector x, NumericVector beta) {
double tt = Rcpp::sum(x * beta);
return R::pnorm(tt, 0.0, 1.0, 1, 0);
}
I'm much less of an expert than #RalfStubner at Rcpp, so I had to hack around (with help from StackOverflow and the Rcpp cheat sheat) to get the following code. Instead of using the R-namespace versions on scalars, I converted back to a NumericVector ... this can almost certainly be done more efficiently/skipping a few steps by someone who actually knows what they're doing ... e.g. it's possible that the arma-to-NumericVector conversion could be done directly without going through as_scalar ... ?
#include <RcppArmadillo.h>
#include <RcppArmadilloExtensions/sample.h>
#include <Rcpp.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
using namespace arma;
// [[Rcpp::export]]
NumericVector pget(NumericVector x, NumericVector beta) {
colvec xx = as<colvec>(x) ;
colvec bb = as<colvec>(beta) ;
double tt = as_scalar(trans(xx) * bb);
NumericVector tt2 = NumericVector::create( tt );
NumericVector temp = Rcpp::pnorm(tt2);
return temp;
}

Rcpp sugar commands in armadillo

I'm trying to use ifelse() command of Rcpp sugar with arma::vec. The code fails with error
'ifelse' was not declared in this scope
I could not find a solution. A simple example code (resulted with error) is below.
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
arma::vec f(arma::vec x, arma::vec y) {
arma::vec res1 = Rcpp::ifelse(x < y, x, y);
arma::vec res = trans(res1)*y;
return res;
}
/*** R
f(c(1,2,3),c(3,2,1))
*/
Using Armadillo's advanced constructors you can have Rcpp::NumericVector and arma::vec that refer to the same memory location. Then you can use both Rcpp functions and arma functions by using the correct front-end object for that piece of memory:
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
arma::vec f(Rcpp::NumericVector xr, Rcpp::NumericVector yr) {
arma::vec x(xr.begin(), xr.size(), false, true);
arma::vec y(yr.begin(), yr.size(), false, true);
Rcpp::NumericVector res1r(xr.size());
arma::vec res1(res1r.begin(), res1r.size(), false, true);
res1r = Rcpp::ifelse(xr < yr, xr, yr);
arma::vec res = trans(res1)*y;
return res;
}
/*** R
f(c(1,2,3),c(3,2,1))
*/
I am not 100% sure that this does not have any unwanted side-effects.
This is the solution that I found I hope will work for you.
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
arma::vec f(arma::vec x, arma::vec y) {
int n = x.size();
arma::vec res(n);
for(int i = 0; i < n; i++){
if (x[i] < y[i]){res[i] = x[i];} else{res[i] = y[i];}
}
return trans(res)*y;
}
The output is
/*** R
f(c(1,2,3),c(3,2,1))
*/
[,1]
[1,] 8

Convert individual Rcpp::IntegerVector element to a character

I have to convert individual elements of Rcpp::IntegerVector into their string form so I can add another string to them. My code looks like this:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
Rcpp::String int_to_char_single_fun(int x){
// Obtain environment containing function
Rcpp::Environment base("package:base");
// Make function callable from C++
Rcpp::Function int_to_string = base["as.character"];
// Call the function and receive its list output
Rcpp::String res = int_to_string(Rcpp::_["x"] = x); // example of original param
// Return test object in list structure
return (res);
}
//[[Rcpp::export]]
Rcpp::CharacterVector add_chars_to_int(Rcpp::IntegerVector x){
int n = x.size();
Rcpp::CharacterVector BASEL_SEG(n);
for(int i = 0; i < n; i++){
BASEL_SEG[i] = "B0" + int_to_char_single_fun(x[i]);
}
return BASEL_SEG;
}
/*** R
int_vec <- as.integer(c(1,2,3,4,5))
BASEL_SEG_char <- add_chars_to_int(int_vec)
*/
I get the following error:
no match for 'operator+'(operand types are 'const char[3]' and 'Rcpp::String')
I cannot import any C++ libraries like Boost to do this and can only use Rcpp functionality to do this. How do I add string to integer here in Rcpp?
We basically covered this over at the Rcpp Gallery when we covered Boost in an example for lexical_cast (though that one went the other way). So rewriting it quickly yields this:
Code
// We can now use the BH package
// [[Rcpp::depends(BH)]]
#include <Rcpp.h>
#include <boost/lexical_cast.hpp>
using namespace Rcpp;
using boost::lexical_cast;
using boost::bad_lexical_cast;
// [[Rcpp::export]]
std::vector<std::string> lexicalCast(std::vector<int> v) {
std::vector<std::string> res(v.size());
for (unsigned int i=0; i<v.size(); i++) {
try {
res[i] = lexical_cast<std::string>(v[i]);
} catch(bad_lexical_cast &) {
res[i] = "(failed)";
}
}
return res;
}
/*** R
lexicalCast(c(42L, 101L))
*/
Output
R> Rcpp::sourceCpp("/tmp/lexcast.cpp")
R> lexicalCast(c(42L, 101L))
[1] "42" "101"
R>
Alternatives
Because converting numbers to strings is as old as computing itself you could also use:
itoa()
snprintf()
streams
and probably a few more I keep forgetting.
As others have pointed out, there are several ways to do this. Here are two very straightforward approaches.
1. std::to_string
Rcpp::CharacterVector add_chars_to_int1(Rcpp::IntegerVector x){
int n = x.size();
Rcpp::CharacterVector BASEL_SEG(n);
for(int i = 0; i < n; i++){
BASEL_SEG[i] = "B0" + std::to_string(x[i]);
}
return BASEL_SEG;
}
2. Creating a new Rcpp::CharacterVector
Rcpp::CharacterVector add_chars_to_int2(Rcpp::IntegerVector x){
int n = x.size();
Rcpp::CharacterVector BASEL_SEG(n);
Rcpp::CharacterVector myIntToStr(x.begin(), x.end());
for(int i = 0; i < n; i++){
BASEL_SEG[i] = "B0" + myIntToStr[i];
}
return BASEL_SEG;
}
Calling them:
add_chars_to_int1(int_vec) ## using std::to_string
[1] "B01" "B02" "B03" "B04" "B05"
add_chars_to_int2(int_vec) ## converting to CharacterVector
[1] "B01" "B02" "B03" "B04" "B05"

Inconsistent results between Rcpp and R code

UPDATE
Previous example is complicated, hence please allow me to use a simpler example as shown below:
Here is the Rcpp code:
#include <RcppArmadillo.h>
#include <RcppArmadilloExtensions/sample.h>
#include <Rmath.h>
#include <Rcpp.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp ;
using namespace arma;
using namespace std;
// [[Rcpp::export]]
double chooseC(double n, double k) {
return Rf_choose(n, k);
}
// [[Rcpp::export]]
double function3(double n, double m, double beta) {
double prob;
NumericVector k(m);
NumericVector k_vec(m);
if(n<m){prob=0;}
else{
if(chooseC(n,m)==R_PosInf){
k=seq_len(m)-1;
k_vec= (n-k)/(m-k)*std::pow((1-beta),(n-m)/m)*beta;
prob=std::accumulate(k_vec.begin(),k_vec.end(), 1, std::multiplies<double>())*beta;
}
else{
prob = beta * chooseC(n,m) * std::pow(beta,m) * std::pow((1-beta),(n-m));
}
}
return(prob);
}
Here is the R code:
function4 <- function ( n , m , beta )
{
if ( n < m )
{
prob <- 0.0
}
else
{
if (is.infinite(choose(n,m))){
k<-0:(m-1)
prob <- beta *prod((n-k)/(m-k)*(1-beta)^((n-m)/m)*beta)
}
else{
prob <- beta * choose(n,m) * beta^m * (1-beta)^(n-m)
}
}
prob
}
Comparison:
input<-619
beta<-0.09187495
x<-seq(0, (input+1)/beta*3)
yy<-sapply(x,function(n)function3(n,input, beta=beta))
yy2<-sapply(x,function(n)function4(n,input, beta=beta))
sum(yy)=0
sum(yy2)=1
However, with other input:
input<-1
beta<-0.08214248
Both results are the same, sum(yy)=sum(yy2)=0.9865887.
I used double in Rcpp code, I don't know what else could cause the inconsistent precision between Rcpp and R code.
Thanks a lot!
I think I fix the Rcpp code, so right now both Rcpp and R code produce the same results when the results are very small values. The solution is shown as below:
#include <RcppArmadillo.h>
#include <RcppArmadilloExtensions/sample.h>
#include <Rmath.h>
#include <Rcpp.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp ;
using namespace arma;
using namespace std;
// [[Rcpp::export]]
double chooseC(double n, double k) {
return Rf_choose(n, k);
}
// [[Rcpp::export]]
double function3(double n, double m, double beta) {
double prob;
arma::vec k = arma::linspace<vec>(0, m-1, m);
arma::vec k_vec;
if(n<m){prob=0;}
else{
if(chooseC(n,m)==R_PosInf){
k_vec= (n-k)/(m-k)*pow((1-beta),(n-m)/m)*beta;
prob=arma::prod(k_vec)*beta;
}
else{
prob = beta * chooseC(n,m) * pow(beta,m) * pow((1-beta),(n-m));
}
}
return(prob);
}
However, I still do not understand why by writing code in this way will fix the precision inconsistent. Rcpp and RcppArmadillo still look like black boxes to me.

Rcpp cannot convert ‘SEXP {aka SEXPREC*}’ to ‘double’ in initialization

I am trying to duplicate the R vectorised sum in Rcpp
I first try the following trouble-free code:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
double call(NumericVector x){
return sum(x);
}
Type call(Time)
> call(Time)
[1] 1919853
Then an environment version, also works well,
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
double call(){
Environment env = Environment::global_env();
NumericVector Time = env["Time"];
return sum(Time);
}
Type call()
> call()
[1] 1919853
Now I am trying something weird as following,
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
double call(){
Environment base("package:base");
Function sumc = base["sum"];
Environment env = Environment::global_env();
NumericVector Time = env["Time"];
double res = sumc(Time);
return res;
}
This time I got a error message:
trycpp.cpp:10:25: error: cannot convert ‘SEXP {aka SEXPREC*}’ to ‘double’ in initialization
double res = sumc(Time);
Any idea what's going wrong ?
You cannot call an R function (ie sumc() on one of Rcpp's vectors. Do this:
// [[Rcpp::export]]
double mycall(){
Environment base("package:base");
Function sumc = base["sum"];
Environment env = Environment::global_env();
NumericVector Time = env["Time"];
double res = sum(Time);
return res;
}
Here sum() is the Rcpp sugar function.

Resources