input stream from aws S3 into Rcpp - r

I have .txt file named test_file.txt in my AWS S3 bucket stored in the location
s3://path/to/csv/test_file.txt which contains a sample statement like - this is for testing
I can read this file directly from the S3 bucket location using a combination of readLines and pipe statement in R as
df <- readLines(pipe("aws s3 cp s3://path/to/csv/test_file.txt -"))
df
#[1] "this is for testing"
When I try to replicate the same in Rcpp, I write a file rcoo_input_string.cpp as follows
#include <Rcpp.h>
#include <string.h>
// [[Rcpp::export]]
Rcpp::String rcpp_input_test() {
Rcpp::Environment base = Rcpp::Environment("package:base");
Rcpp::Function readline = base["readline"];
Rcpp::Function as_string = base["as.character"];
std::string input_string = Rcpp::as<std::string> (as_string(readline("> ")));
Rcpp::Rcout << input_string << std::endl;
return input_string;
}
/*** R
library(magrittr)
rcpp_input_test()
*/
Note:I got the idea that std::cin wont work in R or Rcpp, so I took inspiration from this post Getting user input from R console: Rcpp and std::cin
I write another file rcoo_input_string.R as follows
library(Rcpp)
sourceCpp("./rcoo_input_string.cpp")
and execute the rcoo_input_string.R file from the AWS CLI as
aws s3 cp s3://path/to/csv/test_file.txt - | Rscript rcoo_input_string.R
but I get the following output on the CLI
> library(magrittr)
> rcpp_input_test()
>
[1] ""
which means that the Rcpp function did not read the test_file.txt.
Please note that running
sourceCpp("./rcoo_input_string.cpp")
runs perfectly well and I can give the input and the Rcpp function returns the string value as well (like this)
> sourceCpp("./S3_to_R_data_import/rcoo_input_string.cpp")
> library(magrittr)
> rcpp_input_test()
> this is for testing
this is for testing
[1] "this is for testing"
Can someone guide me on how to create an input stream for this function in R ?
I say input stream, and please correct me if I am wrong, because I wrote a similar code for a C++ file and it executed perfectly.
Note: In trying to solve this problem, I tried cppFunction too, but that did not not solve the problem. Example of rcoo_input_string_2.R
library(Rcpp)
#sourceCpp("/mnt/legoland/S3_to_R_data_import/rcoo_input_string.cpp")
cppFunction(' String rcpp_input_test() {
Environment base = Environment("package:base");
Function readline = base["readline"];
Function as_string = base["as.character"];
std::string input_string = as<std::string> (as_string(readline("> ")));
Rcout << input_string << std::endl;
return input_string;
}
')
print(rcpp_input_test())

Related

How do I resolve compile error in RcppParallel function which points to an RcppParallel header file

I'm trying to speed up numeric computation in my R code using RcppParallel and am attempting to edit an example that uses the Cpp sqrt() function to take the square root of each element of a matrix. My edited code replaces matrices with vectors and multiplies the sqrt() by a constant. (In actual use I will have 3 constants and my own operator function.)
The example comes from
https://gallery.rcpp.org/articles/parallel-matrix-transform/
The compiler identifies the error as in the 'algorithm' file on a comment line:
Line 7 no matching function for call to object of type 'SquareRootPlus::sqrtWrapper'
This is my initial attempt to use RcppParallel and I've not used
Cpp for several years.
Edit: running macOS Ventura on apple silicon,
Rcpp ver 1.0.10,
RcppParallel ver 5.1.6,
and R version 4.2.1 (2022-06-23) -- "Funny-Looking Kid"
It would be called like this (if it compiled):
where c is a numerical constant aka a double and res is a numerical vector
res <- parallelMatrixSqrt(someNumericalVector, c)
My testing code is:
#include <Rcpp.h>
#include <RcppParallel.h>
using namespace RcppParallel;
using namespace Rcpp;
struct SquareRootPlus : public Worker
{
// source vector etc
const RVector<double> input;
const double constParam;
// destination vector
RVector<double> output;
// initialize with source and destination
// get the data type correctly unless auto promoted/cast
SquareRootPlus(const Rcpp::NumericVector input, const double constParam,
Rcpp::NumericVector output)
: input(input), constParam(constParam), output(output) {}
struct sqrt_wrapper { // describe worker function
public: double operator()(double a, double cp) {
return ::sqrt(a) * cp;
}
};
// take the square root of the range of elements requested
// (and multiply it by the constant)
void operator()(std::size_t begin, std::size_t end) {
std::transform(input.begin() + begin,
input.begin() + end,
output.begin() + begin,
sqrt_wrapper());
}
};
// public called routine
// [[Rcpp::export]]
Rcpp::NumericVector paralleVectorSqrt(Rcpp::NumericVector x, double c) {
// allocate the output matrix
Rcpp::NumericVector output(x.length());
// SquareRoot functor (pass input and output matrixes)
SquareRootPlus squareRoot(x, c, output);
// call parallelFor to do the work
parallelFor(0, x.length(), squareRoot);
// return the output matrix
return output;
}
That still works fine for me (Ubuntu 22.10, g++-12) -- modulo same warnings we often get from libraries like Boost, and here now from the include TBB library (and the repo should have a newer one so you can try that).
I just did (straight from the Rcpp Gallery source directory):
> library(Rcpp)
> sourceCpp("2014-06-29-parallel-matrix-transform.cpp")
In file included from /usr/local/lib/R/site-library/RcppParallel/include/tbb/tbb.h:41,
from /usr/local/lib/R/site-library/RcppParallel/include/RcppParallel/TBB.h:10,
from /usr/local/lib/R/site-library/RcppParallel/include/RcppParallel.h:21,
from 2014-06-29-parallel-matrix-transform.cpp:59:
/usr/local/lib/R/site-library/RcppParallel/include/tbb/concurrent_hash_map.h:343:23: warning: ‘template<class _Category, class _Tp, class _Distance, class _Pointer, class _Reference> struct std::iterator’ is dep
recated [-Wdeprecated-declarations]
[... more like this omitted for brevity ...]
> # allocate a matrix
> m <- matrix(as.numeric(c(1:1000000)), nrow = 1000, ncol = 1000)
> # ensure that serial and parallel versions give the same result
> stopifnot(identical(matrixSqrt(m), parallelMatrixSqrt(m)))
> # compare performance of serial and parallel
> library(rbenchmark)
> res <- benchmark(matrixSqrt(m),
+ parallelMatrixSqrt(m),
+ order="relative")
> res[,1:4]
test replications elapsed relative
2 parallelMatrixSqrt(m) 100 0.496 1.000
1 matrixSqrt(m) 100 0.565 1.139
>
and as you can see it not only builds but also runs the example call from R.
You would have to give us more detail about how you call it and what OS and package versions you use. And I won't have time now to dig into your code and do a code review for you but given that (still relatively simple) reference example works maybe you can reduce your currently-not-working approach down to something simpler that works.
Edit Your example appears to have switched from a unary function to one with two arguments in the signature. Sadly it ain't that easy. The fuller error message is (on my side with g++-12)
/usr/include/c++/12/bits/stl_algo.h:4263:31: error: no match for call to ‘(SquareRootPlus::sqrt_wrapper) (const double&)’
4263 | *__result = __unary_op(*__first);
| ~~~~~~~~~~^~~~~~~~~~
question.cpp:25:20: note: candidate: ‘double SquareRootPlus::sqrt_wrapper::operator()(double, double)’
25 | public: double operator()(double a, double cp) {
| ^~~~~~~~
question.cpp:25:20: note: candidate expects 2 arguments, 1 provided
So you need to rework / extend the example framework for this.
Edit 2: The gory details about std::transform() and its unary function are e.g. here at cppreference.com.
Edit 3: Building on the previous comment, when you step back a bit and look at what is happening here you may seen that RcppParellel excels at parceling up a large data structure, then submitting all the piece in parallel and finally reassemble the result. That still works. You simply cannot apply for 'richer signature function' via std::transform(). No more, no less. You need to work the guts of work which applies your function to the chunk it sees. Check the other RcppParallel examples for inspiration.

How to make a portable file with Rcpp::plugins(openmp) which can be Rcpp::sourceCpp'ed

As a simple, likely memory bound, example, suppose I am looking at this C++ file which I want to Rcpp::sourceCpp:
#include <Rcpp.h>
// [[Rcpp::plugins(openmp)]]
// [[Rcpp::export(rng = false)]]
double sum_par(Rcpp::NumericVector x, int n_threads){
double out(0.);
double *xp = &x[0];
int const nx = x.size();
#ifdef _OPENMP
#pragma omp parallel for num_threads(n_threads) reduction(+:out)
#endif
for(int i = 0; i < nx; ++i)
out += xp[i];
return out;
}
/*** R
set.seed(1)
x <- rnorm(100)
sum (x)
#R> [1] 10.88874
sum_par(x, 1L)
#R> [1] 10.88874
sum_par(x, 2L)
#R> [1] 10.88874
*/
This file will not compile when the compiler does not support OpenMP. The question is how to create a portable example which users can Rcpp::sourceCpp in manner that will work for users with and without OpenMP support.
The Concrete Example
I recently published the psqn package on CRAN. There is an example in the package which shows how to use the shared headers in the package. However, this yields an error when I Rcpp::sourceCpp the file in this unit test on CRAN's checks with macOS.
I have tried to wrap the Rcpp::sourceCpp call in try and then use a version of the C++ file on an error which does not include the [[Rcpp::plugins(openmp)]] line but for some reason try does not work with Rcpp::sourceCpp. I gather I can make a configuration file like in RcppArmadillo but I have never (successfully) done this before with a R package.

Passing R functions to C routines using rcpp

I have a C function from a down-stream library that I call in C like this
result = cfunction(input_function)
input_function is a callback that needs to have the following structure
double input_function(const double &x)
{
return(x*x);
}
Where x*x is a user-defined computation that is usually much more complicated. I'd like to wrap cfunction using Rcpp so that the R user could call it on arbitrary R functions.
NumericVector rfunction(Function F){
NumericVector result(1);
// MAGIC THAT I DON'T KNOW HOW TO DO
// SOMEHOW TURN F INTO COMPATIBLE input_funcion
result[0] = cfunction(input_function);
return(result);
}
The R user then might do rfunction(function(x) {x*x}) and get the right result.
I am aware that calling R functions within cfunction will kill the speed but I figure that I can figure out how to pass compiled functions later on. I'd just like to get this part working.
The closest thing I can find that does what I need is this https://sites.google.com/site/andrassali/computing/user-supplied-functions-in-rcppgsl which wraps a function that uses callback that has an oh-so-useful second parameter within which I could stuff the R function.
Advice would be gratefully received.
One possible solution would be saving the R-function into a global variable and defining a function that uses that global variable. Example implementation where I use an anonymous namespace to make the variable known only within the compilation unit:
#include <Rcpp.h>
extern "C" {
double cfunction(double (*input_function)(const double&)) {
return input_function(42);
}
}
namespace {
std::unique_ptr<Rcpp::Function> func;
}
double input_function(const double &x) {
Rcpp::NumericVector result = (*func)(x);
return result(0);
}
// [[Rcpp::export]]
double rfunction(Rcpp::Function F){
func = std::make_unique<Rcpp::Function>(F);
return cfunction(input_function);
}
/*** R
rfunction(sqrt)
rfunction(log)
*/
Output:
> Rcpp::sourceCpp('57137507/code.cpp')
> rfunction(sqrt)
[1] 6.480741
> rfunction(log)
[1] 3.73767

call r script from c++ gives cannot convert Rcpp::CharacterVector to const char*

Is there a way to call an r script from C++ ?
I have an rscript , for example:
myScript.R:
runif(100)
I want to execute this script from C++ and pass the result.
I tried:
#include <Rcpp.h>
#include <iostream>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector loadFile(CharacterVector inFile){
NumericVector one = system(inFile);
return one;
}
inFile : "C:/Program Files/R/R-3.4.2/bin/x64/Rscript C:/Rscripts/myScript.R"
but it gives me :
cannot convert Rcpp::CharacterVector (aka Rcpp::Vector<16>} to const char* for argument 1 to int system(const char*)
Usual convention is to use Rcpp so as to write expensive C++ code in R.
If you would like to invoke an R script from within c++, and then work with the result, one approach would be be to make use of the popen function.
To see how to convert a rcpp character to std::string see Converting element of 'const Rcpp::CharacterVector&' to 'std::string'
Example:
std::string r_execute = "C:/Program Files/R/R-3.4.2/bin/x64/Rscript C:/Rscripts/myScript.R"
FILE *fp = popen(r_execute ,"r");
You should be able to read the result of the operation from the file stream.

How to print an R object to stderr in Rcpp?

I implemented a Python-style dictionary for R, but did not find a good way to raise an error when a given key does not have a value in the dictionary. Calling stop is easy enough, but I would like to tell the user which key has not been found by printing the R object. Right now I have:
Rcpp::Rcout << "Key not found: ";
Rcpp::print(key); # <-- how can I get this on stderr?
Rcpp::stop("Key error!");
This prints the message to stdout, but I'd rather have it on stderr. Probably I'm just missing a function that Rcpp provides?
Here's a MWE:
library(Rcpp)
sourceCpp(code='
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
void test(SEXP key) {
Rcpp::print(key);
Rcpp::Rcerr << "This does not work: " << key << std::endl;
}
/*** R
test("x")
test(c(1,2,3))
*/
')
This works just fine:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
std::string test(std::string key) {
Rcpp::Rcerr << "Key not found: "<< key << std::endl;
Rcpp::stop("Key error!");
return key;
}
/*** R
test("x")
*/
Output:
Key not found: x
Error in eval(expr, envir, enclos) : Key error!
Edit:
OK, so you pass a SEXP that can be a single value or vector. I would suggest to cast that to a character vector:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
void test(SEXP key) {
CharacterVector key1 = as<CharacterVector>(key);
Rcpp::Rcerr << "This does not work: " << key1 << std::endl;
}
/*** R
test(c("x", "y"))
test(1:3)
*/
Output:
> Rcpp::sourceCpp('E:/temp/ttt.cpp')
> test(c("x", "y"))
This does not work: "x" "y"
> test(1:3)
This does not work: "1" "2" "3"
At the moment, it seems that this hack is the only way to go. It's not very efficient, as we go back from C++ to R to get the value as a nice string.
library(Rcpp)
sourceCpp(code='
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
void test(SEXP key, Function generate_error) {
std::string s = as<std::string>(generate_error(key));
stop(s);
}
/*** R
generate_error <- function(key) {
paste("Key not found:", capture.output(print(key)))
}
try( test("x", generate_error) )
try( test(c(1,2,3), generate_error) )
*/
')
Rcpp calls Rf_PrintValue internally. I've glanced at R source and it seems like this function is in turn implemented using printfs.
So, the problem is how to redirect external printf calls to stderr. Depending on your platform you have multiple options like dup/freopen/CreatePipe etc. Arguably, redirecting stdout back and forth is a hack.

Resources