integer64 and Rcpp compatibility - r

I will need 64 bits integer in my package in a close future. I'm studying the feasibility based on the bit64 package. Basically I plan to have one or more columns in a data.table with an interger64 S3 class and I plan to pass this table to C++ functions using Rcpp.
The following nanotime example from Rcpp gallery explains clearly how a vector of 64 bits int is built upon a vector of double and explain how to create an integer64 object from C++ to R.
I'm now wondering how to deal with an interger64 from R to C++. I guess I can invert the principle.
void useInt64(NumericVector v)
{
double len = v.size();
std::vector<int64_t> n(len);
// transfers values 'keeping bits' but changing type
// using reinterpret_cast would get us a warning
std::memcpy(&(n[0]), &(v[0]), len * sizeof(double));
// use n in further computations
}
Is that correct? Is there another way to do that? Can we use a wrapper as<std::vector<int64_t>>(v)? For this last question I guess the conversion is not based on a bit to bit copy.

Related

Rf_allocVector only allocates and does not zero out memory

Original motivation behind this is that I have a dynamically sized array of floats that I want to pass to R through Rcpp without either incurring the cost of a zeroing out nor the cost of a deep copy.
Originally I had thought that there might be some way to take heap allocated array, make it aware to R's gc system and then wrap it with other data to create a "Rcpp::NumericVector" but it seems like that that's not possible - or doable with my current knowledge.
However and correct me if I'm wrong it looks like simply constructing a NumericVector with a size N and then using it as an N sized allocation will call R.h's Rf_allocVector and that itself does not either zero out the allocated array - I tested it on a small C program that gets dyn.loaded into R and it looks like garbage values. I also took a peek at the assembly and there doesn't seem to be any zeroing out.
Can anyone confirm this or offer any alternate solution?
Welcome to StackOverflow.
You marked this rcpp but that is a function from the C API of R -- whereas the Rcpp API offers you its constructors which do in fact set the memory tp zero:
> Rcpp::cppFunction("NumericVector goodVec(int n) { return NumericVector(n); }")
> sum(goodVec(1e7))
[1] 0
>
This creates a dynamically allocated vector using R's memory functions. The vector is indistinguishable from R's own. And it has the memory set to zero
as we use R_Calloc, which is documented in Writing R Extension to setting the memory to zero. (We may also use memcpy() explicitly, you can check the sources.)
So in short, you just have yourself confused over what the C API of R, as well as Rcpp offer, and what is easiest to use when. Keep reading documentation, running and writing examples, and studying existing code. It's all out there!

How to create a RAWSXP vector from C char* ptr without reallocation

Is there a way of creating a RAWSXP vector that is backed by an existing C char* ptr.
Below I show my current working version which needs to reallocate and copy the bytes,
and a second imagined version that doesn't exist.
// My current slow solution that uses lots of memory
SEXP getData() {
// has size, and data
Response resp = expensive_call();
//COPY OVER BYTE BY BYTE
SEXP respVec = Rf_allocVector(RAWSXP, resp.size);
Rbyte* ptr = RAW(respVec);
memcpy(ptr, resp.msg, resp.size);
// free the memory
free(resp.data);
return respVec;
}
// My imagined solution
SEXP getDataFast() {
// has size, and data
Response resp = expensive_call();
// reuse the ptr
SEXP respVec = Rf_allocVectorViaPtr(RAWSXP, resp.data, resp.size);
return respVec;
}
I also noticed Rf_allocVector3 which seems to give control over memory allocations of the vector, but I couldn't get this to work. This is my first time writing an R extension, so I imagine I must be doing something stupid. I'm trying to avoid the copy as the data will be around a GB (very large, sparse though, matrices).
Copying over 1 GB is < 1 second. If your call is expensive, it might be a marginal cost that you should profile to see if it's really a bottleneck.
The way you are trying to do things is probably not possible, because how would R know how to garbage collect the data?
But assuming you are using STL containers, one neat trick I've recently seen is to use the second template argument of STL containers -- the allocator.
template<
class T,
class Allocator = std::allocator<T>
> class vector;
The general outline of the strategy is like this:
Create a custom allocator using R-memory that meets all the requirements (essentially you just need allocate and deallocate)
Every time you need to a return data to R from an STL container, make sure you initialize it with your custom allocator
On returning the data, pull out the underlying R data created by your R-memory allocator -- no copy
This approach gives you all the flexibility of STL containers while using only memory R is aware of.

Armadillo C++ linear algebra library : How to create vector of boolean

Recently I started using Armadillo C++ library. Given my C++ coding skills are not that great, I found this very friendly for linear algebra. I am also using that along with my matlab to speed things up for many of reconstruction algorithm.
I do need to create a vector of boolean and I would prefer using this library rather than . However, I could not figure out how to do it. I tried using uvec; but, documentation seems to indicate that it can not be used with boolean.
Any help would be appreciated.
Regards,
Dushyant
Consider using a matrix uchar_mat which is a typdef for Mat<unsigned char>, it should consume the same amount of memory as a matrix of boolean values.
The Armadillo documentation of version 7.8 states that a matrix Mat<type>, can be of the following types:
float, double, std::complex<float>, std::complex<double>, short, int, long, and unsigned versions of short, int, and long. The code on GitHub however contains typedef Mat <unsigned char> uchar_mat; in the file include/armadillo_bits/typedef_mat.hpp so you should also be able to use uchar_mat.
You will not save any memory by creating a matrix of bool values compared to a matrix of unsigned char values (a bool type consumes 8 bits). This is because in C++ every data type must be addressable; it must be at least 1 byte long so that it is possible to create a pointer pointing to it.

Performing operations on CUDA matrices while reading from a global Point

Hey there,
I have a mathematical function (multidimensional which means that there's an index which I pass to the C++-function on which single mathematical function I want to return. E.g. let's say I have a mathematical function like that:
f = Vector(x^2*y^2 / y^2 / x^2*z^2)
I would implement it like that:
double myFunc(int function_index)
{
switch(function_index)
{
case 1:
return PNT[0]*PNT[0]*PNT[1]*PNT[1];
case 2:
return PNT[1]*PNT[1];
case 3:
return PNT[2]*PNT[2]*PNT[1]*PNT[1];
}
}
whereas PNT is defined globally like that: double PNT[ NUM_COORDINATES ]. Now I want to implement the derivatives of each function for each coordinate thus generating the derivative matrix (columns = coordinates; rows = single functions). I wrote my kernel already which works so far and which call's myFunc().
The Problem is: For calculating the derivative of the mathematical sub-function i concerning coordinate j, I would use in sequential mode (on CPUs e.g.) the following code (whereas this is simplified because usually you would decrease h until you reach a certain precision of your derivative):
f0 = myFunc(i);
PNT[ j ] += h;
derivative = (myFunc(j)-f0)/h;
PNT[ j ] -= h;
now as I want to do this on the GPU in parallel, the problem is coming up: What to do with PNT? As I have to increase certain coordinates by h, calculate the value and than decrease it again, there's a problem coming up: How to do it without 'disturbing' the other threads? I can't modify PNT because other threads need the 'original' point to modify their own coordinate.
The second idea I had was to save one modified point for each thread but I discarded this idea quite fast because when using some thousand threads in parallel, this is a quite bad and probably slow (perhaps not realizable at all because of memory limits) idea.
'FINAL' SOLUTION
So how I do it currently is the following, which adds the value 'add' on runtime (without storing it somewhere) via preprocessor macro to the coordinate identified by coord_index.
#define X(n) ((coordinate_index == n) ? (PNT[n]+add) : PNT[n])
__device__ double myFunc(int function_index, int coordinate_index, double add)
{
//*// Example: f[i] = x[i]^3
return (X(function_index)*X(function_index)*X(function_index));
// */
}
That works quite nicely and fast. When using a derivative matrix with 10000 functions and 10000 coordinates, it just takes like 0.5seks. PNT is defined either globally or as constant memory like __constant__ double PNT[ NUM_COORDINATES ];, depending on the preprocessor variable USE_CONST.
The line return (X(function_index)*X(function_index)*X(function_index)); is just an example where every sub-function looks the same scheme, mathematically spoken:
f = Vector(x0^3 / x1^3 / ... / xN^3)
NOW THE BIG PROBLEM ARISES:
myFunc is a mathematical function which the user should be able to implement as he likes to. E.g. he could also implement the following mathematical function:
f = Vector(x0^2*x1^2*...*xN^2 / x0^2*x1^2*...*xN^2 / ... / x0^2*x1^2*...*xN^2)
thus every function looking the same. You as a programmer should only code once and not depending on the implemented mathematical function. So when the above function is being implemented in C++, it looks like the following:
__device__ double myFunc(int function_index, int coordinate_index, double add)
{
double ret = 1.0;
for(int i = 0; i < NUM_COORDINATES; i++)
ret *= X(i)*X(i);
return ret;
}
And now the memory accesses are very 'weird' and bad for performance issues because each thread needs access to each element of PNT twice. Surely, in such a case where each function looks the same, I could rewrite the complete algorithm which surrounds the calls to myFunc, but as I stated already: I don't want to code depending on the user-implemented function myFunc...
Could anybody come up with an idea how to solve this problem??
Thanks!
Rewinding back to the beginning and starting with a clean sheet, it seems you want to be able to do two things
compute an arbitrary scalar valued
function over an input array
approximate the partial derivative of an arbitrary scalar
valued function over the input array
using first order accurate finite differencing
While the function is scalar valued and arbitrary, it seems that there are, in fact, two clear forms which this function can take:
A scalar valued function with scalar arguments
A scalar valued function with vector arguments
You appeared to have started with the first type of function and have put together code to deal with computing both the function and the approximate derivative, and are now wrestling with the problem of how to deal with the second case using the same code.
If this is a reasonable summary of the problem, then please indicate so in a comment and I will continue to expand it with some code samples and concepts. If it isn't, I will delete it in a few days.
In comments, I have been trying to suggest that conflating the first type of function with the second is not a good approach. The requirements for correctness in parallel execution, and the best way of extracting parallelism and performance on the GPU are very different. You would be better served by treating both types of functions separately in two different code frameworks with different usage models. When a given mathematical expression needs to be implemented, the "user" should make a basic classification as to whether that expression is like the model of the first type of function, or the second. The act of classification is what drives algorithmic selection in your code. This type of "classification by algorithm" is almost universal in well designed libraries - you can find it in C++ template libraries like Boost and the STL, and you can find it in legacy Fortran codes like the BLAS.

Where can I learn how to write C code to speed up slow R functions? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
What's the best resource for learning how to write C code for use with R? I know about the system and foreign language interfaces section of R extensions, but I find it pretty hard going. What are good resources (both online and offline) for writing C code for use with R?
To clarify, I don't want to learn how to write C code, I want to learn how to better integrate R and C. For example, how do I convert from a C integer vector to a R integer vector (or vice versa) or from a C scalar to an R vector?
Well there is the good old Use the source, Luke! --- R itself has plenty of (very efficient) C code one can study, and CRAN has hundreds of packages, some from authors you trust. That provides real, tested examples to study and adapt.
But as Josh suspected, I lean more towards C++ and hence Rcpp. It also has plenty of examples.
Edit: There were two books I found helpful:
The first one is Venables and Ripley's "S Programming" even though it is getting long in the tooth (and there have been rumours of a 2nd edition for years). At the time there was simply nothing else.
The second in Chambers' "Software for Data Analysis" which is much more recent and has a much nicer R-centric feel -- and two chapters on extending R. Both C and C++ get mentioned. Plus, John shreds me for what I did with digest so that alone is worth the price of admission.
That said, John is growing fond of Rcpp (and contributing) as he finds the match between R objects and C++ objects (via Rcpp) to be very natural -- and ReferenceClasses help there.
Edit 2: With Hadley's refocussed question, I very strongly urge you to consider C++. There is so much boilerplate nonsense you have to do with C---very tedious and very avoidable. Have a look at the Rcpp-introduction vignette. Another simple example is this blog post where I show that instead of worrying about 10% differences (in one of the Radford Neal examples) we can get eightyfold increases with C++ (on what is of course a contrived example).
Edit 3: There is complexity in that you may run into C++ errors that are, to put it mildly, hard to grok. But to just use Rcpp rather than to extend it, you should hardly ever need it. And while this cost is undeniable, it is far eclipsed by the benefit of simpler code, less boilerplate, no PROTECT/UNPROTECT, no memory management etc pp. Doug Bates just yesterday stated that he finds C++ and Rcpp to be much more like writing R than writing C++. YMMV and all that.
Hadley,
You can definitely write C++ code that is similar to C code.
I understand what you say about C++ being more complicated than C. This is if you want to master everything : objects, templates, STL, template meta programming, etc ... most people don't need these things and can just rely on others to it. The implementation of Rcpp is very complicated, but just because you don't know how your fridge works, it does not mean you cannot open the door and grab fresh milk ...
From your many contributions to R, what strikes me is that you find R somewhat tedious (data manipulation, graphics, string manipulatio, etc ...). Well get prepared for many more surprises with the internal C API of R. This is very tedious.
From time to time, I read the R-exts or R-ints manuals. This helps. But most of the time, when I really want to find out about something, I go into the R source, and also in the source of packages written by e.g. Simon (there is usually lots to learn there).
Rcpp is designed to make these tedious aspects of the API go away.
You can judge for yourself what you find more complicated, obfuscated, etc ... based on a few examples. This function creates a character vector using the C API:
SEXP foobar(){
SEXP ab;
PROTECT(ab = allocVector(STRSXP, 2));
SET_STRING_ELT( ab, 0, mkChar("foo") );
SET_STRING_ELT( ab, 1, mkChar("bar") );
UNPROTECT(1);
}
Using Rcpp, you can write the same function as:
SEXP foobar(){
return Rcpp::CharacterVector::create( "foo", "bar" ) ;
}
or:
SEXP foobar(){
Rcpp::CharacterVector res(2) ;
res[0] = "foo" ;
res[1] = "bar" ;
return res ;
}
As Dirk said, there are other examples on the several vignettes. We also usually point people towards our unit tests because each of them test a very specific part of the code and are somewhat self explanatory.
I'm obviously biased here, but I would recommend getting familiar about Rcpp instead of learning the C API of R, and then come to the mailing list if something is unclear or does not seem doable with Rcpp.
Anyway, end of the sales pitch.
I guess it all depends what sort of code you want to write eventually.
Romain
#hadley: unfortunately, I don't have specific resources in mind to help you getting started on C++. I picked it up from Scott Meyers's books (Effective C++, More effective C++, etc ...) but these are not really what one could call introductory.
We almost exclusively use the .Call interface to call C++ code. The rule is easy enough :
The C++ function must return an R object. All R objects are SEXP.
The C++ function takes between 0 and 65 R objects as input (again SEXP)
it must (not really, but we can save this for later) be declared with C linkage, either with extern "C" or the RcppExport alias that Rcpp defines.
So a .Call function gets declared like this in some header file:
#include <Rcpp.h>
RcppExport SEXP foo( SEXP x1, SEXP x2 ) ;
and implemented like this in a .cpp file :
SEXP foo( SEXP x1, SEXP x2 ){
...
}
There is not much more to know about the R API to be using Rcpp.
Most people only want to deal with numeric vectors in Rcpp. You do this with the NumericVector class. There are several ways to create a numeric vector :
From an existing object that you pass down from R:
SEXP foo( SEXP x_) {
Rcpp::NumericVector x( x_ ) ;
...
}
With given values using the ::create static function:
Rcpp::NumericVector x = Rcpp::NumericVector::create( 1.0, 2.0, 3.0 ) ;
Rcpp::NumericVector x = Rcpp::NumericVector::create(
_["a"] = 1.0,
_["b"] = 2.0,
_["c"] = 3
) ;
Of a given size:
Rcpp::NumericVector x( 10 ) ; // filled with 0.0
Rcpp::NumericVector x( 10, 2.0 ) ; // filled with 2.0
Then once you have a vector, the most useful thing is to extract one element from it. This is done with the operator[], with 0-based indexing, so for example summing values of a numeric vector goes something like this:
SEXP sum( SEXP x_ ){
Rcpp::NumericVector x(x_) ;
double res = 0.0 ;
for( int i=0; i<x.size(), i++){
res += x[i] ;
}
return Rcpp::wrap( res ) ;
}
But with Rcpp sugar we can do this much more nicely now:
using namespace Rcpp ;
SEXP sum( SEXP x_ ){
NumericVector x(x_) ;
double res = sum( x ) ;
return wrap( res ) ;
}
As I said before, it all depends on what sort of code you want to write. Look into what people do in packages that rely on Rcpp, check the vignettes, the unit tests, come back to us on the mailing list. We are always happy to help.
#jbremnant: That's right. Rcpp classes implement something close to the RAII pattern. When an Rcpp object is created, the constructor takes appropriate measures to ensure the underlying R object (SEXP) is protected from the garbage collector. The destructor withdraws the protection. This is explained in the Rcpp-intrduction vignette. The underlying implementation relies on the R API functions R_PreserveObject and R_ReleaseObject
There is indeed performance penalty due to C++ encapsulation. We try to keep this at a minimum with inlining, etc ... The penalty is small, and when you take into account the gain in terms of time it takes to write and maintain code, it is not that relevant.
Calling R functions from the Rcpp class Function is slower than directly calling eval with the C api. This is because we take precautions and wrap the function call into a tryCatch block so that we capture R errors and promote them to C++ exceptions so that they can be dealt with using the standard try/catch in C++.
Most people want to use vectors (specially NumericVector), and the penalty is very small with this class. The examples/ConvolveBenchmarks directory contains several variants of the notorious convolution function from R-exts and the vignette has benchmark results. It turns out that Rcpp makes it faster than the benchmark code that uses the R API.

Resources