Embedding R in C++: access elements of a VECSXP - r

So I'm calling some R code from C++, and I'm calling directly into R.dll for that (I know about about Rcpp and I have the book here on my desk, but I can't use it for this specific thing). I have an R script that sets a variable and I want to access the contents of that variable.
The variable is a list of strings; in C++, when I do TYPEOF(my_sexp), I get VECSXP; so far so good. Now I want to read the contents of that list into a std::vector. I have tried many permutations of the following idea:
SEXP vector_exp = VECTOR_ELT(my_sexp, 0);
int element_count = TRUELENGTH(vector_exp);
for (int i = 0 ; i < element_count ; i++) {
SEXP elem_sexp = VECTOR_ELT(my_sexp, i);
std::string element_string = R_CHAR(elem_sexp);
}
My problems:
- Using TRUELENGTH, I can an access violation. Using LENGTH, I get a wrong value.
- Accessing elements of the list using VECTOR_ELT() causes an access violation.
- I've tried manually inspecting the memory layout of my_sexp according to the struct definitions in the R headers, but I can't seem to get the casting right to get a meaningful value from it.
So, is there anyone who can tell me roughly how to access the elements of a list; or point me to an example; or point me to the location in Rcpp where a conversion like this is done? I tried finding the last point myself, but didn't get far - it seems like Rcpp's wrap() handles it 'magically' (as in, generically) somehow.
Thanks in advance.

I haven't spent very much time working with R's C interface, as I usually stick to Rcpp, but the following seems to work:
#include <Rcpp.h>
// [[Rcpp::export]]
void print_list_as_vector(SEXP lst) {
PROTECT(lst);
R_xlen_t n_list = XLENGTH(lst);
R_xlen_t n_elem = 0;
for (R_xlen_t i = 0; i < n_list; i++) {
n_elem += XLENGTH(VECTOR_ELT(lst, i));
}
std::vector<std::string> vs;
vs.reserve(n_elem);
for (R_xlen_t i = 0; i < n_list; i++) {
R_xlen_t nj = XLENGTH(VECTOR_ELT(lst, i));
for (R_xlen_t j = 0; j < nj; j++) {
vs.push_back(CHAR(STRING_ELT(VECTOR_ELT(lst, i), j)));
}
}
for (std::size_t i = 0; i < vs.size(); i++) {
Rcpp::Rcout <<
vs[i] << std::endl;
}
UNPROTECT(1);
}
/*** R
clist <- list(c("a", "b", "c"), c("l", "m", "n", "o", "p"), c("xyz1", "xyz2"))
print_list_as_vector(clist)
#a
#b
#c
#l
#m
#n
#o
#p
#xyz1
#xyz2
unlist(clist)
# [1] "a" "b" "c" "l" "m" "n" "o" "p" "xyz1" "xyz2"
*/
This was tested within R / using Rcpp attributes as you can see, but in the actual meat of code I tried to stick to the C interface to replicate your situation.
But to address your questions (as best I can) -
I have always used XLENGTH to get the length of SEXPs, and although I can't really speak to the differences between XLENGTH, LENGTH and TRUELENGTH, the first method always seems to produce the expected result.
I'm using VECTOR_ELT(lst, i) to access the ith element in the VECSXP lst.
Given the context of the data / function, I know that VECTOR_ELT(lst, i) is returning a STRSXP - i.e. a character vector. The jth element of this STRSXP is accessed with STRING_ELT(..., j), and since this returns a CHARSXP, we wrap it in CHAR to get a const char*, which is added to the std::vector<std::string>.
Unfortunately there doesn't seem to be that much documentation for R's C internals, but Hadley has a useful reference page here, and if all else fails, you can dig through the source itself.

Related

Why does thrust::device_vector not seem to have a chance to hold raw pointers to other device_vectors?

I have a question that I found many threads in, but none did explicitly answer my question.
I am trying to have a multidimensional array inside the kernel of the GPU using thrust. Flattening would be difficult, as all the dimensions are non-homogeneous and I go up to 4D. Now I know I cannot have device_vectors of device_vectors, for whichever underlying reason (explanation would be welcome), so I tried going the way over raw-pointers.
My reasoning is, a raw pointer points onto memory on the GPU, why else would I be able to access it from within the kernel. So I should technically be able to have a device_vector, which holds raw pointers, all pointers that should be accessible from within the GPU. This way I constructed the following code:
thrust::device_vector<Vector3r*> d_fluidmodelParticlePositions(nModels);
thrust::device_vector<unsigned int***> d_allFluidNeighborParticles(nModels);
thrust::device_vector<unsigned int**> d_nFluidNeighborsCrossFluids(nModels);
for(unsigned int fluidModelIndex = 0; fluidModelIndex < nModels; fluidModelIndex++)
{
FluidModel *model = sim->getFluidModelFromPointSet(fluidModelIndex);
const unsigned int numParticles = model->numActiveParticles();
thrust::device_vector<Vector3r> d_neighborPositions(model->getPositions().begin(), model->getPositions().end());
d_fluidmodelParticlePositions[fluidModelIndex] = CudaHelper::GetPointer(d_neighborPositions);
thrust::device_vector<unsigned int**> d_fluidNeighborIndexes(nModels);
thrust::device_vector<unsigned int*> d_nNeighborsFluid(nModels);
for(unsigned int pid = 0; pid < nModels; pid++)
{
FluidModel *fm_neighbor = sim->getFluidModelFromPointSet(pid);
thrust::device_vector<unsigned int> d_nNeighbors(numParticles);
thrust::device_vector<unsigned int*> d_neighborIndexesArray(numParticles);
for(unsigned int i = 0; i < numParticles; i++)
{
const unsigned int nNeighbors = sim->numberOfNeighbors(fluidModelIndex, pid, i);
d_nNeighbors[i] = nNeighbors;
thrust::device_vector<unsigned int> d_neighborIndexes(nNeighbors);
for(unsigned int j = 0; j < nNeighbors; j++)
{
d_neighborIndexes[j] = sim->getNeighbor(fluidModelIndex, pid, i, j);
}
d_neighborIndexesArray[i] = CudaHelper::GetPointer(d_neighborIndexes);
}
d_fluidNeighborIndexes[pid] = CudaHelper::GetPointer(d_neighborIndexesArray);
d_nNeighborsFluid[pid] = CudaHelper::GetPointer(d_nNeighbors);
}
d_allFluidNeighborParticles[fluidModelIndex] = CudaHelper::GetPointer(d_fluidNeighborIndexes);
d_nFluidNeighborsCrossFluids[fluidModelIndex] = CudaHelper::GetPointer(d_nNeighborsFluid);
}
Now the compiler won't complain, but accessing for example d_nFluidNeighborsCrossFluids from within the kernel will work, but return wrong values. I access it like this (again, from within a kernel):
d_nFluidNeighborsCrossFluids[iterator1][iterator2][iterator3];
// Note: out of bounds indexing guaranteed to not happen, indexing is definitely right
The question is, why does it return wrong values? The logic behind it should work in my opinion, since my indexing is correct and the pointers should be valid addresses from within the kernel.
Thank you already for your time and have a great day.
EDIT:
Here is a minimal reproducable example. For some reason the values appear right despite of having the same structure as my code, but cuda-memcheck reveals some errors. Uncommenting the two commented lines leads me to my main problem I am trying to solve. What does the cuda-memcheck here tell me?
/* Part of this example has been taken from code of Robert Crovella
in a comment below */
#include <thrust/device_vector.h>
#include <stdio.h>
template<typename T>
static T* GetPointer(thrust::device_vector<T> &vector)
{
return thrust::raw_pointer_cast(vector.data());
}
__global__
void k(unsigned int ***nFluidNeighborsCrossFluids, unsigned int ****allFluidNeighborParticles){
const unsigned int i = blockIdx.x*blockDim.x + threadIdx.x;
if(i > 49)
return;
printf("i: %d nNeighbors: %d\n", i, nFluidNeighborsCrossFluids[0][0][i]);
//for(int j = 0; j < nFluidNeighborsCrossFluids[0][0][i]; j++)
// printf("i: %d j: %d neighbors: %d\n", i, j, allFluidNeighborParticles[0][0][i][j]);
}
int main(){
const unsigned int nModels = 2;
const int numParticles = 50;
thrust::device_vector<unsigned int**> d_nFluidNeighborsCrossFluids(nModels);
thrust::device_vector<unsigned int***> d_allFluidNeighborParticles(nModels);
for(unsigned int fluidModelIndex = 0; fluidModelIndex < nModels; fluidModelIndex++)
{
thrust::device_vector<unsigned int*> d_nNeighborsFluid(nModels);
thrust::device_vector<unsigned int**> d_fluidNeighborIndexes(nModels);
for(unsigned int pid = 0; pid < nModels; pid++)
{
thrust::device_vector<unsigned int> d_nNeighbors(numParticles);
thrust::device_vector<unsigned int*> d_neighborIndexesArray(numParticles);
for(unsigned int i = 0; i < numParticles; i++)
{
const unsigned int nNeighbors = i;
d_nNeighbors[i] = nNeighbors;
thrust::device_vector<unsigned int> d_neighborIndexes(nNeighbors);
for(unsigned int j = 0; j < nNeighbors; j++)
{
d_neighborIndexes[j] = i + j;
}
d_neighborIndexesArray[i] = GetPointer(d_neighborIndexes);
}
d_nNeighborsFluid[pid] = GetPointer(d_nNeighbors);
d_fluidNeighborIndexes[pid] = GetPointer(d_neighborIndexesArray);
}
d_nFluidNeighborsCrossFluids[fluidModelIndex] = GetPointer(d_nNeighborsFluid);
d_allFluidNeighborParticles[fluidModelIndex] = GetPointer(d_fluidNeighborIndexes);
}
k<<<256, 256>>>(GetPointer(d_nFluidNeighborsCrossFluids), GetPointer(d_allFluidNeighborParticles));
if (cudaGetLastError() != cudaSuccess)
printf("Sync kernel error: %s\n", cudaGetErrorString(cudaGetLastError()));
cudaDeviceSynchronize();
}
A device_vector is a class definition. That class has various methods and operators associated with it. The thing that allows you to do this:
d_nFluidNeighborsCrossFluids[...]...;
is a square-bracket operator. That operator is a host operator (only). It is not usable in device code. Issues like this give rise to the general statements that "thrust::device_vector is not usable in device code." The device_vector object itself is generally not usable. However the data it contains is usable in device code, if you attempt to access it via a raw pointer.
Here is an example of a thrust device vector that contains an array of pointers to the data contained in other device vectors. That data is usable in device code, as long as you don't attempt to make use of the thrust::device_vector object itself:
$ cat t1509.cu
#include <thrust/device_vector.h>
#include <stdio.h>
template <typename T>
__global__ void k(T **data){
printf("the first element of vector 1 is: %d\n", (int)(data[0][0]));
printf("the first element of vector 2 is: %d\n", (int)(data[1][0]));
printf("the first element of vector 3 is: %d\n", (int)(data[2][0]));
}
int main(){
thrust::device_vector<int> vector_1(1,1);
thrust::device_vector<int> vector_2(1,2);
thrust::device_vector<int> vector_3(1,3);
thrust::device_vector<int *> pointer_vector(3);
pointer_vector[0] = thrust::raw_pointer_cast(vector_1.data());
pointer_vector[1] = thrust::raw_pointer_cast(vector_2.data());
pointer_vector[2] = thrust::raw_pointer_cast(vector_3.data());
k<<<1,1>>>(thrust::raw_pointer_cast(pointer_vector.data()));
cudaDeviceSynchronize();
}
$ nvcc -o t1509 t1509.cu
$ cuda-memcheck ./t1509
========= CUDA-MEMCHECK
the first element of vector 1 is: 1
the first element of vector 2 is: 2
the first element of vector 3 is: 3
========= ERROR SUMMARY: 0 errors
$
EDIT: In the mcve you have now posted, you point out that an ordinary run of the code appears to give correct results, but when you use cuda-memcheck, errors are reported. You have a general design problem that will cause this.
In C++, when an object is defined within a curly-braces region:
{
{
Object A;
// object A is in-scope here
}
// object A is out-of-scope here
}
// object A is out of scope here
k<<<...>>>(anything that points to something in object A); // is illegal
and you exit that region, the object defined within the region is now out of scope. For objects with constructors/destructors, this usually means the destructor of the object will be called when it goes out-of-scope. For a thrust::device_vector (or std::vector) this will deallocate any underlying storage associated with that vector. That does not necessarily "erase" any data, but attempts to use that data are illegal and would be considered UB (undefined behavior) in C++.
When you establish pointers to such data inside an in-scope region, and then go out-of-scope, those pointers no longer point to anything that would be legal to access, so attempts to dereference the pointer would be illegal/UB. Your code is doing this. Yes, it does appear to give the correct answer, because nothing is actually erased on deallocation, but the code design is illegal, and cuda-memcheck will highlight that.
I suppose one fix would be to pull all this stuff out of the inner curly-braces, and put it at main scope, just like the d_nFluidNeighborsCrossFluids device_vector is. But you might also want to rethink your general data organization strategy and flatten your data.
You should really provide a minimal, complete, verifiable/reproducible example; yours is neither minimal, nor complete, nor verifiable.
I will, however, answer your side-question:
I know I cannot have device_vectors of device_vectors, for whichever underlying reason (explanation would be welcome)
While a device_vector regards a bunch of data on the GPU, it's a host-side data structure - otherwise you would not have been able to use it in host-side code. On the host side, what it holds should be something like: The capacity, the size in elements, the device-side pointer to the actual data, and maybe more information. This is similar to how an std::vector variable may refer to data that's on the heap, but if you create the variable locally the fields I mentioned above will exist on the stack.
Now, those fields of the device vector that are located in host memory are not generally accessible from the device-side. In device-side code you would typically use the raw pointer to the device-side data the device_vector manages.
Also, note that if you have a thrust::device_vector<T> v, each use of operator[] means a bunch of separate CUDA calls to copy data to or from the device (unless there's some caching going on under the hoold). So you really want to avoid using square-brackets with this structure.
Finally, remember that pointer-chasing can be a performance killer, especially on a GPU. You might want to consider massaging your data structure somewhat in order to make it amenable to flattening.

Rcpp Error Null value passed as symbol address

I am new to Rcpp.
I created an rcpp function which takes a dataframe with 2 columns and a vector as input, and returns a vector.
My data are as below
set.seed(10)
min= sort(rnorm(1000,800,sd=0.1))
max= min+0.02
k=data.frame(min,max)
explist= sort(rnorm(100,800,sd=0.2))
Then I call the cfilter.cpp
k$output <- cfilter(k,explist)
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
CharacterVector cfilter(DataFrame k, NumericVector explist) {
NumericVector col1 = k["min"];
NumericVector col2 = k["max"];
NumericVector exp = explist ;
int n = col1.size();
int j = 0;
CharacterVector out(n);
for (int i=0; i<n ; i++){
out[i]=NA_STRING;
while(exp[j]<= col2[i]){
if( exp[j]>= col1[i] && exp[j]<= col2[i] ){
out[i]="Y";
break;
}
else if(exp[j]>col2[i]){
break;
}
else {
j++ ;
}
}
}
return out;
}
It run perfectly fine for 16171 times I called it. And then suddenly, in the loop 16172 it just stops with an error:
> myfile$output<- cfilter(k,explist2)
Error in .Call(<pointer: (nil)>, k, explist) :
NULL value passed as symbol address
I checked k and explist for NA values but there aren't any, there is no problem whatsoever with the input.
I have no clue how to fix this and what causes this error.
Thanks in advance for any response
I came across the same problem. I'm not an Rcpp expert, nor C++ nor, a backend coding expert.
I have circumvented this problem by re-sourcing my cpp file every time I want to make a call of the function. So, for example if following is your for loop:
for(i in 1:SampleSize){
out[[I]]<-cfilter(k,explist)
}
Do something like:
for(i in 1:SampleSize){
sourceCpp("cfilter.cpp")
out[[i]]<-cfilter(k,explist)
}
Again, I don't know exactly why this worked for me, but it worked. Based on my shallow knowledge of C++, it might be related to memory allocation and that every time you source, memory is released and hence there is no mis-allocation. But I think this is a very wild guess.
Best

Passing values between variables of host and kernel Code in a loop in OpenCL

I am in trouble passing values between host code and kernel code due to some vector data types. The following code/explanation is just for referencing my problem, my code is much bigger and complicated. With this small example, hopefully, I will be able to explain where I am having a problem. I f anything more needed please let me know.
std::vector<vector<double>> output;
for (int i = 0;i<2; i++)
{
auto& out = output[i];
sum =0;
for (int l =0;l<3;l++)
{
for (int j=0;j<4; j++)
{
if (some condition is true)
{ out[j+l] = 0.;}
sum+= .....some addition...
}
out[j+l] = sum
}
}
Now I want to parallelize this code, from the second loop. This is what I have done in host code:
cl::buffer out = (context,CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, output.size(), &output, NULL)
Then, I have set the arguments
cl::SetKernelArg(0, out);
Then the loop,
for (int i = 0,i<2, i++)
{
auto& out = output[i];
// sending some more arguments(which are changing accrding to loop) for sum operations
queue.enqueueNDRangeKernel(.......)
queue.enqueuereadbuffer(.....,&out,...)
}
In Kernel Code:
__kernel void sumout(__global double* out, ....)
{
int l = get_global_id(0);
int j = get_global_id(1);
if (some condition is true)
{ out[j+l] = 0.; // Here it goes out of the loop then
return}
sum+= .....some addition...
}
out[j+l] = sum
}
So now, in if condition out[j+l] is getting 0 in the loop. So out value is regularly changing. In normal code, it is a reference pointer to a vector. I am not able to read the values in output from out during my kernel and host code. I want to read the values in output[i] for every out[j+l]. But I am confused due this buffer and vector.
just for more clarification,output is a vector of vector and out is reference vector to output vector. I need to update values in output for every change in out. Since these are vectors, I passed out as cl buffer. I hope it is clear.
Please let me know, if the code is required, I will try to provide as much as I can.
You are sending pointers of vectors to opencl(ofcourse they are contiguous on pointer level) but whole data is not contiguous in memory since each inner vector points to different memory area. Opencl cannot map host pointers to device memory and there is no such command in this api.
You could use vector of arrays(latest version) or pure arrays.

Getting a part of a QMap as a QVector

I have some elements in a QMap<double, double> a-element. Now I want to get a vector of some values of a. The easiest approach would be (for me):
int length = x1-x0;
QVector<double> retVec;
for(int i = x0; i < length; i++)
{
retVec.push_back(a.values(i));
}
with x1 and x0 as the stop- and start-positions of the elements to be copied. But is there a faster way instead of using this for-loop?
Edit: With "faster" I mean both faster to type and (not possible, as pointed out) a faster execution. As it has been pointed out, values(i) is not working as expected, thus I will leave it here as pseudo-code until I found a better_working replacement.
Maybe this works:
QVector<double>::fromList(a.values().mid(x0, length));
The idea is to get all the values as a list of doubles, extract the sublist you are interested in, thus create a vector from that list by means of an already existent static method of QVector .
EDIT
As suggested in the comments and in the updated question, it follows a slower to type but faster solution:
QVector<double> v{length};
auto it = a.cbegin()+x0;
for(auto last = it+length; it != last; it++) {
v.push_back(it.value());
}
I assume that x0 and length take care of the actual length of the key list, so a.cbegin()+x0 is valid and it doesn't worth to add the guard it != a.cend() as well.
Try this, shouldn work, haven't tested it:
int length = x1-x0;
QVector<double> retVec;
retVec.reserve(length); // reserve to avoid reallocations
QMap<double, double>::const_iterator i = map.constBegin();
i += x0; // increment to range start
while (length--) retVec << i++.value(); // add value to vector and advance iterator
This assumes the map has actually enough elements, thus the iterator is not tested before use.

using stars when declaring an array

I have two questions regarding the multidimensional arrays. I declared a 3D array using two stars but when I try to access the elements I get a used-without-initializing error.
unsigned **(test[10]);
**(test[0]) = 5;
Howcome I get that error while when I use the following code, I don't get an error - What's the difference?
unsigned test3[10][10][10];
**(test3[0]) = 5;
My second question is this: I'm trying to port a piece of code that was written for Unix to Windows. One of the lines is this:
unsigned **(precomputedHashesOfULSHs[nnStruct->nHFTuples]);
*nHFTuples is of type int but it's not a constant, and this the error that I'm getting;
error C2057: expected constant expression
Is it possible that I'm getting this error because I'm running it on Windows not Unix? - and how would I solve this problem? I can't make nHFTuples a constant because the user will need to provide the value for it!
In the first one, you didn't declare a 3D array, you declared an array of 10 pointers to pointers to unsigned ints. When you dereference it, you're dereferencing a garbage pointer.
In the second one, you declared the array correctly but you're using it wrong. Arrays are not pointers and you don't dereference them.
Do this:
unsigned test3[10][10][10];
test3[0][0][0] = 5;
To answer your second question, you have to use a number that can be known at compile time as the size of an array. GCC has a nonstandard extension that allows you to do that, but it's not portable and not part of the standard (though C99 introduced them). To fix it, you'll have to use malloc and free:
int i, j, k;
unsigned*** precomputedHashOfULSHs = malloc(nnStruct->nHFTuples * sizeof(unsigned));
for (i = 0; i < firstDimensionLength; ++i) {
precomputedHashOfULSHs[i] = malloc(sizeOfFirstDimension * sizeof(unsigned));
for (j = 0; j < secondDimensionLength; ++j) {
precomputedHashOfULSHs[i][j] = malloc(sizeOfSecondDimension * sizeof(unsigned));
for (k = 0; k < sizeOfSecondDimension; ++k)
precomputedHashOfULSHs[i][j][k] = malloc(sizeof(unsigned));
}
}
// then when you're done...
for (i = 0; i < firstDimensionLength; ++i) {
for (j = 0; j < secondDimensionLength; ++j) {
for (k = 0; k < sizeOfSecondDimension; ++k)
free(precomputedHashOfULSHs[i][j][k]);
free(precomputedHashOfULSHs[i][j]);
}
free(precomputedHashOfULSHs[i]);
}
free(precomputedHashOfULSHs);
(Pardon me if that allocation/deallocation code is wrong, it's late :))
Although you don't specify it, I think you're using a compiler on unix that supports C99 (SUch as GCC), whereas the compiler you use on windows does not support it. (Visual Studio uses only C89 here).
You have three options:
You can hard-code a suitable maximum array size.
You could allocate the array yourself using malloc or calloc. Don't forget to free it when you're done.
Port the program to C++, and use std::vector.
If you choose option 3, then you'll want something like:
std::vector<unsigned int> precomputedHashOfULSHs;
For a single-dimension vector, or for a two-dimensional vector, use:
std::vector<std::vector<unsigned int> > precomputedHashOfULSHs;
Do note that vectors default to being empty, of zero length, so you will need to add each element from the original set.
In the case of test3 as an example, you'll want:
std::vector<std::vector<std::vector<unsigned int> > > precomputedHashOfULSHs;
precomputedHashOfULSHs.resize(10);
for(int i = 0; i < 10; i++) {
precomputedHashOfULSHs[i].resize(10);
for(int ii=0; ii<10; ii++) {
precomputedHashOfULSHs[i][ii].resize(10);
}
}
I haven't tested this code, but it should be right. C++ will manage the memory of that vector for you automatically.

Resources