CUDA gpu vector [duplicate] - vector

This question already has an answer here:
How to pass an array of vectors to cuda kernel?
(1 answer)
Closed 4 years ago.
Recently, when I try to use CUDA programming, I want send a vector to GPU memory. Someone tells me that I can use thrust::device_vector and thrust::host_vector. I also read the help document, but still don't know how to send such a vector into the kernel function.
My codes are as following:
thrust::device_vector<int> dev_firetime[1000];
__global__ void computeCurrent(thrust::device_vector<int> d_ftime)
{
int idx = blockDim.x*blockIdx.x + threadIdx.x;
printf("ftime = %d\n", d_ftime[idx]);
}
In fact, I don't know how to send the vector to kernel function. If you know, please tell me something about this question, and are there any better way to complete the same function?
Thanks very much!

Thrust device vectors cannot be passed directly to CUDA kernels. You need to pass a pointer to the underlying device memory to the kernel. This can be done like this:
__global__ void computeCurrent(int* d_ftime)
{
int idx = blockDim.x*blockIdx.x + threadIdx.x;
printf("ftime = %d\n", d_ftime[idx]);
}
thrust::device_vector<int> dev_firetime(1000);
int* d_ftime = thrust::raw_pointer_cast<int*>(dev_firetime.data());
computeCurrent<<<....>>>(d_ftime);
If you have an array of vectors, you need to do something like what is described here.

Related

How to create a RAWSXP vector from C char* ptr without reallocation

Is there a way of creating a RAWSXP vector that is backed by an existing C char* ptr.
Below I show my current working version which needs to reallocate and copy the bytes,
and a second imagined version that doesn't exist.
// My current slow solution that uses lots of memory
SEXP getData() {
// has size, and data
Response resp = expensive_call();
//COPY OVER BYTE BY BYTE
SEXP respVec = Rf_allocVector(RAWSXP, resp.size);
Rbyte* ptr = RAW(respVec);
memcpy(ptr, resp.msg, resp.size);
// free the memory
free(resp.data);
return respVec;
}
// My imagined solution
SEXP getDataFast() {
// has size, and data
Response resp = expensive_call();
// reuse the ptr
SEXP respVec = Rf_allocVectorViaPtr(RAWSXP, resp.data, resp.size);
return respVec;
}
I also noticed Rf_allocVector3 which seems to give control over memory allocations of the vector, but I couldn't get this to work. This is my first time writing an R extension, so I imagine I must be doing something stupid. I'm trying to avoid the copy as the data will be around a GB (very large, sparse though, matrices).
Copying over 1 GB is < 1 second. If your call is expensive, it might be a marginal cost that you should profile to see if it's really a bottleneck.
The way you are trying to do things is probably not possible, because how would R know how to garbage collect the data?
But assuming you are using STL containers, one neat trick I've recently seen is to use the second template argument of STL containers -- the allocator.
template<
class T,
class Allocator = std::allocator<T>
> class vector;
The general outline of the strategy is like this:
Create a custom allocator using R-memory that meets all the requirements (essentially you just need allocate and deallocate)
Every time you need to a return data to R from an STL container, make sure you initialize it with your custom allocator
On returning the data, pull out the underlying R data created by your R-memory allocator -- no copy
This approach gives you all the flexibility of STL containers while using only memory R is aware of.

integer64 and Rcpp compatibility

I will need 64 bits integer in my package in a close future. I'm studying the feasibility based on the bit64 package. Basically I plan to have one or more columns in a data.table with an interger64 S3 class and I plan to pass this table to C++ functions using Rcpp.
The following nanotime example from Rcpp gallery explains clearly how a vector of 64 bits int is built upon a vector of double and explain how to create an integer64 object from C++ to R.
I'm now wondering how to deal with an interger64 from R to C++. I guess I can invert the principle.
void useInt64(NumericVector v)
{
double len = v.size();
std::vector<int64_t> n(len);
// transfers values 'keeping bits' but changing type
// using reinterpret_cast would get us a warning
std::memcpy(&(n[0]), &(v[0]), len * sizeof(double));
// use n in further computations
}
Is that correct? Is there another way to do that? Can we use a wrapper as<std::vector<int64_t>>(v)? For this last question I guess the conversion is not based on a bit to bit copy.

What is the Arduino equivalent of C#'s Mathf.pingpong?

In C#, more specifically in Unity, there is a a method called Mathf.PingPong. What is the equivalent for Arduino?
I haven't used Unity, but if I understand the definition of that function correctly, you can use the modulo operator.
int pingpong(int t, int length)
{
return t % length;
}
You can probably use fmod, if you need floating point numbers.
Edit: I assume you mean in C when you are talking about arduino.

Sending 2D data from host to device in openCL [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question appears to be off-topic because it lacks sufficient information to diagnose the problem. Describe your problem in more detail or include a minimal example in the question itself.
Closed 8 years ago.
Improve this question
Can I send 2 dimensional data to device? If yes then how can I do that? Means how will be the declaration of the buffer memory? And also how can I fetch/ use these values in kernel function/device?
Plz reply ASAP
Sure.
A cl_mem object can not contain other cl_mem objects. Thus, it is not possible to use "2D" data like this in OpenCL. (In CUDA, this is possible, because the "buffers" there are only pointers to device memory).
Usually, you can convert your data into one large cl_mem object and access it appropriately in the kernel:
__kernel void compute(__global float *data2D, int sizeX, int sizeY)
{
int ix = get_global_id(0);
int iy = get_global_id(1);
int index = ix + iy * sizeX;
float element = data2D[index];
....
}
Lets say you have a 2D buffer in C++ side.
a Buffer of type (float *)[2048];
Then you need to get address of that buffer by
float *address= &a[0][0];
Then you use that address for you cl_mem object.
You can use stack too!
float *a=new float[2048][2048];
....
....
float *address=&a[0][0];
Your opencl-side access to this area must be overlapping exactly to C++ side. Other than C++, you need to know if your matrices are row-major or colum-major or if its being array of arrays or array of objects(like java) before playing with them. If your matrix is not continuous on memory, it can fail.
There are functions to WRITE or READ to/from your buffers from/to opencl buffers. Their structure and wrappers can change from version to version (or even the language that being used in)

Rust - Vector with size defined at runtime

How do you create an array, in rust, whose size is defined at run time?
Basically, how do you convert in rust the following code:
void f(int n){ return std::vector<int>(n); }
?
This is not possible in rust:
let n = 15;
let board: [int, ..n];
Note: I saw that it was impossible to do this in a simple manner, here, but I refuse to accept that such a simple thing is impossible :p
Thanks a lot!
Never-mind, I found it the way:
let n = 15; // number of items
let val = 17; // value to replicate
let v = std::vec::from_elem(val, n);
The proper way in modern Rust is vec![value; size].
Values are cloned, which is quite a relief compared to other languages that casually hand back a vector of references to the same object. E.g. vec![vec![]; 2] creates a vector where both elements are independent vectors, 3 vectors in total. Python's [[]] * 2 creates a vector of length 2 where both elements are (references to) the same nested vector.

Resources