If I do the following:
this->bufferParams = cl::Buffer(context, CL_MEM_READ_ONLY, sizeof(Params), ¶ms, NULL);
My buffer doesnt seem to get populated with my params object. However if I do this
this->queue.enqueueWriteBuffer(this->bufferParams, CL_TRUE, 0, sizeof(Params), ¶ms, NULL);
Then it seems to work. Is there any way in the cl::Buffer syntax to initialize the params object directly rather than doing the enqueue command
Just do this:
this->bufferParams = cl::Buffer(context, CL_MEM_READ_ONLY|
CL_MEM_COPY_HOST_PTR, sizeof(Params), ¶ms, NULL);
If you don use the flag to copy from the host pointer it is not going to copy.
That pointer may be used for other things (like acquire memory) so you need to set the flag accordingly.
EXTRA: Also, for very small structure objects, like your Params probably is, use it directly on clSetKernelArgs(). No need to create a buffer if you are just setting some constant values that are never written. It also goes trough a more optimized memory path.
Related
I have a large array of float called source_array with the size of around 50.000. I am current trying to implement a collections of modifications on the array and evaluate it. Basically in pseudo code:
__kernel void doSomething (__global float *source_array, __global boolean *res. __global int *mod_value) {
// Modify values of source_array with mod_value;
// Evaluate the modified array.
}
So in the process I would need to have a variable to hold modified array, because source_array should be a constant for all work item, if i modify it directly it might interfere with another work item (not sure if I am right here).
The problem is the array is too big for private memory therefore I can't initialize in kernel code. What should I do in this case ?
I considered putting another parameter into the method, serves as place holder for modified array, but again it would intefere with another work items.
Private "memory" on GPUs literally consists of registers, which generally are in short supply. So the __private address space in OpenCL is not suitable for this as I'm sure you've found.
Victor's answer is correct - if you really need temporary memory for each work item, you will need to create a (global) buffer object. If all work items need to independently mutate it, it will need a size of <WORK-ITEMS> * <BYTES-PER-ITEM> and each work-item will need to use its own slice of the buffer. If it's only temporary, you never need to copy it back to host memory.
However, this sounds like an access pattern that will work very inefficiently on GPUs. You will do much better if you decompose your problem differently. For example, you may be able to make whole work-groups coordinate work on some subrange of the array - copy the subrange into local (group-shared) memory, the work is divided between the work items in the group, and the results are written back to global memory, and the next subrange is read to local, etc. Coordinating between work-items in a group is much more efficient than each work item accessing a huge range of global memory We can only help you with this algorithmic approach if you are more specific about the computation you are trying to perform.
Why not to initialize this array in OpenCL host memory buffer. I.e.
const size_t buffer_size = 50000 * sizeof(float);
/* cl_malloc, malloc or new float [50000] or = {0.1f,0.2f,...} */
float *host_array_ptr = (float*)cl_malloc(buffer_size);
/*
put your data into host_array_ptr hear
*/
cl_int err_code;
cl_mem my_array = clCreateBuffer( my_cl_context, CL_MEM_READ_WRITE | CL_MEM_USE_HOST_PTR, buffer_size, host_array_ptr, &err_code );
Then you can use this cl_mem my_array in OpenCL kernel
Find out more
clBuildProgram allows one to give a list of devices to build the program for. That's the reason of the num_devices and device_list parameters in the declaration:
cl_int clBuildProgram(cl_program program, cl_uint num_devices, const cl_device_id *device_list, const char *options, void (CL_CALLBACK *pfn_notify)(cl_program program, void *user_data), void *user_data)
Now what happens, if we use it like this?
cl_int clBuildProgram(program, 0, NULL, ...
Does it build for all devices in the PC?
Does it build for only those devices I have created the context for? (I mean the context I used when I created program with clCreateProgramWithSource.)
The documentation says:
device_list: A pointer to a list of devices associated with program. If device_list is NULL value, the program executable is built for all devices associated with program for which a source or binary has been loaded. If device_list is a non-NULL value, the program executable is built for devices specified in this list for which a source or binary has been loaded.
I think the phrasing is a bit complicated here, but from that, I guess number 2. Is that right?
I am asking because in case of number 1, I would need to pass a device list to this function in order to avoid superfluous compilation for all devices.
2) is correct. Compilation is constrained to only the devices associated with the program's context. This cannot be every single device in the system unless the context was created using every single device.
I have some MPI processes which should write to the same file after they finish their task. The problem is that the length of the results is variable and I cannot assume that each process will write at a certain offset.
A possible approach would be to open the file in every process, to write the output at the end and then to close the file. But this way a race condition could occur.
How can I open and write to that file so that the result would be the expected one?
You might think you want the shared file or ordered mode routines. But these routines get little use and so are not well optimized (so they get little use... quite the cycle...)
I hope you intend on doing this collectively. then you can use MPI_SCAN to collect the offsets, then call MPI_FILE_WRITE_AT_ALL to have the MPI library optimize the I/O for you.
(If you are doing this independently, then you will have to do something like... master slave? passing a token? fall back to the shared file pointer routines even though I hate them?)
Here's an approach for a good collective method:
incr = (count*datatype_size);
/* you can skip this call and assume 'offset' is zero if you don't care
about the contents of the file */
MPI_File_get_position(mpi_fh, &offset);
MPI_Scan(&incr, &new_offset, 1, MPI_LONG_LONG_INT,
MPI_SUM, MPI_COMM_WORLD);
new_offset -= incr;
new_offset += offset;
ret = MPI_File_write_at_all(mpi_fh, new_offset, buf, count,
datatype, status);
Hopefully this isn't too stupid but I want to make sure I'm doing this right.
Some Qt functions return Qt objects as values, but we may want to store them in a pointer somewhere. For example, in QDomDocument, the function documentElement returns a QDomElement, not a pointer to it. Now, as a member of my class I have:
QDomElement *listRootElement;
In a function that sets things up I am using this:
listRootElement = new QDomElement;
*listRootElement = mainIndex->documentElement();
(mainIndex is a QDomDocument.)
This seems to work, but I just want to make sure I'm doing it right and that nothing will come back to bite me.
It would be very similar for some of the image functions where a QPixmap might be returned, and I want to maintain pointers to QPixMap's.
Thanks for any comments!
Assuming that you want to store a pointer to a QDomElement for some reason, and assuming that you aware of the potential pitfalls with pointers (like, two pointers might point to the same object):
The only thing to keep in mind is that the popular 'parent takes care of deleting children' system which Qt uses is only available for QObject (sub-)classes. So when new'ing a QString or a QDomElement or something like that, keep in mind that you do have to delete it yourself, too.
I'm guessing, but I think this:
listRootElement = new QDomElement(mainIndex->documentElement());
...may allow the compiler to optimise better (see this question for some reasoning).
You're overwriting the initially allocated object:
QDomElement *listRootElement; // undefined ptr value, I'd prefer null or new right away
listRootElement = new QDomElement;
*listRootElement = mainIndex->documentElement();
You're essentially doing:
int *x = new int(42);
*x = 47;
This works because both QDomElement and int implements the assignment operator (=).
Note that there's no need to delete anything, as the returned temporary is copied into your newly allocated object.
I thought these two methods were (memory allocation-wise) equivalent, however, I was seeing "out of scope" and "NSCFString" in the debugger if I used what I thought was the convenient method (commented out below) and when I switched to the more explicit method my code stopped crashing! Notice that I am getting the string that is being stored in my container from sqlite3 query.
p = (char*) sqlite3_column_text (queryStmt, 1);
// GUID = (NSString*) [NSString stringWithUTF8String: (p!=NULL) ? p : ""];
GUID = [[NSString alloc] initWithCString:(p!=NULL) ? p : "" encoding:NSUTF8StringEncoding];
Also note, that if I looked at the values in the debugger and printed them with NSLog they looked correct, however, I don't think new memory was allocated and the value copied. Instead the memory pointer was stored - went out of scope - referenced later - crash!
If you need to keep a reference to an object around after a method returns, then you need to take ownership of the object. So, if your variable GUID is an instance variable or some kind of global, you will need to take ownership of the object. If you use the alloc/init method, you have ownership of the object returned since you used alloc. You could just as easily use the stringWithUTF8String: method, but you will need to take ownership explicitly by sending a retain message. So, assuming GUID is some kind of non-method-scoped variable:
GUID = [[NSString stringWithUTF8String:"Some UTF-8 string"] copy];
(either copy or retain can be used here to take ownership, but copy is more common when dealing with strings).
Also, your code may be a little easier to read if you did something like:
GUID = p ? [[NSString stringWithUTF8String:p] copy] : #"";