opencl synchronization

opencl synchronization - opencl

I am new to opencl and there seems to be something about the barrier function I don't understand. This is the code for my kernel. This is a standard matrix vector calculation with the output in *w. there is 1 work group with 64 work units, the same as the dimension of the vector
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
__kernel void fmin_stuff(__global double *h, __global double *g, __global double
*w,int n,__global int * gid) {
// Get the index of the current element
int i = get_global_id(0);
int j;
gid[i]=get_local_id(0);
w[i]=-g[i];
barrier(CLK_GLOBAL_MEM_FENCE | CLK_LOCAL_MEM_FENCE);
for (j=0;j<n;j++)
{
if (j<i)
w[i]-=h[i+j*n]*w[j];
barrier(CLK_GLOBAL_MEM_FENCE | CLK_LOCAL_MEM_FENCE);
}
}
The problem is that the code fails at random. The output is correct for a while. Here are the initial values for w for each run.
-0.148351 -0.309007 0.133204 -1.39589 2.88335 -2.72261 2.80155
-0.148351 -0.309007 0.133204 -1.39589 2.88335 -2.72261 2.80155
-0.148351 -0.309007 0.133204 -1.39589 2.88335 -2.72261 2.80155
-0.148351 -0.309007 0.133204 -1.39589 2.88335 -2.72261 2.80155
-0.148351 -0.309007 0.133204 -1.39589 2.88335 -2.72261 2.80155
-0.148351 -0.309007 0.133204 -1.39589 2.88335 -2.72261 2.80155
-0.148351 -0.309007 0.133204 -1.39589 2.88335 -2.34999 2.51524
-0.148351 -0.309007 0.133204 -1.39589 2.88335 -2.72261 2.80155
-0.148351 -0.309007 0.133204 -1.39589 2.88335 -2.72261 2.80155
-0.148351 -0.309007 0.133204 -1.39589 2.88335 -2.72261 2.10141
-0.148351 -0.309007 0.133204 -1.39589 2.88335 -2.72261 2.80155
-0.148351 -0.309007 0.133204 -1.39589 2.88335 -2.68636 2.77369
The program reports that the kernel executed successfully in each case. For all runs the values in the vector w are eventually incorrect. any advice would be greatly appreciated.
There was some confusion over whether this is a simple matrix multiplication. It is not. this is what the code is trying to accomplish where I include olnly the first 5 terms of w.
w(1)=-g(1);
w(2)=-g(2);
w(3)=-g(3);
w(4)=-g(4);
w(5)=-g(5);
w(2)-=h(2)*w(1);
w(3)-=h(3)*w(1);
w(4)-=h(4)*w(1);
w(5)-=h(5)*w(1);
w(3)-=h(3+N)*w(2);
w(4)-=h(4+N)*w(2);
w(5)-=h(5+N)*w(2);
w(4)-=h(4+2*N)*w(3);
w(5)-=h(5+2*N)*w(3);
w(5)-=h(5+3*N)*w(4);
Also the kernel is only called once per program run. The random behaviour results from running the program mutiple times.
The comment led me to see what I was doing wrong. I had the work groups and items configured as
size_t global_item_size[3] = {N, 1, 1}; // Process the entire lists
size_t local_item_size[3] = {1,1,1}; // Process in groups of 64
ret = clEnqueueNDRangeKernel(command_queue, kernel, 1, NULL,
global_item_size, local_item_size, 0, NULL, NULL);
when it should have been.
size_t global_item_size[3] = {N, 1, 1}; // Process the entire lists
size_t local_item_size[3] = {N,1,1}; // Process in groups of 64
ret = clEnqueueNDRangeKernel(command_queue, kernel, 1, NULL,
global_item_size, local_item_size, 0, NULL, NULL);
Thanks for the help. This is great for me but probably not of much interest
to to others.

Can you, please detail, why do you need both the global_id and the local_id inside your kernel?
If you have only one work-group, then the local_id should be enough.
Also, why do you copy the data from g into w?
Are you trying to achieve more than simply: w=h*g, where h is the matrix and g the vector?
Finally, if you are not simply re-launching your application multiple times but simply you are launching the kernel multiple times in a single application, it seems that the most likely explanation is that you corrupt the memory somewhere, ie. you are overwriting the input data.
Can you check if the input data passed to the kernel is consistent at the same run?

First of all you do not need to use CLK_LOCAL_MEM_FENCE in your case.
However i would recommend to copy
global -> local
work with local data
copy local -> global
In this case you will need CLK_LOCAL_MEM_FENCE
Now back to your problem.
From what I see, problem can occur if different items in Work Group execute this line:
w[i]-=h[i+j*n]*w[j];
not simultaneously. Imagine one work item already computed value for w[i] and then other work item accesses w[j]. Then, in case "j" of our second work item is same as "i" of first , other work item will use on its first iteration value which was already updater by first work item.
What you should do is next (in case you still want to use global memory):
I also assume n < N (your work group size), otherwise no synchronization possible, because you span though several work groups
for (j=0;j<n;j++)
{
double wj;
if (j<i)
wj = w[j];
barrier(CLK_GLOBAL_MEM_FENCE); // read_mem_fence(CLK_GLOBAL_MEM_FENCE) is enough
if(j<i)
w[i]-=h[i+j*n]*wj;
barrier(CLK_GLOBAL_MEM_FENCE); // write_mem_fence(CLK_GLOBAL_MEM_FENCE) is enough
}
Hope this helps

Related

How do I set the device-copy parameters when initializing a graphics device?

I'm working on a graphics device in a R package, and need to set some
graphical parameters for text / labels on the devices as they are initialized or reset.
This feature is described in R Internals:
The three copies of the GPar structure are used to store the current parameters (accessed via gpptr),
the ‘device copy’ (accessed via dpptr) and space for a saved copy of the ‘device copy’ parameters.
The current parameters are, clearly, those currently in use and are copied from the ‘device copy’
whenever plot.new() is called (whether or not that advances to the next ‘page’). The saved copy
keeps the state when the device was last completely cleared (e.g. when plot.new() was called
with par(new=TRUE)), and is used to replay the display list.
How do I, from a package, actually access and initialize the "device copy"?
All I've been able to find is a comment containing an older,
copy-pasted comment in GraphicsDevice.h:
* 2. I found this comment in the doc for dev_Open -- looks nasty
* Any known instances of such a thing happening? Should be
* replaced by a function to query the device for preferred gpars
* settings? (to be called when the device is initialised)
*
* NOTE that it is perfectly acceptable for this
* function to set generic graphics parameters too
* (i.e., override the generic parameter settings
* which GInit sets up) all at the author's own risk
* of course :)

I don't know if I fully understand what you're trying to do, but I think you may find this guide to be useful. Some key excerpts:
To create a running graphics device with our own functions, we call
the graphicsDevice() function. While there are several methods for
this, essentially we give it a list of named functions that specify
the implementation of some or all of the 21 graphical primitive
operations. We might give this as a list or as an instance of
RDevDescMethods or of a sub-class that we define for a particular type
of device. So we focus on writing these functions.
Then:
Each of the methods is passed an object of class DevDescPtr. This is
also the type of the value returned by the top-level function
graphicsDevice() . This is a reference the C-level data structure that
represents the graphics device. We can use this to query the settings
of the graphics device.
Some of these fields in the device are used
when initializing the device rather than within the functions (e.g.
those whose names are prefixed with "start"). Other fields are
structural information about the rendering of different aspects of the
device. For example, we can find the dimensions of the drawing area,
The DevDescPtr class is essentially an opaque data type in R
(containing an external pointer to the C-level data structure) and is
intended to be used as if it were an R-level list. We can use the $
operator to access individual fields and we can find the names of
these fields with names().
and finally:
Under some rare circumstances, it is convenient to convert the
reference to an R object. We can do this by coercing it to the
corresponding R class named DevDesc (i.e. with the "Ptr" remove), i.e.
as(dev, "DevDesc"). This copies each of the fields in the C-level
structure to the corresponding slot in the R class.
For example, the circle method of the device has this signature:
circle ( numeric, numeric, numeric, R_GE_gcontextPtr, DevDescPtr )
R_GE_gcontextPtr is:
...another reference to an instance of a C-level data type. This is the
information about the "current" settings of the device. This gives us
information about the current pen/foreground color, the background
color, the setting for the gamma level, the line width, style, join,
the character point size and expansion/magnification, and the font
information. The available fields are
names(new("R_GE_gcontextPtr"))
[1] "col" "fill" "gamma" "lwd" "lty"
[6] "lend" "ljoin" "lmitre" "cex" "ps"
[11] "lineheight" "fontface" "fontfamily"

Caveat
I will present a solution here that predominantly uses C++ code. To make it more reproducible so that it can be run from within an R console I have done this using Rcpp::cppFunction. However, this is clearly not the method you would use while building a package. The resultant functions work by accessing raw pointers to R graphics devices that the user must specify, and if you call them using a non-existent device number your R session will crash.
Solution
The three copies of the GPar structure that these comments are describing are kept together in another structure called baseSystemState, which is defined here.
Each graphics device has a pointer to a baseSystemState, and we can access the graphics device using C or C++ code if we include the header file include/R_ext/GraphicsEngine.h in our own code.
However, there's a snag. Although we can get a pointer to the baseSystemState struct, our code has no idea what this actually is, since the definition of baseSystemState and GPar are not part of the public API.
So in order to read the baseSystemState and the GPars it contains, we have to redefine these structures in our own code (as Dirk suggested in his comment). Some of the members of GPar are also of types or enums that need to be defined first.
We can take these definitions, compactify them into a single string and use them as includes in a Rcpp::cppFunction call. Here is a wrapper function that does that, and therefore allows you to write C++ functions that have access to the existing graphic devices' parameters:
cppFunction_graphics <- function(s)
{
include <- paste0("#include \"", R.home("include/R_ext/GraphicsEngine.h\""))
Rcpp::cppFunction(s, includes = c(include,
"typedef enum {DEVICE= 0, NDC= 1, INCHES = 13,
NIC = 6, OMA1= 2, OMA2= 3, OMA3 = 4,OMA4= 5,NFC = 7, NPC= 16,USER= 12, MAR1 = 8,
MAR2= 9, MAR3= 10,MAR4= 11, LINES = 14, CHARS =15 } GUnit; typedef struct {
double ax; double bx; double ay; double by;} GTrans; typedef struct {int state;
Rboolean valid; double adj; Rboolean ann; rcolor bg; char bty; double cex;
double lheight; rcolor col; double crt; double din[2]; int err; rcolor fg;
char family[201]; int font; double gamma; int lab[3]; int las; int lty;
double lwd; R_GE_lineend lend; R_GE_linejoin ljoin; double lmitre; double mgp[3];
double mkh; int pch; double ps; int smo; double srt; double tck; double tcl;
double xaxp[3]; char xaxs; char xaxt; Rboolean xlog; int xpd; int oldxpd;
double yaxp[3]; char yaxs; char yaxt; Rboolean ylog; double cexbase;
double cexmain; double cexlab; double cexsub; double cexaxis; int fontmain;
int fontlab; int fontsub; int fontaxis; rcolor colmain; rcolor collab;
rcolor colsub; rcolor colaxis; Rboolean layout; int numrows; int numcols;
int currentFigure; int lastFigure; double heights[200]; double widths[200];
int cmHeights[200]; int cmWidths[200]; unsigned short order[10007]; int rspct;
unsigned char respect[10007]; int mfind; double fig[4]; double fin[2];
GUnit fUnits; double plt[4]; double pin[2]; GUnit pUnits; Rboolean defaultFigure;
Rboolean defaultPlot; double mar[4]; double mai[4]; GUnit mUnits; double mex;
double oma[4]; double omi[4]; double omd[4]; GUnit oUnits; char pty;
double usr[4]; double logusr[4]; Rboolean new_one; int devmode;
double xNDCPerChar; double yNDCPerChar; double xNDCPerLine; double yNDCPerLine;
double xNDCPerInch; double yNDCPerInch; GTrans fig2dev; GTrans inner2dev;
GTrans ndc2dev; GTrans win2fig; double scale;} GPar; typedef struct {GPar dp;
GPar gp; GPar dpSaved; Rboolean baseDevice;} baseSystemState;"),
env = parent.frame(2))
}
So now we can write a function that will extract or write to the graphics parameters of our choice from the device's starting parameters. Here, we will get our function to return a list of various colour parameters, but you can return whatever parameters you like from GPar, most of which are self-explanatory in the GPar struct definition
cppFunction_graphics("
Rcpp::List get_default_GPar(int devnum)
{
pGEDevDesc dd = GEgetDevice(devnum);
baseSystemState *bss = (baseSystemState*) dd->gesd[0]->systemSpecific;
GPar GP = bss->dp;
auto get_colour = [](rcolor rcol){
return Rcpp::NumericVector::create(
Rcpp::Named(\"red\") = rcol & 0xff,
Rcpp::Named(\"green\") = (rcol >> 8) & 0xff,
Rcpp::Named(\"blue\") = (rcol >> 16) & 0xff);
};
return Rcpp::List::create(Rcpp::Named(\"fg\") = get_colour(GP.fg),
Rcpp::Named(\"bg\") = get_colour(GP.bg),
Rcpp::Named(\"col\") = get_colour(GP.col),
Rcpp::Named(\"colmain\") = get_colour(GP.colmain),
Rcpp::Named(\"collab\") = get_colour(GP.collab),
Rcpp::Named(\"colaxis\") = get_colour(GP.colaxis));
}
")
So now in R I can ensure I have a device operating by doing:
plot(1:10)
And to access the current device's default graphics parameters I can do:
get_default_GPar(dev.cur() - 1)
#> $fg
#> red green blue
#> 0 0 0
#>
#> $bg
#> red green blue
#> 255 255 255
#>
#> $col
#> red green blue
#> 0 0 0
#>
#> $colmain
#> red green blue
#> 0 0 0
#>
#> $collab
#> red green blue
#> 0 0 0
#>
#> $colaxis
#> red green blue
#> 0 0 0
Which gives me the correct values for the default device parameters.
Now I can also write to the default device parameters if I define another function. Suppose I want to be able to change the default colour of the device's labels:
cppFunction_graphics("
void set_col(int dn, int red, int green, int blue, int alpha)
{
int new_col = red | (green << 8) | (blue << 16) | (alpha << 24);
pGEDevDesc dd = GEgetDevice(dn);
baseSystemState *bss = (baseSystemState*) dd->gesd[0]->systemSpecific;
bss->dp.collab = new_col;
}
")
Now I have a function in R that can overwrite the default label colours of the device. Let's make the default labels red:
set_col(dev.cur() - 1, 255, 0, 0, 255)
So now when I make a new plot on the same device, the labels will automatically be red:
plot(1:10)
So, as desired, you can change the device's gpars without interfering directly with par.
As for accessing the saved GPars and current GPars, this is just a case of changing the line GPar GP = bss->dp; to GPar GP = bss->gp or GPar GP = bss->dpSaved

Need to understand how char * strcpy (char cad1, const char cad2) works in C

Can't get how a method with this head: char * strcpy (char *cad1, const char *cad2), works in C in this sample:
'char * strcpy (char *cad1, const char *cad2){
char *aux = cad1;
for( ; *cad1++ = *cad2++; );
return cad1;
}'

Starting from the method signature or prototype, that tells a lot about the how it works: we have two parameters together with their respective types and a return type. All parameters in this case are pointers to char, more known as char pointers. Those char pointers are what is used in "C" as strings of characters. One parameter is a const, because that value must not be changed in the function, it MUST keep, the original value.
Strings in "C" have some peculiarities, once the pointer is created to a string it always points to the first characters in the string or index 0, the same as char *v = var[0], and can be incremented passing to the next char in the string such as v++. Other peculiarity in "C" is that all strings represented by char arrays end with a 0 character (ASCII null = 0).
The strcpy version account on that concepts and makes a for loop to copy each element in the char *cad2 to *cad1, that variables MUST be allocated statically or dynamically (malloc) before calling the function, and the return of the function in the code above is a pointer to the original variable (in that case *cad1, normally they return the copied one). In your function it was changed, I mean it is returning the original instead of the copied what looks wrong since you catch in the aux the pointer to the first element of the copied variable and you did not use it.
One good point to observe is the for loop:
for( ; *cad1++ = *cad2++; );
How it works is tricky, the first interesting point is that the for loop has tree parameters, and in "C" all are optional. The first is to initialize, the second is a boolean condition to continuing iterating, and the last one is to increment or decrement.
Next, tricky is is *cad1++ = *cad2++ a boolean expression? The answer is yes, it is. Since in "C" the value 0 (zero) is false, and anything else is true. Remember that I have said strings in "C" finishes always with a 0 (zero), so when evaluating and assigning to the copy the value of a pointer (using *cad1 will return the value pointed by a pointer variable, the star in the begin makes that magic) and reaches the end of the string that will return false and finish the iteration loop.
One point is interesting here, first the evaluation has less priority than the assignment in this case, what makes first the value being copied to the copy variable, then evaluating the boolean expression.
"C" is like this you writes a small code that have large meaning behind it. I hope you have understood the explanation. For further information have a look in "C" pointers at : https://www.tutorialspoint.com/cprogramming/c_pointers.htm.

char * strcpy (char *cad1, const char *cad2){
for( ; *cad1++ = *cad2++;);
return cad1;
}
the way this works, at the calling side, it can be used in two ways, but always requires a buffer to write to so the use is simmilar.
char arr[255];
memset(arr,0,sizeof(char) * 255); // clear the garbage initialized array;
strcpy(arr, "this is the text to copy that is 254 characters long or shorter.");
puts(arr);
or
char arr[255];
memset(arr,0,sizeof(char) * 255);
puts(strcpy(arr,"hello C!"));
sense the function returns the pointer to the buffer this works as well.

std::bad_alloc being thrown when I create a new char**

I am creating an array of c strings from a vector of strings. I want the resulting array to skip the first element of the vector. The function I am using for this is as follows:
char** vectortoarray(vector<string> &thestrings)
{
//create a dynamic array of c strings
char** temp = new char*[thestrings.size()-2];
for(int i = 1; i < thestrings.size(); i++)
temp[i-1] = (char*)thestrings[i].c_str();
return temp;
}
I know that this code works, as I tested it in a smaller program without error. However, when are run in inside of a marginally larger program, I get the error terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc.
How do I keep this from happening?

I cannot say for certain, but your code CAN throw a bad_alloc when you call new with negative value. If you pass your function an empty vector for example, you are effectively calling
char** temp = new char*[-2];
so you should check this before calling new. From a logical perspective this inclusion of -2 makes little sense anyway. I would also suggest reading this question and answer Why new[-1] generates segfault, while new[-2] throws bad_alloc?

That -2 definitely should not be there. Observe also that you have only allocated an array of pointers to each character array. You also need to allocate memory for the character arrays themselves.

C++ pointer is initialized to null by the compiler

So I've been stuck on a memory problem for days now.
I have a multi-threaded program running with c++. I initialize a double* pointer.
From what I've read and previous programming experience, a pointer gets initialized to garbage. It will be Null if you initialize it to 0 or if you allocate memory that's too much for the program. For me, my pointer initialization, without allocation, gives me a null pointer.
A parser function I wrote is suppose to return a pointer to the array of parsed information. When I call the function,
double* data;
data = Parser.ReadCoordinates(&storageFilename[0]);
Now the returned pointer to the array should be set to data. Then I try to print something out from the array. I get memory corruption errors. I've ran gdb and it gives me a memory corruption error:
*** glibc detected *** /home/user/kinect/openni/Platform/Linux/Bin/x64-Debug/Sample-NiHandTracker: free(): corrupted unsorted chunks: 0x0000000001387f90 ***
*** glibc detected *** /home/user/kinect/openni/Platform/Linux/Bin/x64-Debug/Sample-NiHandTracker: malloc(): memory corruption: 0x0000000001392670 ***
Can someone explain to me what is going on? I've tried initializing the pointer as a global but that doesn't work either. I've tried to allocate memory but I still get a memory corruption error. The parser works. I've tested it out with a simple program. So I don't understand why it won't work in my other program. What am I doing wrong? I can also provide more info if needed.
Parser code
double* csvParser::ReadCoordinates(char* filename){
int x; //counter
int size=0; //
char* data;
int i = 0; //counter
FILE *fp=fopen(filename, "r");
if (fp == NULL){
perror ("Error opening file");
}
while (( x = fgetc(fp)) != EOF ) { //Returns the character currently pointed by the internal file position indicator
size++; //Number of characters in the csv file
}
rewind(fp); //Sets the position indicator to the beginning of the file
printf("size is %d.\n", size); //print
data = new char[23]; //Each line is 23 bytes (characters) long
size = (size/23) * 2; //number of x, y coordinates
coord = new double[size]; //allocate memory for an array of coordinates, need to be freed somewhere
num_coord = size; //num_coord is public
//fgets (data, size, fp);
//printf("data is %c.\n", *data);
for(x=0; x<size; x++){
fgets (data, size, fp);
coord[i] = atof(&data[0]); //convert string to double
coord[i+1] = atof(&data[11]); //convert string to double
i = i+2;
}
delete[] data;
fclose (fp);
return coord;
}

Corrupt memory occurs when you write outside the bound of an array or vector.
It's called heap underrun and overrun (depends on which side it's on).
The heap's allocation data gets corrupted, so the symptom you see is an exception in free() or new() calls.
You usually don't get an access violation because the memory is allocated and it belongs to you, but it's used by the heap's logic.
Find the place where you might be writing outside the bounds of an array.

clGetProgramInfo CL_PROGRAM_BINARY_SIZES Incorrect Results?

I am trying to cache a program in a file so that it does not need to compile to assembly. Consequently, I am trying to dump the binaries. I am getting an issue where the binary program returned alternately has garbage data at the end.
Error checking omitted for clarity (no errors occur, though):
clGetProgramInfo(kernel->program, CL_PROGRAM_BINARY_SIZES, 0,NULL, &n);
n /= sizeof(size_t);
size_t* sizes = new size_t[n];
clGetProgramInfo(kernel->program, CL_PROGRAM_BINARY_SIZES, n*sizeof(size_t),sizes, NULL);
I have confirmed that kernel->program is identical between times. In the above code, "n" is invariably 1, but sizes[0] varies between 2296 and 2312 alternate runs.
The problem is that the 2296 number appears to be more accurate--after the final closing brace in the output, there are three newlines and then three spaces.
For the 2312 number, after the final closing brace in the output, there are the three newlines, a line of garbage data, and then the three spaces.
Naturally, the line of garbage data is problematic. I'm not sure how to get rid of it, and I'm pretty sure it's not an error on my part.
NVIDIA GeForce GTX 580M, with driver 305.60 on Windows 7.
Update: I have changed the code to the following:
//Get how many devices there are
size_t n;
clGetProgramInfo(kernel->program, CL_PROGRAM_NUM_DEVICES, 0,NULL, &n);
//Get the list of binary sizes
size_t* sizes = new size_t[n];
clGetProgramInfo(kernel->program, CL_PROGRAM_BINARY_SIZES, n*sizeof(size_t),sizes, NULL);
//Get the binaries
unsigned char** binaries = new unsigned char*[n];
for (int i=0;i<(int)n;++i) {
binaries[i] = new unsigned char[sizes[i]];
}
clGetProgramInfo(kernel->program, CL_PROGRAM_BINARIES, n*sizeof(unsigned char*),binaries, NULL);
Now, the code has n = 4, but only sizes[0] contains meaningful information (so the alloc of sizes[1] fails in the loop). Thoughts?

I get the number of devices with the following line:
clGetProgramInfo(kernel->program, CL_PROGRAM_NUM_DEVICES, sizeof(cl_uint), &n, NULL);

clGetProgramInfo(kernel->program, CL_PROGRAM_NUM_DEVICES, 0,NULL, &n);
needs to be:
clGetProgramInfo(kernel->program, CL_PROGRAM_NUM_DEVICES, sizeof(size_t), &n, NULL);

clGetProgramInfo with CL_PROGRAM_BINARY_SIZES and CL_PROGRAM_BINARIES needs a pointer to an array and not just to a single variable because it creates binaries for each device that you supplied when building the program. That is why the first line returns nothing. n for the second example should be the number of devices.
Not sure why the second example is different for each run... are you sure you are building for the same device each time?