Strange behaviour with For Loops in OpenCL - opencl

I have a problem with loops in opencl when I want to use counters in my loop to generate values, It seems like the counters values remain constants during the iterations. My OpenCL function code is the following :
void my_loop( __global unsigned int* tab1, unsigned int* tab2){
uint i, j;
uint global_id_y = get_global_id(1);
uint global_size_y = get_global_size(1);
uint loop = global_size_y - global_id_y - 1
uint length = get_global_size(0);
for(i = 0; i < loop; i++){
sidx = i * length;
for(j = 0; j < length; j++){
tab2[sidx + j ] = i*10 + j;
}
}
The code that calls the clEnqueueNDRangeKernel function
global_work_size[0] = ncols; //ncols = 5
global_work_size[1] = nbfrequents; //nbfrequents = 10
local_work_size[0] = ncols;
local_work_size[1] = nbfrequents;
clEnqueueNDRangeKernel(command_queue, kernel, 2, NULL, global_work_size,
local_work_size, 0, NULL, &event);
After execution, the results I obtained is something like this :
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
...
0 1 2 3 4
Is it normal ? How can I do to use the differents values of my counters ?

Related

How to store 4, 2 by 2 arrays into a larger 4 by 4 one?

I have a list of 4 2-by-2 arrays. I want them to be stored together in a larger 4-by-4 array. The first 2 arrays compose of the "first row", the last 2 arrays compose the "second row".
Code
static void Test(){
int[,] arr1 = {
{0,1},
{2,3}
};
int[,] arr2 = {
{4,5},
{6,7}
};
int[,] arr3 = {
{8,9},
{10,11}
};
int[,] arr4 = {
{12,13},
{14,15}
};
List<int[,]> arrList = new List<int[,]>();
int[,] result = new int[4,4];
arrList.Add(arr1);
arrList.Add(arr2);
arrList.Add(arr3);
arrList.Add(arr4);
int v = 0;
foreach(int[,] x in arrList){
for(int i = 0; i < 2; i++){
for(int j = 0; j < 2; j++){
result[v*i-1,v*j-1] = x[i,j]; //This needs to change
}
}
v += 1;
}
}
The end goal based on this example should be
0 1 4 5
2 3 6 7
8 9 12 13
10 11 14 15
Eventually I used the following function. What this function does is it loops over 2 by 2 or 3 by 3 arrays, and combines them to a larger n by n array.
private static string[,] Combine(List<string[,]> arrList){
int elements = arrList.Count * arrList[0].Length;
int n = (int) Math.Pow((double) elements, 0.5);
string[,] result = new string[n,n];
int k = 0;
int r = 0;
int by = 0;
int amt = 0;
if(arrList[0].Length % 2 == 0){
by = (int) Math.Pow(arrList[0].Length, 0.5);
amt = n/by;
}else if(arrList[0].Length % 3 == 0){
by = (int) Math.Pow(arrList[0].Length, 0.5);
amt = n/by;
}
for(int v = 0; v < arrList.Count; v++){
if(v%amt == 0 && v != 0){
k=0;
r +=1;
}
for(int i = 0; i < by; i++){
for(int j = 0; j < by; j++){
result[r*by+i, k*by+j] = arrList[v][i,j];
}
}
k+=1;
}
return result;
}

What is an algorithm to generate the following sequence?

I want to find out the function to generate the sequence with the following pattern.
1 2 3 1 1 2 2 3 3 1 1 1 1 2 2 2 2 3 3 3 3 ....
Where the lower bound number is 1 upper number bound number is 3. Each time numbers start from 1 and each number repeats 2 ^ n times, with n starting with 0.
Here it goes, I hope it will help.
#include <iostream>
#include <math.h>
int main(){
for(int n = 0; n < 5;n++){
for(int i = 1; i < 4;i++){
for(int j = 0;j < pow(2,n) ;j++){
std::cout << i;
}
}
}
return 0;
}
Here is a code in C++:
#include <iostream>
#include <cmath>
int main()
{
// These are the loop control variables
int n, m, i, j, k;
// Read the limit
cin >> n;
// Outermost loop to execute the pattern {1..., 2..., 3...} n times
for (i = 0; i < n; ++i)
{
// This loop generates the required numbers 1, 2, and 3
for (j = 1; j <= 3; ++j)
{
// Display the generated number 2^i times
m = pow(2, i);
for (k = 0; k < m; ++k)
{
std::cout << j << ' ';
}
}
}
}
You can use the same logic in any language you choose to implement it.

OpenCl wrong values when reading from multiple GPU

I have a kernel function that only writes number to a __global int* c
To be specific it looks like this:
__kernel void Add1(__global int* c)
{
*c = 3;
}
and in host code I have allocated memory for C value:
cl_mem bufferC[deviceNumber]; // deviceNumber = 8
for(int i = 0; i< deviceNumber; i++){
bufferC[i] = clCreateBuffer(context[i], CL_MEM_WRITE_ONLY, sizeof(cl_int) * global_size, NULL, &error);
}
for(int i = 0; i< deviceNumber; i++){
error = clSetKernelArg(kernel[i], 0, sizeof(cl_mem), (void*)&bufferC[i]);
}
for(int i = 0; i< deviceNumber; i++){
error = clEnqueueReadBuffer(commandQueue[i], bufferC[i], CL_TRUE, 0, sizeof(cl_int) * global_size, &c[i], 0, NULL, NULL);
}
and I print it like:
for (size_t i = 0; i < deviceNumber; ++i)
{
std::cout<< "delta = " << c[i] << std::endl;
}
and output:
delta = 3
delta = 11165
delta = -1329524360
delta = 11165
delta = 0
delta = 0
delta = -1329520352
delta = 11165
so first value is ok, rest is sort of garbage, do you know what mistake I made writing it?
Of course it is only a partial code, but I think I pasted all the lines regarding that 'c' value. Global size is set to 1.
Well, my mistake was creating number of contexts but in argument I put one device instead of a array of them. But I found it by printing error codes in program - try to do that if you have some problems! Cheers

Shuffle an array in Arduino software

I have a problem with Shuffling this array with Arduino software:
int questionNumberArray[10]={0,1,2,3,4,5,6,7,8,9};
Does anyone know a build in function or a way to shuffle the values in the array without any repeating?
The simplest way would be this little for loop:
int questionNumberArray[] = {0,1,2,3,4,5,6,7,8,9};
const size_t n = sizeof(questionNumberArray) / sizeof(questionNumberArray[0]);
for (size_t i = 0; i < n - 1; i++)
{
size_t j = random(0, n - i);
int t = questionNumberArray[i];
questionNumberArray[i] = questionNumberArray[j];
questionNumberArray[j] = t;
}
Let's break it line by line, shall we?
int questionNumberArray[] = {0,1,2,3,4,5,6,7,8,9};
You don't need to put number of cells if you initialize an array like that. Just leave the brackets empty like I did.
const size_t n = sizeof(questionNumberArray) / sizeof(questionNumberArray[0]);
I decided to store number of cells in n constant. Operator sizeof gives you number of bytes taken by your array and number of bytes taken by one cell. You divide first number by the second and you have size of your array.
for (size_t i = 0; i < n - 1; i++)
Please note, that range of the loop is n - 1. We don't want i to ever have value of last index.
size_t j = random(0, n - i);
We declare variable j that points to some random cell with index greater than i. That is why we never wanted i to have n - 1 value - because then j would be out of bound. We get random number with Arduino's random function: https://www.arduino.cc/en/Reference/Random
int t = questionNumberArray[i];
questionNumberArray[i] = questionNumberArray[j];
questionNumberArray[j] = t;
Simple swap of two values. It's possible to do it without temporary t variable, but the code is less readable then.
In my case the result was as follows:
questionNumberArray[0] = 0
questionNumberArray[1] = 9
questionNumberArray[2] = 7
questionNumberArray[3] = 4
questionNumberArray[4] = 6
questionNumberArray[5] = 5
questionNumberArray[6] = 1
questionNumberArray[7] = 8
questionNumberArray[8] = 2
questionNumberArray[9] = 3

recover index in triangular for loops

Is there a simple way to recover an index in nested for loops? For example, in for loops which construct Pascals triangle
int index = 0;
for (int i = 0; i < N; ++i)
for (int j = 0; j < N-i; ++j)
index++;
is there a way to recover i and j given only index?
I am adding this as a second answer since it is in a different language (now C) and has a more direct approach. I am keeping the original answer since the following code is almost inexplicable without it. I combined my two functions into a single one to cut down on function call overhead. Also, to be 100% sure that it answers the original question, I used the loops from that question verbatim. In the driver function I show explicitly that the output is correct for N = 4 and then stress-test it for N = 10000 (with a total of 100,000,000 passes through the inner loop). I don't have any formal timing code, but it takes about 1 second on my machine to run through and test those 100 million cases. My code assumes a 32-bit int. Change to long if needed:
#include <stdio.h>
#include <math.h>
void from_index(int n, int index, int *i, int *j);
int main(void){
int N;
int ri,rj; //recovered i,j
N = 4;
int index = 0;
for (int i = 0; i < N; ++i)
for (int j = 0; j < N-i; ++j){
from_index(N,index,&ri,&rj);
printf("i = %d, j = %d, index = %d, ",i,j,index);
printf("recovered i = %d, recovered j = %d\n",ri,rj);
index++;
}
//stress test:
N = 10000;
index = 0;
for (int i = 0; i < N; ++i)
for (int j = 0; j < N-i; ++j){
from_index(N,index,&ri,&rj);
if(i != ri || j != rj){
printf("Don't post buggy code to Stack Overflow!\n");
printf("(i,j) = (%d,%d) but recovered indices are (%d,%d)\n",i,j,ri,rj);
return 0;
}
index++;
}
printf("\nAll %d tests passed!\n",N*N);
return 0;
}
void from_index(int n, int index, int *i, int *j){
double d;
d = 4*n*(n+1) - 7 - 8 * index;
*i = floor((-1 + sqrt(d))/2);
*j = *i * (*i + 1)/2;
*j = n*(n+1)/2 - 1 - index - *j;
*j = *i - *j;
*i = n - *i - 1;
}
Output:
i = 0, j = 0, index = 0, recovered i = 0, recovered j = 0
i = 0, j = 1, index = 1, recovered i = 0, recovered j = 1
i = 0, j = 2, index = 2, recovered i = 0, recovered j = 2
i = 0, j = 3, index = 3, recovered i = 0, recovered j = 3
i = 1, j = 0, index = 4, recovered i = 1, recovered j = 0
i = 1, j = 1, index = 5, recovered i = 1, recovered j = 1
i = 1, j = 2, index = 6, recovered i = 1, recovered j = 2
i = 2, j = 0, index = 7, recovered i = 2, recovered j = 0
i = 2, j = 1, index = 8, recovered i = 2, recovered j = 1
i = 3, j = 0, index = 9, recovered i = 3, recovered j = 0
All 100000000 tests passed!
In this particular case we have
index = N+(N-1)+...+(N-i+1) + (j+1) = i(2N-i+1)/2 + (j+1) = -i^i/2 + (2N-1)i/2 + (j+1)
with j in the interval [1,N-i].
We neglect j and regard this as a quadratic equation in i. Thus we solve
-i^i/2 + (2N-1)i/2 + (1-index) = 0.
We approximate i to be the greatest out of the two resulting solutions (or the ceil of this value, since neglecting j has the effect of lowering the value of i).
We then come back to the complete version of the equation and substitute the approximation of the value of i. If j is outside the interval [1,N-i] we increase/decrease the value of i and re-substitute until we get a value of j in this interval. This loop will probably repeat for a maximum constant number of steps (I suspect a maximum of three steps, but not in the mood to prove it). So this should be doable in a constant number of steps.
As an alternative, we could approximate j to be N/3, instead of zero. This is approximately the expected value of j (over all possible cases), thus the method will probably converge 'faster' at the local search step.
In the general case, you do something very similar, i.e. you solve a fake equation and you perform a local search around the solution.
I found it easier to find i,j from the index in the following number pattern:
0
1 2
3 4 5
6 7 8 9
Since the indices going down the left are the triangular numbers of the form k*(k+1)/2. By solving an appropriate quadratic equation I was able to recover the row and the column from the index. But -- your loops give something like this:
0 1 2 3
4 5 6
7 8
9
which is trickier. It might be possible to solve this problem directly, but note that if you subtract each of these numbers from 9 you get
9 8 7 6
5 4 3
2 1
0
this is the original triangle turned upside down and reflected horizontally. Thus -- I can reduce the problem of your triangle to my triangle. The following Python code shows how it works (the only thing not quite obvious is that in Python 3 // is integer division). The function fromIndexHelper is my solution to my original triangle problem and fromIndex is how I shift it to your triangle. To test it I first printed the index pattern for n = 4 and then the corresponding indices recovered by my function fromIndex:
from math import floor, sqrt
def fromIndexHelper(n,index):
i = floor((-1+sqrt(1+8*index))/2)
j = index - i*(i+1)//2
return i,j
def fromIndex(n,index):
shift = n*(n+1)//2 - 1
i,j = fromIndexHelper(n,shift-index)
return n-i-1,i - j
#test
index = 0
for i in range(4):
for j in range(4-i):
print(index,end = ' ')
index +=1
print('')
print(' ')
index = 0
for i in range(4):
for j in range(4-i):
print(fromIndex(4,index),end = ' ')
index +=1
print('')
Output:
0 1 2 3
4 5 6
7 8
9
(0, 0) (0, 1) (0, 2) (0, 3)
(1, 0) (1, 1) (1, 2)
(2, 0) (2, 1)
(3, 0)

Resources