I'm trying to figure out why dereferencing my pointer always prints 0. I put in other print statements to make sure random() is working correctly, and it does.
int * first = (int *) malloc(sizeof(int) * N);
while( i < N)
first[i++] = random();
printf("%d", first[i]);
}
I even assigned the values of first to another array and those values matched the ones returned by random(). Why does my print statement in this while loop always print 0?
first[i++] = random();
printf("%d", first[i]);
Assuming i is 0, with these two lines you are assigning a value to the first element of the array:
first[0] = random();
incrementing the index:
i++
then printing the value in the second element of the array:
printf("%d", first[1]);
If you make the incrementing of the index explicit it should be clearer:
while (i < N)
{
first[i] = random();
printf("%d", first[i]);
i++;
}
(You also appeared to have missed the opening bracket ({) but that could be a typo in the question)
Related
My function adds all elements of an array together and takes the "start" pointer and the "end" pointer(I know there are easier ways to get the sum). My problem is that my for-loop is skipped. But if I test the condition separately it works. Does that have anything to do with the order of execution of the for-loop?
My example:
int arr[]={3, 2, 1, 1}
int *start = &arr[0]
int *end = &arr[3]
printf("%d\n", (&start[0] == end)) //The result is 0(false)
printf("%d\n", (&start[3] == end)); // The result is 1(true)
for (int i = 0; (&start[i] == end); i++) // The for-loop dosen't get executed.
if i==0 then the condition is false, so the loop is not executed..
if you want to go through the array you should write &start[i] != end for the condition
I am learning opencl for the first time, and I am currently modifying the shortest path finding algorithm. I know that opencl usually uses the idea of parallel computing to solve problems. So I wonder if I can also use this parallel idea when I am dealing with finding the minimum value and its position in the array?
This is my previous attempt. I think that as long as the variable is the smallest, the result can be obtained regardless of whether the operation is locked or not. Unfortunately, when I use printf to view variables, although valid nodes have been judged, I can't get the correct results.
__kernel void findWay(__global int* A, __global int* B, __global int* minNode, __global int* minDis, __global int* isFinish)
{
//A: weightMatrix , B: usedNode
//dijkstra algorithm , src node is 0
size_t dst = get_global_id(1);
size_t src = get_global_id(0);
size_t vCount = get_global_size(0);
int index = dst * vCount + src;
while(isFinish[0] != vCount){
if((src == minNode[0])&&(B[dst] == 0)&&(A[index] != INT_MAX)){
A[dst*vCount] = min(A[dst*vCount + 0],A[minNode[0]*vCount + 0] + A[index]);
}
minDis[0] = INT_MAX;
barrier(CLK_GLOBAL_MEM_FENCE);
//here is the bug
if((src == 0) &&(B[dst] == 0)){
if(minDis[0] > A[index]){
minDis[0] = A[index];
minNode[0] = dst;
}
}
//=========
barrier(CLK_GLOBAL_MEM_FENCE);
B[minNode[0]] = 1;
if(index == 0){
isFinish[0]++;
}
}
}
In the end, I can only use a normal way to achieve this operation.
if((src == 0) &&(dst == 0)){
for(int i = 0 ; i < vCount ;i++){
if(B[i] == 0 && minDis[0] > A[i *vCount]){
minDis[0] = A[i*vCount];
minNode[0] = i;
}
}
I would like to ask about this search process, can the looping step be omitted?
Horizontal operations on the parallelized array are difficult. The general approach to them is binary-tree-like kernel passes. Start with the original array, make each GPU thread load 2 neighboring elements and choose the smaller one, write that in the same array to position of the first of the two elements. Next kernel loads two elements from the list of every second element, compares the two, writes the smaller one in the first position of the two. Repeat until there is only one element left.
I will illustrate it beloe. I mark values that are not touched by the kernel anymore with *.
original array: 5|2|1|6|9|3|4|8
after 1st kernel pass: 2 *|1 *|3 *|4 *
after 2nd kernel pass: 1 * * *|3 * * *
after 3nd kernel pass: 1 * * * * * * *
smallest element is 1.
I'm trying to write a trimmed mean kernel that takes as input a set of frames (~100). I'm thinking of using an insertion sort (of size ~8). This means that I'll need to read one float/ uint/ushort at a time from the input images and compare it against an 8-wide vector, shifting the elements up and inserting the new value at the correct spot (if necessary), with the largest value added to the mean.
I'm having difficulties finding a portable way of shifting the elements in the vector and inserting the new one at the correct spot. I know that AMD GPUs have ds_permute for example, but those are not portable, and I can't figure out a clever way of using arithmetic and relational operators to do it (since those operate only on their lane and AFAIK unaligned vector accesses are UB in OpenCL).
If you only have 8 items in your list then you could add some indirection and have an index table uchar[8]. You assign the pre-sorted elements values 0-7. As you perform the sort you don't rearrange those items, instead you insert their indices into the table.
To get the speedup you then need to store each index using 4 bits to that all 8 fit into a 32-bit word. Honestly, I don't think this will be faster in your case though.
float elements[8];
uint index_table = 0;
uint sorted_size = 0;
// insert elements[i]
void insert(uint i)
{
uint temp = index_table
for (j = 0; j < sorted_size ; ++j)
{
if (elements[i] < elements[temp & 0xf])
{
// Insert i
temp = (temp << 4) | i;
index_table = (index_table & (4 * j - 1)) | (temp << (4 * j));
return;
}
temp >>= 4;
}
// Insert at end
index_table |= i << 4 * sorted_size ;
}
void insertion_sort()
{
// We can skip the first iteration since the 1st element is always inserted at the start
for (sorted_size = 1; sorted_size < 8; ++sorted_size)
{
insert(sorted_size);
}
}
float ith_smallest(uint i)
{
return elements[(index_table >> 4 * i) & 0xf];
}
I am just trying to understand how the recursion works in this example, and would appreciate it if somebody could break this down for me. I have the following algorithm which basically returns the maximum element in an array:
int MaximumElement(int array[], int index, int n)
{
int maxval1, maxval2;
if ( n==1 ) return array[index];
maxval1 = MaximumElement(array, index, n/2);
maxval2 = MaximumElement(array, index+(n/2), n-(n/2));
if (maxval1 > maxval2)
return maxval1;
else
return maxval2;
}
I am not able to understand how the recursive calls work here. Does the first recursive call always get executed when the second call is being made? I really appreciate it if someone could please explain this to me. Many thanks!
Code comments embedded:
// the names index and n are misleading, it would be better if we named it:
// startIndex and rangeToCheck
int MaximumElement(int array[], int startIndex, int rangeToCheck)
{
int maxval1, maxval2;
// when the range to check is only one cell - return it as the maximum
// that's the base-case of the recursion
if ( rangeToCheck==1 ) return array[startIndex];
// "divide" by checking the range between the index and the first "half" of the range
System.out.println("index = "+startIndex+"; rangeToCheck/2 = " + rangeToCheck/2);
maxval1 = MaximumElement(array, startIndex, rangeToCheck/2);
// check the second "half" of the range
System.out.println("index = "+startIndex+"; rangeToCheck-(rangeToCheck/2 = " + (rangeToCheck-(rangeToCheck/2)));
maxval2 = MaximumElement(array, startIndex+(rangeToCheck/2), rangeToCheck-(rangeToCheck/2));
// and now "Conquer" - compare the 2 "local maximums" that we got from the last step
// and return the bigger one
if (maxval1 > maxval2)
return maxval1;
else
return maxval2;
}
Example of usage:
int[] arr = {5,3,4,8,7,2};
int big = MaximumElement(arr,0,arr.length-1);
System.out.println("big = " + big);
OUTPUT:
index = 0; rangeToCheck/2 = 2
index = 0; rangeToCheck/2 = 1
index = 0; rangeToCheck-(rangeToCheck/2 = 1
index = 0; rangeToCheck-(rangeToCheck/2 = 3
index = 2; rangeToCheck/2 = 1
index = 2; rangeToCheck-(rangeToCheck/2 = 2
index = 3; rangeToCheck/2 = 1
index = 3; rangeToCheck-(rangeToCheck/2 = 1
big = 8
What is happening here is that both recursive calls are being made, one after another. The first one searches have the array and returns the max, the second searches the other half and returns the max. Then the two maxes are compared and the bigger max is returned.
Yes. What you have guessed is right. Out of the two recursive calls MaximumElement(array, index, n/2) and MaximumElement(array, index+(n/2), n-(n/2)), the first call is repeatedly carried out until the call is made with a single element of the array. Then the two elements are compared and the largest is returned. Then this comparison process is continued until the largest element is returned.
I am trying to implement a "coupling to the past" algorithm in Rcpp. For this I need to store a matrix of random numbers, and if the algorithm did not converge create a new matrix of random numbers and store that as well. This might have to be done 10+ times or something until convergence.
I was hoping I could use a List and dynamically update it, similar as I would in R. I was actually very surprised it worked a bit but I got errors whenever the list size becomes large. This seems to make sense as I did not allocate the needed memory for the additional list elements, although I am not that familiar with C++ and not sure if that is the problem.
Here is an example of what I tried. however be aware that this will probably crash your R session:
library("Rcpp")
cppFunction(
includes = '
NumericMatrix RandMat(int nrow, int ncol)
{
int N = nrow * ncol;
NumericMatrix Res(nrow,ncol);
NumericVector Rands = runif(N);
for (int i = 0; i < N; i++)
{
Res[i] = Rands[i];
}
return(Res);
}',
code = '
void foo()
{
// This is the relevant part, I create a list then update it and print the results:
List x;
for (int i=0; i<10; i++)
{
x[i] = RandMat(100,10);
Rf_PrintValue(wrap(x[i]));
}
}
')
foo()
Does anyone know a way to do this without crashing R? I guess I could initiate the list at a fixed amount of elements here, but in my application the amount of elements is random.
You have to "allocate" enough space for your list. Maybe you can use something like a resizefunction:
List resize( const List& x, int n ){
int oldsize = x.size() ;
List y(n) ;
for( int i=0; i<oldsize; i++) y[i] = x[i] ;
return y ;
}
and whenever you want your list to be bigger than it is now, you can do:
x = resize( x, n ) ;
Your initial list is of size 0, so it expected that you get unpredictable behavior at the first iteration of your loop.