Why the example of SHUFFLE of OpenCL is not valid?

Why the example of SHUFFLE of OpenCL is not valid? - opencl

There is an example of shuffle of OpenCL during the document.
//Examples that are not valid are:
uint8 mask;
short16 a;
short8 b;
b = shuffle(a, mask); // invalid
But I can not understand why. I test this during Android with AndroidStudio, and the result said:build program failed:BC-src-code:9:9:{9:9-9:16}: error: no matching builtin function for call to 'shuffle'. Then, I change the short to int, like this:
uint8 mask;
int16 a;
int8 b;
b = shuffle(a, mask);
and it is ok. I can not find any reason from the document, can anybody help me?
Thanks!

I think the critical part of the description in the spec is this:
The size of each element in the mask must match the size of each element in the result.
I take that to mean that if you want to shuffle a vector of shorts, your mask must be a vector of ushort; a mask of uint8 would only be valid for shuffling vectors with elements of 4 bytes - in other words, int, uint, and float.
So the following should be valid again:
ushort8 mask; // <-- changed
short16 a;
short8 b;
b = shuffle(a, mask); // now valid

Related

Will an array of pointers be equal to an array of chars?

I have got this code:
import std.stdio;
import std.string;
void main()
{
char [] str = "aaa".dup;
char [] *str_ptr;
writeln(str_ptr);
str_ptr = &str;
*(str_ptr[0].ptr) = 'f';
writeln(*str_ptr);
writeln(str_ptr[0][1]);
}
I thought that I am creating an array of pointers char [] *str_ptr so every single pointer will point to a single char. But it looks like str_ptr points to the start of the string str. I have to make a decision because if I am trying to give access to (for example) writeln(str_ptr[1]); I am getting a lot of information on console output. That means that I am linking to an element outside the boundary.
Could anybody explain if it's an array of pointers and if yes, how an array of pointers works in this case?

What you're trying to achieve is far more easily done: just index the char array itself. No need to go through explicit pointers.
import std.stdio;
import std.string;
void main()
{
char [] str = "aaa".dup;
str[0] = 'f';
writeln(str[0]); // str[x] points to individual char
writeln(str); // faa
}
An array in D already is a pointer on the inside - it consists of a pointer to its elements, and indexing it gets you to those individual elements. str[1] leads to the second char (remember, it starts at zero), exactly the same as *(str.ptr + 1). Indeed, the compiler generates that very code (though plus range bounds checking in D by default, so it aborts instead of giving you gibberish). The only note is that the array must access sequential elements in memory. This is T[] in D.
An array of pointers might be used if they all the pointers go to various places, that are not necessarily in sequence. Maybe you want the first pointer to go to the last element, and the second pointer to to the first element. Or perhaps they are all allocated elements, like pointers to objects. The correct syntax for this in D is T*[] - read from right to left, "an array of pointers to T".
A pointer to an array is pretty rare in D, it is T[]*, but you might use it when you need to update the length of some other array held by another function. For example
int[] arr;
int[]* ptr = &arr;
(*ptr) ~= 1;
assert(arr.length == 1);
If ptr wasn't a pointer, the arr length would not be updated:
int[] arr;
int[] ptr = arr;
ptr ~= 1;
assert(arr.length == 1); // NOPE! fails, arr is still empty
But pointers to arrays are about modifying the length of the array, or maybe pointing it to something entirely new and updating the original. It isn't necessary to share individual elements inside it.

Return number of elements

If I build a function with a introduced pointer like this:
int* c=new int[16];
And return it
return c;
How can I determine the size of c, (16), in my main(). I can't use sizeof because c isn't an array...

Since c is a pointer to int (that's what int* c means), what you get from sizeof(c) is exactly the size of the pointer to int. That is why sizeof(c)/sizeof(int*) gives you 1.
If you define c as array, not the pointer:
int c[16];
you'll get its size.

You can't get number of elements in dynamically allocated array. It would work in this case:
int c[16];
int num_elements=sizeof(c)/sizeof(int);
In your case sizeof(c) is probably 4 (size of pointer).

OpenCL void pointer arithmetic - strange behavior

I have wrote an OpenCL kernel that is using the opencl-opengl interoperability to read vertices and indices, but probably this is not even important because I am just doing simple pointer addition in order to get a specific vertex by index.
uint pos = (index + base)*stride;
Here i am calculating the absolute position in bytes, in my example pos is 28,643,328 with a stride of 28, index = 0 and base = 1,022,976. Well, that seems correct.
Unfortunately, I cant use vload3 directly because the offset parameter isn't calculated as an absolute address in bytes. So I just add pos to the pointer void* vertices_gl
void* new_addr = vertices_gl+pos;
new_addr is in my example = 0x2f90000 and this is where the strange part begins,
vertices_gl = 0x303f000
The result (new_addr) should be 0x4B90000 (0x303f000 + 28,643,328)
I dont understand why the address vertices_gl is getting decreased by 716,800 (0xAF000)
I'm targeting the GPU: AMD Radeon HD5830
Ps: for those wondering, I am using a printf to get these values :) ( couldn't get CodeXL working)

There is no pointer arithmetic for void* pointers. Use char* pointers to perform byte-wise pointer computations.
Or a lot better than that: Use the real type the pointer is pointing to, and don't multiply offsets. Simply write vertex[index+base] assuming vertex points to your type containing 28 bytes of data.
Performance consideration: Align your vertex attributes to a power of two for coalesced memory access. This means, add 4 bytes of padding after each vertex entry. To automatically do this, use float8 as the vertex type if your attributes are all floating point values. I assume you work with position and normal data or something similar, so it might be a good idea to write a custom struct which encapsulates both vectors in a convenient and self-explaining way:
// Defining a type for the vertex data. This is 32 bytes large.
// You can share this code in a header for inclusion in both OpenCL and C / C++!
typedef struct {
float4 pos;
float4 normal;
} VertexData;
// Example kernel
__kernel void computeNormalKernel(__global VertexData *vertex, uint base) {
uint index = get_global_id(0);
VertexData thisVertex = vertex[index+base]; // It can't be simpler!
thisVertex.normal = computeNormal(...); // Like you'd do it in C / C++!
vertex[index+base] = thisVertex; // Of couse also when writing
}
Note: This code doesn't work with your stride of 28 if you just change one of the float4s to a float3, since float3 also consumes 4 floats of memory. But you can write it like this, which will not add padding (but note that this will penalize memory access bandwidth):
typedef struct {
float pos[4];
float normal[3]; // Assuming you want 3 floats here
} VertexData;

how to set values in bitfield set variables in a structure?

I have written the code below on Qt,when I put values in it it program.exe stops working.
struct aim
{
int i : 1;
int j : 1;
};
int main()
{
aim missed;
printf("Enter value of i :: ");
scanf("%u",missed.i);
printf("Enter value of j :: ");
scanf("%u",missed.j);
}
can anyone help me out with this problem?

There are a few problems with your code:
A 1-bit signed integer isn't very useful, it can only hold the values -1 and 0.
You can't have a pointer to a bit-field, that's not what pointers mean.
Also, there's nothing in the %d specifier that tells the scanf() function that the target value is a bit field (nor is there any other % specifier that can do this, see 2).
The solution is to scanf() to a temporary variable, range-check the received value, then store it in the bit field.

Because the C/C++ standard does not allow to access the members of a bitfield via a pointer and you have to pass scanf a pointer.

Which is the most efficient operation to split an integer to two characters in an Arduino?

Which of the following two approches is more efficient on an ATmega328P?
unsigned int value;
unsigned char char_high, char_low;
char_high = value>>8;
value = value<<8;
char_low = value>>8;
OR
unsigned int value;
unsigned char char_high, char_low;
char_high = value>>8;
char_low = value & 0xff;

You really should measure. I won't answer your question (since you'd benefit more from measuring than I would), but I'll give you a third option:
struct {
union {
uint16_t big;
uint8_t small[2];
};
} nums;
(be aware of the difference between big endian and little endian here)

One option would be to measure it (as has already been said).
Or, compile both and see what the assembly language output looks like.
but actually, the 2nd code you have won't work - if you take value << 8 and assign it to a char, all you get is zero in the char. The subsequent >>8 will still leave you with zero.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Why the example of SHUFFLE of OpenCL is not valid? - opencl

Related

Will an array of pointers be equal to an array of chars?

Return number of elements

OpenCL void pointer arithmetic - strange behavior

how to set values in bitfield set variables in a structure?

Which is the most efficient operation to split an integer to two characters in an Arduino?

Categories

Resources