I need help understanding data alignment in OpenCL's buffers - opencl

Given the following structure
typedef struct
{
float3 position;
float8 position1;
} MyStruct;
I'm creating a buffer to pass it as a pointer to the kernel the buffer will have the previous buffer format.
I understand that I've to add 4 bytes in the buffer after writing the three floats to get the next power of two (16 bytes) but I don't understand why I've to add another 16 bytes extra before writing the bytes of position1. Otherwise I get wrong values in position1.
Can someone explain me why?

A float8 is a vector of 8 floats, each float being 4 bytes. That makes a size of 32 bytes. As per section 6.1.5 of the OpenCL 1.2 specification, Alignment of Types, types are always aligned to their size; so the float8 must be 32 byte aligned. The same section also tells us that float3 takes 4 words. Also, since the sizeof for a struct is arranged to allow arrays of the struct, it won't shrink from reordering these particular fields. On more complex structs you can save space by keeping the smaller fields together.

Related

How to reflect hlsl struct member real size in directx11?

trying to finish variable locations parser for hlsl, but can't find a way to get structure member size, in this question How to reflect information about hlsl struct members? it was recommended to use offset, but it won't give the actual size of member because of 16-byte packing, so if structure consists of float2 and float4 - it's total size is 32 bytes, offset of second member is 16 bytes, but real size of the first member is 8 bytes and not 16 bytes, i understand that in terms of passed memory there is 16 bytes for that variable and only 8 used, but i have "safety check" that when some value is put in stream in location of this variable - it is of the same size (in case of checking if value at least of variable size - placing it in stream can override data of next variables, and in case of checking if value is of variable size or less - it can lead to hard-to-track bugs if i forget to pass enough data for variable), so is there a way to get the real size of structure member?

How does dlmalloc coalesce chunks?

Here is a detailed description of the dlmalloc algorithm: http://g.oswego.edu/dl/html/malloc.html
A dlmalloc chunk is bookended by some metadata, which includes information about the amount of space in the chunk. Two contiguous free chunks might look like
[metadata | X bytes free space | metadata ][metadata | X bytes free space | metadata]
Block A Block B
In that case we want to coalesce block B into block A. Now how many bytes of free space should block A report?
I think it should be 2X + 2 size(metadata) bytes, since now the coalesced block looks like:
[metadata | X bytes free space metadata metadata X bytes free space | metadata]
But I'm wondering if this is correct, because I have a textbook that says the metadata will report 2X bytes without including the extra space we get from being able to write over the metadata.
You can see the answer yourself by looking at the source. Begin with line 1876 to verify your diagram. The metadata is just two size_t unsigned integers, accessed by aliasing a struct malloc_chunk (line 1847). Field prev_size is the size of the previous chunk, and size is the size of this one. Both include the size of the struct malloc_chunk itself. This will be 8 or 16 bytes on nearly all machines depending on whether the code is compiled for 32- or 64-bit addressing.
The "normal case" coalescing code starts at line 3766. You can see that the size variable it's using to track coalescing is chunk size.
So - yeah - in the code blocks marked /* consolidate backward */ and /* consolidate forward */, when he adds the size of the preceding and succeeding chunks, he's implicitly adding the size of the struct malloc_chunk as you suspected.
This shows that your interpretation is correct. My expectation is that the textbook author just got sloppy about the difference between chunk size (which includes metadata) and the size of the memory block allocated to the user. Incidentally, malloc takes care of this difference at line 3397.
Perhaps the bigger lesson here is that - when you're trying to learn anything - you should never skip an opportunity to go straight to the first-hand source and figure stuff out for yourself.

Explain concept of size of integer,character and float pointer in GCC

In GCC(Ubuntu 12 .04) Following code is the program which i need to understand for the concept of size of integer,character and float pointer.
#include<stdio.h>
main()
{
int i=20,*p;
char ch='a',*cp;
float f=22.3,*fp;
printf("%d %d %d\n",sizeof(p),sizeof(cp),sizeof(fp));
printf("%d %d %d\n",sizeof(*p),sizeof(*cp),sizeof(*fp));
}
Here i am getting following output when i run the above code in "UBUNTU 12.04"
Output:
8 8 8
4 1 4
As per this lines,"Irrespective of data types,size of pointer for address it will allow 4 bytes BY DEFAULT"
Then what is the reason behind getting sizeof(p)=8 instead it should be sizeof(p)=4....
Please explain me.
sizeof(x) will return the size of x. A pointer is like any other variable, except that it holds an address. On your 64 bit machine, the pointer takes 64 bits or 8 bytes, and that is what sizeof will return. All pointers on your machine will be 8 bytes long, regardless of what data they point to.
The data they point to may be of a different length.
int x = 5; // x is a 32 bit int, takes up 4 bytes
int *y = &x; // y holds the address of x, & is 8 bytes
float *z; // z holds the address of a float, and an address is still 8 bytes long
You're probably getting confused because you previously have done this on a 32 bit computer. You see, the 32 / 64 bit indicates the size of a machine address. So, on a 32 bit computer, a pointer holds an address that is at most 32 bits long, or four bytes. Your current machine must be a 64 bit machine, which is why the pointer needs to be 8 bytes long.
Read more about this here.
Heck, it's not just the address length. The size of other data types is also platform AND implementation dependent. For example, an int may be 16 bits on one platform & 32 bits on another. A third implementation might go crazy and have 128 bit ints. The only guarantee in the spec is that an int will be at least 16 bits long. When in doubt, always check. The Wikipedia page on C data types would be helpful.
sizeof(p) will return an address, and you are most likely running on a 64-bit machine, so your addresses will be (8*8) or 64 bits in length.
The size of the value dereferenced by p is a 32 bit integer (4*8).
You can verify this by seeing that all:
All pointers have sizeof as 8
Your char value is size 1 (typical of many implementations of c)
Print p and *p (for all variables). You will see the actual address length this way.
I'm not sure which documentation you're using but my guess is that they're talking about pointers in 32bit.
In 64 bit the size of a pointer becomes 8 bytes

Passing 3 Component Vector to openCL (java)

I am trying out openCL and I was wondering how one goes about passing a 3 component vector (float3) to an openCL program? This is probably really simple but I cannot get it to work...
Thanks
A float3 is always stored in 16 bytes, not 12. You should align all your float3 buffers to 16 bytes, or simply use float4. On the host, cl_float3 is equivalent to cl_float4.

Is the endian-ness of a QImage created from a uchar[] platform-dependent?

QImage has a constructor QImage (uchar *data, int width, int height, int bytesPerLine, Format format) that creates a QImage from an existing memory buffer.
Is the order of bytes (uchars) platform-dependent? If I put the values for alpha, red, green, and blue in it with increasing indices, alpha is swapped with blue and red is swapped with green. This indicates a problem with endian-ness.
I now wonder whether the endian-ness is platform-dependent or not. The Qt documentation does not say anything about this.
If it is NOT platform-dependent, I would just change the order of storing the values:
texture[ startIndex + 0 ] = pixelColor.blue();
texture[ startIndex + 1 ] = pixelColor.green();
texture[ startIndex + 2 ] = pixelColor.red();
texture[ startIndex + 3 ] = pixelColor.alpha();
If it is platform-dependent, I would create an array of uint32, store values computed as alpha << 24 | red << 16 | green << 8 | blue, and reinterpret_cast the array before passing it to the QImage() constructor.
Best regards,
Jens
It depends on the format. Formats that state the total number of bits in a pixel are endian-dependent. Like Format_ARGB32 indicates a 32-bit integer whose highest 8 bits are alpha, which on a little endian machine, the same 8 bits are the last byte in the byte sequence.
Formats with individual bits in the sequence like Format_RGB888 are not endian-dependent. Format_RGB888 says the bytes are arranged in memory in R,G,B order regardless of endian.
To access bytes in the buffer, I would use Q_BYTE_ORDER macro to conditionally compile in the corresponding byte access code, instead of using shifts.
I personally use Format_RGB888 since I don't deal with alpha directly in the image. That saves me the problem of dealing with endian difference.
From the Qt Docs:
Warning: If you are accessing 32-bpp image data, cast the returned
pointer to QRgb* (QRgb has a 32-bit size) and use it to read/write the
pixel value. You cannot use the uchar* pointer directly, because the
pixel format depends on the byte order on the underlying platform. Use
qRed(), qGreen(), qBlue(), and qAlpha() to access the pixels.

Resources