Self Referencing Pointer in OpenCL - opencl

I have an OpenCL C++ code working on the Intel Platform. I do have an idea that pointers are not accepted within a structure on the Kernel End. However, I have a Class which utilizes the Self-Referencing Pointer option within it. Now, I am able to use a structure and replicate the same for the structure on the host side but I am not able to do the same on the device side.
For example as follows:
Class Classname{
Classname *SameClass_Selfreferencingpointer;
}
On the Host side I have done the same for the structure as well:
struct Structurename{
Structurename *SameStructure_Selfreferencingpointer;
}
Could someone give an alternate option for this implementation for the device side?
Thank you for any help in advance.

Since there isn't malloc in opencl device and also structs are used in buffers as an array of structs, you could add index of it so it knows where it remains in the array. You can allocate a big buffer prior to kernel, then use atomic functions to increment fake malloc pointer as if it is allocating from the buffer but simply returning an integer that points to last "allocated" struct index. Then, host side would just use the index instead of pointer.
If struct alignments become an issue between host an device, you can add indexing of fields too. Such as starting byte of a field A, starting byte of a field B, all compacted in a single 4-byte integer for a struct having 4 used fields except indexes.
Maybe you can add a preprocess stage:
host writes an artificial number to a field such as 3.1415
device checks floating points in struct for all byte offsets until it finds 3.1415
device puts the found byte offset to an array and sends it to host
then host writes float fields in a struct starting from that byte offset
so host and device become alignment compatible, uses same offset in all kernels that get a struct from host
maybe opposite is better
device puts 3.14 in a field of struct
device writes the struct to an array of struct
host gets the buffer
host checks for 3.14 and finds byte offset
host writes and fp number starting from that offset for future work
which would need both your class and its replicated struct on host+device side.
You should also look for "sycl api".

Related

Is every variable and register name just a pointer in NASM Assembly?

There are [] operations which are similar to dereferencing in high-level languages. So does that mean that every variable and register name is just a pointer or are pointers a high-level languages idea and have no use in Assembly?
Pointers are a useful concept in asm, but I wouldn't say that symbol names are truly pointers. They're addresses, but they aren't pointers because there's no storage holding them (except metadata, and embedded into the machine code), and thus you can't modify them. A pointer can be incremented.
Variables are a high-level concept that doesn't truly exist in assembly. Asm has labels which you can put at any byte position in any section, including .data, .text, or whatever. Along with directives like dd, you can reserve space for a global variable and attach a symbol to it. (But a variable's value can temporarily be in a register, or for its whole lifetime if it's a local variable. The high-level concept of a variable doesn't have to map to static storage with a label.)
Types like integer vs. pointer also don't really exist in assembly; everything is just bytes that you can load into an integer, XMM, or even x87 FP register. (And you don't really have to think of that as type-punning to integer and back if you use eax to copy the bytes of a float from one memory location to another, you're just loading and storing to copy bytes around.)
But on the other hand, a pointer is a low-enough level concept still be highly relevant in assembly. We have the stack pointer (RSP) which usually holds a valid address, pointing to some memory we're using as stack space. (You can use the RSP register to hold values that aren't valid addresses, though. In that case you're not using it as a pointer. But at any time you could execute a push instruction, or mov eax, [rsp], and cause an exception from the invalid address.)
A pointer is an object that holds the address of another object. (I'm using "object" in C terms here: any byte[s] of storage that you can access, including something like an int. Not objected as in object-oriented programming.) So a pointer is basically a type of integer data, especially in assembly for a flat memory model where there aren't multiple components to it. For a segmented memory model, a seg:off far pointer is a pair of integers.
So any valid address stored anywhere in register or memory can usefully be thought of as a pointer.
But no, a symbol defined by a label is not a pointer. Conceptually, I think it's important to think of it as just a label. A pointer is itself an object (some bytes of storage), and thus can be modified. e.g. increment a pointer. But a symbol is just a way to reference some fixed position.
In C terms, a symbol is like a char symbol[], not a char *symbol = NULL; If you use bare symbol, you get the address. Like mov edi, symbol in NASM syntax. (Or mov edi, OFFSET symbol in GNU .intel_syntax or MASM. See also How to load address of function or label into register for practical considerations like using RIP-relative LEA if 32-bit absolute addresses don't work.)
You can deref any symbol in asm to access the bytes there, whether that's mov eax, [main] to load the first 4 bytes of machine code of that function, or mov eax, [global_arr + rdi*8] to index into an array, or any other x86 addressing mode. (Caveat: 32-bit absolute addresses no longer allowed in x86-64 Linux? for that last example).
But you can't do arr++; that makes no sense. There is no storage anywhere holding that address. It's embedded into the machine code of your program at each site that uses it. It's not a pointer. (Note that C arr[4] = 1 compiles to different asm depending on char *arr; vs. char arr[], but in asm you have to manually load the pointer value from wherever it's stored, and then deref it with some offset.)
If you have a label in asm whose address you want to use in C, but that isn't attached to some bytes of storage, you usually want to declare it as extern const char end_of_data_section[]; or whatever.
So for example you can do size_t data_size = data_end - data_start; and get the size in bytes as a link-time constant, if you arranged for those symbols to be at the end/start of your .data section. With a linker script or with global data_end / data_end: in your NASM source. Probably at the same address as some other symbol, for the start.
Assembly language doesn't have variables, pointers or type checking. It has addresses and registers (and no type checking).
A variable (e.g. in C) is a higher level thing - a way to abstract the location of a piece of data so that you don't have to care if the compiler felt like putting it in a register or in memory. A variable also has a type; which is used to detect some kinds of bugs, and used to allow the compiler to automatically convert data to a different type for you.
A pointer (e.g. in C) is a variable (see above). The main difference between a pointer and a "not pointer" is the type of data it contains - for a pointer the variable typically contains an address, but a pointer is not just an address, it's the address of something with a type. This is important - if it was just an address then the compiler wouldn't know how big it is, couldn't detect some kinds of bugs, and couldn't automatically convert data (e.g. consider what would happen for int foo = *pointer_to_char; if the compiler didn't know what type of data the pointer points to).

OpenCL vector data type usage

I'm using a GPU driver that is optimized to work with 16-element vector data type.
However, I'm not sure how to use it properly.
Should I declare it as, for example, cl_float16 on host with a size 16 times less than the original array?
What is the better way to access this type on the OpenCL kernel?
Thanks in advance.
In host code you can use cl_float16 host type. Access it like an array (e.g., value.s[5]). Pass as kernel argument. In kernel, access like value.s5.
How you declare it on the host is pretty much irrelevant. What matters is how you allocate it, and even that only if plan on creating the buffer with CL_MEM_USE_HOST_PTR and your GPU uses system memory. This is because your memory needs to be properly aligned for GPU zero-copy, otherwise the driver will create a background copy. If your GPU doesn't use system memory for buffers, or you don't use CL_MEM_USE_HOST_PTR, then it doesn't matter - the driver will allocate a proper buffer on the GPU.
Your bigger issue is that your GPU needs to work with 16-element vectors. You will have to vectorize every kernel you want to run on it. IOW every part of our algorithms need to work with float16 types. If you just use simple floats, or you declare the buffer as global float16* X but then use element access (X.s0, X.w and such) and work with those, the performance will be the same as if you declared the buffer global float* X - very likely crap.

Why is a buffer used in Win32 API syscall cast to [1<<20]<type> array?

I'm writing a golang application which interacts with Windows Services using the windows/svc package.
When I'm looking at the package source code how syscalls are being done I see interesting cast construct:
name := syscall.UTF16ToString((*[1 << 20]uint16)(unsafe.Pointer(s.ServiceName))[:]
Extracted from mgr.go
This is a common patttern when dealing with Win32 API when one needs to pass a pre-allocated buffer to receive a value from Win32 API function, usually an array or a structure.
I understand that Win API returns a unicode string represented by its pointer and it is passed to the syscall.UTF16ToString(s []uint16) function to convert it to the go string in this case.
I'm confused from the part when an unsafe pointer is cast to the pointer to 1M array, *[1<<20]uint16.
Why the size if 1M [1<<20]?
Buffer for a value is allocated dynamically, not with fixed size of 1M.
You need to choose a static size for the array type, so 1<<20 is chosen to be large enough to allow for any reasonable buffer returned by the call.
There is nothing special about this size, sometimes you'll see 1<<31-1 since it's the largest array for 32bit platforms, or 1<<30 since it looks nicer. It really doesn't matter as long as the type can contain the returned data.

opencl constant memory or value arguments

In OpenCL you can pass a buffer to a kernel via clSetKernelArg and mark that buffer as __constant in the kernel. Alternatively, you can also use clSetKernelArg to pass a value type.
My question is, where does the value type live? Does the API create a constant buffer behind the scenes? Does the API generate a special shader with those values as constant literals?
I'm just curious because I come from a direct3d/opengl background, and constants always had to be passed through constant buffers. So I'm wondering how passing a type by value as an argument works under the hood.

MPI_Aint in MPI_(I)NEIGHBOR_ALLTOALLW() vs int in MPI_(I)ALLTOALLW()

With MPI3.0 neighborhood collective communications were introduced.
And in 2 of them (MPI_NEIGHBOR_ALLTOALLW and MPI_INEIGHBOR_ALLTOALLW) displacements (sdispls and rdispls) are arrays of const MPI_Aint. Contrariwise of how the same, but collective, funcions (MPI_ALLTOALLW and MPI_ALLTOALLW) are defined -arrays of const int.
Also considering what the MPI Standard v3.0 says about MPI_Aint (page 16):
2.5.6 Addresses
Some MPI procedures use address arguments that represent an absolute address in the
calling program. The datatype of such an argument is MPI_Aint in C and
INTEGER (KIND=MPI_ADDRESS_KIND) in Fortran. These types must have the same width
and encode address values in the same manner such that address values in one language
may be passed directly to another language without conversion. There is the MPI constant
MPI_BOTTOM to indicate the start of the address range.
I still don't get the point and, if exist, the difference (in addition that MPI_Aint can't be negative) between int and MPI_Aint!
MPI_Aint is a portable C data type that can hold memory addresses and it could be larger than the usual int. The policy of the MPI Forum is to not change the signature of existing MPI calls (as it could break existing applications - see here). Rather new calls are introduced that supersede the old ones. The rationale is that int worked well before LP64 64-bit architectures become popular at which point int could no longer be used to address the whole virtual address space of a single process. After this realisation some MPI calls got new versions in later versions that use MPI_Aint or MPI_Count (large integer type) instead of int. For example, MPI_Get_count_x supersedes MPI_Get_count and uses MPI_Count instead of int.
In this respect MPI_Alltoallw is an old call (it comes from MPI-2.0) and it retains its signature of using int offsets while MPI_(I)Neighbor_alltoallw is a new one (it comes with MPI-3.0) and it uses the address type in order to be able to work with data located (almost) anywhere in memory.
The same applies to the Fortran bindings.

Resources