Universal binaries for OpenCL

Universal binaries for OpenCL - opencl

I have 2 computers, one with Raden R9 290x, and Raden R7 250 in another. The following discussion focuses only on AMD graphics cards. On both machines the same driver installed. I wrote OpenCL kernel, compile it into a binary and use clCreateProgramWithBinary. But I was faced with the following challenges:
Compiled binaries for these two devices are different: for R7 binary weighs ~ 500KB, and for R9 ~ 1.5MB.
I have no problem when using the binary on the device for which it was compiled, everything happens instantly. But if I try to run the binary for R7 at R9, then clBuildProgram executed a very long time (~ 1 min.), And contrary (binary from R9 load to R7) clBuildProgram - causes access violation.
I need to get the binary that will run on all of AMD's graphics cards, support OpenCL. How to compile OpenCL kernel that it would work properly on all devices?

Related

OpenCL for Intel CPU and Nvidia GPU simultaneously

I am trying to get started with some OpenCL coding.
I've installed the NVidia CUDA OpenCL on my computer and have managed to build a simple "Hello World!" application using Visual Studio 2017.
I have also installed the Intel OpenCL SDK (installation warned me that I needed to update my OpenCL drivers but the Intel update manager was telling me that everything was up to date, so I'm not sure whether this could be an issue).
Now whenever I query the OpenCL platforms on my PC lie so:
std::vector< cl::Platform > platformList;
cl::Platform::get(&platformList);
I only get back my nVidia openCL platform, with my GPU as the only device. I am not getting anything back for my CPU.
Can anyone help? Is it possible to perform both CPU and GPU OpenCL computations in the same project (In different OpenCL contexts? How would I go about doing this?

Seems that Intel GPU driver was not installed properly. You can install a CPU-only package instead:
https://software.intel.com/en-us/articles/opencl-drivers#latest_CPU_runtime

Is there a utility toolkit for OpenCL?

Writing simple OpenCL kernels evolves repeating the following steps:
1. Put the kernel code in a string
2. call clCreateProgramWithSource
3. call clBuildProgram
4. call clCreateKernel
5. call clSetKernelArg (x number of arguments)
6. call clEnqueueNDRangeKernel
Is there a utility library that can make this process less painful, even in the cost of reduced flexibility? I am looking for something similar to GLUT / OpenGL for writing OpenCL programs

Check out Intel(R) SDK for OpenCL Applications https://software.intel.com/en-us/intel-opencl - it has tools to simplify OpenCL development quite a bit.

Beignet does not find CPU

I am using Beignet to try out OpenCL on my notebook with a 4th gen i7 and integrated graphic accelerator, running Ubuntu 16.04
Upon running clinfo I only find 1 platform and 1 device, which is the graphic accelerator.
Should I not find also the CPU itself? As I have read that OpenCL allows to use the host as a normal device and run some kernels on it

Beignet does not include an ICD for Intel CPUs, it's only for the integrated GPU:
Beignet is an open source implementation of the OpenCL specification - a generic compute oriented API. This code base contains the code to run OpenCL programs on Intel GPUs which basically defines and implements the OpenCL host functions required to initialize the device, create the command queues, the kernels and the programs and run them on the GPU.
(from the official beignet webpage)
You need to install the intel ICD, since there appears to not be an open source OCL implementation for intel CPUs.

Use 64bit timestamp on a 32bit machine

We all know the Y2K problem , and this problem will arrive soon in 2038. All the Solution i was read it was say "use 64bit OS" , so i have a question; If my program was compiled with any 64-bit platform it`s possible to running on a machine that only work with 32-bit for example like old Pentium CPU? I was read some resource that say the int_64bit can be represent on 32bit machine by using two 32bit integer.

Simple Answer is
A program compiled for 64 bit platform can not run on 32 bit processor.
Timestamp is mainly application specific and Your programming language takes care of handling it and Operating System will take care of that.
See you need to understand 2 things
Compiling on a 64 bit platform doesn't mean you application is 64 bit. You can compile a 32 bit application on 64 bit platform.
Now see under the hood everything a program or an application that you run on you computer is executed by the processor, so every program is converted or object code or machine code at the end moment before its being executed by the processor and these "bits" actually represents the size of "registers" (kind of memory that processor uses) that your processor has. So the point is whenever a high level language like program C++ is converted to machine code. The compilers follow these steps (in simple terms)
Your Source Code ---> Convert to System specific Assembly ---> Binary Code
Then this binary code is executed by the processor.
Now see that I mentioned Assembly language is machine/processor specific. Its different for ARM processors, x86 Processors, PowerPC Processors etc..(these are processor architectures)
Now in your case lets assume we are now thinking how an 32 bit Intel processor (x86) is gonna run 64 bit Program
You wrote a program for adding to numbers 10 and 20 and compiled the binary for running directly on a processor (for time being forget about OS and Its Libs)
Now lets see "Your Source Code" converted to both 32 bit Assembly Program as well as 64 Bit Assembly Program
Assembly Output (32 Bit or x86 Code)
mov eax, 10
add eax, 20 ;OUTPUT IS SAVED TO "EAX" REGISTER WHICH IS A 32 BITS REGISTER
Assembly Output (64 Bit or x86_64 Code)
mov rax, 10
add rax, 20 ;OUTPUT IS SAVED TO "RAX" REGISTER WHICH IS A 64 BIT REGISTER
Now these Assembly program is line by line converted to binary code
When you run the output binary of the 64 bit code on a 32 bit processor then 32 bit processor doesn't even know what "RAX" is, that will immediately cause an Interrupt (Error from Processor)
Image of a Processor Register is shown in the figure (x86_64 specific)
There are many registers available inside a processor for simplicity I'm just using EAX and RAX register of x86_64 processor. For an x86_64 Processor (64 bit) knows what these "RAX" and "EAX" means. But a 32 bit Processor(x86) wont understand what an "RAX" means. Thats the only reason why 32 bit programs can run on a 64 bit machine and the opposite is not possible....
Note : Its not just Registers there can be processor specific instructions. All the instructions that is supported by a newer processor may not be supported by an older processor.
Example: " imul reg, reg " (Integer Multiplication) instruction is only available from Intel 80386 and above
I know its a bit confusing. Its a bit complicated topic to explain. Still I hope this solves your issue.

Qt/MinGW32 memory usage limitation?

I wrote an application with Qt 4.8.1 and MinGW32 (Nokia Qt SDK). I try to load a large file with this app, but the app always crash when memory usage reach 1,868 MB. If I reduce the size of input file the app works fine. Is there any memory limitations on Qt apps or MinGW32? What should I do if I really want my app to use more memory? My windows is 64 bit.
p.s. Adding "QMAKE_LFLAGS_WINDOWS += -Wl,--stack,32000000" to .pro file won't work
Thanks very much!
p.p.s. I saw many software are capable of using 10+ GB, e.g. Matlab, how to do that on Qt apps?

Your copy of windows may be 64 bit, but MingW32 is a 32 bit compiler, so any app written with that compiler has all the standard limits inherent to 32 bit Windows. Effectively, you won't be able to get more than around 2G of memory for your app to use.
There's a method to get that up to 3G, but beyond that you need a 64 bit compiler.

2GB is limit is for process only.
You can spread your application along N processes (32-bit) to allocate N x 2GB. Operating system must still be 64-bit.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex