OpenCL SPIR binary build is very slow on Intel Skylake GPU - opencl

I've have a number of OpenCL kernels that are compiled into SPIR binaries. I understand that:
Additional time is required to build a program from a SPIR binary as opposed to building a program from an Intermediate Binary, since additional translation and optimization steps are involved.
See: Using SPIR for fun and profit with Intel® OpenCL™ Code Builder
However, building the SPIR kernels on an Intel Skylake i7 GPU takes almost a minute!
Is this a SPIR specific issue? Would SPIR-V be any faster?

Related

OpenCl installation on Ubuntu Gnome 16.10

I'm trying to build a software from source, OpenCL is a required package. But I'm really confused as to what it is and how to install it.
OpenCL is a framework for parallel computing on GPUs and multi-core CPUs. To use it, you need to install a platform, which may depend on what kind of hardware you have, and what type of device your application wants to use. Here is a list of the main platforms:
AMD APP: for AMD GPUs and Intel/AMD CPUs http://developer.amd.com/tools-and-sdks/opencl-zone/amd-accelerated-parallel-processing-app-sdk/
NVIDIA CUDA: for NVIDIA GPUs only https://developer.nvidia.com/cuda-downloads
Intel: for Intel CPUs https://software.intel.com/en-us/intel-opencl

What is opencl kernel compiler?

I have a general question. im working on a framework and need to know about the opencl kernel compiler What is opencl kernel compiler? Is there any source for detailed study?
It's the compiler that converts OpenCL kernels to binaries that the platform running them can understand.
For example, if you're running OpenCL on Intel HD graphics with the Intel OpenCL SDK, that SDK includes a compiler that compiles the kernels down to binaries using that GPU's instruction set.

Intel OpenCL Vs. Khronos OpenCL

What is the difference between Intel, AMD and Khronos OpenCLs. I am totally new to OpenCL and want to start with it. I don't know which one is better to install on my operating system.
OpenCL is an "extension" to C and C++ languages that enables parallelization of software on your compute devices: CPU, GPU, etc.
OpenCL is defined by a standard (created by Khronos Group) and implemented by hardware vendors Intel, AMD, nVidia, etc.. So each OpenCL implementation requires a vendor specific OpenCL driver that will enable the usage of the vendor's hardware.
So to conclude, if you have an Intel based system, use the Intel OpenCL because only so you would be able to use all compute devices in your machine. The same goes if you have an AMD system. Also, take note that there is no Khronos OpenCL implementation.
Of course you can have a platform with OpenCL enabled devices from multiple vendors (e.g. Intel CPU+GPU and nVidia discrete card). In this case the OpenCL runtime contains a generic layer (a dynamic loaded library). This layer is an interface which calls the implementations provided in each device driver depending on the selected OpenCL platform.
OpenCL is a standard defined by Kronos. They distribute header files that you have to give to your compiler. They do not distribute binaries to link against. For that, you must get an ICD (Installable Client Driver), on Windows this is in the form of a DLL file. You will get it from installing one or more of...
Nvidia drivers (if you have an Nvidia GPU)
AMD drivers (if you have an AMD GPU or an AMD CPU)
Intel Drivers (if you have an Intel CPU, also some Intel CPU's have built in GPU's).
Do not worry about compiling against one vendor and it not working on another, OpenCL has been carefully designed to work around this. Compile against any version you have, it will work with any other version that is the same or newer, regardless of who made it.
Be Aware, the AMD OpenCL driver will operate as an OpenCL driver for Intel CPU's. If, for example, you have an AMD GPU and an Intel CPU, and have installed the Intel OpenCL driver and the AMD OpenCL driver, the AMD driver will report that it can provide both a GPU device and a CPU device (your CPU), and the Intel driver will report having a CPU device (also your CPU) and most likely also a GPU device (the GPU that is on the Intel CPU die, for example on an i7-3770, this will be a HD4000). If you blindly ask OpenCL for "All CPU's available" you will get the AMD drivers and the Intel drivers offering you the same CPU. Your code will not run very well in this case.
On Windows it is expected that you will download the header files yourself, and then either create a library from the DLL (MSVC), or link directly against the DLL (Mingw & Clang default behavior).
On Linux, you package manager will likely have a library to link against, consult your distributions documentation regarding this. On Ubuntu and Debian this command will work...
sudo apt-get install ocl-icd-opencl-dev
On Mac, there is nothing to install, and trying to install something will likely damage your system. Just install Xcode, and use the framework "OpenCL".
There are other platforms, for example Android. Some FPGA vendors offer OpenCL libraries. Consult your vendors documentation.
Khronos defines OpenCL standard. Each vendor/ open source will implement that standards.
Khronos defines set of conformance tests which need to pass if a vendor claims that his opencl implementation is as per standard.

How do I program an INTEL GPU

I am quite new in the world of GPU Computing. So I would really like someone to explain me the very basics. I have to Intel chipsets with the following GPUs:
GMA4500
HD graphics
I am interested in running algebraic and bitwise functions with huge data sets, like transpose of an array or bitwise shift of the lines of an array, in a GPU. The goal is of course to gain more performance.
My main question is how can I program such on GPUs? In the past I have used CUDA to program on nVIDIA video card. I understand from previous topics that I can't use CUDA for an INTEL GPUs. Thanks in advance!!
Update 1
I found out that Intel supports OpenCL for HD graphics. More precisely the Intel SDK for OpenCL applications provides a comprehensive development environment for OpenCL application on Intel® platforms including compatible drivers, code samples, development tools, such as the code builder, optimization guide, and support for optimization tools.
The SDK supports OpenCL 1.2 on 3rd and 4th generation Intel® Core™ processors with Intel® HD Graphics and Intel® Iris™ Graphics Family, Intel® Atom™ Processors with Intel HD Graphics, Intel® Xeon® processors, and Intel® Xeon Phi™ coprocessors.
OpenCL is the standard, cross-vendor API for GPGPU programming, roughly analogous to nVidia's proprietary CUDA.

AMD APP OpenCL SDK on Intel

I have seen that AMD APP SDK samples work on a machine having only Intel CPU.
How can this happen? How does the compiler target a different machine architecture?
Do I not need Intel's set of compilers for running the code on the intel CPU?
I think if we have to run an OpenCL application on a specific hardware, I have to (re)compile it using device's vendor specifics compiler.
Where is my understanding wrong?
Firstly, OpenCL is built to work on CPU's and GPU's. You can compile and run the same source code on either type of device. However, its very likely that CPU code will be sub-optimal for a GPU and vice-versa.
AMD H/W is 7% - 14% of total x86/x64 CPU's. So AMD must develop compilers for both AMD and Intel chips to be relevant. AMD have history developing compilers for both sets of chips. Conversely, Intel have developed compilers that either don't work on AMD chips or don't work that well. That's no surprise.
With OpenCL, the AMD APP SDK is the most flexible it will work well on AMD and Intel CPU's and AMD GPUs. Intel's OpenCL SDK doesn't even install on AMD x86 H/W.
If you compile an OpenCL program to binary, you can save and reuse it as long as it matches the OpenCL Platform and Device that created it. So, if you compile for one device and use on another you are very likely to get an error.
The power of OpenCL is abstracting the underlaying hardware and offer massive, parallel and heterogeneous computing power.
Some SDKs and platforms offers some specific features to "optimize" the code, i honestly think that such features are just marketing and they introduce boilerplate code making the application less portable.
There are also some pseudo-new technologies that are just wrappers to OpenCL or they are really similar in the concept like the Intel quick sync.
About Intel i should say that at the first place they were supporting all the iCore generation and even some C2D, now the new SDK only support the 3rd iCore generation, i don't get their strategy honestly, probably Intel is the last option if you want to adopt OpenCL and targeting the biggest possible audience, also their SDK doesn't seems to be really good at all .
Stick with the standard and you will avoid both possible legal and performance issues and your code will also be more portable.
The bottom line is that the AMD SDK includes a compiler for targeting x86 CPUs for OpenCL. That means that even though you are running an Intel CPU the generated code will run on it. It's the same concept as compiling a C program to run on an x86 CPU: it works on Intel and AMD CPUs (or any that implement the x86 instruction set).
The vendor's compiler might have specific optimizations, like user827992 mentions, but in my experience the performance of AMD's CPU compiler isn't that bad when running on an Intel CPU. I haven't tried Intel's OpenCL implementation.
It is true that for some (maybe most in the future) hardware, only the vendor's compiler will support it. AMD's SDK won't build code that will run on an NVIDIA card, and vice-versa. CPUs happen to be a bit of a special case in that the basic instruction set is so widely deployed that the CPU compiler will work on most machines you're likely to come in contact with.

Resources