Does OpenMP 4 use OpenCL? - opencl

The new OpenMP 4 library now allows to use accelerators such as GPGPU.
Is OpenMP 4 implemented on top of OpenCL for these kind of tasks?

In some cases, yes. See TI's OpenMP Accelerator Model runtime implementation as an example. This is due to the fact that OpenCL is a mature standard with significant infrastructure. OpenMP accelerator support is new in version 4.0, so vendors are scrambling to add support.

Related

What hardware acceleration is supported by Google's ML Kit?

I would appreciate clarity around hardware acceleration supported for ML Kit. Some applications make an explicit mention that models are run on CPU, implying that there can be other modes of acceleration. Using GPU via something like OpenCL seems like a natural way of doing so.
I wish to know if Google is capable and willing of mastering OpenCL for Machine Learning applications.
Currently, ML Kit features are all running in CPU to be compatible with all devices. We are adding GPU / NNAPI support for features, and will update them in future releases.

How OpenMP differs from OpenCL when it comes to GPGPU?

When program is run on GPGPU, how would it's execution differ if implemented with OpenMP vs OpenCL?
Does OpenMP utilizes GPGPUs through OpenCL?
If not, what's the common GPGPU API for them I can use directly (without any OpenMP/OpenCL built on top of it)?
P.S. On Linux, OpenMP uses just pthread to manage threads. I couldn't find any other API to GPGPU besides OpenCL and CUDA, so it is obviously (but pretty painful) to admit that OpenMP, when it comes to GPGPU, utilizes OpenCL (or CUDA if GPGPU is by NVIDIA and OpenMP is that smart).
As far as I concern, OpenMP is a set of compilers directives to provide a parallelism on shared memory architectures and GPGPU is in generally NOT such one.
You can use them both together in order to archive better performance or you can use OpenACC, OpenHMPP or C++ AMP, which can quasi substitute them or you can use such libraries as AMD Bolt or ArrayFire - they can allow you to utilize GPGPU without lot of efforts.

Is There A Way To Upgrade to OpenCL 2.0?

There is a feature in OpenCL 2.0 that I would like to use.
I dual boot Ubuntu and Mac OS (Graphics: GTX 670 + HD Graphics 4600). Is it possible to install OpenCL 2.0?
This may be a dumb question - from what I have read, it seems like 2.0-compatible drivers may not be written yet? And also possibly my hardware will not support the new spec?
Basically, when will OpenCL 2.0 be easy/available?
You mentioned you have an NVidia GTX 670; you should note that NVidia's drivers only support OpenCL 1.1, not 1.2.
NVidia have(*) refrained over the past several years from updating their drivers to support the newer OpenCL standard, even though the hardware obviously supports it and CUDA has all the relevant functionality... so don't expect 2.0 to happen so soon on your hardware.
(*) - Due to being Evilâ„¢.
There are no OpenCL 2.0 drivers yet. The specification just became final yesterday. I don't believe any public statements have been made yet about when drivers will become available, and for what hardware. You'll have to wait for whatever fun new feature you wanted. Better yet, let your favorite vendors know that you'd like them to support OpenCL 2.0!
There is some support for OpenCL in Clang 3.0 and from the LLVM organisation.
See the CLang 3.0 release notes
http://llvm.org/releases/3.0/docs/ClangReleaseNotes.html
Here's an LLVM presentation on OpenCL
http://llvm.org/devmtg/2009-10/OpenCLWithLLVM.pdf
Here's another Stackoveflow answer on Clang 3.0 for OpenCL
How to use clang to compile OpenCL to ptx code?
So there are some good folks working on an open source version of OpenCL that compiles to PTX for NVida cards. Not having used it and not being familiar with these efforts, I can't say if there are plans or when when they can get to the OpenCL 2.0 spec.

AMD APP OpenCL SDK on Intel

I have seen that AMD APP SDK samples work on a machine having only Intel CPU.
How can this happen? How does the compiler target a different machine architecture?
Do I not need Intel's set of compilers for running the code on the intel CPU?
I think if we have to run an OpenCL application on a specific hardware, I have to (re)compile it using device's vendor specifics compiler.
Where is my understanding wrong?
Firstly, OpenCL is built to work on CPU's and GPU's. You can compile and run the same source code on either type of device. However, its very likely that CPU code will be sub-optimal for a GPU and vice-versa.
AMD H/W is 7% - 14% of total x86/x64 CPU's. So AMD must develop compilers for both AMD and Intel chips to be relevant. AMD have history developing compilers for both sets of chips. Conversely, Intel have developed compilers that either don't work on AMD chips or don't work that well. That's no surprise.
With OpenCL, the AMD APP SDK is the most flexible it will work well on AMD and Intel CPU's and AMD GPUs. Intel's OpenCL SDK doesn't even install on AMD x86 H/W.
If you compile an OpenCL program to binary, you can save and reuse it as long as it matches the OpenCL Platform and Device that created it. So, if you compile for one device and use on another you are very likely to get an error.
The power of OpenCL is abstracting the underlaying hardware and offer massive, parallel and heterogeneous computing power.
Some SDKs and platforms offers some specific features to "optimize" the code, i honestly think that such features are just marketing and they introduce boilerplate code making the application less portable.
There are also some pseudo-new technologies that are just wrappers to OpenCL or they are really similar in the concept like the Intel quick sync.
About Intel i should say that at the first place they were supporting all the iCore generation and even some C2D, now the new SDK only support the 3rd iCore generation, i don't get their strategy honestly, probably Intel is the last option if you want to adopt OpenCL and targeting the biggest possible audience, also their SDK doesn't seems to be really good at all .
Stick with the standard and you will avoid both possible legal and performance issues and your code will also be more portable.
The bottom line is that the AMD SDK includes a compiler for targeting x86 CPUs for OpenCL. That means that even though you are running an Intel CPU the generated code will run on it. It's the same concept as compiling a C program to run on an x86 CPU: it works on Intel and AMD CPUs (or any that implement the x86 instruction set).
The vendor's compiler might have specific optimizations, like user827992 mentions, but in my experience the performance of AMD's CPU compiler isn't that bad when running on an Intel CPU. I haven't tried Intel's OpenCL implementation.
It is true that for some (maybe most in the future) hardware, only the vendor's compiler will support it. AMD's SDK won't build code that will run on an NVIDIA card, and vice-versa. CPUs happen to be a bit of a special case in that the basic instruction set is so widely deployed that the CPU compiler will work on most machines you're likely to come in contact with.

Thread pool in Qt 4.3

Is there some way to use thread pool with Qt 4.3? I know it has now been implemented in Qt 4.5. But is it somehow available in Qt 4.3 also?
Get the first version of QtConcurrent from the Qt Labs project. This version of QtConcurrent is compatible with Qt4.2 but 4.3 is recommended .
From Qt Labs ...
Qt Concurrent
Platforms: Windows, Linux, Mac Qt
version: 4.2 required, 4.3
recommended.
Qt Concurrent is a C++
template library for writing
multi-threaded applications. Qt
Concurrent provides high-level APIs
that makes it possible to write
multi-threaded programs withouth using
low-level threading primitives such as
critcal sections, mutexes or wait
conditions. Programs written with Qt
Concurrent automaticallly adjust the
number of threads used according to
the number of processor cores
available. This means that
applications written today will
continue to scale when deployed on
multi-core systems in the future. The
library includes functional
programming style APIs for for
parallel list prosessing, a MapReduce
implementation for shared-memory
(non-distributed) systems, and classes
for managing asynchronous computations
in GUI applications. The code can be
checked out with subversion: svn
checkout
svn://labs.trolltech.com/svn/threads/qtconcurrent
qtconcurrent If you don't have svn,
you can download a package instead.
You could get the 4.5 source code and rip it out from there. If they use their own API, it should be easy.
You can always use straight pthreads API in C/C++ with QT and implement your own thread pool.
Although you are probably looking for a solution involving less amount of work.

Resources