Libm optimised for ARM? - math

Is there a libm (libmath) that is optimised for ARM(v6) processors?
I was looking at the GNU implementation and it doesn't seem to be optimised (but it does have x86 ones). It seems that most implementations of libm do not have ARM specific optimisation.

The closest option i found was using libm for ARM in the AOSP Android's repositories (or the manufacturers if they are providing optimized version of android for your cpu).

There are many math libraries available from the official ARM library. Depends on which application you have.
https://developer.arm.com/solutions/hpc/hpc-software/categories/math-libraries

Related

Can Julia run on SPARC Solaris?

I like Linux but I have spare capacity on an enterprise class SPARC Solaris platform. I'm just wondering if anyone has tried running Julia there before as it doesn't seem to be a supported OS.
No Julia does not run SPARC Solaris. Supported platforms are x86 (Linux+windows+mac+FreeBSD), ARM and Power8-LE. A SPARC port would not be too difficult, but would need to be done by someone who cares about that platform, and has access to relevant hardware. Unfortunately, that does not describe most of the current developers and contributors.
Not yet but future interest might also come from wanting to use Julia on FPGAs, in combination with open softcores such as Leon -- a SPARC architecture already supported by LLVM.

How OpenMP differs from OpenCL when it comes to GPGPU?

When program is run on GPGPU, how would it's execution differ if implemented with OpenMP vs OpenCL?
Does OpenMP utilizes GPGPUs through OpenCL?
If not, what's the common GPGPU API for them I can use directly (without any OpenMP/OpenCL built on top of it)?
P.S. On Linux, OpenMP uses just pthread to manage threads. I couldn't find any other API to GPGPU besides OpenCL and CUDA, so it is obviously (but pretty painful) to admit that OpenMP, when it comes to GPGPU, utilizes OpenCL (or CUDA if GPGPU is by NVIDIA and OpenMP is that smart).
As far as I concern, OpenMP is a set of compilers directives to provide a parallelism on shared memory architectures and GPGPU is in generally NOT such one.
You can use them both together in order to archive better performance or you can use OpenACC, OpenHMPP or C++ AMP, which can quasi substitute them or you can use such libraries as AMD Bolt or ArrayFire - they can allow you to utilize GPGPU without lot of efforts.

OpenCL for custom systems on SoC prototyping board

Is it possible to run OpenCL on a system designed by a user on a SoC prototyping board? To be more specific, I have a ZedBoard (Xilinx Zynq) that has Dual ARM cores and a Programmable Logic (PL) Area. If I design a simple system of my own that has a video processing accelerator implemented in the logic area, an ARM core and an AXI interconnect, what do I have to do to provide OpenCL support for this simple system? (In this simple system, the ARM core could be the "Host" and the video processing accelerator could be the "device").
I am a student and I have only some basic knowledge about OpenCL. I have researched about my question and have only ended up confusing myself. What are the things that have to be done to provide OpenCL support for a SoC? I understand that this may be a big project, but I need a guideline where to start and how to proceed.
what do I have to do to provide OpenCL support for this simple system?
Implement a OpenCL platform that makes either use of your ARM CPU or the FPGA (or both). I'd say that is pretty much impossible for you; ARM would surely offer one for the CPU if it was easy (and they definitely have the financial means to employ capable engineers/computer scientists), and implementing accelerators on an FPGA requires in-depth knowledge of FPGA development, as well as compiler theory and experience in systems design. I don't want to sound mean, but you seem to have none of these three.
You asked where to get started; I recommend just writing a first accelerator that e.g. adds up a vector of numbers; as soon as you have that, you will have a clearer idea of your task.
If you want to have a look at a reference: The Ettus USRP E310 is a zynq-based SDR device. Ettus has a technology called RFNoC, which allows users to write their own blocks to push data through. Notice that this took quite a few engineers and quite some time to get started. Notice further that it's much easier than implementing something that converts OpenCL to FPGA implementations.
If you have access to the Xilinx tools: Vivado HLS 15.1 System Edition should compile OpenCL kernels. This will also be included in the SDAccel tool suite.
Source: UG973: Vivado Design Suite User Guide Release Notes, Installation,and Licensing
An alternative might be switching to Altera. They provide some good examples for the Altera Cyclone V SoC which is comparable to Xilinx Zynq devices (also includes ARM Cortex-A9) :
AlteraSDK for OpenCL
I am also a student and my current project is also going on a similar direction, i have successfully installed a version of opencl called POCL on the zedboard, it successfully detects the arm cpu of the zedboard. To install pocl, you need llvm and a horde of other things as well. but basic steps to get pocl up on the zedboard are given below:-
Installing pocl:
http://www.hosseinabady.com/install-pocl-opencl
running example:
http://www.hosseinabady.com/embedded-system-by-examples/opencl_embedded_system/opencl-vector-addition
Lots of dependency: can resolved easily
but LLVM make sure you install 3.4 version for pocl 0.9
Steps to install llvm
https://github.com/pacs-course/pacs/wiki/Instructions-to-install-clang-3.1-on-ubuntu-12.04.1-and-12.10
POCL 0.9 is successfully working for me, as you do the installation you will face many other missing dependencies like hwloc, mesa libraries, open gl/cl headers icd loaders i hope you can resolve them as its a very big list to put up in stack overflow.
In order to detect your fpga as an open cl device, thats not going to be a trivial thing to do, you can refer to this link question i posted on github
https://github.com/pocl/pocl/issues/285
and also a research paper published by hosseinbady found on the publications link on the pocl website
http://pocl.sourceforge.net/publications.html
hope this helps you
Try the ARM OpenCL SDK. The Zedboard has an ARM A9 CPU, this should have a NEON SIMD vector unit http://www.arm.com/products/processors/technologies/neon.php which can run OpenCL. See http://www.arm.com/products/multimedia/mali-technologies/opencl-for-neon.php.
The Zedboard isn't listed as an OpenCL conformant platform https://www.khronos.org/conformance/adopters/conformant-products#opencl.
So there is a chance the ARM driver will not work.
Good luck!
If still relevant, try this paper OpenCL on ZYNQ [PDF]
Also note that Zynq-7000 is listed on https://www.khronos.org/conformance/adopters/conformant-products#opencl ( OpenCL_1_0 ), hence the compatibility.

How to let OpenCl see intel and nvidia devices?

I wonder how we can have OpenCl "seeing" my K20. Xeon, and Xeon Phi at the same time?
Especially I'm confused about the use of two libraries here (from NVidia and Intel).
How to do it, if possible at all?
The OpenCL Installable Client Driver (ICD) takes care of this for you. It is the same regardless of whose implementation you have installed, and exposes all implementations as separate OpenCL "Platforms".
When you call clGetPlatformIDs it will tell you how many platforms you have installed. There could be one for AMD, one for NVIDIA, and one for Intel, for example.
Then within each platform you call clGetDeviceIDs which will return the number of devices within that platform. On your NVIDIA platform you'll find your K20, and within your Intel platform you'll find your Xeon CPU and Xeon Phi co-processor.
If you build or download the clInfo utility you'll see a nice dump of all the installed platforms and devices and the capabilities of each.
The problem is solved.
Looking at the key directory:
/etc/OpenCL/vendors/*.icd
I noticed that for Nvidia the library in used was a link which was duplicated in difference places and pointing to two different releases.
I just replace the former one by the most recent one, the one I've installed recently, and here we go.
Opencl did not know which one to use I guess.
It's like the installation location has changed between the two nividia versions.
When I was supposed to have removed it before reinstalling that was actually not true.
Thank you all for your hell.

Opencl FFT libraries? Anything new or under the radar out there?

I googled this topic and didn't find anything new. I am aware of Nvidia's FFT implementation which is great, but for CUDA only. AMD just released their implementation, but it doesn't work on Nvidia cards. Apple has an older and slower implementation. Are there any other good FFT libraries out there? It would be nice if there was an implementation that was meant to work on Nvidia and AMD cards and other possible platforms and is being actively maintained.
The AMD clAmdFft library should work on NVidia GPUs.
I was involved in the development and I know that was the intention. The code was written to the OpenCL standard and doesn't use any proprietary tricks. Of course, AMD didn't do QA testing on NVidia hardware. It could be that NVidia's OpenCL implementation isn't quite 100% compliant to the standard yet. Or it could be something about your particular video card.
Please post more details here as to exacly what is happening. You should also post that information in the AMD developer forums as a bug.
AMD recently released an OpenCL SDK for their CPUs as wel as GPUs. Included in it are FFT and BLAS libraries. You can go to the bottom of the page to find out about the supported devices.
But I am not really sure about the performance.
Not yet - but there is a project to port the GSL (Gnu Scientific Library) to opencl
http://gsl-cl.sourceforge.net/
I know Apple has released an OpenCL FFT package, but I don't know much about it. I've heard that they make the source available.

Resources