I want to use the raspery Pi for signal processing, so I asked myself how I get the FFT calculating very fast.
My Idea is to use the GPU for the FFT. Old Idea but a working idea as I understanded.
My question is,
Does anyone has a clue about whether the new Raspery Pi 3 supports OpenCL and or other libarys for using GPU?
Edit:
Maybe there's a "vendor" for ARM GPU's like the open-source project for Intel GPU support on linux (Beignet)..
I don't think that there is a way of running OpenCL on RPi. It is definitely possible to use GPU intrinsics for FFT.
Here is an example of that.
Related
Is it possible to enable OpenCL on an A10-7800 without using it for the X server? I have a Linux box that I use for GPGPU programming. A discrete GEForce 740 card is used for both the X server and running OpenCL & Cuda programs I develop. I would also like the option of running OpenCL code on the APU's integrated GPU cores.
Everything I've read so far implies that if I want to use the APU for OpenCL, I have to install Catalyst and, AFAIK, that means using it for the X server. Is this true? Would there be an advantage to using the APU for my X server and using the GEForce solely for GPGPU code?
I had a similar goal, so I've built a system with AMD APU (4 regular cores + 6 GPUs) and Nvidia discrete graphics board. Sorry to say it wasn't easy to make it work, so I asked a question on the Ask Ubuntu forum, didn't get any answers, experimented a lot with hardware and software setup, and finally have posted my own answer to my question.
I'll describe my setup again here - who knows, what might happen with my auto-answered question on the Ask Ubuntu?
At first, I had to enable the integrated graphics hardware via a BIOS flag. This flag is called IGFX Multi-Monitor on my motherboard (ASUS A88X-PRO).
The second step was to find a right mix of a low-level graphics driver and high-level OpenCL implementation. The low-level driver for AMD processors is called AMD Catalyst and has a file name fglrx. I didn't install this driver from the Ubuntu software center - instead I used a version 15.302, directly downloaded from the AMD site. I had to install a significant number of prerequisites for this driver. The most important finding was that I had to skip running the aticonfig command after the fglrx installation - this command actually configures the X server to use this driver for graphics output, and I didn't want that.
Then I've installed the AMD SDK Ver 3.0 (release 130.136, earlier releases didn't work with my fglrx) - it's the OpenCL implementation from AMD. The clinfo command reports both CPUs and GPUs with correct number of cores now.
So, I have a hybrid AMD processor, supported by the OpenCL, with all the graphics output, supported by a discrete graphics card with Nvidia processor.
Good luck!
I maintain a Linux server (OpenSUSE, but the distribution shouldn't matter) containing both NVIDIA and (a discrete) AMD GPU. It's headless, so technically I do not know whether the X server will create additional problems, but I don't think so. You can always configure xorg.conf to use exactly the driver you want. Or for that matter: install Catalyst, but delete the X server driver file itself, which is not the same thing that you need for OpenCL.
There is one problem with a mixed-vendor system that I noticed, however: AMDs OpenCL driver (ICD) will go spelunking for a libGL.so library, I guess in order to do OpenCL/OpenGL-interop. If it finds any of the NVIDIA-supplied libGL.so's, it will get confused and hang - at least on my machine. I "solved" this by deleting all libGL.so's (I do not need it on a headless compute server), but that might not be an acceptable solution for you. Maybe you can arrange things such that the AMD-supplied libGL.so's take precedence, possibly by installing the AMD driver last.
Is it possible to run OpenCL on a system designed by a user on a SoC prototyping board? To be more specific, I have a ZedBoard (Xilinx Zynq) that has Dual ARM cores and a Programmable Logic (PL) Area. If I design a simple system of my own that has a video processing accelerator implemented in the logic area, an ARM core and an AXI interconnect, what do I have to do to provide OpenCL support for this simple system? (In this simple system, the ARM core could be the "Host" and the video processing accelerator could be the "device").
I am a student and I have only some basic knowledge about OpenCL. I have researched about my question and have only ended up confusing myself. What are the things that have to be done to provide OpenCL support for a SoC? I understand that this may be a big project, but I need a guideline where to start and how to proceed.
what do I have to do to provide OpenCL support for this simple system?
Implement a OpenCL platform that makes either use of your ARM CPU or the FPGA (or both). I'd say that is pretty much impossible for you; ARM would surely offer one for the CPU if it was easy (and they definitely have the financial means to employ capable engineers/computer scientists), and implementing accelerators on an FPGA requires in-depth knowledge of FPGA development, as well as compiler theory and experience in systems design. I don't want to sound mean, but you seem to have none of these three.
You asked where to get started; I recommend just writing a first accelerator that e.g. adds up a vector of numbers; as soon as you have that, you will have a clearer idea of your task.
If you want to have a look at a reference: The Ettus USRP E310 is a zynq-based SDR device. Ettus has a technology called RFNoC, which allows users to write their own blocks to push data through. Notice that this took quite a few engineers and quite some time to get started. Notice further that it's much easier than implementing something that converts OpenCL to FPGA implementations.
If you have access to the Xilinx tools: Vivado HLS 15.1 System Edition should compile OpenCL kernels. This will also be included in the SDAccel tool suite.
Source: UG973: Vivado Design Suite User Guide Release Notes, Installation,and Licensing
An alternative might be switching to Altera. They provide some good examples for the Altera Cyclone V SoC which is comparable to Xilinx Zynq devices (also includes ARM Cortex-A9) :
AlteraSDK for OpenCL
I am also a student and my current project is also going on a similar direction, i have successfully installed a version of opencl called POCL on the zedboard, it successfully detects the arm cpu of the zedboard. To install pocl, you need llvm and a horde of other things as well. but basic steps to get pocl up on the zedboard are given below:-
Installing pocl:
http://www.hosseinabady.com/install-pocl-opencl
running example:
http://www.hosseinabady.com/embedded-system-by-examples/opencl_embedded_system/opencl-vector-addition
Lots of dependency: can resolved easily
but LLVM make sure you install 3.4 version for pocl 0.9
Steps to install llvm
https://github.com/pacs-course/pacs/wiki/Instructions-to-install-clang-3.1-on-ubuntu-12.04.1-and-12.10
POCL 0.9 is successfully working for me, as you do the installation you will face many other missing dependencies like hwloc, mesa libraries, open gl/cl headers icd loaders i hope you can resolve them as its a very big list to put up in stack overflow.
In order to detect your fpga as an open cl device, thats not going to be a trivial thing to do, you can refer to this link question i posted on github
https://github.com/pocl/pocl/issues/285
and also a research paper published by hosseinbady found on the publications link on the pocl website
http://pocl.sourceforge.net/publications.html
hope this helps you
Try the ARM OpenCL SDK. The Zedboard has an ARM A9 CPU, this should have a NEON SIMD vector unit http://www.arm.com/products/processors/technologies/neon.php which can run OpenCL. See http://www.arm.com/products/multimedia/mali-technologies/opencl-for-neon.php.
The Zedboard isn't listed as an OpenCL conformant platform https://www.khronos.org/conformance/adopters/conformant-products#opencl.
So there is a chance the ARM driver will not work.
Good luck!
If still relevant, try this paper OpenCL on ZYNQ [PDF]
Also note that Zynq-7000 is listed on https://www.khronos.org/conformance/adopters/conformant-products#opencl ( OpenCL_1_0 ), hence the compatibility.
The Peak GFLOPS of the the cores for the Desktop i7-4770k # 4GHz is 4GHz * 8 (AVX) * (4 FMA) * 4 cores = 512 GFLOPS. But the latest Intel IGP (Iris Pro 5100/5200) has a peak of over 800 GFLOPS. Some algorithms will therefore run even faster on the IGP. Combining the cores with the IGP together would even be better. Additionally, the IGP keeps eating up more silicon. The Iris Pro 5100 takes up over 30% of the silicon now. It seems clear which direction Intel desktop processors are headed.
As far as I have seen the Intel IGP, however, is mostly ignored by programmers with the exception of OpenCL/OpenGL. I'm curious to know how one can program the Intel HD Graphics hardware for compute (e.g. SGEMM) without OpenCL?
Added comment:
Their is no Intel support for HD graphics and OpenCL on Linux. I found beignet which is open source attempt to add support to Linux at least for Ivy Bridge HD graphics. I have not tried it. Probably the people developing Beignet know how to program the HD graphics hardware without OpenCL then.
Keep in mind that there is a performance hit to copy the data to the video card and back, so this must be taken into account. AMD is close to releasing APU chips that have unified memory for the CPU and GPU on the same die, which will go a long way towards alleviating this problem.
The way the GPU used to be utilized before CUDA and OpenCL were to represent the memory to be operated on as a texture utilizing DirectX or OpenGL. Thank goodness we don't have to do that anymore!
AMD is really pushing the APU / OpenCL model, so more programs should take advantage of the GPU via OpenCL - if the performance trade off is there. Currently, GPU computing is a bit of a niche market relegated to high performance computing or number crunching that just isn't needed for web browsing and word processing.
It doesn't make sense any more for vendors to let you program using low-level ISA.
It's very hard and most programmers won't use it.
It keeps them from adjusting the ISA in future revisions.
So programmers use a language (like C99 in OpenCL) and the runtime does ISA-specific optimizations right on the user's machine.
An example of what this enables: AMD switched from VLIW vector machines to scalar machines and existing kernels still ran (most ran faster). You couldn't do this if you wrote ISA directly.
Programming a coprocessor like iris without opencl is rather like driving a car without the steering wheel.
OpenCL is designed to expose the requisite parallelism that iris needs to achieve its theoretical performance. You cant just spawn 100s of threads or processes on it and expect performance. Having blocks of threads doing the same thing, at the same time, on similar memory addresses, is the whole crux of the matter.
Maybe you can think of a better paradigm than opencl for achieving that goal; but until you do, I suggest you try learning some opencl. If you are into python; pyopencl is a great place to start.
I'm learning OpenCL and I have a compatible x86 CPU, but my GPU doesn't support OpenCL at all.
So when I call the clGetDevices API, it returns nothing.
As I'm just learning this framework and I'm not looking for optimization or higher performance, is it necessary to get a new system ? (While OpenCL programs are running on my platform)
Thanks in advance :)
http://www.acooke.org/cute/Developing0.html describes how i worked with a cpu (only) a few years ago. basically, the AMD OpenCL driver worked with my Intel CPU.
I googled this topic and didn't find anything new. I am aware of Nvidia's FFT implementation which is great, but for CUDA only. AMD just released their implementation, but it doesn't work on Nvidia cards. Apple has an older and slower implementation. Are there any other good FFT libraries out there? It would be nice if there was an implementation that was meant to work on Nvidia and AMD cards and other possible platforms and is being actively maintained.
The AMD clAmdFft library should work on NVidia GPUs.
I was involved in the development and I know that was the intention. The code was written to the OpenCL standard and doesn't use any proprietary tricks. Of course, AMD didn't do QA testing on NVidia hardware. It could be that NVidia's OpenCL implementation isn't quite 100% compliant to the standard yet. Or it could be something about your particular video card.
Please post more details here as to exacly what is happening. You should also post that information in the AMD developer forums as a bug.
AMD recently released an OpenCL SDK for their CPUs as wel as GPUs. Included in it are FFT and BLAS libraries. You can go to the bottom of the page to find out about the supported devices.
But I am not really sure about the performance.
Not yet - but there is a project to port the GSL (Gnu Scientific Library) to opencl
http://gsl-cl.sourceforge.net/
I know Apple has released an OpenCL FFT package, but I don't know much about it. I've heard that they make the source available.