Big difference in performance using OpenVINO model on CPU vs Intel Compute Stick 2 - intel

I own an Intel Compute Stick 2 that I intend to use to process object detection networks.
After installing OpenVINO on my machine (Ubuntu 18.04), I tried running the object detection python demo on a video. When running it on the Intel stick, I would get a speed of around 7.5 frames per second, while running it on my laptop Intel CPU is a lot faster at 44 frames per second.
Even if my laptop is a decent gaming laptop, I was surprised by the fact that processing on the Intel stick is so much slower. I plan to use the Intel stick on another device, not my laptop, but I would like to understand why there is this big difference in performance. Anyone had a similar experience?

You're getting an expected performance of Intel® Neural Compute Stick 2.
Check out the following discussions regarding the performance of Intel® Neural Compute Stick 2.
Raspberry Pi and Movidius NCS Face Recognition
Share | Intel Neural Compute Stick 2 (Intel Neural Compute Stick 2) related tests
Battle of Edge AI — Nvidia vs Google vs Intel

Related

GPU mxnetR windows10

I started GPU computing by mxnetR in windows 10.
Simple question is if mx.mlp with mx.gpu use multiple cores in GPU. I seems not...
Also as a test, I wrote a simple program of mx.mlp, with doParallel. But it seems not to run the program in multiple cores. only 1 core of GPU usage was increased.
Please give me your ideas on how to ue multiple cores in GPU to maximize a value of GPU computing by mx.mlp with mx.gpu.
When running mxnet with GPU, mxnet will use many cores simultaneously by determining which math operations can be run in parallel.
A simple metric to reassure yourself that you're getting value-for-money from the GPU is to use the nvidia-smi command to watch GPU utilization.

Can't use more than 2 cores with microsoft R Open

I recently installed Microsoft R Open but this message appears at startup of R:
"Multithreaded BLAS/LAPACK libraries detected. Using 2 cores for math algorithms."
on a MAC it's supposed to start using the 4 cores without any additional set up.
How can i change this to 3 or 4 cores?
Thank you
A very common way to setup multicore processing in RRO is to use the method setMKLthreads() out of the Intel Math Kernel Library (MKL). However, to the best of my knowledge, there is no OSX-compatible MKL version, yet (see here for more information).
Another way to achieve multicore processing on OSX would be to use mcapply() out of the parallel, which works similar to the base-R lapply() (see the package's documentation here).
However, before you dig into this matter, I suggest to check if you really have a CPU with more than 2 physical cores. For instance, there are Intel i5 processors with both 2 and 4 physical cores dependent of the model. CPUs with only 2 physical cores, can then simulate a higher number of virtual cores. Since such i5 CPUs are frequently built into laptops, I think that this could be case if you are using a MacBook.
See also this SO question for further information: Virtual core vs Physical core

I need an MPI Simulator

I want to know the performance of an C application using MPI on a computing cluster, but I only have one server, whose CPU are Intel Xeon E5-2692v2 with Accelerator card(XEON PHI). Is there any tools can simulate it? I know one called MPI-SIM, but, unfortunately, it's written for IBM SP2 and a little old.

Programming Intel IGP (e.g. Iris Pro 5200) hardware without OpenCL

The Peak GFLOPS of the the cores for the Desktop i7-4770k # 4GHz is 4GHz * 8 (AVX) * (4 FMA) * 4 cores = 512 GFLOPS. But the latest Intel IGP (Iris Pro 5100/5200) has a peak of over 800 GFLOPS. Some algorithms will therefore run even faster on the IGP. Combining the cores with the IGP together would even be better. Additionally, the IGP keeps eating up more silicon. The Iris Pro 5100 takes up over 30% of the silicon now. It seems clear which direction Intel desktop processors are headed.
As far as I have seen the Intel IGP, however, is mostly ignored by programmers with the exception of OpenCL/OpenGL. I'm curious to know how one can program the Intel HD Graphics hardware for compute (e.g. SGEMM) without OpenCL?
Added comment:
Their is no Intel support for HD graphics and OpenCL on Linux. I found beignet which is open source attempt to add support to Linux at least for Ivy Bridge HD graphics. I have not tried it. Probably the people developing Beignet know how to program the HD graphics hardware without OpenCL then.
Keep in mind that there is a performance hit to copy the data to the video card and back, so this must be taken into account. AMD is close to releasing APU chips that have unified memory for the CPU and GPU on the same die, which will go a long way towards alleviating this problem.
The way the GPU used to be utilized before CUDA and OpenCL were to represent the memory to be operated on as a texture utilizing DirectX or OpenGL. Thank goodness we don't have to do that anymore!
AMD is really pushing the APU / OpenCL model, so more programs should take advantage of the GPU via OpenCL - if the performance trade off is there. Currently, GPU computing is a bit of a niche market relegated to high performance computing or number crunching that just isn't needed for web browsing and word processing.
It doesn't make sense any more for vendors to let you program using low-level ISA.
It's very hard and most programmers won't use it.
It keeps them from adjusting the ISA in future revisions.
So programmers use a language (like C99 in OpenCL) and the runtime does ISA-specific optimizations right on the user's machine.
An example of what this enables: AMD switched from VLIW vector machines to scalar machines and existing kernels still ran (most ran faster). You couldn't do this if you wrote ISA directly.
Programming a coprocessor like iris without opencl is rather like driving a car without the steering wheel.
OpenCL is designed to expose the requisite parallelism that iris needs to achieve its theoretical performance. You cant just spawn 100s of threads or processes on it and expect performance. Having blocks of threads doing the same thing, at the same time, on similar memory addresses, is the whole crux of the matter.
Maybe you can think of a better paradigm than opencl for achieving that goal; but until you do, I suggest you try learning some opencl. If you are into python; pyopencl is a great place to start.

AMD CPU versus Intel CPU openCL

With some friends we want to use openCL. For this we look to buy a new computer, but we asked us the best between AMD and Intel for use of openCL. The graphics card will be a Nvidia and we don't have choice on the graphic card, so we start to want buy an intel cpu, but after some research we figure out that may be AMD cpu are better with openCL. We didn't find benchmarks which compare the both.
So here is our questions:
Is AMD better than Intel with openCL?
Is it a matter to have a Nvidia card with an AMD cpu for the performance of openCL?
Thank you,
GrWEn
You shouldn't care as much about what CPU you use as much as what GPU you use. You would need to choose between an AMD/ATI GPU or nVidia GPU.
I would personally recommend an nVidia GPU as, in addition to OpenCL support, you can experiment with their more proprietary CUDA technology which offers a far richer development experience than OpenCL does today. While you're at it take a look at the new AMP technology that was just announced by Microsoft for C++ which aims to bring language extensions akin to nVidia's CUDA. nVidia also has offerings for the enterprise with their Tesla GPUs with several vendors offering GPU clusters and you can even get a GPU compute cluster on Amazon EC2 now which is all based on nVidia hardware.
You want to buy a new computer with your friends? What kind of project do you plan to do? The question about the hardware is answered with the needs you have. If you give some more information, we can provide better suggestions.
As written before, the CPU is not the important point as long as you do not want to buy a multiprocessor multicore system like 4 Quadprocessors. The difference in performance is mostly the differences of the GPUs used and there you can find different cards for all needs. From a cheap GPU to the nVidia Tesla cards.
It is definitely not a problem to run a nVidia board on a AMD system. I do it here. You also can use the OpenCL devices from the AMD Multicore CPU and the nVidia GPU in parallel.
You should pay attention: If you plan to buy a potent system to run your software (like a webserver), every developer of OpenCL software needs a system for testing. So every developer needs at least a modern multi-core CPU with an OpenCL SDK. Where the OpenCL kernels are developed does not matter. OpenCL is platform independed.
Both Intel and AMD have good OpenCL-support for their CPUs, so currently it does not really matter which you cooose. If you want to use the embedded GPU on AMD Fusion or Intel SandyBridge, then I suggest you go for Fusion since Intel does not have a driver for their GPUs (yet). Depending on what you are going to use OpenCL for, I could suggest a GPU - sometimes NVidia is faster, sometimes AMD.
AMP, CUDA, RenderScript and the many, many others all work nice but they don't work on all hardware as OpenCL does. CUDA certainly has advantages, but in the time you have learnt openCL I can assure you the tools around OpenCL have catched up.
The CPU has no influence on GPU OpenCL performance.
You might also want to try running the OpenCL kernels on CPU. Checkout the Intel OpenCL compiler beta. You can even run kernels on both CPU and GPU.

Resources