How to get Intel integrated-graphic brand string/model num if clGetDeviceInfo() fail? - opencl

It is known that OpenCL API
clGetDeviceInfo(...,CL_DEVICE_NAME,...)
can return GPU brand string, for instance "Intel Iris Graphics 5100".
Mostly it works,but on Mac mini(later 2014,os=10.12.5/10.13),it return "Intel Iris" without model num in details. And I tried IOKit,
CFDictionaryGetValue(, #"model")
return the same, without model num.
Q: Is there any across-platform way to get Intel GPU brand string/model num? We known that CPUID instruction can return CPU brand string, is there similar way? Thanks!
The output of clinfo
Number of platforms 1
Platform Name Apple
Platform Vendor Apple
Platform Version OpenCL 1.2 (Apr 4 2017 19:07:42)
Platform Profile FULL_PROFILE
Platform Name Apple
Number of devices 2
Device Name Iris
Device Vendor Intel
Device Vendor ID 0x1024500
Device Version OpenCL 1.2
Driver Version 1.2(Apr 22 2017 16:00:44)
Device OpenCL C Version OpenCL C 1.2
Device Type GPU
Device Available Yes
Device Profile FULL_PROFILE
Max compute units 40
Max clock frequency 1200MHz

Related

Jupyter notebook crashes after rendering pybullet-gym environment

I am using pybullet-gym to evaluate my policy model and visualize its interactions; however, when it renders the environment using the following sample code (taken from its own repo) jupyter notebook crashes and restarts its kernel :
import gym
import pybulletgym
env = gym.make('HumanoidPyBulletEnv-v0')
env.render()
env.reset()
Here is the message in shell:
startThreads creating 1 threads.
starting thread 0
started thread 0
argc=2
argv[0] = --unused
argv[1] = --start_demo_name=Physics Server
ExampleBrowserThreadFunc started
X11 functions dynamically loaded using dlopen/dlsym OK!
X11 functions dynamically loaded using dlopen/dlsym OK!
Creating context
Created GL 3.3 context
Direct GLX rendering context obtained
Making context current
GL_VENDOR=NVIDIA Corporation
GL_RENDERER=GeForce MX150/PCIe/SSE2
GL_VERSION=3.3.0 NVIDIA 450.102.04
GL_SHADING_LANGUAGE_VERSION=3.30 NVIDIA via Cg compiler
pthread_getconcurrency()=0
Version = 3.3.0 NVIDIA 450.102.04
Vendor = NVIDIA Corporation
Renderer = GeForce MX150/PCIe/SSE2
b3Printf: Selected demo: Physics Server
startThreads creating 1 threads.
starting thread 0
started thread 0
MotionThreadFunc thread started
ven = NVIDIA Corporation
ven = NVIDIA Corporation
XIO: fatal IO error 11 (Resource temporarily unavailable) on X server ":1"
after 21540 requests (21540 known processed) with 0 events remaining.

Why is MKL in parallel not faster than serial in R 3.6?

I am trying to use Intel's MKL with R and adjust the number of threads using the MKL_NUM_THREADS variable.
It loads correctly, and I can see it using 3200% CPU in htop. However, it isn't actually faster than using only one thread.
I've been adapting Dirk Eddelbuettel's guide for centos, but I may have missed some flag or config somewhere.
Here is a simplified version of how I am testing how number of threads relates to job time. I do get expected results when using OpenBlas.
require(callr)
#> Loading required package: callr
f <- function(i) r(function() crossprod(matrix(1:1e9, ncol=1000))[1],
env=c(rcmd_safe_env(),
R_LD_LIBRARY_PATH=MKL_R_LD_LIBRARY_PATH,
MKL_NUM_THREADS=as.character(i),
OMP_NUM_THREADS="1")
)
system.time(f(1))
#> user system elapsed
#> 14.675 2.945 17.789
system.time(f(4))
#> user system elapsed
#> 54.528 2.920 19.598
system.time(f(8))
#> user system elapsed
#> 115.628 3.181 20.364
system.time(f(32))
#> user system elapsed
#> 787.188 7.249 36.388
Created on 2020-05-13 by the reprex package (v0.3.0)
EDIT 5/18
Per the suggestion to try MKL_VERBOSE=1, I now see the following on stdout which shows it properly calling lapack:
MKL_VERBOSE Intel(R) MKL 2020.0 Product build 20191122 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512) with support of Vector Neural Network Instructions enabled processors, Lnx 2.50GHz lp64 intel_thread
MKL_VERBOSE DSYRK(U,T,1000,1000000,0x7fff436222c0,0x7f71024ef040,1000000,0x7fff436222d0,0x7f7101d4d040,1000) 10.64s CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:1
for f(8), it shows NThr:8
MKL_VERBOSE Intel(R) MKL 2020.0 Product build 20191122 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512) with support of Vector Neural Network Instructions enabled processors, Lnx 2.50GHz lp64 intel_thread
MKL_VERBOSE DSYRK(U,T,1000,1000000,0x7ffe6b39ab40,0x7f4bb52eb040,1000000,0x7ffe6b39ab50,0x7f4bb4b49040,1000) 11.98s CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:8
I still am not getting any expected performance increase from extra cores.
EDIT 2
I am able to get the expected results using Microsoft's distribution of MKL, but not with Intel's official distribution as in the walkthrough. It appears that MS is using a GNU threading library; could the problem be in the threading library and not in blas/lapack itself?
Only seeing this now: Did you check the obvious one, ie whether R on CentOS actually picks up the MKL?
As I recall, R on CentOS it is built in a more, ahem, "restricted" mode with the shipped-with-R reference BLAS. And if and when that is the case you simply cannot switch and choose another one as we have done within Debian and Ubuntu for 20+ years as that requires a different initial choice when R is compiled.
Edit: Per subsequent discussions (see comments below) we all re-realized that it is important to have the threading libraries / models aligned. The MKL is an Intel product and defaults to using their threading library, on Linux the GNU compiler is closer to the system and has its own. That latter one needs to be selected. In my writeup / script for the MKL on .deb systems I use
echo "MKL_THREADING_LAYER=GNU" >> /etc/environment
so set this "system-wide" on the machine, one can also add it just to the R environment files.
I am not sure exactly how R call MKL but if the crossprod function calls mkl's gemm underneath then we have to see very good scalability results with such inputs. What is the input problem sizes?
MKL supports the verbose mode. This option could help to see the many useful runtime info when dgemm will be running. Could you try to export the MKL_VERBOSE=1 environment and see the log file?
Though, I am not pretty sure if R will not suppress the output.

OpenACC on Intel built-in graphics cards (Intel Iris Plus Graphics 655)

I would like to find out, whether built-in Intel graphics cards (e.g. Intel Iris Plus Graphics 655) support OpenACC directives? Would anyone be able to direct me to any relevant information?
The PGI C compiler does not support Intel as a target architecture, where architecture can be specified with the -ta option:
pgcc -I../common -acc -ta=nvidia,time -Minfo=accel -o laplace2d_acc laplace2d.c
Compiler issues the following warning:
pgcc-Warning-OpenACC for GPUs no longer supported on macOS, enabling multicore CPU code generation. Use -ta=multicore to avoid this warning
That means that no GPUs are supported on macOS, but it is still possible to compile the code with OpenACC directives aiming for execution on multiple cores of the CPU with -ta=multicore:
pgcc -I../common -acc -ta=multicore,time -Minfo=accel -o laplace2d_acc laplace2d.c
GNU C compiler (starting from version 7) supports OpenACC (ver. 7 and 8 support OpenACC 2.0a, ver. 9--OpenACC 2.5), where the acc directives are enabled with the -fopenacc option:
gcc -I../common -fopenacc -o laplace2d_acc laplace2d.c
However, I wasn't able to find compiler flags to target the Intel Iris card specifically.

clCreateFromGLTexture() returns CL_INVALID_CONTEXT on certain platforms only

After positive creation of the shared context between OpenGL and OpenCL using following:
cl_context_properties cps[] = {
CL_GL_CONTEXT_KHR,
(cl_context_properties)glXGetCurrentContext(),
CL_GLX_DISPLAY_KHR,
(cl_context_properties)glXGetCurrentDisplay(),
CL_CONTEXT_PLATFORM,
(cl_context_properties)platform_id,
0
};
// Create an OpenCL context
m_contextCL = clCreateContext( cps, 1, &device_id, NULL, NULL, &err);
I try to create a shared texture:
cl_mem mem = clCreateFromGLTexture(
m_contextCL ,
CL_MEM_READ_ONLY ,
GL_TEXTURE_2D ,
0 ,
qt_fbo->texture() ,
&err
);
Now the call is successful only on xubuntu 16.04 with NVIDIA Quadro K620 using proprietary driver version 387.26 and OpenCL delivered with CUDA implementation package.
However when trying it on Toshiba laptop with Intel HD Graphics 520 on Manjaro OS and Beignet OpenCL implementation. The clCreateFromGLTexture(...) is failing by returning CL_INVALID_CONTEXT,
Additionally I tried another platform with Ubuntu 16.04 and Intel Iris IGP (Integrated Graphics Processor) using both Intel SDK and Beignet OpenCL. It fails at the same point of shared texture creation.
I created minimum working example for comparison two GPU techniques (OpenGL and OpenCL) and its interoperability with Qt:
https://github.com/pietrzakmat/opengl-opencl-qt-interop.
All the steps are derived from two tutorials:
1. https://www.codeproject.com/Articles/685281/OpenGL-OpenCL-Interoperability-A-Case-Study-Using
2. https://software.intel.com/en-us/articles/opencl-and-opengl-interoperability-tutorial
Anyone could point out what am I doing wrong and why the creation of shared texture fails on the platforms with integrated graphics or IGP Intel cpu? Is this some problem with drivers or OpenCL implementations? I managed to build and run the samples included in Beignet or intel_ocl_examples so I think the installation is correct.
1) is supported cl_khr_gl_sharing extension? Did you try to use this code on Windows Platform/macOS for Intel GPU ?
2) Did you try to use texture without attached to FBO ?
any way, i think this is problem in OpenCL implementation on Linux platform.

Difference between the terms OpenMPI and MPI API from output of mpi_info

when I type ompi_info on my terminal, I get a huge output on my terminal buffer, a part of which looks like :
Package: Open MPI buildd#lgw01-57 Distribution
Open MPI: 1.10.2
Open MPI repo revision: v1.10.1-145-g799148f
Open MPI release date: Jan 21, 2016
Open RTE: 1.10.2
Open RTE repo revision: v1.10.1-145-g799148f
Open RTE release date: Jan 21, 2016
OPAL: 1.10.2
OPAL repo revision: v1.10.1-145-g799148f
OPAL release date: Jan 21, 2016
MPI API: 3.0.0
Ident string: 1.10.2
Prefix: /usr
Configured architecture: x86_64-pc-linux-gnu
Configure host: lgw01-57
Configured by: buildd
Ignoring the info on release dates, I am curious specifically about the meaning of second line: Open MPI : 1.10.2 and line number twelve: MPI API : 3.0.0 . Does it mean the new functions from Open MPI version 3.0.0 available on the MPI version 1.10.2 ?
Open MPI is an implementation (e.g. code) of the MPI Standard (e.g. pdf document).
These are two distinct things that have their own and independent versions.
Answering my own question, it seems yes, the stable version of OpenMPI 1.10 supports most of the new features introduced in MPI 3 . This page of OpenMPI-1.10.1 shows the list of all MPI API's available which includes the API for one sided communication, which was introduced in MPI version 2.0, and features of MPI 3.0 like non-blocking collective operations like MPI_Ibcast and matching probes like MPI_Mprobe and MPI_Mrecv.
Although this list also doesn't contain the MPI_T tool interface and many other features which is available in the current stable release of openMPI-3.0 .

Resources