Intel MPSS - clGetProgramBuildInfo returns CL_BUILD_NONE - opencl

We have an OpenCL program that works fine on my OS X machine. We just set up a machine with a Xeon Phi and Intel MPSS. However, even when not using the Phi but the Xeon CPU, the CL_PROGRAM_BUILD_STATUS we get is CL_BUILD_NONE.
Unfortunately, we cannot find any documentation on what might cause CL_BUILD_NONE. Do you have any suggestion on how to debug this?
Thank you in advance!
Versions:
[#memphis:~] $ cat /etc/SuSE-release
SUSE Linux Enterprise Server 11 (x86_64)
VERSION = 11
PATCHLEVEL = 2
[#memphis:~] $ uname -a
Linux memphis 3.0.13-0.27-default #1 SMP Wed Feb 15 13:33:49 UTC 2012 (d73692b) x86_64 x86_64 x86_64 GNU/Linux
[#memphis:~] 1 $ rpm -qa |grep intel
intel-mic-2.1.6720-15.suse
intel-mic-mpm-2.1.6720-15.suse
opencl-1.2-intel-mic-3.0.67279-1
intel-mic-sysmgmt-2.1.6720-15.suse
intel-mic-kmod-2.1.6720-15.3.0.13.0.suse
intel-mic-gdb-2.1.6720-15.suse
intel-mic-flash-2.1.386-3.suse
intel-mic-cdt-2.1.6720-15.suse
opencl-1.2-intel-devel-3.0.67279-1
intel-mic-micmgmt-2.1.6720-15.3.0.13.0.suse
opencl-1.2-intel-cpu-3.0.67279-1
intel-mic-gpl-2.1.6720-15.suse
intel-mic-crashmgr-2.1.6720-15.suse

The documentation for clGetProgramBuildInfo seems pretty straightforward:
CL_BUILD_NONE. The build status returned if no clBuildProgram, clCompileProgram or clLinkProgram has been performed on the specified program object for device.
You mention that your program worked on other platforms, but maybe you ended up with a slightly different flow between platforms which led to those methods not being properly invoked in the new flow? I'd suggest carefully verifying the return value from the earlier invoked functions to see you get what you expect to get.

Found it. I am not sure why I had &ret (cl_int return value) as the last parameter instead of having it as the return value of clBuildProgram. Moving it and setting the last parameter to NULL fixes the problem:
wrong:
clBuildProgram(*program, 1, &device_id, opts.str().c_str(), NULL, &ret);
correct:
ret = clBuildProgram(*program, 1, &device_id, opts.str().c_str(), NULL, NULL);
I do understand why this problem occured - apparently the compiler / the OpenCL libraries understood that I wanted to use pfn_notify and asynchronously build my kernel. I am, however, not sure if this behavior is fully conformant to the OpenCL documentation:
If pfn_notify is NULL, clBuildProgram does not return until the build has completed.
In my code, the pfn_notify argument was actually NULL, however user_data was (erroneously) not. While my code didn't make any sense, I believe that user_data should be ignored when pfn_notify is NULL.
I posted this on the Intel forums to see if they agree with my interpretation of the documentation.

Related

Load SPIR binary with clBuildProgram on Windows

I am trying to load a SPIR binary i created with clang+llvm 6.0.1.
Created a few different files with :
clang -target spir-unknown-unknown -cl-std=CL1.2 -c -emit-llvm -Xclang -finclude-default-header OCLkernel.cl
clang -target amdgcn-amd-amdhsa -cl-std=CL1.2 -c -emit-llvm -Xclang -finclude-default-header OCLkernel.cl
clang -cc1 -emit-llvm-bc -triple spir-unknown-unknown -cl-std=CL1.2 -include "include\opencl-c.h" OCLkernel.cl
This is all happening on windows, installed AMD APP SDK 3 and Adrenalin 18.6.1 drivers.
After this i try to create a program from the binary :
clCreateProgramWithBinary(context, 1, &device, &programSrcSize, (const unsigned char**)&programSrc, 0 , &status)
This all goes OK, i don't get any errors here, but i do when trying to build it afterwards :
clBuildProgram(program, 1, &device, " –x spir -spir-std=1.2", NULL, NULL);
The error i get is :
Error CL_INVALID_COMPILER_OPTIONS when calling clBuildProgram
I tried without the "-x spir..." stuff too, but then i just get a :
error: Invalid value (Producer: 'LLVM6.0.1' Reader: 'LLVM 3.9.0svn')
EDIT:
CL_DEVICE_NAME: gfx900
CL_DEVICE_VERSION: OpenCL 2.0 AMD-APP (2580.6)
CL_DEVICE_OPENCL_C_VERSION: OpenCL C 2.0
CL_DRIVER_VERSION: 2580.6 (PAL,HSAIL)
CL_DEVICE_SPIR_VERSIONS: 1.2
After running clCreateProgramWithBinary i query the device with clGetProgramBuildInfo and get :
CL_PROGRAM_BINARY_TYPE = [CL_PROGRAM_BINARY_TYPE_INTERMEDIATE]
So that should mean the binary is being recognised, else i guess it would return CL_PROGRAM_BINARY_TYPE_NONE
EDIT2:
I think clang isn't creating a 'good' binary, but how to create it then?
Appreciate your help!
Unfortunately the support for SPIR was silently removed from AMD drivers, see dipak answers in this thread of AMD community forum:
https://community.amd.com/thread/232093
Regarding your second question: general clang+LLVM (not the secret version tuned by AMD and included in their proprietary drivers) still cannot produce binaries compatible with general-purpose Windows AMD drivers, however it is possible for Linux: all new AMD’s ROCm, AMD PAL and Mesa 3D runtime are covered.
It is a mystery for me why LLVM AMDGPU backend developers do not prioritize the task to produce binaries for Windows drivers, as there is a couple of GCN assembler projects that provide such a functionality through Windows OpenCL interface, to name a few: CLRadeonExtender, ASM4GCN, HepPas, etc. Moreover I know an undocumented fork of clang+LLVM that (as its author states) produce such OpenCL binaries! "There are more things in heaven and earth, Horatio, Than are dreamt of in your philosophy."

"A request was made to bind to that would result in binding more processes than cpus on a resource" mpirun command (for mpi4py)

I am running OpenAI baselines, specifically the Hindsight Experience Replay code. (However, I think this question is independent of the code and is an MPI-related one, hence why I'm posting on StackOverflow.)
You can see the README there but the point is, the command to run is:
python -m baselines.her.experiment.train --num_cpu 20
where the number of CPUs can vary and is for MPI.
I am successfully running the HER training script with 1-4 CPUs (i.e., --num_cpu x for x=1,2,3,4) on a single machine with:
Ubuntu 16.04
Python 3.5.2
TensorFlow 1.5.0
One TitanX GPU
The number of CPUs seems to be 8 as I have a quad-core i7 Intel processor with hyperthreading, and Python confirms that it sees 8 CPUs.
(py3-tensorflow) daniel#titan:~/baselines$ ipython
Python 3.5.2 (default, Nov 23 2017, 16:37:01)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.2.1 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import os, multiprocessing
In [2]: os.cpu_count()
Out[2]: 8
In [3]: multiprocessing.cpu_count()
Out[3]: 8
Unfortunately, when I run with 5 or more CPUs, I get this message blocking the code from running:
(py3-tensorflow) daniel#titan:~/baselines$ python -m baselines.her.experiment.train --num_cpu 5
--------------------------------------------------------------------------
A request was made to bind to that would result in binding more
processes than cpus on a resource:
Bind to: CORE
Node: titan
#processes: 2
#cpus: 1
You can override this protection by adding the "overload-allowed"
option to your binding directive.
--------------------------------------------------------------------------
And here's where I got lost. There's no error message or line of code that I need to fix. I am therefore unsure about where I even add overload-allowed in the code?
The way this code works at a high level is that it takes in this argument and uses the python subprocess module to run an mpirun command. However, checking mpirun --help on the command line doesn't reveal overload-allowed as a valid argument.
Googling this error message leads to questions in the openmpi repository, for instance:
https://github.com/open-mpi/ompi/issues/626 (seems to have died out without resolving issue)
https://github.com/open-mpi/ompi/issues/2158 (not sure how this relates to my issue, didn't get clear resolution)
But I'm not sure if it's an OpenMPI thing or an mpi4py thing?
Here's pip list in my virtual environment if it helps:
(py3.5-mpi-practice) daniel#titan:~$ pip list
DEPRECATION: The default format will switch to columns in the future. You can use --format=(legacy|columns) (or define a format=(legacy|columns) in your pip.conf under the [list] section) to disable this warning.
decorator (4.2.1)
ipython (6.2.1)
ipython-genutils (0.2.0)
jedi (0.11.1)
line-profiler (2.1.2)
mpi4py (3.0.0)
numpy (1.14.1)
parso (0.1.1)
pexpect (4.4.0)
pickleshare (0.7.4)
pip (9.0.1)
pkg-resources (0.0.0)
pprintpp (0.3.0)
prompt-toolkit (1.0.15)
ptyprocess (0.5.2)
Pygments (2.2.0)
setuptools (20.7.0)
simplegeneric (0.8.1)
six (1.11.0)
traitlets (4.3.2)
wcwidth (0.1.7)
So, TL;DR:
How do I fix this error in my code?
If I add the "overload-allowed" thing, what happens? Is it safe?
Thanks!
overload-allowed is a qualifier that is passed to --bind-to parameter of mpirun (source).
mpirun ... --bind-to core:overload-allowed
Beware that hyperthreading thing is more about marketing than about performance bonuses.
Your i7 can actually have four silicon cores and four "logical" ones. The logical ones basically try to use resources of the silicon cores that are currently unused. The problem is that a good HPC program will use 100% of the CPU hardware, and hyperthreading won't have resources to successfully operate.
So, it is safe to "overload" "cores", but it's not a performance boost candidate #1.
Regarding the advice that the paper authors give about reproducing the results. In the best case less cpus just means slow learning. However, if learning doesn't converge to an expected value no matter how hyperparams are tweaked, then it is a reason to look closer at the proposed algorithm.
While IEEE754 computations do differ if done in different order, this difference should not play the crucial role.
The error message suggests that mpi4py is built on top of Open MPI.
By default, a slot is a core, but if you want a slot to be an hyperthread, then you should
mpirun --use-hwthread-cpus ...

Linking issue with (Intel SDK) OpenCL and Code::Blocks

I was trying to get my problem solved for hours, but I did not find any usefull hints. Hopefully you guys can help me out:
Some usefull data:
OS: Windows 8 Basic 64bit
Library: Intel OpenCL SDK
Compiler: MinGW(-gcc) (latest version)
IDE: Code::Blocks (latest version)
Minimal not working Code:
#include <stdlib.h>
#include <CL/cl.h>
int main(void)
{
cl_uint available;
cl_platform_id* platforms = (cl_platform_id*)malloc(sizeof(cl_platform_id));
cl_int result = clGetPlatformIDs(1, platforms, &available);
free(platforms);
if(result == CL_SUCCESS)
return 0;
return -1;
}
Code::Blocks Global Compiler Settings:
Linker Settings: Added path to Intel's OpenCL.lib ([...]\Intel\OpenCL SDK\3.0\lib\x64\OpenCL.lib) (tried -lOpenCL as Other Options as well)
Search-Directories for Compiler: Path to Intels OpenCL-SDK include directory ([...]\Intel\OpenCL SDK\3.0\include)
Search-Directories for Linker: Path to Intels OpenCL-Lib directory ([...]\Intel\OpenCL SDK\3.0\lib\x64)
Build-Log:
mingw32-g++.exe -L"[...]\Intel\OpenCL SDK\3.0\lib\x64" -o bin\Release\openCLTest.exe obj\Release\main.o -s "[...]\Intel\OpenCL SDK\3.0\lib\x64\OpenCL.lib"
obj\Release\main.o:main.c:(.text.startup+0x39): undefined reference to `clGetPlatformIDs#12'
collect2.exe: error: ld returned 1 exit status
Process terminated with status 1 (0 minutes, 0 seconds)
1 errors, 0 warnings (0 minutes, 0 seconds)
I do not know why he does not link properly.
The [...] in the text is modified by me to shorten the path, normally it would be "C:\Program Files (x86)...".
Hopefully you guys can help me! It is really frustrating! :(
Do you need more information?
EDIT:
Okay... one additional hour and I solved my own problem.
Hope this hint can help some other ppl:
I had to link additionally against the x86-library (seems that some functions are not implemented in X64).
Good to know -.-'''
I got the same problem and I tried hard to figure out the solution and finally I did :)
First of all my hardware are Intel Processor Intel(R) Core(TM) i5-2500 CPU # 3.30GHz and Intel(R) HD Graphics then I installed Intel OpenCL SDK 1.2 after updating the drivers. After that I configure the code::blocks to the new paths for include folder and lib folder as mentioned on the following link: http://www.obellianne.fr/alexandre/tutorials/OpenCL/tuto_opencl_codeblocks.php
Then I tried to compile the examples and I got linking problem as follows:
opencl.o(.text+0x6f):opencl.c: undefined reference to `clGetPlatformIDs#12'
opencl.o(.text+0xa7):opencl.c: undefined reference to `clGetDeviceIDs#24'
opencl.o(.text+0x142):opencl.c: undefined reference to `clGetDeviceInfo#20'
opencl.o(.text+0x263):opencl.c: undefined reference to `clGetDeviceInfo#20'
collect2: ld returned 1 exit status
Process terminated with status 1 (0 minutes, 0 seconds)
4 errors, 0 warnings (0 minutes, 0 seconds)
I tried to use command line and I got the same error then I tried to uninstall Intel sdk and replace it with AMD sdk 2.8 which is support X86 CPU with SSE (Streaming SIMD Extension which is designed by Intel ) 2.x orlater
http://developer.amd.com/tools-and-sdks/heterogeneous-computing/amd-accelerated-parallel-processing-app-sdk/system-requirements-driver-compatibility/
Finally it works :)
I hope you find this comment useful.
According to an external source i stumbled upon along my own path of enlightenment to this problem i found out something is actually wrong with the mingw-w64 linker. mingw-w64's ld.exe does not want to link with the standard libopencl.a.. whether this is intel SDK specific or not im not sure but here is the link to the solution.
http://sourceforge.net/p/mingw-w64/support-requests/46/
you just have to link to the supplied libopencl.a instead of the default one.
still dont know exactly why the linker gives a problem but i have verified that the solution does (some how) solve the problem.

Is it possible to debug core file generated by a executable compiled without gdb flag?

Is it possible to debug core file generated by a executable compiled without gdb flag ?
If yes, any pointers or tutorials on it ?
Yes you can. It will not be easy though. I will give you an example.
Lets say that I have the following program called foo.c:
main()
{
*((char *) 0) = '\0';
}
I'll compile it and make sure that there is no symbols:
$ cc foo.c
$ strip a.out
$ file a.out
a.out: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.15, stripped
Ok, time to run it:
$ ./a.out
Segmentation fault (core dumped)
Oops. There seems to be a bug. Let's start a debugger:
$ gdb ./a.out core
[..]
Reading symbols from /tmp/a.out...(no debugging symbols found)...done.
[..]
Core was generated by `./a.out'.
Program terminated with signal 11, Segmentation fault.
#0 0x0804839c in ?? ()
(gdb) bt
#0 0x0804839c in ?? ()
#1 0xb7724e37 in __libc_start_main () from /lib/i386-linux-gnu/libc.so.6
#2 0x08048301 in ?? ()
Hmm, looks bad. No symbols. Can we figure out what happened?
(gdb) x/i $eip
=> 0x804839c: movb $0x0,(%eax)
Looks like it tried to store a byte with a value of zero to the memory location pointed by the EAX register. Why did it fail?
(gdb) p $eax
$1 = 0
(gdb)
It failed because the EAX register is pointing to a memory address zero and it tried to store a byte at that address. Oops!
Unfortunately I do not have pointers to any good tutorials. Searching for "gdb reverse engineering" gives some links which have potentially helpful bits and pieces.
Update:
I noticed the comment that this is about debugging a core dump at a customer. When you ship stripped binaries to a customer, you should always keep a debug version of that binary.
I would recommend not stripping and even giving the source code though. All code that I write goes to a customer with the source code. I have been on the customer side too many times facing an incompetent vendor which has shipped a broken piece of software but does not know how to fix it. It sucks.
This seems to be actually a duplicate of this question:
Debug core file with no symbols
There is some additional info there.
Yes, you can,
this is what people who i.e. write cracks are doing,
unfortunately i don't have the slides and documents of a course i followed at university anymore, but googling for reverse engineering or disassembly tutorials will give you some starting points. Also knowing your way around in assembly code is essential.
Our class was based on a book mainly chapter 1 & 3 but there is a new edition out now
Computer Systems: A programmer's perspective by R.E. Bryant and D.R. O'Hallaron
which explains the basics behind computer systems and also gives you good knowledge of the working of programs in systems.
Also when learning this be aware that 64bit cpus have different assembly code than 32bit cpu's, just in case.
If the program is compiled without -g flag,you cannot debug core file.
Otherwise you can do so as:
gdb executable corefile
More you can find at:
http://wwwpub.zih.tu-dresden.de/~mlieber/practical_debugging/04_gdb.pdf

LLVM ERROR: Cannot yet select: error

Hello i am getting the following error when I am running my app in the simulator.
LLVM ERROR: Cannot yet select: ...
It seems that other have reported similar issues for the same combo:
* New sandy bridge MBP
* Iphone 4.3 Simulator
* opengl
Anyone have some clue?
Here is a short excerpt from the log:
LLVM ERROR: Cannot yet select: 0xa0237d8: v16i8 = bit_convert 0xa02aa48 [ORD=259] [ID=170]
0xa02aa48: v8i16 = X86ISD::PSHUFLW 0xa02a828, 0xa02a608 [ID=166]
0xa02a828: v8i16 = X86ISD::PSHUFHW 0xa0235b8, 0xa02a608 [ID=162]
0xa0235b8: v8i16 = llvm.x86.sse2.packssdw.128 0xa023530, 0xa0234a8, 0xa023420 [ORD=256] [ID=158]
0xa023530: i32 = Constant<647> [ORD=256] [ID=21]
0xa0234a8: v4i32 = bit_convert 0xa023310 [ORD=255] [ID=139]
0xa023310: v4f32 = llvm.x86.sse.cmp.ps 0xa023200, 0xa028d70, 0xb03c4e8, 0xa023288 [ORD=252] [ID=130]
0xa023200: i32 = Constant<784> [ORD=252] [ID=19]
I'm getting this same error. I just got the new sandy bridge MBP today, and on my previous computer, I do not have this problem.
Changing the target to iPad 4.2 instead of iPad 4.3 resolves the issue.
Here's how to change the target in the new version of Xcode:
http://developer.apple.com/library/mac/#documentation/IDEs/Conceptual/Xcode4TransitionGuide/Orientation/Orientation.html
I had the same Error on my MacBook Pro Intel Core i7 in the 4.3 simulator. I updated to Xcode 4.0.2 and now its working again.
This means that LLVM cannot do the instruction selection for some code. Usually this happens when you request some target-specific stuff in the code and disable the features via cmdline.
For example, if you'll use sse2 gcc intrinsics, but will compile for, say, i486, the same sort of message might occur (if not caught earlier by a frontend).
It's hard to say anything more definite without the full error line.
I had the same situation. It looks like a bug of LLVM 2.8 for the new sandy bridge. The work around is to use 4.2 simulator as NoEvilPeople said.
OpenGL apps exit in 4.3 Sim but work...
MacRuby build issues with LLVM
Attempt to force LLVM to treat sandy bridge as core2
In case this helps anyone, I was having the same problem too, but don't have the older SDK for the other fix here. Kazuki posted a link to a discussion over at Apple, and it looks like its a bug that a few people have reported, but that it has something to do with the simulator. That being said, the app I was having a problem with runs fine on-device for me, so that's another potential workaround while this is looked at more.

Resources