clBuildProgram crash on NVidia cards - opencl

I have an OpenCL application that runs fine when using an AMD GPU.
When using an NVidia card, the clBuildProgram call crashes the application (does not even return a failure value, just a crash). When debugging, the crash yields:
read access violation in the nvopencl.dll module. code 0xc0000005. The debugger indicates the clGetExportTable function (inside nvopencl.dll) as source of the violation.
By commenting random parts of the kernels, I have reached this point:
In the code fragment:
if (something){
//some stuff
float3 gradient = (float3)(0,1,0);
gradient = normalize(gradient);
return;
}
By deleting the "gradient = normalize(gradient);" line, the clBuildProgram does not crash, but letting it there, crashed the program. the gradient variable is not even used inside the kernel, so it is not related to any other part of it. And the normalize funcion by itself should not be the source of the problem, because it is used in other parts of the code.
I think it may be related to some driver bug. Because installing the latest CUDA version (6.5) makes the OpenCL Volume Rendering sample binaries distributed by NVidia to misbehave, while using a CUDA 6 installation make the Volume Rendering sample to work properly.
My code is related to volume rendering techniques, that is why I think that it may be related, but my problem appears with both CUDA 6.5 and CUDA 6 installations.
Have you experienced something similar? What could be the cause of the problem, and how can I handle it?
Thank you.

After further analysis, the problem seems to be a bug in the drivers, as Xapa mentioned.

Related

Can I check OpenCL kernel syntax at compilation time?

I'm working on some OpenCL code within a larger project. The code only gets compiled at run-time - but I don't want to deploy a version and start it up just for that. Is there some way for me to have the syntax of those kernels checked (even without consider), or even compile them, at least under some restrictions, to make it easier to catch errors earlier?
I will be targeting AMD and/or NVIDIA GPUs.
The type of program you are looking for is an "offline compiler" for OpenCL kernels - knowing this will hopefully help with your search. They exist for many OpenCL implementations, you should check availability for the specific implementation you are using; otherwise, a quick web search suggests there are some generic open source ones which may or may not fit the bill for you.
If your build machine is also your deployment machine (i.e. your target OpenCL implementation is available on your build machine), you can of course also put together a very basic offline compiler yourself by simply wrapping clBuildProgram() and friends in a basic command line utility.

CL_PLATFORM_NOT_FOUND_KHR in opencl

This is a very strange situation. Why do I get error
CL_PLATFORM_NOT_FOUND_KHR
when I'm calling this function:
clGetPlatformIDs(0, NULL, &platformCount);
Earlier this error was not. I have installed the driver and SDK from Intel and Nvidia. Are there any suggestions?
Here is explained why such error can occur. clGetPlatformIDs returns CL_SUCCESS if the function is executed successfully and there are a non-zero number of platforms available. Otherwise it can return CL_PLATFORM_NOT_FOUND_KHR if the cl_khr_icd extension is enabled and no platforms are found.
You are in luck. Well sort of... Seeing this is 3 years later.
Disclaimer: I HAVE NO CLUE WHY THIS WORKS:
Machine: x64 windows 10.
Graphics Card: Geforce GTX 960
Total Failure To Load Library : LoadLibraryA( "OpenCL64.dll" )
WRONG (but loads) : LoadLibraryA( "C:/Program Files/NVIDIA Corporation/OpenCL/OpenCL64.dll" )
WRONG (but loads) : LoadLibraryA( "C:/Program Files/NVIDIA Corporation/OpenCL/OpenCL.dll" )
CORRECT: LoadLibraryA( "OpenCL.dll" )
Here is the really insideous thing: Both of my "WRONG" answers will let you
grab function pointers, but when you call clGetPlatformIDs the return status
will be 0xFFFFFC17 ( CL_PLATFORM_NOT_FOUND_KHR ).
Then you'll be examining your function call correctness.
Maybe you'll even look at the calling convention. Maybe you'll check
the header files and make sure there are not any typos there. And yet,
you are looking in all the wrong places because the original problem happened
more steps back than you think.
Because of this problem, I build into my programs code that reads a file:
"OPEN_CL_SEARCH_PATHS.TXT" so the user of the software can change what DLL file
the program attempts to load.
While I am here, I would also like to add that there seems to be a bug with the
driver that makes it so OpenCL <==> OpenGL sharing is NOT a zero-copy share and
is incredibly laggy. Now I've got to go figure out Vulkan to make my fractal
rendering engine even though OpenCL's abstraction better suits the problem.
It is probably important to note that I am NOT using an SDK or any
validation layers. In fact, I am not even using
windows.h.
I wrote assembly code to grab the address of GetProcAddress and LoadLibrary by navigating the PEB file. I am also not using cl.h or cl_platform.h.
I reconstruct the structs I need from the documentation. I am also not
bothering with prototypes for function signatures either. For example,
I call "clGetPlatformIDs" by casting it to type "F_03" and then
calling it that way.
typedef void* (F_03)( void, void*, void* );
My machine doesn't have GPU and so had to use hashcat with OpenCL for CPU alone. My machine was Intel core i3, so I have downloaded the OpenCL softwares from Intel website and installed manually and the error gone.
Source: https://youtu.be/AieYqNQ6ADM

openCL clGetDeviceIds seg fault (imx6 (Freescale) with openCL on a Linux SUSE distribution (armv7))

I'm developing an application with openCL on an imx6q (freescale - Vivante gc200 with openCL EP) with a Linux suse 13.1 distribution adapted for armv7.
I'm based on the following tutorial : https://community.freescale.com/docs/DOC-93984#comment-12585. I installed the following package : gpu-viv-bin-mx6q.
When I try the example code, it works on a laptop version, but on the imx6, it gives me a segmentation fault when calling the function clGetDeviceIds.
The program is compiled correctly but not work when running;
I tried by passing different null variables in the function. I'm not sure if it's due to memory allocation (as the same code work on my laptop, i can suppose this is not the problem). When I launch it in debug mode, the program seems not to find the file : "gc_hal_user_query.c" (hal is for Hardware Abstraction Layer).
I can't find sufficient documentation on the web, and i'm quite newbie on linux and openCL, so if anybody could help me. Thanks in advance.
I guess the issue is that, when you call
clGetPlatformIDs(1, &cpPlatform, NULL);
cpPlatform receives 0 if no platform is detected. This leads to a segmentation fault during the next call to
clGetDeviceIDs(cpPlatform, CL_DEVICE_TYPE_GPU, 1, &cdDevice, NULL);
I unfortunately can't help further, I have the same problem.
You are running with insufficient permissions. Try running as root.

intel_iommu , what is it?

One of my customers had a problem with a Xeon E5 machine: they were having one gpu (I believe it was an NVIDIA one) hanging and they solved by adding the
intel_iommu = igfx_off
in the grub loader.
What is this value and what does it? I read around but couldn't just figure that out in simple terms
Quoting from the "Intel-IOMMU.txt" file included in the Linux kernel documentation:
"If you encounter issues with graphics devices, you can try adding option intel_iommu=igfx_off to turn off the integrated graphics engine. If this fixes anything, please ensure you file a bug reporting the problem."
Apparently the GPU in this case was not working properly with the DMAR (DMA Remapping) feature provided by the Intel chipset. Using the "igfx_off" parameter allows the GPU to access the physical memory directly without going through the DMAR.
The purpose of the DMAR feature is to enable things like direct assignment of hardware to virtualized guests. If you have to use the "igfx_off" parameter then you probably won't be able to use this GPU in such a direct-assigned virtualization scenario.

Calling functions in Qt from third-party DLL works in debug mode, crashes in release

I use a third-party DLL (FTD2xx) to communicate with an external device. Using Qt4, in debug mode everything works fine, but the release crashes silently after successfully completing a called function. It seems to crash at return, but if I write something to the console (with qDebug) at the end of the function, sometimes it does not crash there, but a few, or few dozen lines later.
I suspect a not properly cleaned stack, what the debug build can survive, but the release chokes on it. Did someone encounter a similar problem? The DLL itself cannot be changed, as the source is not available.
It seems the reduction of the optimization level was the only way around. The DLL itself might have problems, as a program which does nothing but calls a single function from that DLL crashes the same way if optimization is turned on.
Fortunately, the size and speed lost by the change in optimization level is negligible.
Edit: for anyone with similar problems on Qt 5.0 or higher: If you change the optimization level (for example, to QMAKE_CXXFLAGS_RELEASE = -O0), it's usually not enough to just rebuild the application. A full "clean all" is required.
Be warned - the EPANET library is not thread safe, it contains a lot of global variables.
Are you calling two methods of that library from different threads?

Resources