Use NVidia discrete via Nouveau on Optimus laptop - xorg

I've been reading documentation and testing things out, but I can't figure out how to have the NVidia discrete graphics always render everything onto the Intel graphics in my Optimus laptop using Nouveau (on Xorg).
1) Is this possible?
2) How is this done?
From what I've read, several docs/sites seem to imply that this is impossible on a muxless laptop.

If your laptop has all of its display outputs connected to the Intel, and ONLY to the Intel, then yes, everything has to be passed trough the Intel card. If however some of your outputs are connected to the Nvidia card, then setting the Nvidia as your primary GPU and passing only the data for the respective displays through the Intel card would be possible. A good documentation on how to do that can be found here.

Related

Stop OpenCL support for a GPU

I have two GPUs installed on my machine. I am working with library which uses OpenCL acceleration that only support one GPU and it is not configurable. I can not tell it which one I want. It seems that this library for some reason chose one of my GPUs that I do not want.
How can I delete/stop/deactivate this GPU from being supported as an OpenCL device?
I want to do this so I get only one supported GPU and the library will be forced to use it.
Note: Any option that contains change or edit the library is available for me.
P.S. I am on Windows 10 with Intel processor and Intel GPU + NVidia GPU
On Windows the OpenCL ICD system uses Registry entries to find all of the installed OpenCL platforms.
Solution: Using RegEdit you can backup and then remove the entry for the GPU you do not want to use. The Registry location is HKEY_LOCAL_MACHINE\SOFTWARE\Khronos\OpenCL\Vendors.
Reference: https://www.khronos.org/registry/cl/extensions/khr/cl_khr_icd.txt

Use OpenCL on AMD APU but use discrete GPU for the X server

Is it possible to enable OpenCL on an A10-7800 without using it for the X server? I have a Linux box that I use for GPGPU programming. A discrete GEForce 740 card is used for both the X server and running OpenCL & Cuda programs I develop. I would also like the option of running OpenCL code on the APU's integrated GPU cores.
Everything I've read so far implies that if I want to use the APU for OpenCL, I have to install Catalyst and, AFAIK, that means using it for the X server. Is this true? Would there be an advantage to using the APU for my X server and using the GEForce solely for GPGPU code?
I had a similar goal, so I've built a system with AMD APU (4 regular cores + 6 GPUs) and Nvidia discrete graphics board. Sorry to say it wasn't easy to make it work, so I asked a question on the Ask Ubuntu forum, didn't get any answers, experimented a lot with hardware and software setup, and finally have posted my own answer to my question.
I'll describe my setup again here - who knows, what might happen with my auto-answered question on the Ask Ubuntu?
At first, I had to enable the integrated graphics hardware via a BIOS flag. This flag is called IGFX Multi-Monitor on my motherboard (ASUS A88X-PRO).
The second step was to find a right mix of a low-level graphics driver and high-level OpenCL implementation. The low-level driver for AMD processors is called AMD Catalyst and has a file name fglrx. I didn't install this driver from the Ubuntu software center - instead I used a version 15.302, directly downloaded from the AMD site. I had to install a significant number of prerequisites for this driver. The most important finding was that I had to skip running the aticonfig command after the fglrx installation - this command actually configures the X server to use this driver for graphics output, and I didn't want that.
Then I've installed the AMD SDK Ver 3.0 (release 130.136, earlier releases didn't work with my fglrx) - it's the OpenCL implementation from AMD. The clinfo command reports both CPUs and GPUs with correct number of cores now.
So, I have a hybrid AMD processor, supported by the OpenCL, with all the graphics output, supported by a discrete graphics card with Nvidia processor.
Good luck!
I maintain a Linux server (OpenSUSE, but the distribution shouldn't matter) containing both NVIDIA and (a discrete) AMD GPU. It's headless, so technically I do not know whether the X server will create additional problems, but I don't think so. You can always configure xorg.conf to use exactly the driver you want. Or for that matter: install Catalyst, but delete the X server driver file itself, which is not the same thing that you need for OpenCL.
There is one problem with a mixed-vendor system that I noticed, however: AMDs OpenCL driver (ICD) will go spelunking for a libGL.so library, I guess in order to do OpenCL/OpenGL-interop. If it finds any of the NVIDIA-supplied libGL.so's, it will get confused and hang - at least on my machine. I "solved" this by deleting all libGL.so's (I do not need it on a headless compute server), but that might not be an acceptable solution for you. Maybe you can arrange things such that the AMD-supplied libGL.so's take precedence, possibly by installing the AMD driver last.

How to let OpenCl see intel and nvidia devices?

I wonder how we can have OpenCl "seeing" my K20. Xeon, and Xeon Phi at the same time?
Especially I'm confused about the use of two libraries here (from NVidia and Intel).
How to do it, if possible at all?
The OpenCL Installable Client Driver (ICD) takes care of this for you. It is the same regardless of whose implementation you have installed, and exposes all implementations as separate OpenCL "Platforms".
When you call clGetPlatformIDs it will tell you how many platforms you have installed. There could be one for AMD, one for NVIDIA, and one for Intel, for example.
Then within each platform you call clGetDeviceIDs which will return the number of devices within that platform. On your NVIDIA platform you'll find your K20, and within your Intel platform you'll find your Xeon CPU and Xeon Phi co-processor.
If you build or download the clInfo utility you'll see a nice dump of all the installed platforms and devices and the capabilities of each.
The problem is solved.
Looking at the key directory:
/etc/OpenCL/vendors/*.icd
I noticed that for Nvidia the library in used was a link which was duplicated in difference places and pointing to two different releases.
I just replace the former one by the most recent one, the one I've installed recently, and here we go.
Opencl did not know which one to use I guess.
It's like the installation location has changed between the two nividia versions.
When I was supposed to have removed it before reinstalling that was actually not true.
Thank you all for your hell.

Programming Intel IGP (e.g. Iris Pro 5200) hardware without OpenCL

The Peak GFLOPS of the the cores for the Desktop i7-4770k # 4GHz is 4GHz * 8 (AVX) * (4 FMA) * 4 cores = 512 GFLOPS. But the latest Intel IGP (Iris Pro 5100/5200) has a peak of over 800 GFLOPS. Some algorithms will therefore run even faster on the IGP. Combining the cores with the IGP together would even be better. Additionally, the IGP keeps eating up more silicon. The Iris Pro 5100 takes up over 30% of the silicon now. It seems clear which direction Intel desktop processors are headed.
As far as I have seen the Intel IGP, however, is mostly ignored by programmers with the exception of OpenCL/OpenGL. I'm curious to know how one can program the Intel HD Graphics hardware for compute (e.g. SGEMM) without OpenCL?
Added comment:
Their is no Intel support for HD graphics and OpenCL on Linux. I found beignet which is open source attempt to add support to Linux at least for Ivy Bridge HD graphics. I have not tried it. Probably the people developing Beignet know how to program the HD graphics hardware without OpenCL then.
Keep in mind that there is a performance hit to copy the data to the video card and back, so this must be taken into account. AMD is close to releasing APU chips that have unified memory for the CPU and GPU on the same die, which will go a long way towards alleviating this problem.
The way the GPU used to be utilized before CUDA and OpenCL were to represent the memory to be operated on as a texture utilizing DirectX or OpenGL. Thank goodness we don't have to do that anymore!
AMD is really pushing the APU / OpenCL model, so more programs should take advantage of the GPU via OpenCL - if the performance trade off is there. Currently, GPU computing is a bit of a niche market relegated to high performance computing or number crunching that just isn't needed for web browsing and word processing.
It doesn't make sense any more for vendors to let you program using low-level ISA.
It's very hard and most programmers won't use it.
It keeps them from adjusting the ISA in future revisions.
So programmers use a language (like C99 in OpenCL) and the runtime does ISA-specific optimizations right on the user's machine.
An example of what this enables: AMD switched from VLIW vector machines to scalar machines and existing kernels still ran (most ran faster). You couldn't do this if you wrote ISA directly.
Programming a coprocessor like iris without opencl is rather like driving a car without the steering wheel.
OpenCL is designed to expose the requisite parallelism that iris needs to achieve its theoretical performance. You cant just spawn 100s of threads or processes on it and expect performance. Having blocks of threads doing the same thing, at the same time, on similar memory addresses, is the whole crux of the matter.
Maybe you can think of a better paradigm than opencl for achieving that goal; but until you do, I suggest you try learning some opencl. If you are into python; pyopencl is a great place to start.

How to use 2 OpenCL runtimes

I want to use 2 OpenCL runtimes in one system together (in my case AMD and Nvidia, but the question is pretty generic).
I know that I can compile my program with any SDK. But when running the program, I need to provide libOpenCL.so. How can I provide the libs of both runtimes so that I see 3 devices (AMD CPU, AMD GPU, Nvidia GPU) in my OpenCL program?
I know that it must be possible somehow, but I didn't find a description on how to do it for linux, yet.
Thanks a lot,
Tomas
You're not thinking of it right. SDK's are not provided by the application, and are not needed for running a compiled program. OpenCL runtimes are provided by the client system, and that's what's giving your program platforms and devices to use in clGetPlatformIDs and clGetDeviceIDs.
If the user does not have an Nvidia graphics card, you are simply not going to be able to use an Nvidia platform and device on his system, because he doesn't have the Nvidia OpenCL runtime or hardware.
All different OpenCL SDK's provide you are vendor-specific extensions, which are then understood by the vendor runtime.
The Khronos OpenCL working group defined a ICD layer (installable client driver) that allows multiple vendor drivers to be installed on the system. The application accesses the vendor drivers through the ICD layer. For more details see cl_khr_icd.txt.
The Smith and Thomas answers are correct; this is just expanding on that information: When you enumerate the OpenCL platforms, you'll get one for each installed driver. Within each platform you enumerate the devices. The AMD and Intel drivers also expose CPU devices. So on a fully populated machines, you might see an AMD platform (with CPU and GPU devices), an NVIDIA platform (with GPU device), and an Intel platform (with CPU and GPU devices). Your code creates a context on whichever devices you want to use, and one or more command queues to feed them work. You can keep them all busy working on things, but you can only share data buffers between devices from the same platform. To share data across platforms, it must hit CPU memory in between.
In regards to running on multiple OpenCL devices at the same time. If you want to run on multiple devices create a separate context for each device/vendor and run each one in a separate thread. For example I have a GTX 590. This shows up as two GTX 590 devices. I also have the Intel i7 processor. I create three contexts: two for the 590 devices and one for the CPU and run each context/device in three threads using SDL_CreateThread (pthreads works well as well). You have to weight the number of jobs for each device proportional to their "speed" if you want to get good results. For example 45% for each GTX 590 and 10% for the CPU. The best weights to use depend on the application.

Resources