intel_iommu , what is it? - intel

One of my customers had a problem with a Xeon E5 machine: they were having one gpu (I believe it was an NVIDIA one) hanging and they solved by adding the
intel_iommu = igfx_off
in the grub loader.
What is this value and what does it? I read around but couldn't just figure that out in simple terms

Quoting from the "Intel-IOMMU.txt" file included in the Linux kernel documentation:
"If you encounter issues with graphics devices, you can try adding option intel_iommu=igfx_off to turn off the integrated graphics engine. If this fixes anything, please ensure you file a bug reporting the problem."
Apparently the GPU in this case was not working properly with the DMAR (DMA Remapping) feature provided by the Intel chipset. Using the "igfx_off" parameter allows the GPU to access the physical memory directly without going through the DMAR.
The purpose of the DMAR feature is to enable things like direct assignment of hardware to virtualized guests. If you have to use the "igfx_off" parameter then you probably won't be able to use this GPU in such a direct-assigned virtualization scenario.

Related

Can I check OpenCL kernel syntax at compilation time?

I'm working on some OpenCL code within a larger project. The code only gets compiled at run-time - but I don't want to deploy a version and start it up just for that. Is there some way for me to have the syntax of those kernels checked (even without consider), or even compile them, at least under some restrictions, to make it easier to catch errors earlier?
I will be targeting AMD and/or NVIDIA GPUs.
The type of program you are looking for is an "offline compiler" for OpenCL kernels - knowing this will hopefully help with your search. They exist for many OpenCL implementations, you should check availability for the specific implementation you are using; otherwise, a quick web search suggests there are some generic open source ones which may or may not fit the bill for you.
If your build machine is also your deployment machine (i.e. your target OpenCL implementation is available on your build machine), you can of course also put together a very basic offline compiler yourself by simply wrapping clBuildProgram() and friends in a basic command line utility.

clBuildProgram crash on NVidia cards

I have an OpenCL application that runs fine when using an AMD GPU.
When using an NVidia card, the clBuildProgram call crashes the application (does not even return a failure value, just a crash). When debugging, the crash yields:
read access violation in the nvopencl.dll module. code 0xc0000005. The debugger indicates the clGetExportTable function (inside nvopencl.dll) as source of the violation.
By commenting random parts of the kernels, I have reached this point:
In the code fragment:
if (something){
//some stuff
float3 gradient = (float3)(0,1,0);
gradient = normalize(gradient);
return;
}
By deleting the "gradient = normalize(gradient);" line, the clBuildProgram does not crash, but letting it there, crashed the program. the gradient variable is not even used inside the kernel, so it is not related to any other part of it. And the normalize funcion by itself should not be the source of the problem, because it is used in other parts of the code.
I think it may be related to some driver bug. Because installing the latest CUDA version (6.5) makes the OpenCL Volume Rendering sample binaries distributed by NVidia to misbehave, while using a CUDA 6 installation make the Volume Rendering sample to work properly.
My code is related to volume rendering techniques, that is why I think that it may be related, but my problem appears with both CUDA 6.5 and CUDA 6 installations.
Have you experienced something similar? What could be the cause of the problem, and how can I handle it?
Thank you.
After further analysis, the problem seems to be a bug in the drivers, as Xapa mentioned.

How to profile an openmp code natively on Intel MIC?

I have an openmp code written in C. I executed the code on Intel MIC on Stampede. I want to profile the code to find the hotspots in the code so that it will be helpful for me to optimize the code further. I tried to use the profiler gprof but I read somewhere that gprof cannot be used on MIC directly. I tried to use perf by going through tutorial. I could go till a certain step after which when the perf annotate step comes and I execute the code, it gives me the error ")" unexpected. So I am not knowing how to proceed to profile my code. Can anybody please help ??
This is the site where I referred to the perf tutorial : sandsoftwaresound.net/perf/perf-tutorial-hot-spots/ .
80% of optimization for the Xeon Phi is the same as for the host (Xeon). Use gprof, printf, compiler options, and the rest of your toolkit and carry your optimization as far as you can executing your code on the host only. After you can do no more, then focus on specific Xeon Phi optimizations.
As you are on Stampede, I assume you are using the Intel compiler. The compiler has a lot of diagnostic capabilities to profile your code and even provide suggestions. I'd provide you with more specific URLs but am on vacation with limited bandwidth.
Though this isn't specific to your question, here are some other suggestions. If you aren't, you'll most likely get a substantial boost using it. Intel compilers are danged good at optimizations, especially on Intel architectures. Also, you should use Intel MKL where possible. All of MKL's routines are optimized for the different IA architectures, and the most relevant to HPC are optimized specifically for MIC.
You have a few options.
The heavyweight approach is to use Intel Vtune. Firstly add -g to your compiler flags.
I use Vtune from the host command line quite a bit, here is the command I use to profile an application on the MIC. (This is executed on the host machine, Vtune on the host uses ssh
to launch the application on the MIC.)
amplxe-cl -collect knc-hotspots -source-search-dir=/mysrc/dir -search-dir=/mybin/dir -- ssh mic0 /home/me/myapp
Assume the app on the MIC is at /home/me/myapp, and the source dir and source search dir on the host. (With Vtune update 15 at least, I need to specify both of these separately in order to get the Vtune GUI to show me symbol info)
Once your app has finished, run the Vtune GUI on the host with amplxe-gui and open your result set.
There are also some simplified open source profiling tools developed by Intel that support the MIC, Speedometer and Overhead, you can find information about them here
Hopefully this is enough info to get you started.

DirectShow DMO Color Converter

I am having an issue with connecting the Color Converter DMO object in graphedit (graphstudio, and in code). It works on one machine and turns green in graphedit, however on the machine I have to demo the program on it will not connect! I've looked at sdks, installs, and the machine should mimic my machine.
I also noticed that I can reregister the filter on my machine with regsvr however it fails on the other machine.
Any ideas as to what the culprit could be?
Why would you want to re-register it, it is either a core OS component, or not available at all
It is not a filter, it is dual interface DMO/DSP and while it is available within DirectShow through DMO Wrapper Filter, this use scenario is not guaranteed to work out smoothly and you possibly have to workaround issues.
Having wrapped it through DirectShow.NET library, the number of issues might increase, so you have to gather and provide more details about the errors on the way (HRESULTs etc).

How to programatically change the output mode of an intel gma450 graphics card to clone

I would like to change the output mode of an Intel GMA450 based graphics chip to "cloned" mode.
Since the environment is a Windows Embedded Standard and only one of the connected monitors might be visible for the enduser, I would like to either permanently set the output mode to cloned or reset it continuously to cloned mode in case the actual mode differs (e.g. after a reboot, disconect/reconect of the second monitor or by other means).
Is there a way (Registrykey, API for the Intel driver, Win-Api) to change the display mode to cloned / dual output programatically?
Update:
I found the SDK for the IEDG driver it seems that I might be able to programatically set the resolution, clone mode etc.
However, I can't find the SDK or any information for the driver I am currently using: IntelĀ® Graphics Media Accelerator Driver for Windows* XP, version 14.32.4.4926.
This isn't a good answer, but it might get you headed in a direction to figure it out.
My last laptop had an external monitor connected, and the Intel drivers would often be confused about the orientation of the secondary after a reconnect or a reboot. I got tired of dealing with that and tried to fix it programatically because the clicks were too many in the GUI. Select this monitor, select rotation, select other monitor, select rotation, apply, arrange, apply, wait...
I spent about a day on it (ahh, the days of being an employee vs. self-employed!) and the solution I found was to use a program to compare the registry (regshot perhaps?) to discover what keys were involved in the correction (what they were before versus what they were after) and then there was an intel-provided exe that forced the driver to reset based on the registry-- the exe was essentially like pressing the "apply" button in the gui. I was running XP and if I recall, the gui management was for configuration of the Intel Graphics Media Accelerator Driver for Windows XP as well. So the final solution became a cmd file on my desktop that would apply a REG without confirmation and then run an exe with some parameters.
Now, I don't have that laptop (they didn't let me walk out the door with it when I quit!) and I do not remember the specifics on the exe that was required to do the reset. Just changing registry keys didn't spontaneously cause it to take effect-- there was an api call involved, which I just handled with their exe. I know that isn't a lot to go on, but something tells me the file was in the driver package, or somewhere on the drive already, and I just found it. Running it at the command line gave options. Like /reset.
I hope that helps you a little. Be sure to post back if you figure it out.
Also post back if I'm completely mistaken and it didn't happen like this at all. But that's the way I remember it. :)

Resources