I'm working on some OpenCL code within a larger project. The code only gets compiled at run-time - but I don't want to deploy a version and start it up just for that. Is there some way for me to have the syntax of those kernels checked (even without consider), or even compile them, at least under some restrictions, to make it easier to catch errors earlier?
I will be targeting AMD and/or NVIDIA GPUs.
The type of program you are looking for is an "offline compiler" for OpenCL kernels - knowing this will hopefully help with your search. They exist for many OpenCL implementations, you should check availability for the specific implementation you are using; otherwise, a quick web search suggests there are some generic open source ones which may or may not fit the bill for you.
If your build machine is also your deployment machine (i.e. your target OpenCL implementation is available on your build machine), you can of course also put together a very basic offline compiler yourself by simply wrapping clBuildProgram() and friends in a basic command line utility.
Related
I had this question on my exam, now in diagrams I saw, we have : hardware, kernel, system call interface to the kernel, then (compilers, shells, sys.libs) and on top some applications. Does OS scope include only kernel, and everything else is just some additional functions we choose to install , or does a Unix OS include everything from the list I gave above?
OS have more or less 2 definitions :
academic : OS is soft for doing a abstraction layer between
hardware and software
pragmatic : OS is soft that come with hardware when we buy it.
Compiler and shell don't enter in definition 1. It can be enter in definition 2.
And usually, users that are interesting by a compiler or a shell prefer to consider OS as asbtraction layer (academic definition).
Simple answer, No. They are not an internal part of Unix but additional functionality to help make the Operating System more usable.
The OS scope applies primarily to the kernel only.
Whilst you need a compiler to build the kernel, you don't necessarily require one for the general day to day use of the system. Most operating systems don't ship the compiler by default and instead, the kernel and applications is built on one machine and then the resulting binarys are packaged and distributed either with the computer directly (Windows/Unix) or via the internet for others to download and install (Linux/BSD)
Likewise with the shell. Although all operating systems ship with a default one (sh/bash/dash on Linux|Unix systems, Command Prompt/Powershell on Windows), most general users can go their entire lives without using it.
Having said that, if you were to delete the shell, you'll almost certainly find your system won't boot up. This is because a lot of core start-up scripts rely on the shell to stop / start the services presenting interfaces between the user and the kernel.
In summary:
You need a compiler to build the kernel and applications but not for running the OS.
You need a shell to execute applications (which also includes the compiler)
I have an openmp code written in C. I executed the code on Intel MIC on Stampede. I want to profile the code to find the hotspots in the code so that it will be helpful for me to optimize the code further. I tried to use the profiler gprof but I read somewhere that gprof cannot be used on MIC directly. I tried to use perf by going through tutorial. I could go till a certain step after which when the perf annotate step comes and I execute the code, it gives me the error ")" unexpected. So I am not knowing how to proceed to profile my code. Can anybody please help ??
This is the site where I referred to the perf tutorial : sandsoftwaresound.net/perf/perf-tutorial-hot-spots/ .
80% of optimization for the Xeon Phi is the same as for the host (Xeon). Use gprof, printf, compiler options, and the rest of your toolkit and carry your optimization as far as you can executing your code on the host only. After you can do no more, then focus on specific Xeon Phi optimizations.
As you are on Stampede, I assume you are using the Intel compiler. The compiler has a lot of diagnostic capabilities to profile your code and even provide suggestions. I'd provide you with more specific URLs but am on vacation with limited bandwidth.
Though this isn't specific to your question, here are some other suggestions. If you aren't, you'll most likely get a substantial boost using it. Intel compilers are danged good at optimizations, especially on Intel architectures. Also, you should use Intel MKL where possible. All of MKL's routines are optimized for the different IA architectures, and the most relevant to HPC are optimized specifically for MIC.
You have a few options.
The heavyweight approach is to use Intel Vtune. Firstly add -g to your compiler flags.
I use Vtune from the host command line quite a bit, here is the command I use to profile an application on the MIC. (This is executed on the host machine, Vtune on the host uses ssh
to launch the application on the MIC.)
amplxe-cl -collect knc-hotspots -source-search-dir=/mysrc/dir -search-dir=/mybin/dir -- ssh mic0 /home/me/myapp
Assume the app on the MIC is at /home/me/myapp, and the source dir and source search dir on the host. (With Vtune update 15 at least, I need to specify both of these separately in order to get the Vtune GUI to show me symbol info)
Once your app has finished, run the Vtune GUI on the host with amplxe-gui and open your result set.
There are also some simplified open source profiling tools developed by Intel that support the MIC, Speedometer and Overhead, you can find information about them here
Hopefully this is enough info to get you started.
One of my customers had a problem with a Xeon E5 machine: they were having one gpu (I believe it was an NVIDIA one) hanging and they solved by adding the
intel_iommu = igfx_off
in the grub loader.
What is this value and what does it? I read around but couldn't just figure that out in simple terms
Quoting from the "Intel-IOMMU.txt" file included in the Linux kernel documentation:
"If you encounter issues with graphics devices, you can try adding option intel_iommu=igfx_off to turn off the integrated graphics engine. If this fixes anything, please ensure you file a bug reporting the problem."
Apparently the GPU in this case was not working properly with the DMAR (DMA Remapping) feature provided by the Intel chipset. Using the "igfx_off" parameter allows the GPU to access the physical memory directly without going through the DMAR.
The purpose of the DMAR feature is to enable things like direct assignment of hardware to virtualized guests. If you have to use the "igfx_off" parameter then you probably won't be able to use this GPU in such a direct-assigned virtualization scenario.
I have C source with MPI calls.
I wonder can I get sequential program from the source by linking with some MPI stub library? Where can I get this lib?
Most correctly-written MPI programs should not depend on the number of processes they use to get a correct answer -- eg, if you run them on one process (mpirun -np 1 ./a.out) they should still work. So you oughtn't need a stub library - just use MPI. (If for some reason you just don't want extraneous libraries kicking around, it's certainly possible to write stubs and link against them -- I did this back in the day when setting up MPI on my laptop was a huge PITA, you could use this as a starting point and add any functionality you need. But these days, fiddling with the stub library is probably going to be more work than just using an existing MPI implementation.)
If your MPI program does not currently work correctly on one processor, a stub library probably won't help; you'll need to find the special cases that it's not handling and fix them.
I don't think this is possible. Contrary to OpenMP, programs using MPI don't necessarily run or produce the same result when you simply take away the MPI part.
PETSc contains a stub MPI library that works for one process (ie serial) execution:
http://www.mcs.anl.gov/petsc/petsc-3.2/include/mpiuni/mpi.h
I use a third-party DLL (FTD2xx) to communicate with an external device. Using Qt4, in debug mode everything works fine, but the release crashes silently after successfully completing a called function. It seems to crash at return, but if I write something to the console (with qDebug) at the end of the function, sometimes it does not crash there, but a few, or few dozen lines later.
I suspect a not properly cleaned stack, what the debug build can survive, but the release chokes on it. Did someone encounter a similar problem? The DLL itself cannot be changed, as the source is not available.
It seems the reduction of the optimization level was the only way around. The DLL itself might have problems, as a program which does nothing but calls a single function from that DLL crashes the same way if optimization is turned on.
Fortunately, the size and speed lost by the change in optimization level is negligible.
Edit: for anyone with similar problems on Qt 5.0 or higher: If you change the optimization level (for example, to QMAKE_CXXFLAGS_RELEASE = -O0), it's usually not enough to just rebuild the application. A full "clean all" is required.
Be warned - the EPANET library is not thread safe, it contains a lot of global variables.
Are you calling two methods of that library from different threads?