Are the Intel compilers worth it? - intel

Prety straight forward, are the Intel compilers worth getting? I do mostly systems level and desktop work so I figure I might benefti. Can anyone with some more experience shed some light?

If you are on Windows, they do provide a nice speed boost over other compilers on Intel processors. There is a known behavior where they pick a very slow code path with non-Intel processors (AMD, VIA), and antitrust probes surrounding the issue.
If you use the thread building blocks or other features, you also risk tying your code to the Intel compiler long term as the functionality doesn't exist elsewhere.
GCC 4.5 on Linux is nearly on-par with the Intel compiler. There is no clear winner on that platform.

In the small experience I've had with intel compilers (C only), I would say their are vastly superior. Specifically the OpenMP library was much much faster than the open source version. "Worth it" depends on your situation though, they are expensive, but they are better IMO.

From the benchmarks I've seen, it does look like using the Intel specific compilers provide some performance/multithreading benefit over their Open Source alternatives.

if floating number precision is important to you then use Visual studio compiler and not intel compiler.
32 bit vs 64 bit application Can give you different result on calculation with Intel compiler. (checked).
Visual studio compiler result on 32 bit vs 64 bit will be same.

If you're comparing the numerical behavior of ICL vs. MSVC++ you must take into account the different behavior of the /fp: settings.
ICL /fp:source (less aggressive than default) is equivalent to MSVC /fp:fast (more aggressive than default).
Microsoft doesn't perform any of the optimizations which are enabled by ICL default. These include simd reductions (which usually improve accuracy, but by an unpredictable margin). ICL also violates the standard about parens by default. There still seems to be a controversy about whether to fix that by better performing means than /fp:source.

Related

Why doesn't Intel design its SIMD ISAs in a more compatible or universal way?

Intel has several SIMD ISAs, such as SSE, AVX, AVX2, AVX-512 and IMCI on Xeon Phi. These ISAs are supported on different processors. For example, AVX-512 BW, AVX-512 DQ and AVX-512 VL are only supported on Skylake, but not on Xeon Phi. AVX-512F, AVX-512 CDI, AVX-512 ERI and AVX-512 PFI are supported on both the Skylake and Xeon Phi.
Why doesn't Intel design a more universal SIMD ISA that can run on all of its advanced processors?
Also, Intel removes some intrinsics and adds new ones when developing ISAs. A lot of intrinsics have many flavours. For example, some work on packed 8-bit while some work on packed 64-bit. Some flavours are not widely supported. For example, Xeon Phi is not going to have the capability to process packed 8-bit values. Skylake, however, will have this.
Why does Intel alter its SIMD intrinsics in such an inconsistent way?
If the SIMD ISAs are more compatible with each other, an existed AVX code may be ported to AVX-512 with much less efforts.
I see the reason why as three-fold.
(1) When they originally designed MMX they had very little area to work with so made it as simple as possible. They also did it in such a way that was fully compatible with the existing x86 ISA (precise interrupts + some state saving on context switches). They hadn't anticipated that they would continually enlarge the SIMD register widths and add so many instructions. Every generation when they added wider SIMD registers and more sophisticated instructions they had to maintain the old ISA for compatibility.
(2) This weird thing you're seeing with AVX-512 is from the fact that they are trying to unify two disparate product lines. Skylake is from Intel's PC/server line therefore their path can be seen as MMX -> SSE/2/3/4 -> AVX -> AVX2 -> AVX-512. The Xeon Phi was based on an x86-compatible graphics card called Larrabee that used the LRBni instruction set. This is more or less the same as AVX-512, but with less instructions and not officially compatible with MMX/SSE/AVX/etc...
(3) They have different products for different demographics. For example, (as far as I know) the AVX-512 CD instructions won't be available in the regular SkyLake processors for PCs, just in the SkyLake Xeon processors used for servers in addition to the Xeon Phi used for HPC. I can understand this to an extent since the CD extensions are targeted at things like parallel histogram generation; this case is more likely to be a critical hotspot in servers/HPC than in general-purpose PCs.
I do agree it's a bit of mess. Intel are beginning to see the light and planning better for additional expansions; AVX-512 is supposedly ready to scale to 1024 bits in a future generation. Unfortunately it's still not really good enough and Agner Fog discusses this on the Intel Forums.
For me I would have liked to see a model that can be upgraded without the user having to recompile their code each time. For example, instead of defining AVX register as 512-bits in the ISA, this should be a parameter stored in the microarchitecture and retrievable by the programmer at runtime. The user asks what is the maximum SIMD width available on this machine?, the architecture returns XYZ, and the user has generic control flow to cope with whatever that XYZ is. This would be much cleaner and scalable than the current technique which uses several versions of the same function for every possible SIMD version. :-/
There is SIMD ISA convergence between Xeon and Xeon Phi and ultimately they may become identical. I doubt you will ever get the same SIMD ISA across the whole Intel CPU line - bear in mind that it stretches from a tiny Quark SOC to Xeon Phi. There will be a long time, possibly infinite, before AVX-1024 migrates from Xeon Phi to Quark or a low end Atom CPU.
In order to get better portability between different CPU families, including future ones, I advise you to use higher level concepts than bare SIMD instructions or intrinsics. Use OpenCL, OpenMP, Cilk Plus, C++ AMP and autovectorizing compiler. Quite often, they will do a good job generating platform specific SIMD instructions for you.

Optimize mathematical library (libm)

Have anyone tried to compile glibc with -march=corei7 to see if there's any performance improvement over the version that comes by default with any Linux x68_64 distribution? GCC is compiled with -march=i686. I think (not sure) that the mathematical library is also compiled the same way. Can anybody confirm this?
Most Linux distributions for x86 compile using only i686 instructions, but asking for scheduling them for later processors. I haven't really followed later developments.
A long while back different versions of system libraries according to processor lines were common, but the performance differences were soon deemed too small for the cost. And machines got more uniform in performance meanwhile.
One thing that has to be remembered always is that today's machines are memory bound. I.e., today a memory access takes a few hundred times longer than an instruction, and the gap is growing. Not to mention that this machine (an oldish laptop, was top-of-the-line some 2 years back) has 4 cores (8 threads), all battling to get data/instructions from memory. Making the code run a tiny bit faster, so the CPU can wait longer for RAM, isn't very productive.

From a programming point of view, what does it mean when a program is 32 or 64 bit?

I'm a beginner programmer in my first year of Computer Science.
I'm curious about the 32 bit and 64 bit systems, and how it affects developing software.
When I download software I need to choose between the two, while other software only has a 32 bit version.
Are there different ways of programming for a 64 bit system?
Is it compiled in the same way?
What are the main benefits of a separate 64 bit app?
Cheers
Are there different ways of programming for a 64 bit system?
Yes and no. No, in the sense that most of the time you should be able to write platform-independent code, even if you are coding in a language like C. Yes, in the sense that having knowledge of the underlying architecture (not just the word size!) helps to speed up critical parts of your program. For instance, you may be able to use special instructions available.
Is it compiled in the same way?
Again, yes and no. Compilers for systems languages work in similar ways for all architectures, but of course, the details differ a bit. For instance, the compiler will use knowledge about your architecture to generate as efficient code as possible for it, but also has to take care of differences between architectures and other details, like calling conventions.
What are the main benefits of a separate 64 bit app?
I assume you are asking about the usual desktop CPUs, i.e. x86 architecture, but note that there are other architectures with word sizes ranging from 8-bit to 128-bit. Typically, people would compile a program targeting a single architecture (i.e. for a given machine), and that's about it.
However, x86 is a bit special, in that the CPU can operate in different modes, each with a different word size: 16-bit, 32-bit and 64-bit (among other differences). Effectively, they implement several ISAs (Instruction Set Architectures) in a single CPU.
This was done to preserve backwards compatibility, and it is key to their commercial success. Consider that, when people bought the first 64-bit capable CPUs, it was most likely that they were still using 32-bit operating systems and software, so they really needed the compatibility. The other options are emulating it (poor performance) or making sure all the popular customer software has been ported (hard to achieve in ecosystems like Windows with many independent, proprietary vendors).
There are several benefits of 64-bit x86 over 32-bit x86: more addressable memory, more integer registers, twice the XMM registers, a better calling convention, guaranteed SSE2... The only downside is using 64-bit pointers, which implies more memory and cache usage. In practice, many programs can expect to be slightly faster in x64 (e.g. 10%), but pointer-heavy programs may even see a decrease in performance.
Generally speaking the main benefit of 64 bit application is that it has access to more memory. Having 32 bit pointer you can access only 4GB of memory.
Most modern compilers have option to compile either 32 bit or 64 bit code.
32/64 coding is the same unless you are dealing with huge in-memory objects, where you would need to use 64 bit specifically.
An interesting fact/example is that Unix time is stored as a single number. It is calculated as a number of seconds passed from January 1st 1970. This number will soon reach 32-bit size, so eventually we will have to upgrade all of our systems to 64-bit so they can hold such a large number.

Java JDK 32 bits vs 64 bits

I am creating a quite simple application which reads and display text files and search through them.
I am asking myself if there is any interest for me to propose 32 and 64 bits version to the user.
Is the difference only in having access to more memory heap size with the 64 bit version or is there any other interest ?
Will a 32 bit compiled program work on a 64 bits JVM (I assume yes)
The only differences between 32-bit and 64-bit builds of any program are the sizes of machine words, the amount of addressable memory, and the Operating System ABI in use. With Java, the language specification means that the differences in machine word size and OS ABI should not matter at all unless you're using native code as well. (Native code must be built to be the same as the word-size of the JVM that will load it; you can't mix 32-bit and 64-bit builds in the same process without very exotic coding indeed, and you shouldn't be doing that with Java about.)
The only times that have swung it for me is when there have been native libraries involved that have pushed it one way or the other. If you're just in Java land then realistically, unless you need >4GB of heap size, there's very little difference.
EDIT: The differences include things like it uses slightly more memory than 32 bit, significantly more if you're using a version before 6u23 and aren't using -XX:+UseCompressedOops. There may also be a slight performance difference between the two, but again nothing huge.

Is the SPARC architecture still relevant as a JIT compiler target on high-end servers?

X86 and AMD64 are the most important architectures for many computing environments (desktop, servers, and supercomputers). Obviously a JIT compiler should support both of them to gain acceptance.
Until recently, the SPARC architecture was the logical next step for a compiler, specially on high-end servers markets. But now that Sun is dead, things are not clear.
Oracle doesn't seem to be really interested in it, and some big projects are dropping support for that architecture (Ubuntu for example). But on the other hand, the OpenSPARC initiative intended to open source recent processors is quite promising, meaning that a lot of manufacturers could implement and use SPARC for free in the near future.
So, is SPARC still a good choice as the next target architecture for a JIT compiler? Or is it better to choose another one (POWER, ARM, MIPS, ...)?
I don't know any more than you about SPARC's future. I hope it has one; it's been tragic how many good architectures have died out while x86 has kept going.
But i would suggest you look at ARM as a target. It isn't present in big server hardware, but it's huge in the mobile market, and powers all sorts of interesting little boxes, like my NAS, my ADSL router, and so on.
Your next target architecture should definitely be ARM - power consumption in large datacenters is a huge issue and the next big thing will be trying to reduce that by using low-power CPUs; see Facebook's first attempt on this.

Resources