Minimum SSE/AVX version required to compare 2 64-bit integers, atomically? - intel

Besides the title... is there an easy way to find this information myself? Preferably in a tabular format.

Easy way to find it yourself:
Intel Intrinsics Guide
Don't be confused by title; Intrinsics Guide is actually very convinient for the purpose of ISA-specific instructions finding.

pcmpgtq and pcmpeqq were both introduced with SSE4.1, if that's what you're looking for.
x64 with its REX.W CMP has been around for longer though.
See also
Intel's manuals
AMD's developer guides
ref.x86asm.net

Related

When to use eigen and when to use Blas

I did some basic reading on Eigen and Blas. Both library have support for matrix matrix, matrix vector multiplication. I don't understand which one should I use in which case? To me it seems, both have almost same performance. It would be nice if someone could give me some resource or just tell me what are the advantages one library have over another? Or how does these two differer in case of matrix and vector manipulation? Thanks in advance.
Use Eigen, it's more complete and much easier to use. Then, if you wonder if another fully optimized BLAS implementation could give you higher performance, then just recompile your code with -DEIGEN_USE_BLAS and link to your favorite blas and see by yourself.
Also, when using Eigen, don't forget to enable compiler optimizations, e.g. -O3 and the instruction-sets your hardware supports, e.g., -mavx -mfma when using latest Eigen.
So the answer to this question is here.
http://eigen.tuxfamily.org/index.php?title=FAQ#How_does_Eigen_compare_to_BLAS.2FLAPACK.3F
More or less, I use Eigen mostly, because it has a comforable interface. If you need speed and multicore parallelism or have only little but time-consuming linear algebra stuff in your code, go for GotoBlas2. Usually it is fastest on Intel machines.

CGAL tools: is there an interface to CGAL, or equivalent toolset in R?

I'm starting to learn about dealing with complex vs simple polygons, determining whether points are inside/outside polygons, etc. (e.g. http://geomalgorithms.com/a09-_intersect-3.html and related pages). I was hoping to find an R package that provides implementations of the Bentley-Ottmann algorithm, winding number, edge-crossing, and so on.
Alternatively, is there an R interface to the CGAL library or similar toolset? Is Rcpp the best (or only) way to go about this?
The nearest thing is probably package:rgeos. Meant for geospatial applications, polygon overlay, buffering, intersections etc.
A wrap of CGAL would be very interesting. However, I have a vague feeling there may be licensing issues... Its partly LGPL and partly GPL, but if you don't want to comply with those licenses you can buy a commercial license.
There are CGAL SWIG bindings: http://code.google.com/p/cgal-bindings/ and SWIG supports R, so it should work, but I don't know if it has been tried.

CPLEX -linear-optimization-program for Unix?

The linear-optimization course 2.3140 requires CPLEX but it is pain to use because poorly-documented and hard-to-get any information when a brick wall like here and here, let alone not having the software locally.
Does there exist some linear-optimization -tool by which I could program like with CPLEX? Since I haven't used this tool for a year, I have forgotten a lot of trivial things. Now trying to find some tool that I could run even in my Debian comp or Apple -comp, any tool or lib existing?
Trial 1: Trying to find GUI -tool to execute code like this
Trying to understand how the CPLEX works from IBM Academic Initiative. In uni, I have some sort of Eclipse CPLEX -thing but I found only this -- where can I get the GUI thing for some Unix? Image here.
There is a ton of documentation available from ibm. If you want the software on your local machine and are a student, you can get it through the academic initiative. If you want to try something different and are a student, you can get gurobi, which has a python interface you might like.
I would recommend you to look at the COIN-OR website here:
http://www.coin-or.org/
They provide well-documented libraries and solvers (I use CPLEX mostly, so I don't use those much, but it is well documented and looks really good).
CPLEX alone does a lot of things, but for a linear programming course you will probably only need a tool to solve linear programs, and maybe mixed-integer problems (MIP).
Have a look at CMPL from coin, this may be enough for you; if you need to write "real" programs, you will have to use a (C or C++) library. They provide CoinMP for MIPs, and Clp for linear programs (simplex, barrier algorithms).
I have also used GLPK (from the GNU project) for linear programs, but it performs poorly for MIP (the default branch-and-bound procedure is very simple), although it may be enough for your course:
http://www.gnu.org/software/glpk/
However, I don't really agree with you about the fact that CPLEX documentation is poor..
Python
I haven't tested CVXOPT but my teacher mocked it, apparently a bit buggy, more here.

General sparse iterative solver libraries

What are some of the better libraries for large sparse iterative (conjugate gradient, MINRES, GMRES, etc.) linear algebra system solving? I've often coded my own routines, but I'm interested to know which "off-the-shelf" packages people prefer. I've heard of PETSc, TAUCS, IML++, and a few others. I'm wondering how these stack up, and what else is out there. My preference is for ease of use, and freely available software.
Victor Eijkhout's Overview of Iterative Linear System Solver Packages would probably be a good place to start.
You may also wish to look at Trilinos
http://trilinos.sandia.gov/
It is designed by some great software craftsman, using modern
design techniques.
Moreover, from within Trilinos, you can call PetsC if you desire.
NIST has some sparse Linear Algebra software you can download
here: http://math.nist.gov/sparselib++/ and here: http://math.nist.gov/spblas/
I haven't used those packages myself, but I've heard good things about them.
http://www.cise.ufl.edu/research/sparse/umfpack/
UMFPACK is a set of routines for
solving unsymmetric sparse linear
systems, Ax=b, using the Unsymmetric
MultiFrontal method. Written in
ANSI/ISO C, with a MATLAB (Version 6.0
and later) interface. Appears as a
built-in routine (for lu, backslash,
and forward slash) in MATLAB. Includes
a MATLAB interface, a C-callable
interface, and a Fortran-callable
interface. Note that "UMFPACK" is
pronounced in two syllables, "Umph
Pack". It is not "You Em Ef Pack".
I'm using it for FEM code.
I would check out Microsoft's Solver Foundation. It's free to cheap for even pretty big problems. The unlimited version is industrial strength and is based on Gurobi and of course isn't cheap.
http://code.msdn.microsoft.com/solverfoundation

inverse FFT in shader language?

does anyone know an implementation of the inverse FFT in HLSL/GLSL/cg ... ?
It would save me much work.
Best,
heinrich
Do you already have a FFT implementation? You may already be aware, but the inverse can be computed by reversing the order of the N inputs, taking the FFT over those, and dividing the result by N.
DirectX11 comes with a FFT example for compute shaders (see DX11 August SDK Release Notes). As PereAllenWebb points out, this can be also used for inverse FFT.
Edit: If you just want a fast FFT, you could try the CUFFT, which runs on the GPU. It's part of the CUDA SDK. The AMCL from AMD also has a FFT, which is currently not GPU accelerated, but this will be likely added soon.
I implemented a 1D FFT on 7800GTX hardware back in 2005. This was before CUDA etc so I had to resort to using Cg and manually implementing the FFT.
I have two FFT implementations. One is a Radix2 Decimation in Time FFT and the other a Stockham Autosort FFT. The stockham would perform around 2-4x faster than a CPU (at the time 3GHz P4 single core) for larger sizes (> 8192) but for smaller sizes the CPU was faster as it doesn't have to shift data to/from the GPU.
If you're interested in the shader code feel free to contact me and I'll send it over by email. It was from a personal project so not covered by any commercial copyright. I would imagine that CUDA (and similar) implementations would massively outperform my implementation, however from a learning perspective you can't get better than to write or study the code yourself!
Maybe you could take a look at OpenCL which is a standard for general purpose computing on graphics (and other) hardware.
The wikipedia article contains a OpenCL example for a standard FFT:
http://en.wikipedia.org/wiki/OpenCL#Example
If you are on a Mac with OS X 10.6, you just need to install the developer tools to get started with OpenCL development.
I also heard that hardware vendors already provide basic OpenCL driver support on Windows.

Resources