I am trying to parallelize an object localization algorithm on a GPU during my internship. But the OpenCL maximum floating point precision became quite problematic.
The reference algorithm is implemented using double precision all the time especially for the SVM classifier and descriptors. Mine is implemented in single precision causing errors. I checked my normalized errors and I got what I expected at several points of my program (10^-6).
However these errors become much more importants after the classification step of the process.
Is there any way to simulate double precision values with a GPU supporting only single precision ?
PS : I can use double precision on my GPU (Nvidia GTS450) but the program will be tested on several platforms with much less power which probably means no double precision.
This might be of interest to you: http://www.bealto.com/mp-mandelbrot_fp128-opencl.html
Related
I have my own game engine which is written in opengl and c++. I also have my own math library for matrix and vectors manipulations. I always had doubts about the performance of my math library, so I recently decided to search for some popular math library which is used by many game / graphics developers. I was surprised that I couldn't find anything.
People on stackoverflow suggested GLM and Eigen libraries in similar posts, so I made some performance tests. I multiplied 1000000 times two matrices 4x4, and here are results:
GLM: 4.23 seconds
Eigen: 12.57 seconds
My library: 0.25 seconds
I was surprised by these results, because my implementation of multiplying matrices is taken from wikipedia. I checked the code from glm and eigen and I found, that there is a lot of typedefs, assertions and other type checking, unnecessary code, which decrease performance a lot.
So, my question is:
Do you know any FAST math library with nice API for gamedev / graphics purpose? I need functionality like: creating translation, rotation, projection, matrix * matrix, inverse, look at, matrix * vector, quaternions, etc...
I checked the code from glm and eigen and I found, that there is a lot of typedefs, assertions and other type checking, unnecessary code, which decrease performance a lot.
Are you sure that you have done all this benchmarks using higher compiler optimization ON ?
And not for example using Debug settings ?
Another alternative would be also MathFu from Google.
http://google.github.io/mathfu/
Are there any software tools for performing arithmetic on very large numbers in parallel? What I mean by parallel is that I want to use all available cores on my computer for this.
The constraints are wide open for me. I don't mind trying any language or tech.
Please and thanks.
It seems like you are either dividing really huge numbers, or are using a suboptimal algorithm. Parallelizing things to a fixed number of cores will only tweak the constants, but have no effect on the asymptotic behavior of your operation. And if you're talking about hours for a single division, asymptotic behavior is what matters most. So I suggest you first make sure sure your asymptotic complexity is as good as can be, and then start looking for ways to improve the constants, perhaps by parallelizing.
Wikipedia suggests Barrett division, and GMP has a variant of that. I'm not sure whether what you've tried so far is on a similar level, but unless you are sure that it is, I'd give GMP a try.
See also Parallel Modular Multiplication on Multi-core Processors for recent research. Haven't read into that myself, though.
The only effort I am aware of is a CUDA library called CUMP. However, the library only provides support for addition, subtraction and multiplication. Anyway, you can use multiplication to perform the division on the GPU and check if the quality of the result is enough for your particular problem.
It's fairly well-known that ATLAS uses "blocking" or "tiling" versions of matrix computation algorithms, which substantially improve performance.
It also appears that ATLAS has some architectural defaults, which have been computed manually. And it's possible to do a search to determine other values for NB (a #define macro which I believe stands for number of blocks).
But how does it work? How are the values determined? Do the algorithms just run a bunch of times with different values, Monte Carlo style, until some kind of optimum is found?
Here's a hypothetical, too. Let's say you copied a blocked ATLAS algorithm into C++ templates and had a 128-bit rational type. Could you derive NB for the rational version of the algorithm in some way from an ATLAS-tuned NB value from the double version of the algorithm?
I need to program something that calculates a number to arbitrary precision...
but I need it to output the digits that are already "certain" (ie below some error bound) to a file so that there are digits to work on while the program keeps running.
Also, most libraries for arbitrary precision seem to require a fixed precision, but what if I wanted dynamic precision, ie, it would go on and on...
Most algorithms that calculate a number to extended precision require that all intermediate calculations are done to a somewhat higher precision to guarantee accurate results. You normally specify your final desired precision and that's the result that you get. If you want to output the "known" accurate digits during the calculation, you'll generally need to implement the algorithm and track the accurate digits yourself.
Without knowing what number you want to calculate, I can't offer any better suggestions.
GMP/MPIR only support very basic floating point calculations. MPFR, which requires either GMP or MPIR, provides a much broader set of floating point operations.
My advice is to use MPIR. It's a fork of GMP but with (in my opinion) a more helpful and developer-friendly crew.
Many numerical algorithms tend to run on 32/64bit floating points.
However, what if you had access to lower precision (and less power hungry) co-processors? How can then be utilized in numerical algorithms?
Does anyone know of good books/articles that address these issues?
Thanks!
Numerical analysis theory uses methods to predict the precision error of operations, independent of the machine they are running on. There are always cases where even on the most advanced processor operations may lose accuracy.
Some books to read about it:
Accuracy and Stability of Numerical Algorithms by N.J. Higham
An Introduction to Numerical Analysis by E. Süli and D. Mayers
If you cant find them or are too lazy to read them tell me and i will try to explain some things to you. (Well im no expert in this because im a Computer Scientist, but i think i can explain you the basics)
I hope you understand what i wrote (my english is not the best).
Most of what you are likely to find will be about doing floating-point arithmetic on computers irrespective of the size of the representation of the numbers themselves. The basic issues surround f-p arithmetic apply whatever the number of bits. Off the top of my head these basic issues will be:
range and accuracy of numbers that are represented;
careful selection of algorithms which are robust and reliable on f-p numbers rather than on real numbers;
the perils and pitfalls of iterative and lengthy calculations in which you run the risk of losing precision and accuracy.
In general, the fewer bits you have the sooner you run into problems, but just as there are algorithms which are useful in 32 bits, there are algorithms which are useful in 8 bits. Sometimes the same algorithm is useful however many bits you use.
As #George suggested, you should probably start with a basic text on numerical analysis, though I think the Higham book is not a basic text.
Regards
Mark