I'm trying to assess the expected performance of calculating trigonometry functions as a function of the required precision. Obviously the wall clock time depends on the speed of the underlying arithmetic, so factoring that out by just counting number of operations:
Using state-of-the-art algorithms, how many arithmetic operations (add, subtract, multiply, divide) should it take to calculate sin(x), as a function of the number of bits (or decimal digits) of precision required in the output?
... to assess the expected performance of calculating trigonometry functions as a function of the required precision.
Look as the first omitted term in the Taylor series sine for x = π/4 as the order of error.
Details: sin(x) usually has these phases:
Handling special cases: NaN, infinities.
Argument reduction to the primary range to say [-π/4...+π/4]. Real good reduction is hard as π is irrational and so involves code that reaches 50% of sin() time. Much time used to emulate the needed extended precision. (Research K.C. Ng's "ARGUMENT REDUCTION FOR HUGE ARGUMENTS: Good to the Last Bit")
Low quality reduction involves much less:/, truncate, -, *.
Calculation over a limited range. This is what many only consider. If done with a Taylor's series and needing 53 bits, then about 10-11 terms are needed: Taylor series sine. Yet quality code often uses a pair of crafted polynomials, each of about 4-5 terms, to form the quotient p(x)/q(x).
Of course dedicated hardware support in any of these steps greatly increases performance.
Note: code for sin() is often paired with cos() code as extensive use of trig identities simplify the calculation.
I'd expect a software solution for sin() to cost on the order of 25x a common *. This is a rough estimate.
To achieve a very low error rate in the ULP, code typically uses a tad more. sine_crap() could get by with only a few terms. So when assessing time performance, there is a trade-off with correctness. How good a sin() do you want?
assess the expected performance of calculating trigonometry functions as a function of the required precision
Using the Taylors series as a predictor of the number of ops, worst case x = π/4 (45°) and the error in the calculation on the order of the last term of the series:
For 32-bit float, order 6 float ops needed.
For 64-bit double, order 9 float ops needed.
So if time scales by the square of the FP width, double predicted to take 9/6*2*2 or 6 times as long.
We can calculate any trigonometric function using a simple right angled triangle or using the McLaurin\Taylor Series. So it really depends on which one you choose to implement. If you only pass an angle as an argument, and wish to calculate the sin of that particular angle, it would take about 4 to 6 steps to calculate the sin using an unit circle.
Related
I would like to have a function f(x) that gives good pseudo-random numbers in uniform distribution according to value x. I am aware of linear congruential generators, however these work in iterations, i.e. I provide the initial seed and then I get a sequence of random values one by one. This is not what I want, because if a want to get let's say 200000th number in the sequence, I have to compute numbers 1 ... 199999. I need a function that is given by one simple formula that uses basic operations such as +, *, mod, etc. I am also aware of hash functions but I didn't find any that suits these needs. I might come up with some function myself, but I'd like to use something that's been tested to give decent pseudo-random values. Is there anything like that being used?
You might consider multiplicative congruential generators. These are linear congruentials without the additive constant: Xi+1 = aXi % c for suitable constants a and c. Expanding this out for a few iterations will convince you that Xk = akX0 % c, where X0 is your seed value. This can be calculated in O(log(k)) time using fast modular exponentiation. No need to calculate the first 199,999 to get the 200,000th value, you can find it in something proportional to about 18 steps.
Actually, for LCG with additive constant it works as well. There is a paper by F. Brown, "Random Number Generation with Arbitrary Stride", Trans. Am. Nucl. Soc. (Nov. 1994). Based on this paper there is reasonable LCG with decent quality and log2(N) skip-ahead feature, used by well-known Monte Carlo package MCNP5. C++ post is here https://github.com/Iwan-Zotow/LCG-PLE63/. Further development if this idea (RNG with logarithmic skip-ahead) is pretty decent family of generators at http://www.pcg-random.org/
You could use a simple encryption algorithm that can encrypt the numbers 1, 2, 3, ... Since encryption is reversible, each input number will have a unique output. The 200000th number in your sequence is encrypt(key, 200000). Use DES for 64 bit numbers, AES for 128 bit numbers and you can roll your own simple Feistel cipher for 32 bit or 16 bit numbers.
I used AMD's two-stage reduction example to compute the sum of all numbers from 0 to 65 536 using floating point precision. Unfortunately, the result is not correct. However, when I modify my code, so that I compute the sum of 65 536 smaller numbers (for example 1), the result is correct.
I couldn't find any error in the code. Is it possible that I am getting wrong results, because of the float type? If this is the case, what is the best approach to solve the issue?
This is a "side effect" of summing floating point numbers using finite precision CPU's or GPU's. The accuracy depends the algorithm and the order the values are summed. The theory and practice behind is explained in Nicholas J, Higham's paper
The Accuracy of Floating Point Summation
http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=7AECC0D6458288CD6E4488AD63A33D5D?doi=10.1.1.43.3535&rep=rep1&type=pdf
The fix is to use a smarter algorithm like the Kahan Summation Algorithm
https://en.wikipedia.org/wiki/Kahan_summation_algorithm
And the Higham paper has some alternatives too.
This problem illustrates the nature of benchmarking, the first rule of the benchmark is to get the
right answer, using realistic data!
There is probably no error in the coding of your kernel or host application. The issue is with the single-precision floating point.
The correct sum is: 65537 * 32768 = 2147516416, and it takes 31 bits to represent it in binary (10000000000000001000000000000000). 32-bit floats can only hold integers accurately up to 2^24.
"Any integer with absolute value less than [2^24] can be exactly represented in the single precision format"
"Floating Point" article, wikipedia
This is why you are getting the correct sum when it is less than or equal to 2^24. If you are doing a complete sum using single-precision, you will eventually lose accuracy no matter which device you are executing the kernel on. There are a few things you can do to get the correct answer:
use double instead of float if your platform supports it
use int or unsigned int
sum a smaller set of numbers eg: 0+1+2+...+4095+4096 = (2^23 + 2^11)
Read more about single precision here.
How do hardware implementations of a floating-point square root work? Which algorithm would they use and can anyone provide links to verilog/vhdl implementations?
AFAIK, either a digit-recurrence algorithm (little resource) or Newton's iteration on the reciprocal square root (needs other operators: adder, multiplier, or FMA).
Concerning Newton's iteration, the choice of the initial approximation is not obvious. See Kornerup and Muller's article Choosing starting values for certain Newton–Raphson iterations.
You get the best bang for the money by implementing an approximation for 1 / sqrt (x) in hardware, giving maybe ten or twelve bits of precision, like Intel processors do. Then you use good old Newton iteration to improve that approximation using add/subtract/multiply only, and multiply the last approximation by x.
Alternatively, consider that calculating the square root of x is the same as dividing x by the square root of x. You can implement something very similar to a division, giving one bit of precision each time, except that the number you are dividing by changes in every iteration.
(I'm not sure whether I should post this problem on this site or on the math site. Please feel free to migrate this post if necessary.)
My problem at hand is that given a value of k I'd like to numerically compute a rational function of nonlinear polynomials in k which looks like the following: (sorry I don't know how to typeset equations here...)
where {a_0, ..., a_N; b_0, ..., b_N} are complex constants, {u_0, ..., u_N, v_0, ..., v_N} are real constants and i is the imaginary number. I learned from Numerical Recipes that there are whole bunch of ways to compute polynomials quickly, in the meanwhile keeping the rounding error small enough, if all coefficients were constant. But I do not think those ideas are useful in my case since the exponential prefactors also depend on k.
Currently I calculate it in a brute force way in C with complex.h (this is just a pseudo code):
double complex function(double k)
{
return (a_0+a_1*cexp(I*u_1*k)*k+a_2*cexp(I*u_2*k)*k*k+...)/(b_0+b_1*cexp(I*v_1*k)*k+v_2*cexp(I*v_2*k)*k*k+...);
}
However when the number of calls of function increases (because this is just a part of my real calculation), it is very slow and inaccurate (only 6 valid digits). I appreciate any comments and/or suggestions.
I trust that this isn't a homework assignment!
Normally the trick is to use a loop add the next coefficient to the running sum, and multiply by k. However, in your case, I think the "e" term in the coefficient is going to overwhelm any savings by factoring out k. You can still do it, but the savings will probably be small.
Is u_i a constant? Depending on how many times you need to run this formula, maybe you could premultiply u_i * k (unless k changes each run). It's been so many decades since I took a Numerical Analysis course that I have only vague recollections of the tricks of the trade. Let's see... is e^(i*u_i*k) the same as (e^(i*u_i))^k? I don't remember the rules on imaginary numbers, or whether you'll save anything since you've got a real^real (assuming k is real) anyway (internally done using e^power).
If you're getting only 6 digits, that suggests that your math, and maybe your library, is working in single precision (32 bit) reals. Check your library and check your declarations that you are using at least double precision (64 bit) reals everywhere.
It's clear that one shouldn't use floating precision when working with, say, monetary amounts since the variation in precision leads to inaccuracies when doing calculations with that amount.
That said, what are use cases when that is acceptable? And, what are the general principles one should have in mind when deciding?
Floating point numbers should be used for what they were designed for: computations where what you want is a fixed precision, and you only care that your answer is accurate to within a certain tolerance. If you need an exact answer in all cases, you're best using something else.
Here are three domains where you might use floating point:
Scientific Simulations
Science apps require a lot of number crunching, and often use sophisticated numerical methods to solve systems of differential equations. You're typically talking double-precision floating point here.
Games
Think of games as a simulation where it's ok to cheat. If the physics is "good enough" to seem real then it's ok for games, and you can make up in user experience what you're missing in terms of accuracy. Games usually use single-precision floating point.
Stats
Like science apps, statistical methods need a lot of floating point. A lot of the numerical methods are the same; the application domain is just different. You find a lot of statistics and monte carlo simulations in financial applications and in any field where you're analyzing a lot of survey data.
Floating point isn't trivial, and for most business applications you really don't need to know all these subtleties. You're fine just knowing that you can't represent some decimal numbers exactly in floating point, and that you should be sure to use some decimal type for prices and things like that.
If you really want to get into the details and understand all the tradeoffs and pitfalls, check out the classic What Every Programmer Should Know About Floating Point, or pick up a book on Numerical Analysis or Applied Numerical Linear Algebra if you're really adventurous.
I'm guessing you mean "floating point" here. The answer is, basically, any time the quantities involved are approximate, measured, rather than precise; any time the quantities involved are larger than can be conveniently represented precisely on the underlying machine; any time the need for computational speed overwhelms exact precision; and any time the appropriate precision can be maintained without other complexities.
For more details of this, you really need to read a numerical analysis book.
Short story is that if you need exact calculations, DO NOT USE floating point.
Don't use floating point numbers as loop indices: Don't get caught doing:
for ( d = 0.1; d < 1.0; d+=0.1)
{ /* Some Code... */ }
You will be surprised.
Don't use floating point numbers as keys to any sort of map because you can never count on equality behaving like you may expect.
Most real-world quantities are inexact, and typically we know their numeric properties with a lot less precision than a typical floating-point value. In almost all cases, the C types float and double are good enough.
It is necessary to know some of the pitfalls. For example, testing two floating-point numbers for equality is usually not what you want, since all it takes is a single bit of inaccuracy to make the comparison non-equal. tgamblin has provided some good references.
The usual exception is money, which is calculated exactly according to certain conventions that don't translate well to binary representations. Part of this is the constants used: you'll never see a pi% interest rate, or a 22/7% interest rate, but you might well see a 3.14% interest rate. In other words, the numbers used are typically expressed in exact decimal fractions, not all of which are exact binary fractions. Further, the rounding in calculations is governed by conventions that also don't translate well into binary. This makes it extremely difficult to precisely duplicate financial calculations with standard floating point, and therefore people use other methods for them.
It's appropriate to use floating point types when dealing with scientific or statistical calculations. These will invariably only have, say, 3-8 significant digits of accuracy.
As to whether to use single or double precision floating point types, this depends on your need for accuracy and how many significant digits you need. Typically though people just end up using doubles unless they have a good reason not to.
For example if you measure distance or weight or any physical quantity like that the number you come up with isn't exact: it has a certain number of significant digits based on the accuracy of your instruments and your measurements.
For calculations involving anything like this, floating point numbers are appropriate.
Also, if you're dealing with irrational numbers floating point types are appropriate (and really your only choice) eg linear algebra where you deal with square roots a lot.
Money is different because you typically need to be exact and every digit is significant.
I think you should ask the other way around: when should you not use floating point. For most numerical tasks, floating point is the preferred data type, as you can (almost) forget about overflow and other kind of problems typically encountered with integer types.
One way to look at floating point data type is that the precision is independent of the dynamic, that is whether the number is very small of very big (within an acceptable range of course), the number of meaningful digits is approximately the same.
One drawback is that floating point numbers have some surprising properties, like x == x can be False (if x is nan), they do not follow most mathematical rules (distributivity, that is x( y + z) != xy + xz). Depending on the values for z, y, and z, this can matters.
From Wikipedia:
Floating-point arithmetic is at its
best when it is simply being used to
measure real-world quantities over a
wide range of scales (such as the
orbital period of Io or the mass of
the proton), and at its worst when it
is expected to model the interactions
of quantities expressed as decimal
strings that are expected to be exact.
Floating point is fast but inexact. If that is an acceptable trade off, use floating point.