What is the difference between 'precision' and 'accuracy'? - math

What is the difference between 'accurate' and 'precise' ?
If there is a difference, can you give an example of
a number that is accurate but not precise
a number that is precise but not accurate
a number that is both accurate and precise
Thanks!

Precision refers to how much information is conveyed by a number (in terms of number of digits) whereas accuracy is a measure of "correctness".
Let's take the π approximation 22/7, for our purposes, 3.142857143.
For your specific questions:
a number that is accurate but not precise: 3.14. That's certainly accurate in terms of closeness, given the precision available. There is no other number with three significant digits that is closer to the target (both 3.13 and 3.15 are further away from the real value).
a number that is precise but not accurate: 99999.12345678901234567890. That's much more precise since it conveys more information. Unfortunately its accuracy is way off since it's nowhere near the target value.
a number that is both accurate and precise: 3.142857143. You can get more precise (by tacking zeros on the end) but no more accurate.
Of course, that's if the target number is actually 3.142857143. If it's 22/7, then you can get more accurate and precise, since 3.142857143 * 7 = 22.000000001. The actual decimal number for that fraction is an infinitely repeating one (in base 10):
3 . 142857 142857 142857 142857 142857 ...
and so on, so you can keep adding precision and accuracy in that representation by continuing to repeat that group of six digits. Or, you can maximise both by just using 22/7.

One way to think of it is this:
A number that is "precise" has a lot of digits. But it might not be very correct.
A number that is "accurate" is correct, but may not have a lot of digits.
Examples:
3.14 is an "accurate" approximation to Pi. But it is not very precise.
3.13198408654198 is a very "precise" approximation to Pi, but it is not accurate,
3.14159265358979 is both accurate and precise.
So precision gives a lot of information. But says nothing about how correct it is.
Accuracy says how correct the information is, but says nothing about how much information there is.

Assume the exact time right now is 13:01:03.1234
Accurate but not precise - it's 13:00 +/- 0:05
Precise but not accurate - it's 13:15:01.1425
Accurate and precise - it's 13:01:03.1234

The standard example I always heard involved a dart board:
accurate but not precise: lots of darts scattered evenly all over the dart board
precise but not accurate: lots of darts concentrated in one spot of the dart board, that is not the bull's eye
both: lots of darts concentrated in the bull's eye
Accuracy is about getting the right answer. Precision is about repeatedly getting the same answer.

Precision and accuracy are defined by significant digits. Accuracy is defined by the number of significant digits while precision is identified by the location of the last significant digit. For instance the number 1234 is more accurate than 0.123 because 1234 had more significant digits. The number 0.123 is more precise because the 3 (last significant figure) is in the thousandths place. Both types of digits typically only relevant because they are the results of a measurement. For instance, you can have a decimal number that's exact such as 0.123 such as 123/1000 as defined, thus the discussion of precision has no real meaning because 0.123 was given or defined;however, if you were to measure something and come up with that value, then 0.123 indicates the precision of the tool used to measure it.
The real confusion occurs when combining these numbers such as adding, subtracting, multiply and dividing. For example, when adding two numbers that are the result of a measurement, the answer can only be as precise as the least precise number. Think of it as a chain is only as strong as its weakest link.

Accuracy are very often confused with precision but they are much different.
Accuracy is degree to which the measured value agrees with true value.
Example-Our objective is to make rod of 25mm And we are able to make it of 25 mm then it is accurate.
Precision is the repeatability of the measuring process.
Example-Our objective is to make 10 rods of 25mm and we make all rods of 24mm then we are precise as we make all rods of same size,but it is not accurate as true value is 25 mm.

Related

Optimize dataset for floating point add/sub/mul/div

Suppose we have a data set of numbers, with which we want to do some calculations using addition/subtraction/multiplication/division using a computer.
The coverage of the real numbers by the floating point representation varies a lot, depending on the number being represented:
In terms of absolute precision in the real->FP mapping the "holes" grow towards the bigger numbers, with a weird hole around 0, depending on the architecture. Due to this, the add/sub precision towards the bigger numbers will drop.
If we divide 2 consecutive numbers which are represented in our floating point representation, the result of the division will be bigger both while going to the bigger numbers and when going to smaller and smaller fractions.
So, my question is:
Is there a "sweet interval" for floats on an ordinary PC today, where the results for the arithmetics with the said operators (add/sub/mul/div) are just more precise?
If I have a data set of many-significant-digit numbers like "123123123123123", "134534513412351151", etc., with which I want to do some arithmetics, which floating point interval should it be converted to, to have the best precision for the result?
Since floating points are something like 1.xxx*10^yyy, 2.xxx*10^yyy, ..., 9.xxx*10^yyy, I would assume, converting my numbers into the [1, 9] interval would give the best results for the memory consumed, but I may be terribly wrong...
Suppose I use C, can such conversion even be made? Is there a best-practice to do that? Before an operation, C will convert the operands to the same format, so I guess I would have to use a string representation, inject a "." somewhere and parse that as float.
Please note:
This is a theoretical question, I don't have an actual data set on my hand that would decide what is best. On the same note, the mentioning of C was random, I am also interested in responses like "forget C, I would use this and this, BECAUSE it supports this and this".
Please spare me from answers like "this cannot be answered, because it depends on the actual operations, since the results may be in another magnitude range than the original data, etc., etc.". Let's suppose that the results of the calculation is more or less in the same interval, as the operands. Sure, when dividing the "more-or-less the same magnitude" operands, the result will be somewhere between 1-10, maybe 0.1-100, ... , but that is probably exactly the best interval they can be in.
Of course, if the answer includes some explanation, other than a brush-off, I will be happy to read it!
The absolute precision of floating-point numbers changes with the magnitude of the numbers because the exponent changes. The relative precision does not change, except for numbers near the bottom of the exponent range, where underflow occurs. If you multiply binary floating-point numbers by a power of two, perform arithmetic (suitably adjusted for the scaling), and reverse the scaling, the results will be identical to doing the arithmetic without scaling, barring effects from overflow and underflow. If your arithmetic does involve underflow or overflow, then scaling could help avoid that. For example, if your precision is suffering because your numbers are so small that some intermediate results are below the normal range of the floating-point format, then scaling by a power of two can avoid the loss of precision from underflow.
If you scale by something other than a power of two, the results can be different, due to changes in the significands. The effects will generally be tiny, and whether the results are better or worse will effectively be random chance, except in carefully engineered special situations.

Largest number of bits in error that is guaranteed to be detected in CRC

I have a few questions about CRC:
How can I tell, for a given CRC polynom and a n-bits data, what is the largest number of bits in error that is guaranteed to be detected?
Is it true that ALWAYS - the bigger the polynom degree, the more errors that can be detected from that CRC?
Thanks!
I will assume that "How many errors can it detect" is the largest number of bits in error that is always guaranteed to be detected.
You would need to search for the minimum weight codeword of length n for that polynomial, also referred to as its Hamming distance. The number of bit errors that that CRC is guaranteed to detect is one less than that minimum weight. There is no alternative to what essentially amounts to a brute-force search. See Table 1 in this paper by Koopman for some results. As an example, for messages 2974 or fewer bits in length, the standard CRC-32 has a Hamming distance of no less than 5, so any set of 4 bit errors would be detected.
Not necessarily. Polynomials can vary widely in quality. Given the performance of a polynomial on a particular message length, you can always find a longer polynomial that has worse performance. However a well-chosen longer polynomial should in general perform better than a well-chosen shorter polynomial. However even there, for long message lengths you may find that both have a Hamming distance of two.
It's not a question of 'how many'. It's a question of what proportion and what kind. 'How many' depends on those things and on the length of the input.
Yes.

OpenCL reduction result wrong with large floats

I used AMD's two-stage reduction example to compute the sum of all numbers from 0 to 65 536 using floating point precision. Unfortunately, the result is not correct. However, when I modify my code, so that I compute the sum of 65 536 smaller numbers (for example 1), the result is correct.
I couldn't find any error in the code. Is it possible that I am getting wrong results, because of the float type? If this is the case, what is the best approach to solve the issue?
This is a "side effect" of summing floating point numbers using finite precision CPU's or GPU's. The accuracy depends the algorithm and the order the values are summed. The theory and practice behind is explained in Nicholas J, Higham's paper
The Accuracy of Floating Point Summation
http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=7AECC0D6458288CD6E4488AD63A33D5D?doi=10.1.1.43.3535&rep=rep1&type=pdf
The fix is to use a smarter algorithm like the Kahan Summation Algorithm
https://en.wikipedia.org/wiki/Kahan_summation_algorithm
And the Higham paper has some alternatives too.
This problem illustrates the nature of benchmarking, the first rule of the benchmark is to get the
right answer, using realistic data!
There is probably no error in the coding of your kernel or host application. The issue is with the single-precision floating point.
The correct sum is: 65537 * 32768 = 2147516416, and it takes 31 bits to represent it in binary (10000000000000001000000000000000). 32-bit floats can only hold integers accurately up to 2^24.
"Any integer with absolute value less than [2^24] can be exactly represented in the single precision format"
"Floating Point" article, wikipedia
This is why you are getting the correct sum when it is less than or equal to 2^24. If you are doing a complete sum using single-precision, you will eventually lose accuracy no matter which device you are executing the kernel on. There are a few things you can do to get the correct answer:
use double instead of float if your platform supports it
use int or unsigned int
sum a smaller set of numbers eg: 0+1+2+...+4095+4096 = (2^23 + 2^11)
Read more about single precision here.

Understanding Floating point precision analysis for Parallel Reduction

I am trying to analyze how reduction (parallel) can be used to add a large array of floating point numbers and precision loss involved in it. Definitely reduction will help in getting more precision compared to serial addition . I'll be really thankful if you can direct me to some detailed source or provide some insight for this analysis. Thanks.
Every primitive floating point operation will have a rounding error; if the result is x then the rounding error is <= c * abs (x) for some rather small constant c > 0.
If you add 1000 numbers, that takes 999 additions. Each addition has a result and a rounding error. The rounding error is small when the result is small. So you want to adjust the order of additions so that the average absolute value of the result is as small as possible. A binary tree is one method. Sorting the values, then adding the smallest two numbers and putting the result back into the sorted list is also quite reasonable. Both methods keep the average result small, and therefore keep the rounding error small.

How do you correctly calculate a running average of large sets of numbers with 32-bit floating point?

I'm writing a path tracer and have to collect an average over a large number of samples per pixel. I get significant visual differences between a 1024-samples run and a 16384-samples run; the 16384-samples run is darker. I conjecture that this is because the 16384-samples image runs into floating-point precision errors. I average the color values by dividing each value by 16384, then adding them together.
Is there a way to average a large, difficult to compute set of numbers with a known magnitude, while minimizing rounding error? Preferably without requiring non-constant memory, and definitely without discarding any samples?
You probably want the Kahan summation algorithm. This is a simple and effective way of minimising cumulative rounding errors when summing a large number of points with finite precision floating point arithmetic.
Since you're dividing by a power of 2 and your numbers are not ultra-small, this step shouldn't have any influence on the accuracy of the result. You're just subtracting 14 from the exponent.
What counts is the number of bits in your samples.
Floats give you 24 bits of precision. If you have 2^14 = 16384 samples, then when you add them all up, you'll gradually lose precision until at some point anyhing after the 24-14=10th bit gets lost. In other words: at that point you're only keeping about 3 decimal digits.
Would it be possible to use an int as an accumulator, or even a uint? That way you'll keep 8 extra bits, twice as much as the difference between 1024 and 16384 samples.
There is a second, entirely different option. I don't know what the range of your samples is, but if they are around the same size, you can subtract an approximate average from each value, average the differences, and add the approximate average back on at the end.
How much you gain by this method depends on how good your initial approximation of the average is, and how close the values are to the average. So I'd say it's probably less reliable in your situation.

Resources