Sum row that does not give exactly zero, when it should [duplicate] - r

This question already has answers here:
Why are these numbers not equal?
(6 answers)
Closed 5 years ago.
I am doing a simple row sum and two columns give me 0 (which is the number it should give), but the last one gives an epsilon, but not zero per se.
# generate the row values that their sumation should give zero.
d<-0.8
c<-1-d
a<-0.5
b<-0.5
e<-0.2
f<-1-e
Perc<-c(-1, a,b,c,-1,d,e,f,-1)
# Put them in a 3x3 matrix
div<-matrix(ncol = 3, byrow = TRUE,Perc)
# Do the row sum
rowSums(div)
# RESULT
[1] 0.000000e+00 0.000000e+00 5.551115e-17
rowSums(div)[3]==0
[1] FALSE
I am using this version of R: version 3.4.1 (2017-06-30) -- "Single Candle"
Any idea why ? and how i can fix this?

This happens because the machines can't store decimal numbers exactly. There can be a small error for some numbers.
The fix here is to use the all.equal function. It takes the tolerance level of the machine into account when comparing two numbers.
all.equal(sum(div[3, ]), 0)
TRUE

Related

R factorial() is not yielding the correct result [duplicate]

This question already has answers here:
Why are these numbers not equal?
(6 answers)
Closed 3 days ago.
I am trying to calculate the factorial of 52 in R. Astonishingly, I am getting contradicting results.
aaa<-1*2*3*4*5*6*7*8*9*10*11*12*13*14*15*16*17*18*19*20*21*22*23*24*25*26*27*28*29*30*31*32*33*34*35*36*37*38*39*40*41*42*43*44*45*46*47*48*49*50*51*52
bbb<-factorial(52)
aaa
[1] 80658175170943876845634591553351679477960544579306048386139594686464
bbb
[1] 80658175170944942408940349866698506766127860028660283290685487972352
aaa==bbb #False
What am I doing wrong?
This is a well known problem in computing with large numbers; R uses double-precision floating-point, the precision of which may vary by machine. Thats why you are getting multiple results across methods (including the online calculator in your comments). If you want to change your precision (in bits), one option is to use the Rmpfr package:
Rmpfr::mpfr(factorial(52),
6) # six bits
#1 'mpfr' number of precision 6 bits
#[1] 8.09e+67
Rmpfr::mpfr(factorial(52),
8) # eight bits
#1 'mpfr' number of precision 8 bits
#[1] 8.046e+67
This will allow you to obtain a result with the same value:
x <-Rmpfr::mpfr(1*2*3*4*5*6*7*8*9*10*11*12*13*14*15*16*17*18*19*20*21*22*23*24*25*26*27*28*29*30*31*32*33*34*35*36*37*38*39*40*41*42*43*44*45*46*47*48*49*50*51*52,
8)
y <- Rmpfr::mpfr(factorial(52),
8)
x == y
#[1] TRUE

R (and python) rounding up starting at 5 or 6 [duplicate]

This question already has answers here:
Round up from .5
(7 answers)
Closed 4 years ago.
I saw already a question with very large number of decimal digits R rounding explanation.
round(62.495, digits=2)
gives me 62.49. I would expect already 62.5, but it seems, R (3.4.3, 3.5.0) rounds up only starting at 6, e.g.,
round(62.485, 2) == 62.48
round(62.486, 2) == 62.49.
For other reasons, I am using the option
options(digits.secs=6)
From what I have learnt, one rounds up starting at 5. I tested also with Python and Matlab. Matlab rounds up, Python 3.5.4 down.
How can I change the behaviour or is this definition different, e.g. between Europe and US?
This is a floating point representation issue, 62.495 is actually represented by a slightly smaller number which then gets rounded downwards.
print(62.495,digits=22)
[1] 62.49499999999999744205
R's rounding is statistical rounding, or round half to even. It should round halves up or down to an even number, eg
round(0.5) # rounds the half down to 0
[1] 0
round(1.5) # rounds the half up to 2
[1] 2

Logical Operator on Matrix Entries - R [duplicate]

This question already has answers here:
Why are these numbers not equal?
(6 answers)
Closed 5 years ago.
I have a 1 column matrix with labels and a numeric vector.
I want to extract the labels in the matrix which are equal to one of the entries in that vector, more specifically:
> mat
[,1]
intercept 20.86636535
crim -0.23802478
zn 0.03822050
indus 0.05135584
chas 2.43504780
> vec
[1] -0.23802478 0.05135584
> mat[2, 1] == vec[1]
crim
FALSE
Currently I'm stuck with the first step. I have no idea why it returns FALSE while they hold the same numeric values.
I'd use round(as.numeric(mat[,2, drop=T]), 5) %in% round(vec, 5)
, as there may well be floating point issues.
Doing so yields:
[1] FALSE TRUE FALSE TRUE FALSE
Basically, you need to turn the second column into a vector (using drop=T) and then turn it from a character to a numeric. The rounding (in this case, to 5 decimal places) then bridges the floating point problem that I mentioned before (along with David Arenburg).
I hope that helps you.

Logic regarding summation of decimals [duplicate]

This question already has answers here:
Why are these numbers not equal?
(6 answers)
Closed 8 years ago.
Does the last statement in this series of statements make logical sense to anybody else? R seems to give similar results for a small subset of possible sums of decimals under 1. I cannot recall any basic mathematical principles that would make this true, but it seems to be unlikely to be an error.
> 0.4+0.6
[1] 1
> 0.4+0.6==1.0
[1] TRUE
> 0.3+0.6
[1] 0.9
> 0.3+0.6==0.9
[1] FALSE
Try typing 0.3+0.6-0.9, on my system the result is -1.110223e-16 this is because the computer doesn't actually sum them as decimal numbers, it stores binary approximations, and sums those. And none of those numbers can be exactly represented in binary, so there is a small amount of error present in the calculations, and apparently it's small enough not to matter in the first one, but not the second.
Floating point arithmetic is not exact, but the == operator is. Use all.equal to compare two floating point values in R.
isTRUE(all.equal(0.3+0.6, 0.9))
You can also define a tolerance when calling all.equals.
isTRUE(all.equal(0.3+0.6, 0.9, tolerance = 0.001))

Why did I obtain wrong answer when using "=="? [duplicate]

This question already has answers here:
Why are these numbers not equal?
(6 answers)
Closed 9 years ago.
If I type:
x<-seq(0,20,.05)
x[30]
x[30]==1.45
Why do I obtain a False from the last line of code? What did I do wrong here?
This question has been asked a million times, albeit in different forms. This is due to floating point inaccuracy. Also here's another link on floating point errors you may want to catch up on!
Try this to first see what's going on:
x <- seq(0, 20, 0.5)
sprintf("%.20f", x[30]) # convert value to string with 20 decimal places
# [1] "14.50000000000000000000"
x[30] == 14.5
# [1] TRUE
All is well so far. Now, try this:
x <- seq(0, 20, 0.05)
sprintf("%.20f", x[30]) # convert value to string with 20 decimal places
# [1] "1.45000000000000017764"
x[30] == 1.45
# [1] FALSE
You can see that the machine is able to accurately represent this number only up to certain digits. Here, up to 15 digits or so. So, by directly comparing the results, you get of course a FALSE. Instead what you could do is to use all.equal which has a parameter for tolerance which equals .Machine$double.eps ^ 0.5. On my machine this evaluates to 1.490116e-08. This means if the absolute difference between the numbers x[30] and 1.45... is < this threshold, then all.equal evaluates this to TRUE.
all.equal(x[30], 1.45)
[1] TRUE
Another way of doing this is to explicitly check with a specific threshold (as #eddi's answer shows it).
This has to do with the fact that these are double's, and the correct way of comparing double's in any language is to do something like:
abs(x[30] - 1.45) < 1e-8 # or whatever precision you think is appropriate

Resources