Error in either `/` or round functions from base R [duplicate] - r

This question already has answers here:
Why are these numbers not equal?
(6 answers)
Closed 3 years ago.
I have been trying to make a contribution to the data.table package by adding the round function to the ITime class, when i came across a rather odd discrepancy produced by the round function. Behind the scenes, an object of class ITime is just an integer vector with pretty formatting, and thus unclass(object) provides an integer vector.
Rounding this integer vector to the nearest minute can thus be done like this:
x <- as.ITime(seq(as.POSIXct("2020-01-01 07:00:00"), as.POSIXct("2020-01-01 07:10:00"), "30 sec"))
round(unclass(x) / 60L) * 60L
# or
round(as.integer(x) / 60L) * 60L
Here is where the problem comes...
When I do this operation, I would expect any instance of unclass(x) / 60 that ends with .5 to be rounded up. However, that is not the case!
I have tried the example on both Windows and Mac on two different computers with the same result. Does anyone have an idea as to why this would happen?
** FYI I know that this particular problem can be solved differently: unclass(x) %/% 60L. But my interest is in why the round function does not work as expected.

?round:
‘round’ rounds the values in its first argument to the specified
number of decimal places (default 0). See ‘Details’ about “round
to even” when rounding off a 5.
[...]
Note that for rounding off a 5, the IEC 60559 standard (see also
‘IEEE 754’) is expected to be used, ‘_go to the even digit_’.
Therefore ‘round(0.5)’ is ‘0’ and ‘round(-1.5)’ is ‘-2’. However,
this is dependent on OS services and on representation error
(since e.g. ‘0.15’ is not represented exactly, the rounding rule
applies to the represented number and not to the printed number,
and so ‘round(0.15, 1)’ could be either ‘0.1’ or ‘0.2’).

Related

Round numbers in R correctly [duplicate]

This question already has answers here:
Round up from .5
(7 answers)
Closed 1 year ago.
There are a number of threads about this question. None seems to answer the simple question: why does R round incorrectly and how can I let it round correctly?
Correct rounding to the i-th decimal x considers the i+1-th decimal. If it is 5 or larger then x is is set to x+1. If it is 4 or smaller then x is returned. For example 1.45 is rounded to the first decimal as 1.5. 1.44 is rounded 1.4. However, in R
> round(1.45,1)
[1] 1.4
But
> round(1.46,1)
[1] 1.5
So it changes the convention to 'if the i+1th decimal is 6 or larger, then x is set to x+1'. Why? And how can I change this to the convention I am familiar with?
Most decimal fractions are not exactly representable in binary double precision
Learned here: https://stat.ethz.ch/R-manual/R-devel/library/base/html/Round.html
Section "Warnings":
Rounding to decimal digits in binary arithmetic is non-trivial (when digits != 0) and may be surprising. Be aware that most decimal fractions are not exactly representable in binary double precision. In R 4.0.0, the algorithm for round(x, d), for d > 0, has been improved to measure and round “to nearest even”, contrary to earlier versions of R (or also to sprintf() or format() based rounding).

R not calculating large cubes correctly?

There's been some news lately about the discovery of three cubes that sum to 42. Namely, Andrew Sutherland and Andrew Booker discovered that (-80538738812075974)^3 + 80435758145817515^3 + 12602123297335631^3=42
(https://math.mit.edu/~drew/)
I was tinkering with this a bit, and I'm not getting 42 in R.
I do get it in other places (WolframAlpha), but R gives me this:
> (-80538738812075974)^3 + 80435758145817515^3 + 12602123297335631^3
[1] 1.992544e+35
Any idea what I'm doing wrong? Is it a limitation with large numbers in R? Or am I (very probably) just doing something dumb?
UPDATE
As pointed out by #MrFlick, this is a well-known floating point arithmetic issue. R stores large numbers in your example as double precision numbers.
Also, be aware of integer overflow. See a related discussion here. Note that base R will not throw an error (just a warning) when integer overflow occurs.The bit64 package may help sometimes, but won't do the job in your case the accuracy it provides is still not enough.
If you want to calculate with large (integer) numbers in R you can use the gmp library like:
library(gmp)
as.bigz("-80538738812075974")^3 + as.bigz("80435758145817515")^3 + as.bigz("12602123297335631")^3
#[1] 42
#or even better use also Integer in the exponent:
as.bigz("-80538738812075974")^3L + as.bigz("80435758145817515")^3L + as.bigz("12602123297335631")^3L
As #mrflick pointed already out you are using numeric - double. So the calculated result is approximately right. When using Integer you get warnings and R will convert it to numeric:
(-80538738812075974L)^3L + 80435758145817515L^3L + 12602123297335631L^3L
#[1] 1.992544e+35
#Warning messages:
#1: non-integer value 80538738812075974L qualified with L; using numeric value
#2: non-integer value 80435758145817515L qualified with L; using numeric value
#3: non-integer value 12602123297335631L qualified with L; using numeric value
Note that you have to give a string to as.bigz. When writing a number R will treat it as double or convert, as above, large integer numbers, to double and might lose precision.

What is the difference between 1:5 and c(1,2,3,4,5)? [duplicate]

This question already has answers here:
What's the difference between `1L` and `1`?
(4 answers)
Closed 7 years ago.
I'm very confused with the data structure concepts in R (those are much more easy to understand in SAS).
Is there any difference between x <- 1:5 and x <- c(1,2,3,4,5)?
From the environment window, I know that one is int and the other is num.
x and y below are not quite identical because they have different storage modes, as you discovered by using str(x) and str(y). In my experience, this distinction is unimportant 99% of the time; R uses fairly loose typing, and integers will automatically be promoted to double (i.e. double-precision floating point) when necessary. Integers and floating point values below the maximum integer value (.Machine$integer.max) can be converted back and forth without loss of information. (Integers do take slightly less space to store, and can be slightly faster to compute with as #RichardScriven comments above.)
If you want to create an integer vector, append L as below ... or use as.integer() as suggested in comments above.
x <- 1:5
y <- c(1,2,3,4,5)
z <- c(1L,2L,3L,4L,5L)
all.equal(x,y) ## test for _practical_ equivalence: TRUE
identical(x,y) ## FALSE
identical(x,z) ## TRUE
storage.mode() and class() may also be useful, as well as is.integer(), is.double(), as.integer(), as.double(), is.numeric() ...

Logic regarding summation of decimals [duplicate]

This question already has answers here:
Why are these numbers not equal?
(6 answers)
Closed 8 years ago.
Does the last statement in this series of statements make logical sense to anybody else? R seems to give similar results for a small subset of possible sums of decimals under 1. I cannot recall any basic mathematical principles that would make this true, but it seems to be unlikely to be an error.
> 0.4+0.6
[1] 1
> 0.4+0.6==1.0
[1] TRUE
> 0.3+0.6
[1] 0.9
> 0.3+0.6==0.9
[1] FALSE
Try typing 0.3+0.6-0.9, on my system the result is -1.110223e-16 this is because the computer doesn't actually sum them as decimal numbers, it stores binary approximations, and sums those. And none of those numbers can be exactly represented in binary, so there is a small amount of error present in the calculations, and apparently it's small enough not to matter in the first one, but not the second.
Floating point arithmetic is not exact, but the == operator is. Use all.equal to compare two floating point values in R.
isTRUE(all.equal(0.3+0.6, 0.9))
You can also define a tolerance when calling all.equals.
isTRUE(all.equal(0.3+0.6, 0.9, tolerance = 0.001))

acos(1) returns NaN for some values, not others

I have a list of latitude and longitude values, and I'm trying to find the distance between them. Using a standard great circle method, I need to find:
acos(sin(lat1)*sin(lat2) + cos(lat1)*cos(lat2) * cos(long2-long1))
And multiply this by the radius of earth, in the units I am using. This is valid as long as the values we take the acos of are in the range [-1,1]. If they are even slightly outside of this range, it will return NaN, even if the difference is due to rounding.
The issue I have is that sometimes, when two lat/long values are identical, this gives me an NaN error. Not always, even for the same pair of numbers, but always the same ones in a list. For instance, I have a person stopped on a road in the desert:
Time |lat |long
1:00PM|35.08646|-117.5023
1:01PM|35.08646|-117.5023
1:02PM|35.08646|-117.5023
1:03PM|35.08646|-117.5023
1:04PM|35.08646|-117.5023
When I calculate the distance between the consecutive points, the third value, for instance, will always be NaN, even though the others are not. This seems to be a weird bug with R rounding.
Can't tell exactly without seeing your data (try dput), but this is mostly likely a consequence of FAQ 7.31.
(x1 <- 1)
## [1] 1
(x2 <- 1+1e-16)
## [1] 1
(x3 <- 1+1e-8)
## [1] 1
acos(x1)
## [1] 0
acos(x2)
## [1] 0
acos(x3)
## [1] NaN
That is, even if your values are so similar that their printed representations are the same, they may still differ: some will be within .Machine$double.eps and others won't ...
One way to make sure the input values are bounded by [-1,1] is to use pmax and pmin: acos(pmin(pmax(x,-1.0),1.0))
A simple workaround is to use pmin(), like this:
acos(pmin(sin(lat1)*sin(lat2) + cos(lat1)*cos(lat2) * cos(long2-long1),1))
It now ensures that the precision loss leads to a value no higher than exactly 1.
This doesn't explain what is happening, however.
(Edit: Matthew Lundberg pointed out I need to use pmin to get it tow work with vectorized inputs. This fixes the problem with getting it to work, but I'm still not sure why it is rounding incorrectly.)
I just encountered this. This is caused by input larger than 1. Due to the computational error, my inner product between unit norms becomes a bit larger than 1 (like 1+0.00001). And acos() can only deal with [-1,1]. So, we can clamp the upper bound to exactly 1 to solve the problem.
For numpy: np.clip(your_input, -1, 1)
For Pytorch: torch.clamp(your_input, -1, 1)

Resources