Store a numeric variable with more decimals [duplicate] - r

When I create a dataframe from numeric vectors, R seems to truncate the value below the precision that I require in my analysis:
data.frame(x=0.99999996)
returns 1 (*but see update 1)
I am stuck when fitting spline(x,y) and two of the x values are set to 1 due to rounding while y changes. I could hack around this but I would prefer to use a standard solution if available.
example
Here is an example data set
d <- data.frame(x = c(0.668732936336141, 0.95351462456867,
0.994620622127435, 0.999602102672081, 0.999987126195509, 0.999999955814133,
0.999999999999966), y = c(38.3026509783688, 11.5895099585560,
10.0443344234229, 9.86152339768516, 9.84461434575695, 9.81648333804257,
9.83306725758297))
The following solution works, but I would prefer something that is less subjective:
plot(d$x, d$y, ylim=c(0,50))
lines(spline(d$x, d$y),col='grey') #bad fit
lines(spline(d[-c(4:6),]$x, d[-c(4:6),]$y),col='red') #reasonable fit
Update 1
*Since posting this question, I realize that this will return 1 even though the data frame still contains the original value, e.g.
> dput(data.frame(x=0.99999999996))
returns
structure(list(x = 0.99999999996), .Names = "x", row.names = c(NA,
-1L), class = "data.frame")
Update 2
After using dput to post this example data set, and some pointers from Dirk, I can see that the problem is not in the truncation of the x values but the limits of the numerical errors in the model that I have used to calculate y. This justifies dropping a few of the equivalent data points (as in the example red line).

If you really want set up R to print its results with utterly unreasonable precision, then use: options(digits=16).
Note that this does nothing for that accuracy of functions using htese results. It merely changes how values appear when they are printed to the console. There is no rounding of the values as they are being stored or accessed unless you put in more significant digits than the abscissa can handle. The 'digits' option has no effect on the maximal precision of floating point numbers.

Please re-read R FAQ 7.31 and the reference cited therein -- a really famous paper on what everbody should know about floating-point representation on computers.
The closing quote from Kerngighan and Plauger is also wonderful:
10.0 times 0.1 is hardly ever 1.0.
And besides the numerical precision issue, there is of course also how R prints with fewer decimals than it uses internally:
> for (d in 4:8) print(0.99999996, digits=d)
[1] 1
[1] 1
[1] 1
[1] 1
[1] 0.99999996
>

Related

how many digits does R carry in a numeric calculation (how to increase number of digits in R Numeric) [duplicate]

There is an option in R to get control over digit display. For example:
options(digits=10)
is supposed to give the calculation results in 10 digits till the end of R session. In the help file of R, the definition for digits parameter is as follows:
digits: controls the number of digits
to print when printing numeric values.
It is a suggestion only. Valid values
are 1...22 with default 7
So, it says this is a suggestion only. What if I like to always display 10 digits, not more or less?
My second question is, what if I like to display more than 22 digits, i.e. for more precise calculations like 100 digits? Is it possible with base R, or do I need an additional package/function for that?
Edit: Thanks to jmoy's suggestion, I tried sprintf("%.100f",pi) and it gave
[1] "3.1415926535897931159979634685441851615905761718750000000000000000000000000000000000000000000000000000"
which has 48 decimals. Is this the maximum limit R can handle?
The reason it is only a suggestion is that you could quite easily write a print function that ignored the options value. The built-in printing and formatting functions do use the options value as a default.
As to the second question, since R uses finite precision arithmetic, your answers aren't accurate beyond 15 or 16 decimal places, so in general, more aren't required. The gmp and rcdd packages deal with multiple precision arithmetic (via an interace to the gmp library), but this is mostly related to big integers rather than more decimal places for your doubles.
Mathematica or Maple will allow you to give as many decimal places as your heart desires.
EDIT:
It might be useful to think about the difference between decimal places and significant figures. If you are doing statistical tests that rely on differences beyond the 15th significant figure, then your analysis is almost certainly junk.
On the other hand, if you are just dealing with very small numbers, that is less of a problem, since R can handle number as small as .Machine$double.xmin (usually 2e-308).
Compare these two analyses.
x1 <- rnorm(50, 1, 1e-15)
y1 <- rnorm(50, 1 + 1e-15, 1e-15)
t.test(x1, y1) #Should throw an error
x2 <- rnorm(50, 0, 1e-15)
y2 <- rnorm(50, 1e-15, 1e-15)
t.test(x2, y2) #ok
In the first case, differences between numbers only occur after many significant figures, so the data are "nearly constant". In the second case, Although the size of the differences between numbers are the same, compared to the magnitude of the numbers themselves they are large.
As mentioned by e3bo, you can use multiple-precision floating point numbers using the Rmpfr package.
mpfr("3.141592653589793238462643383279502884197169399375105820974944592307816406286208998628034825")
These are slower and more memory intensive to use than regular (double precision) numeric vectors, but can be useful if you have a poorly conditioned problem or unstable algorithm.
If you are producing the entire output yourself, you can use sprintf(), e.g.
> sprintf("%.10f",0.25)
[1] "0.2500000000"
specifies that you want to format a floating point number with ten decimal points (in %.10f the f is for float and the .10 specifies ten decimal points).
I don't know of any way of forcing R's higher level functions to print an exact number of digits.
Displaying 100 digits does not make sense if you are printing R's usual numbers, since the best accuracy you can get using 64-bit doubles is around 16 decimal digits (look at .Machine$double.eps on your system). The remaining digits will just be junk.
One more solution able to control the how many decimal digits to print out based on needs (if you don't want to print redundant zero(s))
For example, if you have a vector as elements and would like to get sum of it
elements <- c(-1e-05, -2e-04, -3e-03, -4e-02, -5e-01, -6e+00, -7e+01, -8e+02)
sum(elements)
## -876.5432
Apparently, the last digital as 1 been truncated, the ideal result should be -876.54321, but if set as fixed printing decimal option, e.g sprintf("%.10f", sum(elements)), redundant zero(s) generate as -876.5432100000
Following the tutorial here: printing decimal numbers, if able to identify how many decimal digits in the certain numeric number, like here in -876.54321, there are 5 decimal digits need to print, then we can set up a parameter for format function as below:
decimal_length <- 5
formatC(sum(elements), format = "f", digits = decimal_length)
## -876.54321
We can change the decimal_length based on each time query, so it can satisfy different decimal printing requirement.
If you work primarily with tibbles, there is a function that enforces digits: num().
Here is an example:
library(tidyverse)
data <- tribble(
~ weight, ~ weight_selfreport,
81.5,81.66969147005445,
72.6,72.59528130671505,
92.9,93.01270417422867,
79.4,79.4010889292196,
94.6,96.64246823956442,
80.2,79.4010889292196,
116.2,113.43012704174228,
95.4,95.73502722323049,
99.5,99.8185117967332
)
data <-
data %>%
mutate(across(where(is.numeric), ~ num(., digits = 3)))
data
#> # A tibble: 9 × 2
#> weight weight_selfreport
#> <num:.3!> <num:.3!>
#> 1 81.500 81.670
#> 2 72.600 72.595
#> 3 92.900 93.013
#> 4 79.400 79.401
#> 5 94.600 96.642
#> 6 80.200 79.401
#> 7 116.200 113.430
#> 8 95.400 95.735
#> 9 99.500 99.819
Thus you can even decide to have different rounding options depending on what your needs are. I find it very helpful and a rather quick solution to printing dfs.

Why does dput have more precision than the original? [duplicate]

There is an option in R to get control over digit display. For example:
options(digits=10)
is supposed to give the calculation results in 10 digits till the end of R session. In the help file of R, the definition for digits parameter is as follows:
digits: controls the number of digits
to print when printing numeric values.
It is a suggestion only. Valid values
are 1...22 with default 7
So, it says this is a suggestion only. What if I like to always display 10 digits, not more or less?
My second question is, what if I like to display more than 22 digits, i.e. for more precise calculations like 100 digits? Is it possible with base R, or do I need an additional package/function for that?
Edit: Thanks to jmoy's suggestion, I tried sprintf("%.100f",pi) and it gave
[1] "3.1415926535897931159979634685441851615905761718750000000000000000000000000000000000000000000000000000"
which has 48 decimals. Is this the maximum limit R can handle?
The reason it is only a suggestion is that you could quite easily write a print function that ignored the options value. The built-in printing and formatting functions do use the options value as a default.
As to the second question, since R uses finite precision arithmetic, your answers aren't accurate beyond 15 or 16 decimal places, so in general, more aren't required. The gmp and rcdd packages deal with multiple precision arithmetic (via an interace to the gmp library), but this is mostly related to big integers rather than more decimal places for your doubles.
Mathematica or Maple will allow you to give as many decimal places as your heart desires.
EDIT:
It might be useful to think about the difference between decimal places and significant figures. If you are doing statistical tests that rely on differences beyond the 15th significant figure, then your analysis is almost certainly junk.
On the other hand, if you are just dealing with very small numbers, that is less of a problem, since R can handle number as small as .Machine$double.xmin (usually 2e-308).
Compare these two analyses.
x1 <- rnorm(50, 1, 1e-15)
y1 <- rnorm(50, 1 + 1e-15, 1e-15)
t.test(x1, y1) #Should throw an error
x2 <- rnorm(50, 0, 1e-15)
y2 <- rnorm(50, 1e-15, 1e-15)
t.test(x2, y2) #ok
In the first case, differences between numbers only occur after many significant figures, so the data are "nearly constant". In the second case, Although the size of the differences between numbers are the same, compared to the magnitude of the numbers themselves they are large.
As mentioned by e3bo, you can use multiple-precision floating point numbers using the Rmpfr package.
mpfr("3.141592653589793238462643383279502884197169399375105820974944592307816406286208998628034825")
These are slower and more memory intensive to use than regular (double precision) numeric vectors, but can be useful if you have a poorly conditioned problem or unstable algorithm.
If you are producing the entire output yourself, you can use sprintf(), e.g.
> sprintf("%.10f",0.25)
[1] "0.2500000000"
specifies that you want to format a floating point number with ten decimal points (in %.10f the f is for float and the .10 specifies ten decimal points).
I don't know of any way of forcing R's higher level functions to print an exact number of digits.
Displaying 100 digits does not make sense if you are printing R's usual numbers, since the best accuracy you can get using 64-bit doubles is around 16 decimal digits (look at .Machine$double.eps on your system). The remaining digits will just be junk.
One more solution able to control the how many decimal digits to print out based on needs (if you don't want to print redundant zero(s))
For example, if you have a vector as elements and would like to get sum of it
elements <- c(-1e-05, -2e-04, -3e-03, -4e-02, -5e-01, -6e+00, -7e+01, -8e+02)
sum(elements)
## -876.5432
Apparently, the last digital as 1 been truncated, the ideal result should be -876.54321, but if set as fixed printing decimal option, e.g sprintf("%.10f", sum(elements)), redundant zero(s) generate as -876.5432100000
Following the tutorial here: printing decimal numbers, if able to identify how many decimal digits in the certain numeric number, like here in -876.54321, there are 5 decimal digits need to print, then we can set up a parameter for format function as below:
decimal_length <- 5
formatC(sum(elements), format = "f", digits = decimal_length)
## -876.54321
We can change the decimal_length based on each time query, so it can satisfy different decimal printing requirement.
If you work primarily with tibbles, there is a function that enforces digits: num().
Here is an example:
library(tidyverse)
data <- tribble(
~ weight, ~ weight_selfreport,
81.5,81.66969147005445,
72.6,72.59528130671505,
92.9,93.01270417422867,
79.4,79.4010889292196,
94.6,96.64246823956442,
80.2,79.4010889292196,
116.2,113.43012704174228,
95.4,95.73502722323049,
99.5,99.8185117967332
)
data <-
data %>%
mutate(across(where(is.numeric), ~ num(., digits = 3)))
data
#> # A tibble: 9 × 2
#> weight weight_selfreport
#> <num:.3!> <num:.3!>
#> 1 81.500 81.670
#> 2 72.600 72.595
#> 3 92.900 93.013
#> 4 79.400 79.401
#> 5 94.600 96.642
#> 6 80.200 79.401
#> 7 116.200 113.430
#> 8 95.400 95.735
#> 9 99.500 99.819
Thus you can even decide to have different rounding options depending on what your needs are. I find it very helpful and a rather quick solution to printing dfs.

Is there a way to handle calculations invovling exponential of big values in R?

I have looked a bit online and in the site but I did not find any solution. My problem is relatively simple so if you could point me to a possible solution, much appreciated.
test_vec <- c(2,8,709,600)
mean(exp(test_vec))
test_vec_bis <- c(2,8,710,600)
mean(exp(test_vec_bis))
exp(709)
exp(710)
# The numerical limit of R is at exp(709)
How can I calculate the mean of my vector and deal with the Inf values knowing that R could probably handle the mean value but not all values in the numerator of the mean calculation ?
There is an edge case where you can solve your problem by simply restating your problem mathematically, but that would require that the length of your vector is extremely large and/or that your large exp. numbers are close to the numeric limit:
Since the mean sum(x)/n can be written as sum(x/n) and since exp(x)/exp(y) = exp(x-y), you can calculate sum(exp(x-log(n))), which gives you a relief of log(n).
mean(exp(test_vec))
[1] 2.054602e+307
sum(exp(test_vec - log(length(test_vec))))
[1] 2.054602e+307
sum(exp(test_vec_bis - log(length(test_vec_bis))))
[1] 5.584987e+307
While this works for your example, most likely this won't work for your real vector.
In this case, you will have to consult packages like Rmpfr as suggested by #fra.
Here's one way where you qualify to only select those in your test_vec that give an answer < Inf:
mean(exp(test_vec)[which(exp(test_vec) < Inf)])
[1] 1.257673e+260
t2 <- c(2,8,600)
mean(exp(t2))
[1] 1.257673e+260
This assumes you were looking to exclude values that result in Inf, of course.

how to get more decimal places in round function in R [duplicate]

There is an option in R to get control over digit display. For example:
options(digits=10)
is supposed to give the calculation results in 10 digits till the end of R session. In the help file of R, the definition for digits parameter is as follows:
digits: controls the number of digits
to print when printing numeric values.
It is a suggestion only. Valid values
are 1...22 with default 7
So, it says this is a suggestion only. What if I like to always display 10 digits, not more or less?
My second question is, what if I like to display more than 22 digits, i.e. for more precise calculations like 100 digits? Is it possible with base R, or do I need an additional package/function for that?
Edit: Thanks to jmoy's suggestion, I tried sprintf("%.100f",pi) and it gave
[1] "3.1415926535897931159979634685441851615905761718750000000000000000000000000000000000000000000000000000"
which has 48 decimals. Is this the maximum limit R can handle?
The reason it is only a suggestion is that you could quite easily write a print function that ignored the options value. The built-in printing and formatting functions do use the options value as a default.
As to the second question, since R uses finite precision arithmetic, your answers aren't accurate beyond 15 or 16 decimal places, so in general, more aren't required. The gmp and rcdd packages deal with multiple precision arithmetic (via an interace to the gmp library), but this is mostly related to big integers rather than more decimal places for your doubles.
Mathematica or Maple will allow you to give as many decimal places as your heart desires.
EDIT:
It might be useful to think about the difference between decimal places and significant figures. If you are doing statistical tests that rely on differences beyond the 15th significant figure, then your analysis is almost certainly junk.
On the other hand, if you are just dealing with very small numbers, that is less of a problem, since R can handle number as small as .Machine$double.xmin (usually 2e-308).
Compare these two analyses.
x1 <- rnorm(50, 1, 1e-15)
y1 <- rnorm(50, 1 + 1e-15, 1e-15)
t.test(x1, y1) #Should throw an error
x2 <- rnorm(50, 0, 1e-15)
y2 <- rnorm(50, 1e-15, 1e-15)
t.test(x2, y2) #ok
In the first case, differences between numbers only occur after many significant figures, so the data are "nearly constant". In the second case, Although the size of the differences between numbers are the same, compared to the magnitude of the numbers themselves they are large.
As mentioned by e3bo, you can use multiple-precision floating point numbers using the Rmpfr package.
mpfr("3.141592653589793238462643383279502884197169399375105820974944592307816406286208998628034825")
These are slower and more memory intensive to use than regular (double precision) numeric vectors, but can be useful if you have a poorly conditioned problem or unstable algorithm.
If you are producing the entire output yourself, you can use sprintf(), e.g.
> sprintf("%.10f",0.25)
[1] "0.2500000000"
specifies that you want to format a floating point number with ten decimal points (in %.10f the f is for float and the .10 specifies ten decimal points).
I don't know of any way of forcing R's higher level functions to print an exact number of digits.
Displaying 100 digits does not make sense if you are printing R's usual numbers, since the best accuracy you can get using 64-bit doubles is around 16 decimal digits (look at .Machine$double.eps on your system). The remaining digits will just be junk.
One more solution able to control the how many decimal digits to print out based on needs (if you don't want to print redundant zero(s))
For example, if you have a vector as elements and would like to get sum of it
elements <- c(-1e-05, -2e-04, -3e-03, -4e-02, -5e-01, -6e+00, -7e+01, -8e+02)
sum(elements)
## -876.5432
Apparently, the last digital as 1 been truncated, the ideal result should be -876.54321, but if set as fixed printing decimal option, e.g sprintf("%.10f", sum(elements)), redundant zero(s) generate as -876.5432100000
Following the tutorial here: printing decimal numbers, if able to identify how many decimal digits in the certain numeric number, like here in -876.54321, there are 5 decimal digits need to print, then we can set up a parameter for format function as below:
decimal_length <- 5
formatC(sum(elements), format = "f", digits = decimal_length)
## -876.54321
We can change the decimal_length based on each time query, so it can satisfy different decimal printing requirement.
If you work primarily with tibbles, there is a function that enforces digits: num().
Here is an example:
library(tidyverse)
data <- tribble(
~ weight, ~ weight_selfreport,
81.5,81.66969147005445,
72.6,72.59528130671505,
92.9,93.01270417422867,
79.4,79.4010889292196,
94.6,96.64246823956442,
80.2,79.4010889292196,
116.2,113.43012704174228,
95.4,95.73502722323049,
99.5,99.8185117967332
)
data <-
data %>%
mutate(across(where(is.numeric), ~ num(., digits = 3)))
data
#> # A tibble: 9 × 2
#> weight weight_selfreport
#> <num:.3!> <num:.3!>
#> 1 81.500 81.670
#> 2 72.600 72.595
#> 3 92.900 93.013
#> 4 79.400 79.401
#> 5 94.600 96.642
#> 6 80.200 79.401
#> 7 116.200 113.430
#> 8 95.400 95.735
#> 9 99.500 99.819
Thus you can even decide to have different rounding options depending on what your needs are. I find it very helpful and a rather quick solution to printing dfs.

acos(1) returns NaN for some values, not others

I have a list of latitude and longitude values, and I'm trying to find the distance between them. Using a standard great circle method, I need to find:
acos(sin(lat1)*sin(lat2) + cos(lat1)*cos(lat2) * cos(long2-long1))
And multiply this by the radius of earth, in the units I am using. This is valid as long as the values we take the acos of are in the range [-1,1]. If they are even slightly outside of this range, it will return NaN, even if the difference is due to rounding.
The issue I have is that sometimes, when two lat/long values are identical, this gives me an NaN error. Not always, even for the same pair of numbers, but always the same ones in a list. For instance, I have a person stopped on a road in the desert:
Time |lat |long
1:00PM|35.08646|-117.5023
1:01PM|35.08646|-117.5023
1:02PM|35.08646|-117.5023
1:03PM|35.08646|-117.5023
1:04PM|35.08646|-117.5023
When I calculate the distance between the consecutive points, the third value, for instance, will always be NaN, even though the others are not. This seems to be a weird bug with R rounding.
Can't tell exactly without seeing your data (try dput), but this is mostly likely a consequence of FAQ 7.31.
(x1 <- 1)
## [1] 1
(x2 <- 1+1e-16)
## [1] 1
(x3 <- 1+1e-8)
## [1] 1
acos(x1)
## [1] 0
acos(x2)
## [1] 0
acos(x3)
## [1] NaN
That is, even if your values are so similar that their printed representations are the same, they may still differ: some will be within .Machine$double.eps and others won't ...
One way to make sure the input values are bounded by [-1,1] is to use pmax and pmin: acos(pmin(pmax(x,-1.0),1.0))
A simple workaround is to use pmin(), like this:
acos(pmin(sin(lat1)*sin(lat2) + cos(lat1)*cos(lat2) * cos(long2-long1),1))
It now ensures that the precision loss leads to a value no higher than exactly 1.
This doesn't explain what is happening, however.
(Edit: Matthew Lundberg pointed out I need to use pmin to get it tow work with vectorized inputs. This fixes the problem with getting it to work, but I'm still not sure why it is rounding incorrectly.)
I just encountered this. This is caused by input larger than 1. Due to the computational error, my inner product between unit norms becomes a bit larger than 1 (like 1+0.00001). And acos() can only deal with [-1,1]. So, we can clamp the upper bound to exactly 1 to solve the problem.
For numpy: np.clip(your_input, -1, 1)
For Pytorch: torch.clamp(your_input, -1, 1)

Resources