R: gmp: inaccurate output from mod.bigz - r

I downloaded the gmp package in order to calculate the modular exponentiation of very large numbers. But one of its functions, mod.bigz, seems to fail beyond a certain number of digits. For example, the answer to 100...00 mod 3 should be 1 since 99...99 is divisible by 3. But the answer I get is sometimes 0 or 2. Is there any way to fix this or is gmp just not accurate for very large numbers?
https://cran.r-project.org/web/packages/gmp/index.html
#install.packages('gmp')
library(gmp)
mod.bigz(100000000000000000000000000000000000000000000000000,3)
# 2
mod.bigz(10000000000000000000000000000000000000000000000000000000,3)
# 0

I think my overall advice is avoid falling back to base R at any point when you have those numbers in your code. If you fall back on regular R (or "regular most any programming language") at some point then it breaks.
For the original example you could wrap the inner number in pow.bigz:
mod.bigz(pow.bigz(10,50), 3)
# 1
mod.bigz(pow.bigz(10,55),3)
# 1
For the more complicated example we discussed in the comments, i.e. 693487563928456923569873549873658638579865348726988458, we get to the real solution which is to avoid falling back on R for the number via the character class:
mod.bigz("693487563928456923569873549873658638579865348726988458",3) # should be 0
# 0
mod.bigz("100000000000000000000000000000000000000000000000000",3) # should be 1
# 1

Related

Why does dput have more precision than the original? [duplicate]

There is an option in R to get control over digit display. For example:
options(digits=10)
is supposed to give the calculation results in 10 digits till the end of R session. In the help file of R, the definition for digits parameter is as follows:
digits: controls the number of digits
to print when printing numeric values.
It is a suggestion only. Valid values
are 1...22 with default 7
So, it says this is a suggestion only. What if I like to always display 10 digits, not more or less?
My second question is, what if I like to display more than 22 digits, i.e. for more precise calculations like 100 digits? Is it possible with base R, or do I need an additional package/function for that?
Edit: Thanks to jmoy's suggestion, I tried sprintf("%.100f",pi) and it gave
[1] "3.1415926535897931159979634685441851615905761718750000000000000000000000000000000000000000000000000000"
which has 48 decimals. Is this the maximum limit R can handle?
The reason it is only a suggestion is that you could quite easily write a print function that ignored the options value. The built-in printing and formatting functions do use the options value as a default.
As to the second question, since R uses finite precision arithmetic, your answers aren't accurate beyond 15 or 16 decimal places, so in general, more aren't required. The gmp and rcdd packages deal with multiple precision arithmetic (via an interace to the gmp library), but this is mostly related to big integers rather than more decimal places for your doubles.
Mathematica or Maple will allow you to give as many decimal places as your heart desires.
EDIT:
It might be useful to think about the difference between decimal places and significant figures. If you are doing statistical tests that rely on differences beyond the 15th significant figure, then your analysis is almost certainly junk.
On the other hand, if you are just dealing with very small numbers, that is less of a problem, since R can handle number as small as .Machine$double.xmin (usually 2e-308).
Compare these two analyses.
x1 <- rnorm(50, 1, 1e-15)
y1 <- rnorm(50, 1 + 1e-15, 1e-15)
t.test(x1, y1) #Should throw an error
x2 <- rnorm(50, 0, 1e-15)
y2 <- rnorm(50, 1e-15, 1e-15)
t.test(x2, y2) #ok
In the first case, differences between numbers only occur after many significant figures, so the data are "nearly constant". In the second case, Although the size of the differences between numbers are the same, compared to the magnitude of the numbers themselves they are large.
As mentioned by e3bo, you can use multiple-precision floating point numbers using the Rmpfr package.
mpfr("3.141592653589793238462643383279502884197169399375105820974944592307816406286208998628034825")
These are slower and more memory intensive to use than regular (double precision) numeric vectors, but can be useful if you have a poorly conditioned problem or unstable algorithm.
If you are producing the entire output yourself, you can use sprintf(), e.g.
> sprintf("%.10f",0.25)
[1] "0.2500000000"
specifies that you want to format a floating point number with ten decimal points (in %.10f the f is for float and the .10 specifies ten decimal points).
I don't know of any way of forcing R's higher level functions to print an exact number of digits.
Displaying 100 digits does not make sense if you are printing R's usual numbers, since the best accuracy you can get using 64-bit doubles is around 16 decimal digits (look at .Machine$double.eps on your system). The remaining digits will just be junk.
One more solution able to control the how many decimal digits to print out based on needs (if you don't want to print redundant zero(s))
For example, if you have a vector as elements and would like to get sum of it
elements <- c(-1e-05, -2e-04, -3e-03, -4e-02, -5e-01, -6e+00, -7e+01, -8e+02)
sum(elements)
## -876.5432
Apparently, the last digital as 1 been truncated, the ideal result should be -876.54321, but if set as fixed printing decimal option, e.g sprintf("%.10f", sum(elements)), redundant zero(s) generate as -876.5432100000
Following the tutorial here: printing decimal numbers, if able to identify how many decimal digits in the certain numeric number, like here in -876.54321, there are 5 decimal digits need to print, then we can set up a parameter for format function as below:
decimal_length <- 5
formatC(sum(elements), format = "f", digits = decimal_length)
## -876.54321
We can change the decimal_length based on each time query, so it can satisfy different decimal printing requirement.
If you work primarily with tibbles, there is a function that enforces digits: num().
Here is an example:
library(tidyverse)
data <- tribble(
~ weight, ~ weight_selfreport,
81.5,81.66969147005445,
72.6,72.59528130671505,
92.9,93.01270417422867,
79.4,79.4010889292196,
94.6,96.64246823956442,
80.2,79.4010889292196,
116.2,113.43012704174228,
95.4,95.73502722323049,
99.5,99.8185117967332
)
data <-
data %>%
mutate(across(where(is.numeric), ~ num(., digits = 3)))
data
#> # A tibble: 9 × 2
#> weight weight_selfreport
#> <num:.3!> <num:.3!>
#> 1 81.500 81.670
#> 2 72.600 72.595
#> 3 92.900 93.013
#> 4 79.400 79.401
#> 5 94.600 96.642
#> 6 80.200 79.401
#> 7 116.200 113.430
#> 8 95.400 95.735
#> 9 99.500 99.819
Thus you can even decide to have different rounding options depending on what your needs are. I find it very helpful and a rather quick solution to printing dfs.

how to get more decimal places in round function in R [duplicate]

There is an option in R to get control over digit display. For example:
options(digits=10)
is supposed to give the calculation results in 10 digits till the end of R session. In the help file of R, the definition for digits parameter is as follows:
digits: controls the number of digits
to print when printing numeric values.
It is a suggestion only. Valid values
are 1...22 with default 7
So, it says this is a suggestion only. What if I like to always display 10 digits, not more or less?
My second question is, what if I like to display more than 22 digits, i.e. for more precise calculations like 100 digits? Is it possible with base R, or do I need an additional package/function for that?
Edit: Thanks to jmoy's suggestion, I tried sprintf("%.100f",pi) and it gave
[1] "3.1415926535897931159979634685441851615905761718750000000000000000000000000000000000000000000000000000"
which has 48 decimals. Is this the maximum limit R can handle?
The reason it is only a suggestion is that you could quite easily write a print function that ignored the options value. The built-in printing and formatting functions do use the options value as a default.
As to the second question, since R uses finite precision arithmetic, your answers aren't accurate beyond 15 or 16 decimal places, so in general, more aren't required. The gmp and rcdd packages deal with multiple precision arithmetic (via an interace to the gmp library), but this is mostly related to big integers rather than more decimal places for your doubles.
Mathematica or Maple will allow you to give as many decimal places as your heart desires.
EDIT:
It might be useful to think about the difference between decimal places and significant figures. If you are doing statistical tests that rely on differences beyond the 15th significant figure, then your analysis is almost certainly junk.
On the other hand, if you are just dealing with very small numbers, that is less of a problem, since R can handle number as small as .Machine$double.xmin (usually 2e-308).
Compare these two analyses.
x1 <- rnorm(50, 1, 1e-15)
y1 <- rnorm(50, 1 + 1e-15, 1e-15)
t.test(x1, y1) #Should throw an error
x2 <- rnorm(50, 0, 1e-15)
y2 <- rnorm(50, 1e-15, 1e-15)
t.test(x2, y2) #ok
In the first case, differences between numbers only occur after many significant figures, so the data are "nearly constant". In the second case, Although the size of the differences between numbers are the same, compared to the magnitude of the numbers themselves they are large.
As mentioned by e3bo, you can use multiple-precision floating point numbers using the Rmpfr package.
mpfr("3.141592653589793238462643383279502884197169399375105820974944592307816406286208998628034825")
These are slower and more memory intensive to use than regular (double precision) numeric vectors, but can be useful if you have a poorly conditioned problem or unstable algorithm.
If you are producing the entire output yourself, you can use sprintf(), e.g.
> sprintf("%.10f",0.25)
[1] "0.2500000000"
specifies that you want to format a floating point number with ten decimal points (in %.10f the f is for float and the .10 specifies ten decimal points).
I don't know of any way of forcing R's higher level functions to print an exact number of digits.
Displaying 100 digits does not make sense if you are printing R's usual numbers, since the best accuracy you can get using 64-bit doubles is around 16 decimal digits (look at .Machine$double.eps on your system). The remaining digits will just be junk.
One more solution able to control the how many decimal digits to print out based on needs (if you don't want to print redundant zero(s))
For example, if you have a vector as elements and would like to get sum of it
elements <- c(-1e-05, -2e-04, -3e-03, -4e-02, -5e-01, -6e+00, -7e+01, -8e+02)
sum(elements)
## -876.5432
Apparently, the last digital as 1 been truncated, the ideal result should be -876.54321, but if set as fixed printing decimal option, e.g sprintf("%.10f", sum(elements)), redundant zero(s) generate as -876.5432100000
Following the tutorial here: printing decimal numbers, if able to identify how many decimal digits in the certain numeric number, like here in -876.54321, there are 5 decimal digits need to print, then we can set up a parameter for format function as below:
decimal_length <- 5
formatC(sum(elements), format = "f", digits = decimal_length)
## -876.54321
We can change the decimal_length based on each time query, so it can satisfy different decimal printing requirement.
If you work primarily with tibbles, there is a function that enforces digits: num().
Here is an example:
library(tidyverse)
data <- tribble(
~ weight, ~ weight_selfreport,
81.5,81.66969147005445,
72.6,72.59528130671505,
92.9,93.01270417422867,
79.4,79.4010889292196,
94.6,96.64246823956442,
80.2,79.4010889292196,
116.2,113.43012704174228,
95.4,95.73502722323049,
99.5,99.8185117967332
)
data <-
data %>%
mutate(across(where(is.numeric), ~ num(., digits = 3)))
data
#> # A tibble: 9 × 2
#> weight weight_selfreport
#> <num:.3!> <num:.3!>
#> 1 81.500 81.670
#> 2 72.600 72.595
#> 3 92.900 93.013
#> 4 79.400 79.401
#> 5 94.600 96.642
#> 6 80.200 79.401
#> 7 116.200 113.430
#> 8 95.400 95.735
#> 9 99.500 99.819
Thus you can even decide to have different rounding options depending on what your needs are. I find it very helpful and a rather quick solution to printing dfs.

i not showing up as number in loop

so I have a loop that finds the position in the matrix where there is the largest difference in consecutive elements. For example, if thematrix[8] and thematrix[9] have the largest difference between any two consecutive elements, the number given should be 8.
I made the loop in a way that it will ignore comparisons where one of the elements is NaN (because I have some of those in my data). The loop I made looks like this.
thenumber = 0 #will store the difference
for (i in 1:nrow(thematrix) - 1) {
if (!is.na(thematrix[i]) & !is.na(thematrix[i + 1])) {
if (abs(thematrix[i] - thematrix[i + 1]) > thenumber) {
thenumber = i
}
}
}
This looks like it should work but whenever I run it
Error in if (!is.na(thematrix[i]) & !is.na(thematrix[i + 1])) { :
argument is of length zero
I tried this thing but with a random number in the brackets instead of i and it works. For some reason it only doesn't work when I use the i specified in the beginning of the for-loop. It doesn't recognize that i represents a number. Why doesn't R recognize i?
Also, if there's a better way to do this task I'd appreciate it greatly if you could explain it to me
You are pretty close but when you call i in 1:nrow(thematrix) - 1 R evaluates this to make i = 0 which is what causes this issue. I would suggest either calling i in 1:nrow(thematrix) or i in 2:nrow(thematrix) - 1 to start your loop at i = 1. I think your approach is generally pretty intuitive but one suggestion would be to frequently use the print() function to evaluate how i changes over the course of your function.
The issue is that the : operator has higher precedence than -; you just need to use parentheses around (nrow(thematrix)-1). For example,
thematrix <- matrix(1:10, nrow = 5)
##
wrong <- 1:nrow(thematrix) - 1
right <- 1:(nrow(thematrix) - 1)
##
R> wrong
#[1] 0 1 2 3 4
R> right
#[1] 1 2 3 4
Where the error message is coming from trying to access the zero-th element of thematrix:
R> thematrix[0]
integer(0)
The other two answers address your question directly, but I must say this is about the worst possible way to solve this problem in R.
set.seed(1) # for reproducible example
x <- sample(1:10,10) # numbers 1:10 in random order
x
# [1] 3 4 5 7 2 8 9 6 10 1
which.max(abs(diff(x)))
# [1] 9
The diff(...) function calculates sequential differences, and which.max(...) identifies the element number of the maximum value in a vector.

R spline function given a fixed space

So, I need to generate a spline function to feed it into another program which only accepts a fixed space between consecutive points. So, I used spline function in R with a given number of points to genrate spline, however, the floating-point cutoff makes the space among the points variable, for example:
spline(d$V1, d$V2, n=(max(d$V1)-min(d$V1))/0.0200)
> head(t.spl, 7)
x y
1 2.3000 -3.0204
2 2.3202 -3.0204
3 2.3404 -3.0204
4 2.3606 -3.0204
5 2.3807 -3.0204
6 2.4009 -3.0204
7 2.4211 -3.0204
so, the space between 1st 1nd 2nd row is 0.0202, while between 4th and 5th is 0.0201. So because of this problem, the other program that I am feeding this spline into, doesn't accept this. So, is there any way to make this work?
As an aside: please provide a reproducible example next time (I can't copy/paste your code in because I don't have d or t.spl)
I think you'll find that the different intervals (0.0202 vs 0.0201) is an artifact of the number of characters you are printing on the screen, not of the spline function.
It seems R is printing 4 digits after the decimal point for you for neatness, so it's doing the rounding only for the purposes of displaying the results to you.
You can see how many digits are displayed with options('digits')$digits, and adjust it with options(digits=new_number_of_digits) (see ?options for details).
For example:
options(digits=4)
pi
# 3.142
options(digits=10)
pi
# 3.141592654
In summary, when you feed the values in to your other program, make sure you print the values with enough decimal points that the other program accepts the intervals as being "equal".
If you are writing to a file, for example, just make sure you write enough digits out. If you are copy-pasting from the R console, make sure you adjust R to print out enough digits.
MathematicalCoffee is probably right. I'm just adding an alternative for the sake of wordiness.
myspline <- splinefun(dV$1,dV$2)
mydata.y <- myspline(desired_x_values,deriv=0)
Will guarantee the uniform x-spacings you desire.

Adding numbers within a vector in r

I have a vector
v<-c(1,2,3)
I need add the numbers in the vector in the following fashion
1,1+2,1+2+3
producing a second vector
v1<-c(1,3,6)
This is probably quite simple...but I am a bit stuck.
Use the cumulative sum function:
cumsum(v)
#[1] 1 3 6

Resources