I want to calculate a big number.
My problem is that there is a limit.
So for example, if you run factorial(170) it returns: [1] 7.257416e+306.
But as soon as you want to calculate factorial(171)(or a bigger number) it returns [1] Inf.
That is because when you run .Machine you will see that
$double.xmax
[1] 1.797693e+308
So my question is, how can one make it bigger? For instance, can we make it to 1.797693e+500?
You can't, in base R; R can only do computations with 32-bit integers and 64-bit floating point values. You can use the Rmpfr package:
library(Rmpfr)
factorialMpfr(200)
1 'mpfr' number of precision 1246 bits
## [1] 788657867364790503552363213932185062295135977687173263294742533244359449963403342920304284011984623904177212138919638830257642790242637105061926624952829931113462857270763317237396988943922445621451664240254033291864131227428294853277524242407573903240321257405579568660226031904170324062351700858796178922222789623703897374720000000000000000000000000000000000000000000000000
This value is "only" about 1e374, but we can easily go larger than that, e.g.
round(log10(factorialMpfr(400)))
869
However, there are some drawbacks to this: (1) computation is much slower; (2) it can be complicated to fit these results into an existing R workflow. People often find ways to do their computations on the log scale (you can compute log-factorial directly with the lfactorial() function)
Related
I'm struggling with issues re. floating point accuracy, and could not find a solution.
Here is a short example:
aa<-c(99.93029, 0.0697122)
aa
[1] 99.9302900 0.0697122
aa[1]
99.93029
print(aa[1],digits=20)
99.930289999999999
It would appear that, upon storing the vector, R converted the numbers to something with a slightly different internal representation (yes, I have read circle 1 of the "R inferno" and similar material).
How can I force R to store the input values exactly "as is", with no modification?
In my case, my problem is that the values are processed in such a way that the small errors very quickly grow:
aa[2]/(100-aa[1])*100
[1] 100.0032 ## Should be 100, of course !
print(aa[2]/(100-aa[1])*100,digits=20)
[1] 100.00315593171625
So I need to find a way to get my normalization right.
Thanks
PS- There are many questions on this site and elsewhere, discussing the issue of apparent loss of precision, i.e. numbers displayed incorrectly (but stored right). Here, for instance:
How to stop read.table from rounding numbers with different degrees of precision in R?
This is a distinct issue, as the number is stored incorrectly (but displayed right).
(R version 3.2.1 (2015-06-18), win 7 x64)
Floating point precision has always generated lots of confusion. The crucial idea to remember is: when you work with doubles, there is no way to store each real number "as is", or "exactly right" -- the best you can store is the closest available approximation. So when you type (in R or any other modern language) something like x = 99.93029, you'll get this number represented by 99.930289999999999.
Now when you expect a + b to be "exactly 100", you're being inaccurate in terms. The best you can get is "100 up to N digits after the decimal point" and hope that N is big enough. In your case it would be correct to say 99.9302900 + 0.0697122 is 100 with 5 decimal points of accuracy. Naturally, by multiplying that equality by 10^k you'll lose additional k digits of accuracy.
So, there are two solutions here:
a. To get more precision in the output, provide more precision in the input.
bb <- c(99.93029, 0.06971)
print(bb[2]/(100-bb[1])*100, digits = 20)
[1] 99.999999999999119
b. If double precision not enough (can happen in complex algorithms), use packages that provide extra numeric precision operations. For instance, package gmp.
i guess you have misunderstood here. It's the same case where R is storing the correct value but the value is displayed accordingly to the value of option chosen while displaying it.
For Eg:
# the output of below will be:
> print(99.930289999999999,digits=20)
[1] 99.930289999999999395
But
# the output of:
> print(1,digits=20)
[1] 1
Also
> print(1.1,digits=20)
[1] 1.1000000000000000888
In addition to previous answers, I think that a good lecture regarding the subject would be
R Inferno, by P.Burns
http://www.burns-stat.com/documents/books/the-r-inferno/
I'm trying to find the square root of a big integer in R language. I'm using package gmp which provides bigz for big integers but it seems it's missing a function for square root. I'm opened to using another package for big integers if needed.
library(gmp)
sqrt(as.bigz("113423713055421844361000443349850346743"))
Error: 'Math.bigz' is not implemented yet
Alternatively I'm looking for a way to implement sqrt using bigz.
This type of problem is exactly what the Rmpfr package was built for.
library(Rmpfr)
a <- mpfr("113423713055421844361000443349850346743", 128) ## specify the number of bits
sqrt(a)
1 'mpfr' number of precision 128 bits
[1] 10650056950806500000.00000005163589039117
It should be noted that in order to access the power of this package, you must first declare your variable as an mpfr object. Once you have done this, you can easily perform any number of arithmetic operations to any number of precision of bits (up to memory).
Sometimes you might want to use logs to do calculations with big numbers. I.e. x^y = exp(y*log(x)).
library(gmp)
x <- 113423713055421844361000443349850346743
as.bigz(exp(0.5*log(as.bigz(x))))
Big Integer ('bigz') :
[1] 10650056950806493184
If I calculate factorial(100) then I get an answer of [1] 9.332622e+157 but when I try to calculate a larger factorial, say factorial(1000) I get an answer of [1] Inf
Is there a way to use arbitrary precision when calculating factorials such that I can calculate say factorial(1000000)?
For arbitrary precision you can use either gmp or Rmpfr. For specifically factorial gmp offers factorialZ and Rmpfr has factorialMpfr. So you can run something like below
> Rmpfr::factorialMpfr(200)
1 'mpfr' number of precision 1246 bits
[1] 788657867364790503552363213932185062295135977687173263294742533244359449963403342920304284011984623904177212138919638830257642790242637105061926624952829931113462857270763317237396988943922445621451664240254033291864131227428294853277524242407573903240321257405579568660226031904170324062351700858796178922222789623703897374720000000000000000000000000000000000000000000000000
> gmp::factorialZ(200)
Big Integer ('bigz') :
[1] 788657867364790503552363213932185062295135977687173263294742533244359449963403342920304284011984623904177212138919638830257642790242637105061926624952829931113462857270763317237396988943922445621451664240254033291864131227428294853277524242407573903240321257405579568660226031904170324062351700858796178922222789623703897374720000000000000000000000000000000000000000000000000
HTH
I wrote a web scraper; #Khashaa's answer is probably faster, but I went through for proof of concept and to hone my nascent rvest skills:
library(rvest)
Factorial<-function(n){
x<-strsplit(strsplit((html(paste0(
#%21 is URL speak for !
"http://www.wolframalpha.com/input/?i=",n,"%21")) %>%
#to understand this part, I recommend going to the site
# and doing "Inspect Element" on the decimal representation
html_nodes("area") %>% html_attr("href")),
split="[=&]")[[1]][2],split="\\+")[[1]]
cat(paste0(substr(x[1],1,8), #8 here can be changed to the precision you'd like;
# could also make it match printing options...
"e+",gsub(".*E","",x[3])))
}
> Factorial(10000)
2.846259e+35659
Another possible advantage is using Wolfram's computing power instead of your own (I don't know how efficient the package options are, I imagine they just use asymptotic approximations so this probably isn't a concern, just thought I'd mention it)
The problem that I would like to solve is the infinite sum over the following function:
For the sum I use an FTOL determination criterion. This whole term doesn't create any problems until z becomes very large. I expect the maximum value of z around 220. As you can see the first term has its max around Factorial(221) and therefore has to go around Factorial(500) until the determination criterion has been reached. After spotting this problem I didn't want to change the whole code (as it is only one small part) and tried to use library('Rmpfr') and library('gmp'). The problem is that I do not get what I want to. While multiplication normally works, subtraction fails for higher values:
This works
> factorialZ(22)-factorial(22)
Big Integer ('bigz') :
[1] 0
but this fails:
> factorialZ(50)-factorial(50)
Big Integer ('bigz') :
[1] 359073645150499628823711419759505502520867983196160
another way I tried:
> gamma(as(10,"mpfr"))-factorial(9)
1 'mpfr' number of precision 128 bits
[1] 0
> gamma(as(40,"mpfr"))-factorial(39)
1 'mpfr' number of precision 128 bits
[1] 1770811808798664813196481658880
There has to be something that I don't really understand. Does someone have a even better solution for the problem or can someone help me out with the issue above?
I think you incorrectly understand the priorities in factorialZ(x)-factorial(x) . The second term, factorial(x) is calculated before it's converted to a bigz to be combined with the first term.
You must create any integer outside the 2^64 (or whatever, depending on your machine) range using a bigz - compatible function.
50! is between 2^214 and 2^215 so the closest representable numbers are 2^(214-52) apart. factorial in R is based on a Lanczos approximation whereas factorialZ is calculating it exactly. The answers are within the machine precision:
> all.equal(as.numeric(factorialZ(50)), factorial(50))
[1] TRUE
The part that you're not understanding is floating point and it's limitations. You're only getting ~15 digits of precision in floating point. factorialZ(50) has a LOT more precision than that, so you shouldn't expect them to be the same.
I would be interested to increase the floating point limit for when calculating qnorm/pnorm from their current level, for example:
x <- pnorm(10) # 1
qnorm(x) # Inf
qnorm(.9999999999999999444) # The highst limit I've found that still return a <<Inf number
Is that (under a reasonable amount of time) possible to do? If so, how?
If the argument is way in the upper tail, you should be able to get better precision by calculating 1-p. Like this:
> x = pnorm(10, lower.tail=F)
> qnorm(x, lower.tail=F)
10
I would expect (though I don't know for sure) that the pnorm() function is referring to a C or Fortran routine that is stuck on whatever floating point size the hardware supports. Probably better to rearrange your problem so the precision isn't needed.
Then, if you're dealing with really really big z-values, you can use log.p=T:
> qnorm(pnorm(100, low=F, log=T), low=F, log=T)
100
Sorry this isn't exactly what you're looking for. But I think it will be more scalable -- pnorm hits 1 so rapidly at high z-values (it is e^(-x^2), after all) that even if you add more bits they will run out fast.