R: Strange trig function behavior - r

As a Matlab user transitioning to R, I have ran across the problem of applying trigonometric functions to degrees. In Matlab there are trig functions for both radians and degrees (e.g. cos and cosd, respectively). R seems to only include functions for radians, thus requiring me to create my own (see below)
cosd<-function(degrees) {
radians<-cos(degrees*pi/180)
return(radians)
}
Unfortunately this function does not work properly all of the time. Some results are shown below.
> cosd(90)
[1] 6.123234e-17
> cosd(180)
[1] -1
> cosd(270)
[1] -1.836970e-16
> cosd(360)
[1] 1
I'd like to understand what is causing this and how to fix this. Thanks!

This is floating point arithmetic:
> all.equal(cosd(90), 0)
[1] TRUE
> all.equal(cosd(270), 0)
[1] TRUE
If that is what you meant by "does not work properly"?
This is also a FAQ: http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f

Looks like it's working fine to me. The value for pi probably isn't precise enough, so you are getting a very close estimate. If you think about it, 6.123234e-17 and -1.836970e-16 are very very close to 0, which is what the answer should be.
Your problem lies in the fact that while 90*pi/180 = pi/2 on paper, in computers, we use floating point numbers. I'm not sure what R/matlab use, but I'd definitely guess either a 32 bit or 64 bit floating point number. And you can only fit so much information in that limited number of bits, so you can't store every possible decimal.
You could modify your function so that given 90 or 270, return 0.

This is a floating point representation error. See Chapter 1 of http://lib.stat.cmu.edu/s/Spoetry/Tutor/R_inferno.pdf

The same reason that
1-(1/3)-(1/3)-(1/3)
doesn't equal 0. It has something to do with floating point numbers. I'm sure there will be more elaboration.

You may also be interested in the zapsmall function for another way of showing numbers that are close to 0 as 0.

Related

Managing floating point accuracy

I'm struggling with issues re. floating point accuracy, and could not find a solution.
Here is a short example:
aa<-c(99.93029, 0.0697122)
aa
[1] 99.9302900 0.0697122
aa[1]
99.93029
print(aa[1],digits=20)
99.930289999999999
It would appear that, upon storing the vector, R converted the numbers to something with a slightly different internal representation (yes, I have read circle 1 of the "R inferno" and similar material).
How can I force R to store the input values exactly "as is", with no modification?
In my case, my problem is that the values are processed in such a way that the small errors very quickly grow:
aa[2]/(100-aa[1])*100
[1] 100.0032 ## Should be 100, of course !
print(aa[2]/(100-aa[1])*100,digits=20)
[1] 100.00315593171625
So I need to find a way to get my normalization right.
Thanks
PS- There are many questions on this site and elsewhere, discussing the issue of apparent loss of precision, i.e. numbers displayed incorrectly (but stored right). Here, for instance:
How to stop read.table from rounding numbers with different degrees of precision in R?
This is a distinct issue, as the number is stored incorrectly (but displayed right).
(R version 3.2.1 (2015-06-18), win 7 x64)
Floating point precision has always generated lots of confusion. The crucial idea to remember is: when you work with doubles, there is no way to store each real number "as is", or "exactly right" -- the best you can store is the closest available approximation. So when you type (in R or any other modern language) something like x = 99.93029, you'll get this number represented by 99.930289999999999.
Now when you expect a + b to be "exactly 100", you're being inaccurate in terms. The best you can get is "100 up to N digits after the decimal point" and hope that N is big enough. In your case it would be correct to say 99.9302900 + 0.0697122 is 100 with 5 decimal points of accuracy. Naturally, by multiplying that equality by 10^k you'll lose additional k digits of accuracy.
So, there are two solutions here:
a. To get more precision in the output, provide more precision in the input.
bb <- c(99.93029, 0.06971)
print(bb[2]/(100-bb[1])*100, digits = 20)
[1] 99.999999999999119
b. If double precision not enough (can happen in complex algorithms), use packages that provide extra numeric precision operations. For instance, package gmp.
i guess you have misunderstood here. It's the same case where R is storing the correct value but the value is displayed accordingly to the value of option chosen while displaying it.
For Eg:
# the output of below will be:
> print(99.930289999999999,digits=20)
[1] 99.930289999999999395
But
# the output of:
> print(1,digits=20)
[1] 1
Also
> print(1.1,digits=20)
[1] 1.1000000000000000888
In addition to previous answers, I think that a good lecture regarding the subject would be
R Inferno, by P.Burns
http://www.burns-stat.com/documents/books/the-r-inferno/

Is there a package or technique availabe for calculating large factorials in R?

If I calculate factorial(100) then I get an answer of [1] 9.332622e+157 but when I try to calculate a larger factorial, say factorial(1000) I get an answer of [1] Inf
Is there a way to use arbitrary precision when calculating factorials such that I can calculate say factorial(1000000)?
For arbitrary precision you can use either gmp or Rmpfr. For specifically factorial gmp offers factorialZ and Rmpfr has factorialMpfr. So you can run something like below
> Rmpfr::factorialMpfr(200)
1 'mpfr' number of precision 1246 bits
[1] 788657867364790503552363213932185062295135977687173263294742533244359449963403342920304284011984623904177212138919638830257642790242637105061926624952829931113462857270763317237396988943922445621451664240254033291864131227428294853277524242407573903240321257405579568660226031904170324062351700858796178922222789623703897374720000000000000000000000000000000000000000000000000
> gmp::factorialZ(200)
Big Integer ('bigz') :
[1] 788657867364790503552363213932185062295135977687173263294742533244359449963403342920304284011984623904177212138919638830257642790242637105061926624952829931113462857270763317237396988943922445621451664240254033291864131227428294853277524242407573903240321257405579568660226031904170324062351700858796178922222789623703897374720000000000000000000000000000000000000000000000000
HTH
I wrote a web scraper; #Khashaa's answer is probably faster, but I went through for proof of concept and to hone my nascent rvest skills:
library(rvest)
Factorial<-function(n){
x<-strsplit(strsplit((html(paste0(
#%21 is URL speak for !
"http://www.wolframalpha.com/input/?i=",n,"%21")) %>%
#to understand this part, I recommend going to the site
# and doing "Inspect Element" on the decimal representation
html_nodes("area") %>% html_attr("href")),
split="[=&]")[[1]][2],split="\\+")[[1]]
cat(paste0(substr(x[1],1,8), #8 here can be changed to the precision you'd like;
# could also make it match printing options...
"e+",gsub(".*E","",x[3])))
}
> Factorial(10000)
2.846259e+35659
Another possible advantage is using Wolfram's computing power instead of your own (I don't know how efficient the package options are, I imagine they just use asymptotic approximations so this probably isn't a concern, just thought I'd mention it)

Factorial(x) for x>170 using Rmpfr/gmp library

The problem that I would like to solve is the infinite sum over the following function:
For the sum I use an FTOL determination criterion. This whole term doesn't create any problems until z becomes very large. I expect the maximum value of z around 220. As you can see the first term has its max around Factorial(221) and therefore has to go around Factorial(500) until the determination criterion has been reached. After spotting this problem I didn't want to change the whole code (as it is only one small part) and tried to use library('Rmpfr') and library('gmp'). The problem is that I do not get what I want to. While multiplication normally works, subtraction fails for higher values:
This works
> factorialZ(22)-factorial(22)
Big Integer ('bigz') :
[1] 0
but this fails:
> factorialZ(50)-factorial(50)
Big Integer ('bigz') :
[1] 359073645150499628823711419759505502520867983196160
another way I tried:
> gamma(as(10,"mpfr"))-factorial(9)
1 'mpfr' number of precision 128 bits
[1] 0
> gamma(as(40,"mpfr"))-factorial(39)
1 'mpfr' number of precision 128 bits
[1] 1770811808798664813196481658880
There has to be something that I don't really understand. Does someone have a even better solution for the problem or can someone help me out with the issue above?
I think you incorrectly understand the priorities in factorialZ(x)-factorial(x) . The second term, factorial(x) is calculated before it's converted to a bigz to be combined with the first term.
You must create any integer outside the 2^64 (or whatever, depending on your machine) range using a bigz - compatible function.
50! is between 2^214 and 2^215 so the closest representable numbers are 2^(214-52) apart. factorial in R is based on a Lanczos approximation whereas factorialZ is calculating it exactly. The answers are within the machine precision:
> all.equal(as.numeric(factorialZ(50)), factorial(50))
[1] TRUE
The part that you're not understanding is floating point and it's limitations. You're only getting ~15 digits of precision in floating point. factorialZ(50) has a LOT more precision than that, so you shouldn't expect them to be the same.

R -- Finding non-complex solution from function "polyroot"() with "Im()"

I am having trouble finding the non-complex solution from the complex array provided by polyroot().
coef1 = c(-10000,157.07963267949,0,0.523598775598299)
roots=polyroot(coef1)
returns
##[1] 23.01673- 0.00000i -11.50837+26.40696i -11.50837-26.40696i
and I would like the index that doesn't have an imaginary part. In this case:
roots[1]
## [1] 23.01673-0i
I am going to apply this process in a loop and would like to use Im() to isolate the non-complex solution, however, when I try using:
Im(roots)
## [1] -2.316106e-23 2.640696e+01 -2.640696e+01
and therefore cannot use something like:
which(Im(roots)==0)
which returns
##integer(0)
I am confident there is a real root given the plot from:
plot(function(x) -10000 + 157.07963267949*x + 0.523598775598299*x^3,xlim=c(0,50))
abline(0,0,col='red')
Is there some funny rounding going on? I would prefer a solution that doesn't involve ceiling() or anything similar. Any of you R experts have any ideas? Cheers, guys and gals!
You can use a certain error or threshold:
Re(roots)[abs(Im(roots)) < 1e-6]
[1] 23.01673
Graphically :
curve(-10000 + 157.07963267949*x + 0.523598775598299*x^3,xlim=c(0,50))
abline(0,0,col='red')
real_root <- Re(roots)[abs(Im(roots)) < 1e-6]
text(real_root,1,label=round(real_root,2),adj=c(1,-1),col='blue')
Notice the absurdly small imaginary part of the first root. It's a rounding issue, which are unfortunately common when computers do floating-point operations. Though it's a little inelegant, try
which(round(Im(roots), 12)
which will round it to 12 decimal places (more precision than you probably need).

Adding floating point precision to qnorm/pnorm?

I would be interested to increase the floating point limit for when calculating qnorm/pnorm from their current level, for example:
x <- pnorm(10) # 1
qnorm(x) # Inf
qnorm(.9999999999999999444) # The highst limit I've found that still return a <<Inf number
Is that (under a reasonable amount of time) possible to do? If so, how?
If the argument is way in the upper tail, you should be able to get better precision by calculating 1-p. Like this:
> x = pnorm(10, lower.tail=F)
> qnorm(x, lower.tail=F)
10
I would expect (though I don't know for sure) that the pnorm() function is referring to a C or Fortran routine that is stuck on whatever floating point size the hardware supports. Probably better to rearrange your problem so the precision isn't needed.
Then, if you're dealing with really really big z-values, you can use log.p=T:
> qnorm(pnorm(100, low=F, log=T), low=F, log=T)
100
Sorry this isn't exactly what you're looking for. But I think it will be more scalable -- pnorm hits 1 so rapidly at high z-values (it is e^(-x^2), after all) that even if you add more bits they will run out fast.

Resources