Is there a way to handle calculations invovling exponential of big values in R? - r

I have looked a bit online and in the site but I did not find any solution. My problem is relatively simple so if you could point me to a possible solution, much appreciated.
test_vec <- c(2,8,709,600)
mean(exp(test_vec))
test_vec_bis <- c(2,8,710,600)
mean(exp(test_vec_bis))
exp(709)
exp(710)
# The numerical limit of R is at exp(709)
How can I calculate the mean of my vector and deal with the Inf values knowing that R could probably handle the mean value but not all values in the numerator of the mean calculation ?

There is an edge case where you can solve your problem by simply restating your problem mathematically, but that would require that the length of your vector is extremely large and/or that your large exp. numbers are close to the numeric limit:
Since the mean sum(x)/n can be written as sum(x/n) and since exp(x)/exp(y) = exp(x-y), you can calculate sum(exp(x-log(n))), which gives you a relief of log(n).
mean(exp(test_vec))
[1] 2.054602e+307
sum(exp(test_vec - log(length(test_vec))))
[1] 2.054602e+307
sum(exp(test_vec_bis - log(length(test_vec_bis))))
[1] 5.584987e+307
While this works for your example, most likely this won't work for your real vector.
In this case, you will have to consult packages like Rmpfr as suggested by #fra.

Here's one way where you qualify to only select those in your test_vec that give an answer < Inf:
mean(exp(test_vec)[which(exp(test_vec) < Inf)])
[1] 1.257673e+260
t2 <- c(2,8,600)
mean(exp(t2))
[1] 1.257673e+260
This assumes you were looking to exclude values that result in Inf, of course.

Related

unprecise math in R when dealing with infinite fractions

The deviations of the mean should always sum up to 0.
However, when the mean has a lot of digits, maybe infinitely like this one which is 20/7, R fails to calculate it.
x <- c(1,2,2,3,3,4,5)
sum(x - mean(x))
[1] -4.440892e-16
I am quite a newbie and have not found any information about this so far, maybe I was not searching for the right terms.
Is it possible to calculate with infinitely long numbers in R?
I am asking this out of theoretical interest.
The problem you have described is a general problem with all programming languages. Internally all floats are based on the IEEE754 convention. You can read more about it here.
As far as I know there is no easy way around these small errors, except for using number representations with higher precision.
EDIT: R already used the double precision representation of floating point numbers. To read more about it you can have a look at the R FAQ and this SO question.
If you deal with rational numbers only, such as your example, you can use the gmp package.
You can use the Rmpfr package to deal with numbers with an arbitrary precision (that you have to set).
Another possibility is the lazyNumbers package, freshly released on CRAN:
library(lazyNumbers)
# create a vector of lazy numbers
x <- lazyvec(c(1, 2, 2, 3, 3, 4, 5))
# compute its mean
m <- sum(x) / length(x)
# sum expected to be 0
y <- sum(x - m)
# convert it to double
as.double(y)
## 0

Finding the Proportion of a specific difference between the average of two vectors

I have a question for an assignment I'm doing.
Q:
"Set the seed at 1, then using a for-loop take a random sample of 5 mice 1,000 times. Save these averages.
What proportion of these 1,000 averages are more than 1 gram away from the average of x ?"
I understand that basically, I need to write a code that says: What percentage of "Nulls" is +or- 1 gram from the average of "x." I'm not really certain how to write that given that this course hasn't given us the information on how to do that yet is asking us to do so. Any help on how to do so?
url <- "https://raw.githubusercontent.com/genomicsclass/dagdata/master/inst/extdata/femaleControlsPopulation.csv"
filename <- basename(url)
download(url, destfile=filename)
x <- unlist( read.csv(filename) )
set.seed(1)
n <- 1000
nulls<-vector("numeric", n)
for(i in 1:n){
control <- sample(x, 5)
nulls[i] <-mean(control)
##I know my last line for this should be something like this
## mean(nulls "+ or - 1")> or < mean(x)
## not certain if they're asking for abs() to be involved.
## is the question asking only for those that are 1 gram MORE than the avg of x?
}
Thanks for any help.
Z
I do think that the absolute distance is what they're after here.
Vectors in R are nice in that you can just perform arithmetic operations between a vector and a scalar and it will apply it element-wise, so computing the absolute value of nulls - mean(x) is easy. The abs function also takes vectors as arguments.
Logical operators (such as < and >) can also be used in the same way, making it equally simple to compare the result with 1. This will yield a vector of booleans (TRUE/FALSE) where TRUE means the value at that index was indeed greater than 1, but booleans are really just numbers (1 or 0), so you can just sum that vector to find the number of TRUE elements.
I don't know what programming level you are on, but I hope this helps without giving the solution away completely (since you said it's for an assignment).

Arithmetics with extremely small values in R

I am facing a big issue in computing the cumsum() of a vector. The vector has a length of ~ 10,000 elements and from element say 2000 the values go down to 1e-310. To give a feeling of the distribution I am dealing with, here is a plot.
When I try to apply cumsum() I get lots of ones, which is impossible, and a minimum value around 10^-2. I am porting a code which we developed in Matlab and of course no problems there. For some reason, R seems to have troubles in working with such small numbers to the extent that applying standard functions returns unexpected, and wrong, results.
I searched over stack overflow and found these two posts:
R: Number precision, how to prevent rounding?
Controlling number of decimal digits in print output in R
Unfortunately, none of them helped me out.
I also tried to use Rcpp cumsum() function with no luck. I guess the problem comes directly from the precision of my matrix object.
I am not even sure how to reproduce this so I am happy to share my 9137 x 2 matrix. I am completely stuck with this.
Looking forward to hearing from you guys!
Thanks
Update
Here is a sample of 100 elements from my matrix:
y <- sample( BestPair, 100 )
dput( y )
c(7.74958917761984e-289, 4.19283869686715e-319, 1.52834266884531e-319,
2.22089175309335e-297, 4.93980517192742e-298, 1.37861543592719e-301,
1.47044459800611e-317, 6.49068860911021e-319, 1.83302927898675e-305,
8.39514422452147e-312, 2.88487711616485e-300, 0.000544461085044608,
0.000435738736513519, 1.35649914843994e-309, 4.30826678309556e-310,
2.60728322623343e-319, 0.000544460617547516, 5.28815204888643e-299,
0.00102710912090133, 0.00198425117943324, 1.99711912768841e-304,
8.34594499227505e-306, 7.42055412763084e-300, 5.00039717762739e-311,
1.8750204972032e-305, 1.06513324565406e-310, 5.00487443690634e-313,
3.4890421843663e-319, 7.48945537292364e-310, 1.92948452007191e-310,
1.19840058299897e-305, 0.000532438536688165, 6.53966533658245e-318,
0.000499821676385928, 2.02305525482572e-305, 5.18981575493413e-306,
8.82648276295387e-320, 7.30476057376283e-320, 1.23073492422415e-291,
4.1801705284367e-311, 5.10863383734203e-318, 1.12106998189819e-298,
9.34823978505262e-297, 0.00093615863896107, 5.3667092510628e-311,
3.85094526994501e-319, 1.3693720559483e-313, 3.96230943126873e-311,
2.03293191294298e-319, 2.38607510351427e-291, 0.000544460855322345,
1.74738584846597e-310, 1.41874408662835e-318, 5.73056861298345e-319,
3.28565325597139e-302, 3.5412805275117e-310, 1.19647007227024e-302,
1.71539915106223e-305, 2.10738303243284e-311, 6.47783846432427e-313,
5.0072402480979e-303, 7.7250380240544e-303, 9.75451890703679e-309,
0.000533945755492525, 0.00211359631486468, 1.6612179399628e-312,
0.000521804571338402, 4.12194185271951e-308, 1.12829365794294e-313,
8.89772702908418e-319, 5.092756929242e-312, 7.45208240537024e-311,
6.60385177095196e-298, 0.000544461017733648, 1.62108867188263e-318,
3.95135528339003e-309, 1.8792966379072e-292, 5.98494480819088e-295,
0.00051614492665081, 2.25198141886419e-300, 7.97467977809552e-305,
1.78098757558338e-311, 1.66946525895122e-313, 0.000778442249425894,
6.58100207570114e-312, 0.00120733768329515, 3.44432924341767e-319,
6.38151190880713e-313, 7.1129669216109e-300, 4.11319531475755e-319,
7.21747577033383e-304, 1.48709312807403e-318, 1.39519898470211e-303,
4.58585270141592e-312, 2.16309869205282e-295, 7.55248601743105e-314,
3.16365476892733e-310, 1.96961507010996e-305, 3.21125377499206e-318,
3.66277772043574e-304)
Update 2
Apparently, imposing the following:
BestPair[ BestPair < .Machine$double.eps ] <- 0
does not solve the issue. Still finding weird results from cumsum(). Here is a plot to better explain what I am dealing with. The Cumulative Prob. has this shape because BestPair has been sorted by decreasing order. I want to have the 1 from cumsum() on top of my vector.
Here is a summary of the ob
> summary(CumProb)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0250 1.0000 1.0000 0.9685 1.0000 1.0000
Update 3. Results from Matlab
Here is the result as computed with Matlab. As you can see, I can get a pretty decent distribution which I cannot achieve in R even if I truncated the original matrix.

Round numbers in output to show small near-zero quantities as zero

I would like the output of my R console to look readable. To this end, I would like R to round all my numbers to the nearest N decimal places. I have some success but it doesn't work completely:
> options(scipen=100, digits=4)
> .000000001
[1] 0.000000001
> .1
[1] 0.1
> 1.23123123123
[1] 1.231
I would like the 0.000000001 to be displayed as simply 0. How does one do this? Let me be more specific: I would like a global fix for the entire R session. I realize I can start modifying things by rounding them but it's less helpful than simply setting things for the entire session.
Look at ?options, specifically the digits and scipen options.
try
sprintf("%.4f", 0.00000001)
[1] "0.0000"
Combine what Greg Snow and Ricardo Saporta already gave you to get the right answer: options('scipen'=+20) and options('digits'=2) , combined with round(x,4) .
round(x,4) will round small near-zero quantities.
Either you round off the results of your regression once and store it:
x <- round(x, 4)
... or else yes, you have to do that every time you display the small quantity, if you don't want to store its rounded value. In your case, since you said small near-zero quantities effectively represent zero, why don't you just round it?
If for some reason you need to keep both the precise and the rounded versions, then do.
x.rounded <- round(x, 4)

acos(1) returns NaN for some values, not others

I have a list of latitude and longitude values, and I'm trying to find the distance between them. Using a standard great circle method, I need to find:
acos(sin(lat1)*sin(lat2) + cos(lat1)*cos(lat2) * cos(long2-long1))
And multiply this by the radius of earth, in the units I am using. This is valid as long as the values we take the acos of are in the range [-1,1]. If they are even slightly outside of this range, it will return NaN, even if the difference is due to rounding.
The issue I have is that sometimes, when two lat/long values are identical, this gives me an NaN error. Not always, even for the same pair of numbers, but always the same ones in a list. For instance, I have a person stopped on a road in the desert:
Time |lat |long
1:00PM|35.08646|-117.5023
1:01PM|35.08646|-117.5023
1:02PM|35.08646|-117.5023
1:03PM|35.08646|-117.5023
1:04PM|35.08646|-117.5023
When I calculate the distance between the consecutive points, the third value, for instance, will always be NaN, even though the others are not. This seems to be a weird bug with R rounding.
Can't tell exactly without seeing your data (try dput), but this is mostly likely a consequence of FAQ 7.31.
(x1 <- 1)
## [1] 1
(x2 <- 1+1e-16)
## [1] 1
(x3 <- 1+1e-8)
## [1] 1
acos(x1)
## [1] 0
acos(x2)
## [1] 0
acos(x3)
## [1] NaN
That is, even if your values are so similar that their printed representations are the same, they may still differ: some will be within .Machine$double.eps and others won't ...
One way to make sure the input values are bounded by [-1,1] is to use pmax and pmin: acos(pmin(pmax(x,-1.0),1.0))
A simple workaround is to use pmin(), like this:
acos(pmin(sin(lat1)*sin(lat2) + cos(lat1)*cos(lat2) * cos(long2-long1),1))
It now ensures that the precision loss leads to a value no higher than exactly 1.
This doesn't explain what is happening, however.
(Edit: Matthew Lundberg pointed out I need to use pmin to get it tow work with vectorized inputs. This fixes the problem with getting it to work, but I'm still not sure why it is rounding incorrectly.)
I just encountered this. This is caused by input larger than 1. Due to the computational error, my inner product between unit norms becomes a bit larger than 1 (like 1+0.00001). And acos() can only deal with [-1,1]. So, we can clamp the upper bound to exactly 1 to solve the problem.
For numpy: np.clip(your_input, -1, 1)
For Pytorch: torch.clamp(your_input, -1, 1)

Resources