I am facing a big issue in computing the cumsum() of a vector. The vector has a length of ~ 10,000 elements and from element say 2000 the values go down to 1e-310. To give a feeling of the distribution I am dealing with, here is a plot.
When I try to apply cumsum() I get lots of ones, which is impossible, and a minimum value around 10^-2. I am porting a code which we developed in Matlab and of course no problems there. For some reason, R seems to have troubles in working with such small numbers to the extent that applying standard functions returns unexpected, and wrong, results.
I searched over stack overflow and found these two posts:
R: Number precision, how to prevent rounding?
Controlling number of decimal digits in print output in R
Unfortunately, none of them helped me out.
I also tried to use Rcpp cumsum() function with no luck. I guess the problem comes directly from the precision of my matrix object.
I am not even sure how to reproduce this so I am happy to share my 9137 x 2 matrix. I am completely stuck with this.
Looking forward to hearing from you guys!
Thanks
Update
Here is a sample of 100 elements from my matrix:
y <- sample( BestPair, 100 )
dput( y )
c(7.74958917761984e-289, 4.19283869686715e-319, 1.52834266884531e-319,
2.22089175309335e-297, 4.93980517192742e-298, 1.37861543592719e-301,
1.47044459800611e-317, 6.49068860911021e-319, 1.83302927898675e-305,
8.39514422452147e-312, 2.88487711616485e-300, 0.000544461085044608,
0.000435738736513519, 1.35649914843994e-309, 4.30826678309556e-310,
2.60728322623343e-319, 0.000544460617547516, 5.28815204888643e-299,
0.00102710912090133, 0.00198425117943324, 1.99711912768841e-304,
8.34594499227505e-306, 7.42055412763084e-300, 5.00039717762739e-311,
1.8750204972032e-305, 1.06513324565406e-310, 5.00487443690634e-313,
3.4890421843663e-319, 7.48945537292364e-310, 1.92948452007191e-310,
1.19840058299897e-305, 0.000532438536688165, 6.53966533658245e-318,
0.000499821676385928, 2.02305525482572e-305, 5.18981575493413e-306,
8.82648276295387e-320, 7.30476057376283e-320, 1.23073492422415e-291,
4.1801705284367e-311, 5.10863383734203e-318, 1.12106998189819e-298,
9.34823978505262e-297, 0.00093615863896107, 5.3667092510628e-311,
3.85094526994501e-319, 1.3693720559483e-313, 3.96230943126873e-311,
2.03293191294298e-319, 2.38607510351427e-291, 0.000544460855322345,
1.74738584846597e-310, 1.41874408662835e-318, 5.73056861298345e-319,
3.28565325597139e-302, 3.5412805275117e-310, 1.19647007227024e-302,
1.71539915106223e-305, 2.10738303243284e-311, 6.47783846432427e-313,
5.0072402480979e-303, 7.7250380240544e-303, 9.75451890703679e-309,
0.000533945755492525, 0.00211359631486468, 1.6612179399628e-312,
0.000521804571338402, 4.12194185271951e-308, 1.12829365794294e-313,
8.89772702908418e-319, 5.092756929242e-312, 7.45208240537024e-311,
6.60385177095196e-298, 0.000544461017733648, 1.62108867188263e-318,
3.95135528339003e-309, 1.8792966379072e-292, 5.98494480819088e-295,
0.00051614492665081, 2.25198141886419e-300, 7.97467977809552e-305,
1.78098757558338e-311, 1.66946525895122e-313, 0.000778442249425894,
6.58100207570114e-312, 0.00120733768329515, 3.44432924341767e-319,
6.38151190880713e-313, 7.1129669216109e-300, 4.11319531475755e-319,
7.21747577033383e-304, 1.48709312807403e-318, 1.39519898470211e-303,
4.58585270141592e-312, 2.16309869205282e-295, 7.55248601743105e-314,
3.16365476892733e-310, 1.96961507010996e-305, 3.21125377499206e-318,
3.66277772043574e-304)
Update 2
Apparently, imposing the following:
BestPair[ BestPair < .Machine$double.eps ] <- 0
does not solve the issue. Still finding weird results from cumsum(). Here is a plot to better explain what I am dealing with. The Cumulative Prob. has this shape because BestPair has been sorted by decreasing order. I want to have the 1 from cumsum() on top of my vector.
Here is a summary of the ob
> summary(CumProb)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0250 1.0000 1.0000 0.9685 1.0000 1.0000
Update 3. Results from Matlab
Here is the result as computed with Matlab. As you can see, I can get a pretty decent distribution which I cannot achieve in R even if I truncated the original matrix.
I'm trying to add a column to my data set in R with the Within formula.
Data set name: Full_Stats
Objective: Add Minutespergoal column using within formula
Formula
Full_Stats2<-within(Full_Stats,
{Minutespergoal<-Minutes_played/Goal })
The formula works fine, but I'd like to avoid having NaN and Inf in the result. How could I fix this?
Please let me know if any question.
Thanks
NaN occurs by dividing zero by zero, and infinity occurs by dividing a non-zero number by zero. You can avoid these by making sure that your denominator Goal is never zero. Assuming you wanted to remove these values you could try:
Full_Stats2<-within(Full_Stats,
{Minutespergoal<-Minutes_played/Goal })[Goal != 0]
I am running into an issue when I exponentiate floating point data. It seems like it should be an easy fix. Here is my sample code:
temp <- c(-0.005220092)
temp^1.1
[1] NaN
-0.005220092^1.1
[1] -0.003086356
Is there some obvious error I am making with this? It seems like it might be an oversight on my part with regard to exponents.
Thanks,
Alex
The reason for the NaN is because the result of the exponentiation is complex, so you have to pass a complex argument:
as.complex(temp)^1.1
[1] -0.002935299-0.000953736i
# or
(temp + 0i)^1.1
[1] -0.002935299-0.000953736i
The reason that your second expression works is because unary - has lower precedence than ^, so this is equivalent to -(0.005220092^1.1). See ?Syntax.
I do not know if other R users have found the following problem.
Within R I do the folowing operation:
> (3/-2)^(1/3)
[1] NaN
I obtain a NaN result.
I the similar way if I set:
> w<-(3/-2)
> g<-1/3
> w^g
[1] NaN
However, if I do:
> 3/-2
[1] -1.5
> -1.5^(1/3)
[1] -1.144714
Is there anybody that can explain this contradiction?
Where do you see a problem? -1.5^(1/3) is not the same as (-1.5)^(1/3). If you have basic maths education you shouldn't expect these to be the same.
Read help("Syntax") to learn that ^ has higher precedence than - in R.
This is due to the mathematical definition of exponentiation. For the continuous real exponentiation operator, you are not allowed to have a negative base.
Begin by doing (3/2)^(1/3) and after add "-"
you can't calculate a cube root of a negative number !
If you really want the answer you can do the computation over the complex numbers, i.e. get the cube root of -1.5+0i:
complex(real=-1.5,im=0)^(1/3)
## [1] 0.5723571+0.9913516i
This is actually only one of three complex roots of x^3+1.5==0:
polyroot(c(1.5,0,0,1))
[1] 0.5723571+0.9913516i -1.1447142+0.0000000i 0.5723571-0.9913516i
but the other answers probably come closer to addressing your real question.
I have a list of latitude and longitude values, and I'm trying to find the distance between them. Using a standard great circle method, I need to find:
acos(sin(lat1)*sin(lat2) + cos(lat1)*cos(lat2) * cos(long2-long1))
And multiply this by the radius of earth, in the units I am using. This is valid as long as the values we take the acos of are in the range [-1,1]. If they are even slightly outside of this range, it will return NaN, even if the difference is due to rounding.
The issue I have is that sometimes, when two lat/long values are identical, this gives me an NaN error. Not always, even for the same pair of numbers, but always the same ones in a list. For instance, I have a person stopped on a road in the desert:
Time |lat |long
1:00PM|35.08646|-117.5023
1:01PM|35.08646|-117.5023
1:02PM|35.08646|-117.5023
1:03PM|35.08646|-117.5023
1:04PM|35.08646|-117.5023
When I calculate the distance between the consecutive points, the third value, for instance, will always be NaN, even though the others are not. This seems to be a weird bug with R rounding.
Can't tell exactly without seeing your data (try dput), but this is mostly likely a consequence of FAQ 7.31.
(x1 <- 1)
## [1] 1
(x2 <- 1+1e-16)
## [1] 1
(x3 <- 1+1e-8)
## [1] 1
acos(x1)
## [1] 0
acos(x2)
## [1] 0
acos(x3)
## [1] NaN
That is, even if your values are so similar that their printed representations are the same, they may still differ: some will be within .Machine$double.eps and others won't ...
One way to make sure the input values are bounded by [-1,1] is to use pmax and pmin: acos(pmin(pmax(x,-1.0),1.0))
A simple workaround is to use pmin(), like this:
acos(pmin(sin(lat1)*sin(lat2) + cos(lat1)*cos(lat2) * cos(long2-long1),1))
It now ensures that the precision loss leads to a value no higher than exactly 1.
This doesn't explain what is happening, however.
(Edit: Matthew Lundberg pointed out I need to use pmin to get it tow work with vectorized inputs. This fixes the problem with getting it to work, but I'm still not sure why it is rounding incorrectly.)
I just encountered this. This is caused by input larger than 1. Due to the computational error, my inner product between unit norms becomes a bit larger than 1 (like 1+0.00001). And acos() can only deal with [-1,1]. So, we can clamp the upper bound to exactly 1 to solve the problem.
For numpy: np.clip(your_input, -1, 1)
For Pytorch: torch.clamp(your_input, -1, 1)