Ceiling function in R is inaccurate (sometimes) - r

I am trying to round up a number x to be divisible by a number m. Using the following function from previous post on SO:
roundUP = function(x,m)
{
return(m * ceiling(x/m))
}
But, when I input x = 0.28 and m = 0.005, the function outputs 0.285 when the result should be 0.28.
When I tried ceiling(0.28/0.005) it outputs 57 when the result should be 56 since 56 is already a whole number. Can anyone explain if is this happening and is this an error from Ceiling function?

The issue has to do with floating point arithmetic. You can find some details about this here.
Walk through the code below and it should shed some light on what's going on.
0.28/0.005 # Console displays 56
0.28/0.005 == 56 # returns FALSE. Why?
print(0.28/0.005, digits = 18) # 56.0000000000000071
# Solution?
roundUP = function(x, m)
{
return(m * ceiling(round(x/m, 9)))
}
Also, note the Warning section within ?ceiling
The realities of computer arithmetic can cause unexpected results,
especially with floor and ceiling. For example, we ‘know’ that
floor(log(x, base = 8)) for x = 8 is 1, but 0 has been seen on an R
platform. It is normally necessary to use a tolerance.

In R,
floor(x) function rounds to the nearest integer that’s smaller than x.
ceiling(x) function rounds to the nearest integer that’s larger than x.
So whatever be the value after the decimal point, R considers the next integer (before/after).
If you want 56 as the output, you must use floor(0.28/0.005).

Related

R Precision for Double - Why code returns negative why positive outcome expected?

I am testing 2 ways of calculating Prod(b-a), where a and b are vectors of length n. Prod(b-a)=(b1-a1)(b2-a2)(b3-a3)*... (bn-an), where b_i>a_i>0 for all i=1,2,3, n. For some special cases, another way (Method 2) of calculation this prod(b-a) is more efficient. It uses the following formula, which is to expand the terms and sum them:
Here is my question is: When it happens that a_i very close to b_i, the true outcome could be very, very close 0, something like 10^(-16). Method 1 (substract and Multiply) always returns positive output. Method 2 of using the formula some times return negative output ( about 7~8% of time returning negative for my experiment). Mathematically, these 2 methods should return exactly the same output. But in computer language, it apparently produces different outputs.
Here are my codes to run the test. When I run the testing code for 10000 times, about 7~8% of my runs for method 2 returns negative output. According to the official document, the R double has the precision of "2.225074e-308" as indicated by R parameter: ".Machine$double.xmin". Why it's getting into the negative values when the differences are between 10^(-16) ~ 10^(-18)? Any help that sheds light on this will be apprecaited. I would also love some suggestions concerning how to practically increase the precision to higher level as indicated by R document.
########## Testing code 1.
ftest1case<-function(a,b) {
n<-length(a)
if (length(b)!=n) stop("--------- length a and b are not right.")
if ( any(b<a) ) stop("---------- b has to be greater than a all the time.")
out1<-prod(b-a)
out2<-0
N<-2^n
for ( i in 1:N ) {
tidx<-rev(as.integer(intToBits(x=i-1))[1:n])
tsign<-ifelse( (sum(tidx)%%2)==0,1.0,-1.0)
out2<-out2+tsign*prod(b[tidx==0])*prod(a[tidx==1])
}
c(out1,out2)
}
########## Testing code 2.
ftestManyCases<-function(N,printFreq=1000,smallNum=10^(-20))
{
tt<-matrix(0,nrow=N,ncol=2)
n<-12
for ( i in 1:N) {
a<-runif(n,0,1)
b<-a+runif(n,0,1)*0.1
tt[i,]<-ftest1case(a=a,b=b)
if ( (i%%printFreq)==0 ) cat("----- i = ",i,"\n")
if ( tt[i,2]< smallNum ) cat("------ i = ",i, " ---- Negative summation found.\n")
}
tout<-apply(tt,2,FUN=function(x) { round(sum(x<smallNum)/N,6) } )
names(tout)<-c("PerLess0_Method1","PerLee0_Method2")
list(summary=tout, data=tt)
}
######## Step 1. Test for 1 case.
n<-12
a<-runif(n,0,1)
b<-a+runif(n,0,1)*0.1
ftest1case(a=a,b=b)
######## Step 2 Test Code 2 for multiple cases.
N<-300
tt<-ftestManyCases(N=N,printFreq = 100)
tt[[1]]
It's hard for me to imagine when an algorithm that consists of generating 2^n permutations and adding them up is going to be more efficient than a straightforward product of differences, but I'll take your word for it that there are some special cases where it is.
As suggested in comments, the root of your problem is the accumulation of floating-point errors when adding values of different magnitudes; see here for an R-specific question about floating point and here for the generic explanation.
First, a simplified example:
n <- 12
set.seed(1001)
a <- runif(a,0,1)
b <- a + 0.01
prod(a-b) ## 1e-24
out2 <- 0
N <- 2^n
out2v <- numeric(N)
for ( i in 1:N ) {
tidx <- rev(as.integer(intToBits(x=i-1))[1:n])
tsign <- ifelse( (sum(tidx)%%2)==0,1.0,-1.0)
j <- as.logical(tidx)
out2v[i] <- tsign*prod(b[!j])*prod(a[j])
}
sum(out2v) ## -2.011703e-21
Using extended precision (with 1000 bits of precision) to check that the simple/brute force calculation is more reliable:
library(Rmpfr)
a_m <- mpfr(a, 1000)
b_m <- mpfr(b, 1000)
prod(a_m-b_m)
## 1.00000000000000857647286522936696473705868726043995807429578968484409120647055193862325070279593735821154440625984047036486664599510856317884962563644275433171621778761377125514191564456600405460403870124263023336542598111475858881830547350667868450934867675523340703947491662460873009229537576817962228e-24
This proves the point in this case, but in general doing extended-precision arithmetic will probably kill any performance gains you would get.
Redoing the permutation-based calculation with mpfr values (using out2 <- mpfr(0, 1000), and going back to the out2 <- out2 + ... running summation rather than accumulating the values in a vector and calling sum()) gives an accurate answer (at least to the first 20 or so digits, I didn't check farther), but takes 6.5 seconds on my machine (instead of 0.03 seconds when using regular floating-point).
Why is this calculation problematic? First, note the difference between .Machine$double.xmin (approx 2e-308), which is the smallest floating-point value that the system can store, and .Machine$double.eps (approx 2e-16), which is the smallest value such that 1+x > x, i.e. the smallest relative value that can be added without catastrophic cancellation (values a little bit bigger than this magnitude will experience severe, but not catastrophic, cancellation).
Now look at the distribution of values in out2v, the series of values in out2v:
hist(out2v)
There are clusters of negative and positive numbers of similar magnitude. If our summation happens to add a bunch of values that almost cancel (so that the result is very close to 0), then add that to another value that is not nearly zero, we'll get bad cancellation.
It's entirely possible that there's a way to rearrange this calculation so that bad cancellation doesn't happen, but I couldn't think of one easily.

Best way to correct the modulus error in R?

The core R engine has a serious flaw with the way it expresses output from the Modulus operation:
ceiling((1.99 %% 1) * 100)
Returns: 99 (correct)
ceiling((2.99 %% 1) * 100)
Returns: 100 (incorrect)
The behavior will manifest in any integer value N + 2.99 (e.g. 3.99, etc.). If this is tied to a floating point representation, the system is not expressing the full details of the difference. This is especially disturbing because:
Both (1.99 %% 1) and (2.99 %% 1) appear to return 0.99.
Both ((1.99 %% 1) * 100) and ((2.99 %% 1) * 100) appear to return 99.
However, if you do any rounding or similar mathematical operations, the invisible residual value for 2.99 flips things in an unexpected way.
While solving this problem for my current application is trivial:
floor((2.99 - floor(2.99)) * 100)
Returns: 99 (correct)
sprintf("%.22f", floor((2.99 - floor(2.99)) * 100))
Returns: 99.0000000000000000000000 (correct)
... I wonder how many other instances that Modulus returns bad values without the underlying detail to show the floating point delta. Is there a way to expose the underlying residual value which Modulus seems to attach? It's otherwise invisible.
EDIT: As per the generous example from andrew.punnett below, print(1.99, digits = 22) returns 1.99 (no float expansion), while print(1.99 %% 1, digits = 22) returns 0.98999999999999999. As per the astute eye of Aaron, this appears to be version and / or system dependent.
Thanks!
This isn't really a bug in R. It is really a property of floating-point arithmetic.
The problem arises because neither 1.99 or 2.99 can be represented exactly as a floating-point number. The closest decimal number to 2.99 that can be stored in a double precision (64bit) floating-point number is 2.99000000000000021316282072803 (try the conversion here)
Therefore the expression evaluates as:
ceiling((2.99 %% 1) * 100) = ceiling(99.000000000000021316282072803)
= 100
Contrastingly, the nearest representation of 1.99 is 1.989999999999999991118215803 which happens to give the answer you expect:
ceiling((1.99 %% 1) * 100) = ceiling(98.9999999999999991118215803)
= 99
Both results are correct with respect to IEEE 754 floating-point arithmetic, but as you have seen only one agrees with the result you would get by applying the rules of real-number arithmetic.
This problem is compounded by the fact that the default behaviour in R is to truncate every floating-point number you print(). If you want to see more digits, then you must supply a digits parameter:
print(1.99, digits = 22)
However, even this doesn't give you the correct number of digits on all platforms, so a more reliable way to accurately view a floating-point number is:
cat(sprintf("%.22f\n", 1.99))

modelling an infinite series in R

I'm trying to write a code to approximate the following infinite Taylor series from the Theis hydrogeological equation in R.
I'm pretty new to functional programming, so this was a challenge! This is my attempt:
Wu <- function(u, repeats = 100) {
result <- numeric(repeats)
for (i in seq_along(result)){
result[i] <- -((-u)^i)/(i * factorial(i))
}
return(sum(result) - log(u)-0.5772)
}
I've compared the results with values from a data table available here: https://pubs.usgs.gov/wsp/wsp1536-E/pdf/wsp_1536-E_b.pdf - see below (excuse verbose code - should have made a csv, with hindsight):
Wu_QC <- data.frame(u = c(1.0*10^-15, 4.1*10^-14,9.9*10^-13, 7.0*10^-12, 3.7*10^-11,
2.3*10^-10, 6.8*10^-9, 5.7*10^-8, 8.4*10^-7, 6.3*10^-6,
3.1*10^-5, 7.4*10^-4, 5.1*10^-3, 2.9*10^-2,8.7*10^-1,
4.6,9.90),
Wu_table = c(33.9616, 30.2480, 27.0639, 25.1079, 23.4429,
21.6157, 18.2291, 16.1030, 13.4126, 11.3978,
9.8043,6.6324, 4.7064,2.9920,0.2742,
0.001841,0.000004637))
Wu_QC$rep_100 <- Wu(Wu_QC$u,100)
The good news is the formula gives identical results for repeats = 50, 100, 150 and 170 (so I've just given you the 100 version above). The bad news is that, while the function performs well for u < ~10^-3, it goes off the rails and gives negative outputs for numbers within an order of magnitude or so of 1. This doesn't happen when I just call the function on an individual number. i.e:
> Wu(4.6)
[1] 0.001856671
Which is the correct answer to 2sf.
Can anyone spot what I've done wrong and/or suggest a better way to code this equation? I think the problem is something to do with my for loop and/or an issue with the factorials generating infinite numbers as u gets larger, but I'm not at all certain.
Thanks!
As it says on page 93 of your reference, W is also known as the exponential integral. See also here.
Then, e.g., the package expint provides a function to compute W(u):
library(expint)
expint(10^(-8))
# [1] 17.84347
expint(4.6)
# [1] 0.001841006
where the results are exactly as in your referred table.
You can write a function that takes in a value together with the repetition times and outputs the required value:
w=function(u,l){
a=2:l
-0.5772-log(u)+u+sum(u^(a)*rep(c(-1,1),length=l-1)/(a)/factorial(a))
}
transform(Wu_QC,new=Vectorize(w)(u,170))
u Wu_table new
1 1.0e-15 3.39616e+01 3.396158e+01
2 4.1e-14 3.02480e+01 3.024800e+01
3 9.9e-13 2.70639e+01 2.706387e+01
4 7.0e-12 2.51079e+01 2.510791e+01
5 3.7e-11 2.34429e+01 2.344290e+01
6 2.3e-10 2.16157e+01 2.161574e+01
7 6.8e-09 1.82291e+01 1.822914e+01
8 5.7e-08 1.61030e+01 1.610301e+01
9 8.4e-07 1.34126e+01 1.341266e+01
10 6.3e-06 1.13978e+01 1.139777e+01
11 3.1e-05 9.80430e+00 9.804354e+00
12 7.4e-04 6.63240e+00 6.632400e+00
13 5.1e-03 4.70640e+00 4.706408e+00
14 2.9e-02 2.99200e+00 2.992051e+00
15 8.7e-01 2.74200e-01 2.741930e-01
16 4.6e+00 1.84100e-03 1.856671e-03
17 9.9e+00 4.63700e-06 2.030179e-05
As the numbers become large the estimation is not quite good, so we should have to go further than 170! but R cannot do that. Maybe you can try other platforms. ie Python
I think I may have solved this myself (though borrowing heavily from Onyambo's answer!) Here's my code:
well_func2 <- function (u, l = 100) {
result <- numeric(length(u))
a <- 2:l
for(i in seq_along(u)){
result[i] <- -0.5772-log(u[i])+u[i]+sum(u[i]^(a)*rep(c(-1,1),length=l-1)/(a)/factorial(a))
}
return(result)
}
As far as I can tell so far, this matches the tabulated results well for u <5 (as did Onyambo's code), and it also gives the same result for vector vs single-value inputs.
Still needs a bit more testing, and there's probably a tidier way to code it using map() or similar instead of the for loop, but I'm happy enough for now. Thought I'd share in case anyone else has the same problem.

Inaccurate results calculating Fibonacci numbers in R

I'm calculating Fibonacci numbers in R with this code:
fib <- function(n) {
a = 0
b = 1
for (i in 1:n) {
tmp = b
b = a
a = a + tmp
}
return (a)
}
sprintf("%.0f",fib(79))
From fib(79) onwards I'm getting inaccurate results. For instance: fib(79) = "14472334024676220" when the right result, according to this web, should be: fib(79)=14472334024676221
Using the function fibonacci from the package numbers I get the same inaccurate results. I assume this is because of the number precision in R.
How can I bypass this limitation and get accurate Fibonacci numbers in R?
Thank you for voting the question. My reputation is above 10 so I can post it rigth now :). I've got a pretty simple solution using the package gmp (a library mentioned as well in the link provided by Ben Bolker) to sum large integers.
require(gmp)
fib <- function(n) {
a = 0
b = 1
for (i in 1:n) {
tmp = b
b = a
a = add.bigz(a, tmp) # gmp function
}
return (a)
}
fib(79)
The result is the right Fibonacci number, fib(79): 14472334024676221.
I tested it for even larger integers fib(5000), 1045 digits, and the result seems accurate to the last digit.
You reached the limits of double precision floating-point arithmetics which has 16 decimal digits of accuracy. You do require 17 here. AFAIK R does not have a variable type with greater precision.
To bypass this you might want to split your operands and sum'em separately.
A state-of-the-art way around this is to convert your operands to character, parse them from astern, adding digit by digit, paying attention for carryover.

Fast exponentiation when only first k digits are required?

This is actually for a programming contest, but I've tried really hard and haven't got even the faintest clue how to do this.
Find the first and last k digits of nm where n and m can be very large ~ 10^9.
For the last k digits I implemented modular exponentiation.
For the first k I thought of using the binomial theorem upto certain powers but that involves quite a lot of computation for factorials and I'm not sure how to find an optimal point at which n^m can be expanded as (x+y)m.
So is there any known method to find the first k digits without performing the entire calculation?
Update 1 <= k <= 9 and k will always be <= digits in nm
not sure, but the identity nm = exp10(m log10(n)) = exp(q (m log(n)/q)) where q = log(10) comes to mind, along with the fact that the first K digits of exp10(x) = the first K digits of exp10(frac(x)) where frac(x) = the fractional part of x = x - floor(x).
To be more explicit: the first K digits of nm are the first K digits of its mantissa = exp(frac(m log(n)/q) * q), where q = log(10).
Or you could even go further in this accounting exercise, and use exp((frac(m log(n)/q)-0.5) * q) * sqrt(10), which also has the same mantissa (+ hence same first K digits) so that the argument of the exp() function is centered around 0 (and between +/- 0.5 log 10 = 1.151) for speedy convergence.
(Some examples: suppose you wanted the first 5 digits of 2100. This equals the first 5 digits of exp((frac(100 log(2)/q)-0.5)*q)*sqrt(10) = 1.267650600228226. The actual value of 2100 is 1.267650600228229e+030 according to MATLAB, I don't have a bignum library handy. For the mantissa of 21,000,000,000 I get 4.612976044195602 but I don't really have a way of checking.... There's a page on Mersenne primes where someone's already done the hard work; 220996011-1 = 125,976,895,450... and my formula gives 1.259768950493908 calculated in MATLAB which fails after the 9th digit.)
I might use Taylor series (for exp and log, not for nm) along with their error bounds, and keep adding terms until the error bounds drop below the first K digits. (normally I don't use Taylor series for function approximation -- their error is optimized to be most accurate around a single point, rather than over a desired interval -- but they do have the advantage that they're mathematically simple, and you can increased accuracy to arbitrary precision simply by adding additional terms)
For logarithms I'd use whatever your favorite approximation is.
Well. We want to calculate and to get only n first digits.
Calculate by the following iterations:
You have .
Calcluate each not exactly.
The thing is that the relative error of is less
than n times relative error of a.
You want to get final relative error less than .
Thus relative error on each step may be .
Remove last digits at each step.
For example, a=2, b=16, n=1. Final relative error is 10^{-n} = 0,1.
Relative error on each step is 0,1/16 > 0,001.
Thus 3 digits is important on each step.
If n = 2, you must save 4 digits.
2 (1), 4 (2), 8 (3), 16 (4), 32 (5), 64 (6), 128 (7), 256 (8), 512 (9), 1024 (10) --> 102,
204 (11), 408 (12), 816 (13), 1632 (14) -> 163, 326 (15), 652 (16).
Answer: 6.
This algorithm has a compexity of O(b). But it is easy to change it to get
O(log b)
Suppose you truncate at each step? Not sure how accurate this would be, but, e.g., take
n=11
m=some large number
and you want the first 2 digits.
recursively:
11 x 11 -> 121, truncate -> 12 (1 truncation or rounding)
then take truncated value and raise again
12 x 11 -> 132 truncate -> 13
repeat,
(132 truncated ) x 11 -> 143.
...
and finally add #0's equivalent to the number of truncations you've done.
Have you taken a look at exponentiation by squaring? You might be able to modify one of the methods such that you only compute what's necessary.
In my last algorithms class we had to implement something similar to what you're doing and I vaguely remember that page being useful.

Resources