Relational operator is not producing the expected answer in R [duplicate] - r

I'm trying to compare two numbers in R as a part of a if-statement condition:
(a-b) >= 0.5
In this particular instance, a = 0.58 and b = 0.08... and yet (a-b) >= 0.5 is false. I'm aware of the dangers of using == for exact number comparisons, and this seems related:
(a - b) == 0.5) is false, while
all.equal((a - b), 0.5) is true.
The only solution I can think of is to have two conditions: (a-b) > 0.5 | all.equal((a-b), 0.5). This works, but is that really the only solution? Should I just swear off of the = family of comparison operators forever?
Edit for clarity: I know that this is a floating point problem. More fundamentally, what I'm asking is: what should I do about it? What's a sensible way to deal with greater-than-or-equal-to comparisons in R, since the >= can't really be trusted?

I've never been a fan of all.equal for such things. It seems to me the tolerance works in mysterious ways sometimes. Why not just check for something greater than a tolerance less than 0.05
tol = 1e-5
(a-b) >= (0.05-tol)
In general, without rounding and with just conventional logic I find straight logic better than all.equal
If x == y then x-y == 0. Perhaps x-y is not exactly 0 so for such cases I use
abs(x-y) <= tol
You have to set tolerance anyway for all.equal and this is more compact and straightforward than all.equal.

You could create this as a separate operator or overwrite the original >= function (probably not a good idea) if you want to use this approach frequently:
# using a tolerance
epsilon <- 1e-10 # set this as a global setting
`%>=%` <- function(x, y) (x + epsilon > y)
# as a new operator with the original approach
`%>=%` <- function(x, y) (all.equal(x, y)==TRUE | (x > y))
# overwriting R's version (not advised)
`>=` <- function(x, y) (isTRUE(all.equal(x, y)) | (x > y))
> (a-b) >= 0.5
[1] TRUE
> c(1,3,5) >= 2:4
[1] FALSE FALSE TRUE

For completeness' sake, I'll point out that, in certain situations, you could simply round to a few decimal places (and this is kind of a lame solution by comparison to the better solution previously posted.)
round(0.58 - 0.08, 2) == 0.5

One more comment. The all.equal is a generic. For numeric values, it uses all.equal.numeric. An inspection of this function shows that it used .Machine$double.eps^0.5, where .Machine$double.eps is defined as
double.eps: the smallest positive floating-point number ‘x’ such that
‘1 + x != 1’. It equals ‘double.base ^ ulp.digits’ if either
‘double.base’ is 2 or ‘double.rounding’ is 0; otherwise, it
is ‘(double.base ^ double.ulp.digits) / 2’. Normally
‘2.220446e-16’.
(.Machine manual page).
In other words, that would be an acceptable choice for your tolerance:
myeq <- function(a, b, tol=.Machine$double.eps^0.5)
abs(a - b) <= tol

Choose some tolerance level:
epsilon <- 1e-10
Then use
(a-b+epsilon) >= 0.5

But, if your using tolerances anyway, why do you care that a-b == .5 (in fact) doesn't get evaluated? If you are using tolerances anyway you are saying I don't care about the end points exactly.
Here is what is true
if( (a-b) >= .5)
if( (a-b) < .5)
one of those should always evaluate true on every pair of doubles. Any code that uses one implicitly defines a no operation on the other one, at least. If your using tolerances to get actual .5 included in the first but your problem is defined on a continuous domain you arn't accomplishing much. In most problems involving continuous values in the underlying problem there will be very little point to that, since values arbitrarily over .5 will always evaluate as they should. Values arbitrarily close to .5 will go to the "wrong" flow control, but in continuous problems where you are using appropriate precision that doesn't matter.
The only time that tolerances make sense is when you are dealing with problems of the type
if( (a-b) == c)
if( (a-b) != c)
Here no amount of "appropriate precision" can help you. The reason is that you have to be prepared that the second will always evaluate to true unless you set the bits of a-b at a very low level by hand, when in fact you probably want the first to sometimes be true.

<= and >= comparisons are not language specific when numerical difficulty is raised in floating point numbers.
IsSmallerOrEqual <- function(a,b) { # To check a <= b
# Check whether "Mean relative difference..." exist in all.equal's result;
# If exists, it results in character, not logical
if ( class(all.equal(a, b)) == "logical" && (a<b | all.equal(a, b))) { return(TRUE)
} else if (a < b) { return(TRUE)
} else { return(FALSE) }
}
IsSmallerOrEqual(abs(-2-(-2.2)), 0.2) # TRUE; To check |-2-(-2.2)| <= 0.2
IsSmallerOrEqual(abs(-2-(-2.2)), 0.3) # TRUE
IsSmallerOrEqual(abs(-2-(-2.2)), 0.1) # FALSE
IsBiggerOrEqual <- function(a,b) { # To check a >= b
# Check whether "Mean relative difference..." exist in all.equal's result;
# If exists, it results in character, not logical
if ( class(all.equal(a, b)) == "logical" && (a>b | all.equal(a, b))) { return(TRUE)
} else if (a > b) { return(TRUE)
} else { return(FALSE) }
}
IsBiggerOrEqual(3,3) # TRUE
IsBiggerOrEqual(4,3) # TRUE
IsBiggerOrEqual(3,4) # FALSE
IsBiggerOrEqual(0.58 - 0.08,0.5) # TRUE
If all.equal is not processed, we may encounter errors.
The following is not necessary but useful:
abs(-2-(-2.2)) # 0.2
sprintf("%.54f",abs(-2-(-2.2))) # "0.200000000000000177635683940025046467781066894531250000"
sprintf("%.54f",0.2) # "0.200000000000000011102230246251565404236316680908203125"
all.equal(abs(-2-(-2.2)), 0.2) # TRUE; check nearly equivalence of floating point numbers
identical(abs(-2-(-2.2)), 0.2) # FALSE; check exact equivalence of floating point numbers

Related

What is the correct/standard way to check if difference is smaller than machine precision?

I often end up in situations where it is necessary to check if the obtained difference is above machine precision. Seems like for this purpose R has a handy variable: .Machine$double.eps. However when I turn to R source code for guidelines about using this value I see multiple different patterns.
Examples
Here are a few examples from stats library:
t.test.R
if(stderr < 10 *.Machine$double.eps * abs(mx))
chisq.test.R
if(abs(sum(p)-1) > sqrt(.Machine$double.eps))
integrate.R
rel.tol < max(50*.Machine$double.eps, 0.5e-28)
lm.influence.R
e[abs(e) < 100 * .Machine$double.eps * median(abs(e))] <- 0
princomp.R
if (any(ev[neg] < - 9 * .Machine$double.eps * ev[1L]))
etc.
Questions
How can one understand the reasoning behind all those different 10 *, 100 *, 50 * and sqrt() modifiers?
Are there guidelines about using .Machine$double.eps for adjusting differences due to precision issues?
The machine precision for double depends on its current value. .Machine$double.eps gives the precision when the values is 1. You can use the C function nextAfter to get the machine precision for other values.
library(Rcpp)
cppFunction("double getPrec(double x) {
return nextafter(x, std::numeric_limits<double>::infinity()) - x;}")
(pr <- getPrec(1))
#[1] 2.220446e-16
1 + pr == 1
#[1] FALSE
1 + pr/2 == 1
#[1] TRUE
1 + (pr/2 + getPrec(pr/2)) == 1
#[1] FALSE
1 + pr/2 + pr/2 == 1
#[1] TRUE
pr/2 + pr/2 + 1 == 1
#[1] FALSE
Adding value a to value b will not change b when a is <= half of it's machine precision. Checking if the difference is smaler than machine precision is done with <. The modifiers might consider typical cases how often an addition did not show a change.
In R the machine precision can be estimated with:
getPrecR <- function(x) {
y <- log2(pmax(.Machine$double.xmin, abs(x)))
ifelse(x < 0 & floor(y) == y, 2^(y-1), 2^floor(y)) * .Machine$double.eps
}
getPrecR(1)
#[1] 2.220446e-16
Each double value is representing a range. For a simple addition, the range of the result depends on the reange of each summand and also the range of their sum.
library(Rcpp)
cppFunction("std::vector<double> getRange(double x) {return std::vector<double>{
(nextafter(x, -std::numeric_limits<double>::infinity()) - x)/2.
, (nextafter(x, std::numeric_limits<double>::infinity()) - x)/2.};}")
x <- 2^54 - 2
getRange(x)
#[1] -1 1
y <- 4.1
getRange(y)
#[1] -4.440892e-16 4.440892e-16
z <- x + y
getRange(z)
#[1] -2 2
z - x - y #Should be 0
#[1] 1.9
2^54 - 2.9 + 4.1 - (2^54 + 5.9) #Should be -4.7
#[1] 0
2^54 - 2.9 == 2^54 - 2 #Gain 0.9
2^54 - 2 + 4.1 == 2^54 + 4 #Gain 1.9
2^54 + 5.9 == 2^54 + 4 #Gain 1.9
For higher precission Rmpfr could be used.
library(Rmpfr)
mpfr("2", 1024L)^54 - 2.9 + 4.1 - (mpfr("2", 1024L)^54 + 5.9)
#[1] -4.700000000000000621724893790087662637233734130859375
In case it could be converted to integer gmp could be used (what is in Rmpfr).
library(gmp)
as.bigz("2")^54 * 10 - 29 + 41 - (as.bigz("2")^54 * 10 + 59)
#[1] -47
Definition of a machine.eps: it is the lowest value eps for which 1+eps is not 1
As a rule of thumb (assuming a floating point representation with base 2):
This eps makes the difference for the range 1 .. 2,
for the range 2 .. 4 the precision is 2*eps
and so on.
Unfortunately, there is no good rule of thumb here. It's entirely determined by the needs of your program.
In R we have all.equal as a built in way to test approximate equality. So you could use maybe something like (x<y) | all.equal(x,y)
i <- 0.1
i <- i + 0.05
i
if(isTRUE(all.equal(i, .15))) { #code was getting sloppy &went to multiple lines
cat("i equals 0.15\n")
} else {
cat("i does not equal 0.15\n")
}
#i equals 0.15
Google mock has a number of floating point matchers for double precision comparisons, including DoubleEq and DoubleNear. You can use them in an array matcher like this:
ASSERT_THAT(vec, ElementsAre(DoubleEq(0.1), DoubleEq(0.2)));
Update:
Numerical Recipes provide a derivation to demonstrate that using a one-sided difference quotient, sqrt is a good choice of step-size for finite difference approximations of derivatives.
The Wikipedia article site Numerical Recipes, 3rd edition, Section 5.7, which is pages 229-230 (a limited number of page views is available at http://www.nrbook.com/empanel/).
all.equal(target, current,
tolerance = .Machine$double.eps ^ 0.5, scale = NULL,
..., check.attributes = TRUE)
These IEEE floating point arithmetic is a well known limitation of computer arithmetic and is discussed in several places:
The FAQ in R has a whole question dedicated to it: R FAQ 7.31
The R Inferno by Patrick Burns dedicated the first "Circle" to this problem (page 9 onward)
Arithmetic Sum Proof Problem asked in Math Meta
David Goldberg, "What Every Computer Scientist Should Know About Floating-point Arithmetic," ACM Computing Surveys 23, 1 (1991-03), 5-48 doi>10.1145/103162.103163 (revision also available)
The Floating-Point Guide - What Every Programmer Should Know About Floating-Point Arithmetic
0.30000000000000004.com compares floating point arithmetic across programming languages
Canonical duplicate for "floating point is inaccurate" (a meta discussion about a canonical answer for this issue)
Several Stack Overflow questions including
Why can't decimal numbers be represented exactly in binary?
Why are floating point numbers inaccurate?
Is floating point math broken?
Math Tricks Explained by Arthur T. Benjamin
.
dplyr::near() is another option for testing if two vectors of floating point numbers are equal.
The function has a built in tolerance parameter: tol = .Machine$double.eps^0.5 that can be adjusted. The default parameter is the same as the default for all.equal().

R inequality operator error? [duplicate]

I'm trying to compare two numbers in R as a part of a if-statement condition:
(a-b) >= 0.5
In this particular instance, a = 0.58 and b = 0.08... and yet (a-b) >= 0.5 is false. I'm aware of the dangers of using == for exact number comparisons, and this seems related:
(a - b) == 0.5) is false, while
all.equal((a - b), 0.5) is true.
The only solution I can think of is to have two conditions: (a-b) > 0.5 | all.equal((a-b), 0.5). This works, but is that really the only solution? Should I just swear off of the = family of comparison operators forever?
Edit for clarity: I know that this is a floating point problem. More fundamentally, what I'm asking is: what should I do about it? What's a sensible way to deal with greater-than-or-equal-to comparisons in R, since the >= can't really be trusted?
I've never been a fan of all.equal for such things. It seems to me the tolerance works in mysterious ways sometimes. Why not just check for something greater than a tolerance less than 0.05
tol = 1e-5
(a-b) >= (0.05-tol)
In general, without rounding and with just conventional logic I find straight logic better than all.equal
If x == y then x-y == 0. Perhaps x-y is not exactly 0 so for such cases I use
abs(x-y) <= tol
You have to set tolerance anyway for all.equal and this is more compact and straightforward than all.equal.
You could create this as a separate operator or overwrite the original >= function (probably not a good idea) if you want to use this approach frequently:
# using a tolerance
epsilon <- 1e-10 # set this as a global setting
`%>=%` <- function(x, y) (x + epsilon > y)
# as a new operator with the original approach
`%>=%` <- function(x, y) (all.equal(x, y)==TRUE | (x > y))
# overwriting R's version (not advised)
`>=` <- function(x, y) (isTRUE(all.equal(x, y)) | (x > y))
> (a-b) >= 0.5
[1] TRUE
> c(1,3,5) >= 2:4
[1] FALSE FALSE TRUE
For completeness' sake, I'll point out that, in certain situations, you could simply round to a few decimal places (and this is kind of a lame solution by comparison to the better solution previously posted.)
round(0.58 - 0.08, 2) == 0.5
One more comment. The all.equal is a generic. For numeric values, it uses all.equal.numeric. An inspection of this function shows that it used .Machine$double.eps^0.5, where .Machine$double.eps is defined as
double.eps: the smallest positive floating-point number ‘x’ such that
‘1 + x != 1’. It equals ‘double.base ^ ulp.digits’ if either
‘double.base’ is 2 or ‘double.rounding’ is 0; otherwise, it
is ‘(double.base ^ double.ulp.digits) / 2’. Normally
‘2.220446e-16’.
(.Machine manual page).
In other words, that would be an acceptable choice for your tolerance:
myeq <- function(a, b, tol=.Machine$double.eps^0.5)
abs(a - b) <= tol
Choose some tolerance level:
epsilon <- 1e-10
Then use
(a-b+epsilon) >= 0.5
But, if your using tolerances anyway, why do you care that a-b == .5 (in fact) doesn't get evaluated? If you are using tolerances anyway you are saying I don't care about the end points exactly.
Here is what is true
if( (a-b) >= .5)
if( (a-b) < .5)
one of those should always evaluate true on every pair of doubles. Any code that uses one implicitly defines a no operation on the other one, at least. If your using tolerances to get actual .5 included in the first but your problem is defined on a continuous domain you arn't accomplishing much. In most problems involving continuous values in the underlying problem there will be very little point to that, since values arbitrarily over .5 will always evaluate as they should. Values arbitrarily close to .5 will go to the "wrong" flow control, but in continuous problems where you are using appropriate precision that doesn't matter.
The only time that tolerances make sense is when you are dealing with problems of the type
if( (a-b) == c)
if( (a-b) != c)
Here no amount of "appropriate precision" can help you. The reason is that you have to be prepared that the second will always evaluate to true unless you set the bits of a-b at a very low level by hand, when in fact you probably want the first to sometimes be true.
<= and >= comparisons are not language specific when numerical difficulty is raised in floating point numbers.
IsSmallerOrEqual <- function(a,b) { # To check a <= b
# Check whether "Mean relative difference..." exist in all.equal's result;
# If exists, it results in character, not logical
if ( class(all.equal(a, b)) == "logical" && (a<b | all.equal(a, b))) { return(TRUE)
} else if (a < b) { return(TRUE)
} else { return(FALSE) }
}
IsSmallerOrEqual(abs(-2-(-2.2)), 0.2) # TRUE; To check |-2-(-2.2)| <= 0.2
IsSmallerOrEqual(abs(-2-(-2.2)), 0.3) # TRUE
IsSmallerOrEqual(abs(-2-(-2.2)), 0.1) # FALSE
IsBiggerOrEqual <- function(a,b) { # To check a >= b
# Check whether "Mean relative difference..." exist in all.equal's result;
# If exists, it results in character, not logical
if ( class(all.equal(a, b)) == "logical" && (a>b | all.equal(a, b))) { return(TRUE)
} else if (a > b) { return(TRUE)
} else { return(FALSE) }
}
IsBiggerOrEqual(3,3) # TRUE
IsBiggerOrEqual(4,3) # TRUE
IsBiggerOrEqual(3,4) # FALSE
IsBiggerOrEqual(0.58 - 0.08,0.5) # TRUE
If all.equal is not processed, we may encounter errors.
The following is not necessary but useful:
abs(-2-(-2.2)) # 0.2
sprintf("%.54f",abs(-2-(-2.2))) # "0.200000000000000177635683940025046467781066894531250000"
sprintf("%.54f",0.2) # "0.200000000000000011102230246251565404236316680908203125"
all.equal(abs(-2-(-2.2)), 0.2) # TRUE; check nearly equivalence of floating point numbers
identical(abs(-2-(-2.2)), 0.2) # FALSE; check exact equivalence of floating point numbers

Speeding up Julia's poorly written R examples

The Julia examples to compare performance against R seem particularly convoluted. https://github.com/JuliaLang/julia/blob/master/test/perf/perf.R
What is the fastest performance you can eke out of the two algorithms below (preferably with an explanation of what you changed to make it more R-like)?
## mandel
mandel = function(z) {
c = z
maxiter = 80
for (n in 1:maxiter) {
if (Mod(z) > 2) return(n-1)
z = z^2+c
}
return(maxiter)
}
mandelperf = function() {
re = seq(-2,0.5,.1)
im = seq(-1,1,.1)
M = matrix(0.0,nrow=length(re),ncol=length(im))
count = 1
for (r in re) {
for (i in im) {
M[count] = mandel(complex(real=r,imag=i))
count = count + 1
}
}
return(M)
}
assert(sum(mandelperf()) == 14791)
## quicksort ##
qsort_kernel = function(a, lo, hi) {
i = lo
j = hi
while (i < hi) {
pivot = a[floor((lo+hi)/2)]
while (i <= j) {
while (a[i] < pivot) i = i + 1
while (a[j] > pivot) j = j - 1
if (i <= j) {
t = a[i]
a[i] = a[j]
a[j] = t
}
i = i + 1;
j = j - 1;
}
if (lo < j) qsort_kernel(a, lo, j)
lo = i
j = hi
}
return(a)
}
qsort = function(a) {
return(qsort_kernel(a, 1, length(a)))
}
sortperf = function(n) {
v = runif(n)
return(qsort(v))
}
sortperf(5000)
The key word in this question is "algorithm":
What is the fastest performance you can eke out of the two algorithms below (preferably with an explanation of what you changed to make it more R-like)?
As in "how fast can you make these algorithms in R?" The algorithms in question here are the standard Mandelbrot complex loop iteration algorithm and the standard recursive quicksort kernel.
There are certainly faster ways to compute the answers to the problems posed in these benchmarks – but not using the same algorithms. You can avoid recursion, avoid iteration, and avoid whatever else R isn't good at. But then you're no longer comparing the same algorithms.
If you really wanted to compute Mandelbrot sets in R or sort numbers, yes, this is not how you would write the code. You would either vectorize it as much as possible – thereby pushing all the work into predefined C kernels – or just write a custom C extension and do the computation there. Either way, the conclusion is that R isn't fast enough to get really good performance on its own – you need have C do most of the work in order to get good performance.
And that's exactly the point of these benchmarks: in Julia you never have to rely on C code to get good performance. You can just write what you want to do in pure Julia and it will have good performance. If an iterative scalar loop algorithm is the most natural way to do what you want to do, then just do that. If recursion is the most natural way to solve the problem, then that's ok too. At no point will you be forced to rely on C for performance – whether via unnatural vectorization or writing custom C extensions. Of course, you can write vectorized code when it's natural, as it often is in linear algebra; and you can call C if you already have some library that does what you want. But you don't have to.
We do want to have the fairest possible comparison of the same algorithms across languages:
If someone does have faster versions in R that use the same algorithm, please submit patches!
I believe that the R benchmarks on the julia site are already byte-compiled, but if I'm doing it wrong and the comparison is unfair to R, please let me know and I will fix it and update the benchmarks.
Hmm, in the Mandelbrot example the matrix M has its dimensions transposed
M = matrix(0.0,nrow=length(im), ncol=length(re))
because it's filled by incrementing count in the inner loop (successive values of im). My implementation creates a vector of complex numbers in mandelperf.1 and operates on all elements, using an index and subsetting to keep track of which elements of the vector have not yet satisfied the condition Mod(z) <= 2
mandel.1 = function(z, maxiter=80L) {
c <- z
result <- integer(length(z))
i <- seq_along(z)
n <- 0L
while (n < maxiter && length(z)) {
j <- Mod(z) <= 2
if (!all(j)) {
result[i[!j]] <- n
i <- i[j]
z <- z[j]
c <- c[j]
}
z <- z^2 + c
n <- n + 1L
}
result[i] <- maxiter
result
}
mandelperf.1 = function() {
re = seq(-2,0.5,.1)
im = seq(-1,1,.1)
mandel.1(complex(real=rep(re, each=length(im)),
imaginary=im))
}
for a 13-fold speed-up (the results are equal but not identical because the original returns numeric rather than integer values).
> library(rbenchmark)
> benchmark(mandelperf(), mandelperf.1(),
+ columns=c("test", "elapsed", "relative"),
+ order="relative")
test elapsed relative
2 mandelperf.1() 0.412 1.00000
1 mandelperf() 5.705 13.84709
> all.equal(sum(mandelperf()), sum(mandelperf.1()))
[1] TRUE
The quicksort example doesn't actually sort
> set.seed(123L); qsort(sample(5))
[1] 2 4 1 3 5
but my main speed-up was to vectorize the partition around the pivot
qsort_kernel.1 = function(a) {
if (length(a) < 2L)
return(a)
pivot <- a[floor(length(a) / 2)]
c(qsort_kernel.1(a[a < pivot]), a[a == pivot], qsort_kernel.1(a[a > pivot]))
}
qsort.1 = function(a) {
qsort_kernel.1(a)
}
sortperf.1 = function(n) {
v = runif(n)
return(qsort.1(v))
}
for a 7-fold speedup (in comparison to the uncorrected original)
> benchmark(sortperf(5000), sortperf.1(5000),
+ columns=c("test", "elapsed", "relative"),
+ order="relative")
test elapsed relative
2 sortperf.1(5000) 6.60 1.000000
1 sortperf(5000) 47.73 7.231818
Since in the original comparison Julia is about 30 times faster than R for mandel, and 500 times faster for quicksort, the implementations above are still not really competitive.

Modulus warning in R- Lehmann Primality Test

I spent a little time hacking an R implementation of the lehmann primality test. The function design I borrowed from http://davidkendal.net/articles/2011/12/lehmann-primality-test
Here is my code:
primeTest <- function(n, iter){
a <- sample(1:(n-1), 1)
lehmannTest <- function(y, tries){
x <- ((y^((n-1)/2)) %% n)
if (tries == 0) {
return(TRUE)
}else{
if ((x == 1) | (x == (-1 %% n))){
lehmannTest(sample(1:(n-1), 1), (tries-1))
}else{
return(FALSE)
}
}
}
lehmannTest(a, iter)
}
primeTest(4, 50) # false
primeTest(3, 50) # true
primeTest(10, 50)# false
primeTest(97, 50) # gives false # SHOULD BE TRUE !!!! WTF
prime_test<-c(2,3,5,7,11,13,17 ,19,23,29,31,37)
for (i in 1:length(prime_test)) {
print(primeTest(prime_test[i], 50))
}
For small primes it works but as soon as i get around ~30, i get a bad looking message and the function stops working correctly:
2: In lehmannTest(a, iter) : probable complete loss of accuracy in modulus
After some investigating i believe it has to do with floating point conversions. Very large numbers are rounded so that the mod function gives a bad response.
Now the questions.
Is this a floating point problem? or in my implementation?
Is there a purely R solution or is R just bad at this?
Thanks
Solution:
After the great feedback and a hour reading about modular exponentiation algorithms i have a solution. first it is to make my own modular exponentiation function. The basic idea is that modular multiplication allows you calculate intermediate results. you can calculate the mod after each iteration, thus never getting a giant nasty number that swamps the 16-bit R int.
modexp<-function(a, b, n){
r = 1
for (i in 1:b){
r = (r*a) %% n
}
return(r)
}
primeTest <- function(n, iter){
a <- sample(1:(n-1), 1)
lehmannTest <- function(y, tries){
x <- modexp(y, (n-1)/2, n)
if (tries == 0) {
return(TRUE)
}else{
if ((x == 1) | (x == (-1 %% n))){
lehmannTest(sample(1:(n-1), 1), (tries-1))
}else{
return(FALSE)
}
}
}
if( n < 2 ){
return(FALSE)
}else if (n ==2) {
return(TRUE)
} else{
lehmannTest(a, iter)
}
}
primeTest(4, 50) # false
primeTest(3, 50) # true
primeTest(10, 50)# false
primeTest(97, 50) # NOW IS TRUE !!!!
prime_test<-c(5,7,11,13,17 ,19,23,29,31,37,1009)
for (i in 1:length(prime_test)) {
print(primeTest(prime_test[i], 50))
}
#ALL TRUE
Of course there is a problem with representing integers. In R integers will be represented correctly up to 2^53 - 1 which is about 9e15. And the term y^((n-1)/2) will exceed that even for small numbers easily. You will have to compute (y^((n-1)/2)) %% n by continually squaring y and taking the modulus. That corresponds to the binary representation of (n-1)/2.
Even the 'real' number theory programs do it like that -- see Wikipedia's entry on "modular exponentiation". That said it should be mentioned that programs like R (or Matlab and other systems for numerical computing) may not be a proper environment for implementing number theory algorithms, probably not even as playing fields with small integers.
Edit: The original package was incorrect
You could utilize the function modpower() in package 'pracma' like this:
primeTest <- function(n, iter){
a <- sample(1:(n-1), 1)
lehmannTest <- function(y, tries){
x <- modpower(y, (n-1)/2, n) # ((y^((n-1)/2)) %% n)
if (tries == 0) {
return(TRUE)
}else{
if ((x == 1) | (x == (-1 %% n))){
lehmannTest(sample(1:(n-1), 1), (tries-1))
}else{
return(FALSE)
}
}
}
lehmannTest(a, iter)
}
The following test is successful as 1009 is the only prime in this set:
prime_test <- seq(1001, 1011, by = 2)
for (i in 1:length(prime_test)) {
print(primeTest(prime_test[i], 50))
}
# FALSE FALSE FALSE FALSE TRUE FALSE
If you are just using base R, I would pick #2b... "R is bad at this". In R integers (which you do not appear to be using) are restricted to 16-bit accuracy. Above that limit you will get rounding errors. You should probably be looking at: package:gmp or package:Brobdingnag. Package:gmp has large-integer and large-rational classes.

Numeric comparison difficulty in R

I'm trying to compare two numbers in R as a part of a if-statement condition:
(a-b) >= 0.5
In this particular instance, a = 0.58 and b = 0.08... and yet (a-b) >= 0.5 is false. I'm aware of the dangers of using == for exact number comparisons, and this seems related:
(a - b) == 0.5) is false, while
all.equal((a - b), 0.5) is true.
The only solution I can think of is to have two conditions: (a-b) > 0.5 | all.equal((a-b), 0.5). This works, but is that really the only solution? Should I just swear off of the = family of comparison operators forever?
Edit for clarity: I know that this is a floating point problem. More fundamentally, what I'm asking is: what should I do about it? What's a sensible way to deal with greater-than-or-equal-to comparisons in R, since the >= can't really be trusted?
I've never been a fan of all.equal for such things. It seems to me the tolerance works in mysterious ways sometimes. Why not just check for something greater than a tolerance less than 0.05
tol = 1e-5
(a-b) >= (0.05-tol)
In general, without rounding and with just conventional logic I find straight logic better than all.equal
If x == y then x-y == 0. Perhaps x-y is not exactly 0 so for such cases I use
abs(x-y) <= tol
You have to set tolerance anyway for all.equal and this is more compact and straightforward than all.equal.
You could create this as a separate operator or overwrite the original >= function (probably not a good idea) if you want to use this approach frequently:
# using a tolerance
epsilon <- 1e-10 # set this as a global setting
`%>=%` <- function(x, y) (x + epsilon > y)
# as a new operator with the original approach
`%>=%` <- function(x, y) (all.equal(x, y)==TRUE | (x > y))
# overwriting R's version (not advised)
`>=` <- function(x, y) (isTRUE(all.equal(x, y)) | (x > y))
> (a-b) >= 0.5
[1] TRUE
> c(1,3,5) >= 2:4
[1] FALSE FALSE TRUE
For completeness' sake, I'll point out that, in certain situations, you could simply round to a few decimal places (and this is kind of a lame solution by comparison to the better solution previously posted.)
round(0.58 - 0.08, 2) == 0.5
One more comment. The all.equal is a generic. For numeric values, it uses all.equal.numeric. An inspection of this function shows that it used .Machine$double.eps^0.5, where .Machine$double.eps is defined as
double.eps: the smallest positive floating-point number ‘x’ such that
‘1 + x != 1’. It equals ‘double.base ^ ulp.digits’ if either
‘double.base’ is 2 or ‘double.rounding’ is 0; otherwise, it
is ‘(double.base ^ double.ulp.digits) / 2’. Normally
‘2.220446e-16’.
(.Machine manual page).
In other words, that would be an acceptable choice for your tolerance:
myeq <- function(a, b, tol=.Machine$double.eps^0.5)
abs(a - b) <= tol
Choose some tolerance level:
epsilon <- 1e-10
Then use
(a-b+epsilon) >= 0.5
But, if your using tolerances anyway, why do you care that a-b == .5 (in fact) doesn't get evaluated? If you are using tolerances anyway you are saying I don't care about the end points exactly.
Here is what is true
if( (a-b) >= .5)
if( (a-b) < .5)
one of those should always evaluate true on every pair of doubles. Any code that uses one implicitly defines a no operation on the other one, at least. If your using tolerances to get actual .5 included in the first but your problem is defined on a continuous domain you arn't accomplishing much. In most problems involving continuous values in the underlying problem there will be very little point to that, since values arbitrarily over .5 will always evaluate as they should. Values arbitrarily close to .5 will go to the "wrong" flow control, but in continuous problems where you are using appropriate precision that doesn't matter.
The only time that tolerances make sense is when you are dealing with problems of the type
if( (a-b) == c)
if( (a-b) != c)
Here no amount of "appropriate precision" can help you. The reason is that you have to be prepared that the second will always evaluate to true unless you set the bits of a-b at a very low level by hand, when in fact you probably want the first to sometimes be true.
<= and >= comparisons are not language specific when numerical difficulty is raised in floating point numbers.
IsSmallerOrEqual <- function(a,b) { # To check a <= b
# Check whether "Mean relative difference..." exist in all.equal's result;
# If exists, it results in character, not logical
if ( class(all.equal(a, b)) == "logical" && (a<b | all.equal(a, b))) { return(TRUE)
} else if (a < b) { return(TRUE)
} else { return(FALSE) }
}
IsSmallerOrEqual(abs(-2-(-2.2)), 0.2) # TRUE; To check |-2-(-2.2)| <= 0.2
IsSmallerOrEqual(abs(-2-(-2.2)), 0.3) # TRUE
IsSmallerOrEqual(abs(-2-(-2.2)), 0.1) # FALSE
IsBiggerOrEqual <- function(a,b) { # To check a >= b
# Check whether "Mean relative difference..." exist in all.equal's result;
# If exists, it results in character, not logical
if ( class(all.equal(a, b)) == "logical" && (a>b | all.equal(a, b))) { return(TRUE)
} else if (a > b) { return(TRUE)
} else { return(FALSE) }
}
IsBiggerOrEqual(3,3) # TRUE
IsBiggerOrEqual(4,3) # TRUE
IsBiggerOrEqual(3,4) # FALSE
IsBiggerOrEqual(0.58 - 0.08,0.5) # TRUE
If all.equal is not processed, we may encounter errors.
The following is not necessary but useful:
abs(-2-(-2.2)) # 0.2
sprintf("%.54f",abs(-2-(-2.2))) # "0.200000000000000177635683940025046467781066894531250000"
sprintf("%.54f",0.2) # "0.200000000000000011102230246251565404236316680908203125"
all.equal(abs(-2-(-2.2)), 0.2) # TRUE; check nearly equivalence of floating point numbers
identical(abs(-2-(-2.2)), 0.2) # FALSE; check exact equivalence of floating point numbers

Resources