Modulus warning in R- Lehmann Primality Test - r

I spent a little time hacking an R implementation of the lehmann primality test. The function design I borrowed from http://davidkendal.net/articles/2011/12/lehmann-primality-test
Here is my code:
primeTest <- function(n, iter){
a <- sample(1:(n-1), 1)
lehmannTest <- function(y, tries){
x <- ((y^((n-1)/2)) %% n)
if (tries == 0) {
return(TRUE)
}else{
if ((x == 1) | (x == (-1 %% n))){
lehmannTest(sample(1:(n-1), 1), (tries-1))
}else{
return(FALSE)
}
}
}
lehmannTest(a, iter)
}
primeTest(4, 50) # false
primeTest(3, 50) # true
primeTest(10, 50)# false
primeTest(97, 50) # gives false # SHOULD BE TRUE !!!! WTF
prime_test<-c(2,3,5,7,11,13,17 ,19,23,29,31,37)
for (i in 1:length(prime_test)) {
print(primeTest(prime_test[i], 50))
}
For small primes it works but as soon as i get around ~30, i get a bad looking message and the function stops working correctly:
2: In lehmannTest(a, iter) : probable complete loss of accuracy in modulus
After some investigating i believe it has to do with floating point conversions. Very large numbers are rounded so that the mod function gives a bad response.
Now the questions.
Is this a floating point problem? or in my implementation?
Is there a purely R solution or is R just bad at this?
Thanks
Solution:
After the great feedback and a hour reading about modular exponentiation algorithms i have a solution. first it is to make my own modular exponentiation function. The basic idea is that modular multiplication allows you calculate intermediate results. you can calculate the mod after each iteration, thus never getting a giant nasty number that swamps the 16-bit R int.
modexp<-function(a, b, n){
r = 1
for (i in 1:b){
r = (r*a) %% n
}
return(r)
}
primeTest <- function(n, iter){
a <- sample(1:(n-1), 1)
lehmannTest <- function(y, tries){
x <- modexp(y, (n-1)/2, n)
if (tries == 0) {
return(TRUE)
}else{
if ((x == 1) | (x == (-1 %% n))){
lehmannTest(sample(1:(n-1), 1), (tries-1))
}else{
return(FALSE)
}
}
}
if( n < 2 ){
return(FALSE)
}else if (n ==2) {
return(TRUE)
} else{
lehmannTest(a, iter)
}
}
primeTest(4, 50) # false
primeTest(3, 50) # true
primeTest(10, 50)# false
primeTest(97, 50) # NOW IS TRUE !!!!
prime_test<-c(5,7,11,13,17 ,19,23,29,31,37,1009)
for (i in 1:length(prime_test)) {
print(primeTest(prime_test[i], 50))
}
#ALL TRUE

Of course there is a problem with representing integers. In R integers will be represented correctly up to 2^53 - 1 which is about 9e15. And the term y^((n-1)/2) will exceed that even for small numbers easily. You will have to compute (y^((n-1)/2)) %% n by continually squaring y and taking the modulus. That corresponds to the binary representation of (n-1)/2.
Even the 'real' number theory programs do it like that -- see Wikipedia's entry on "modular exponentiation". That said it should be mentioned that programs like R (or Matlab and other systems for numerical computing) may not be a proper environment for implementing number theory algorithms, probably not even as playing fields with small integers.
Edit: The original package was incorrect
You could utilize the function modpower() in package 'pracma' like this:
primeTest <- function(n, iter){
a <- sample(1:(n-1), 1)
lehmannTest <- function(y, tries){
x <- modpower(y, (n-1)/2, n) # ((y^((n-1)/2)) %% n)
if (tries == 0) {
return(TRUE)
}else{
if ((x == 1) | (x == (-1 %% n))){
lehmannTest(sample(1:(n-1), 1), (tries-1))
}else{
return(FALSE)
}
}
}
lehmannTest(a, iter)
}
The following test is successful as 1009 is the only prime in this set:
prime_test <- seq(1001, 1011, by = 2)
for (i in 1:length(prime_test)) {
print(primeTest(prime_test[i], 50))
}
# FALSE FALSE FALSE FALSE TRUE FALSE

If you are just using base R, I would pick #2b... "R is bad at this". In R integers (which you do not appear to be using) are restricted to 16-bit accuracy. Above that limit you will get rounding errors. You should probably be looking at: package:gmp or package:Brobdingnag. Package:gmp has large-integer and large-rational classes.

Related

How to use the output of an r function in another function?

I want to create an script that calculates probabilities for a rol game.
I´m new to programming and I´m stuck with the return values and nested functions. What I want is to use the values returned by the first function in the next one.
I have two functions dice(k, n) and fight(a, b). (for the example, the functions are partly written):
dice <- function (k, n) {
if (k > 3 && n > 2){
a <- 3
b <- 2
attack <- sample(1:6, a)
deff <- sample(1:6, b)
}
return(c(attack, deff))
}
So I want to use the vector attack, and deff in the next function:
fight <- function(a, b){
if (a == 3 && b == 2){
if(sort(attack,T)[1] > sort(deff,T)[1]){
n <- n - 1}
if (sort(attack,T)[1] <= sort(deff,T)[1]) {
k <- k - 1}
if (sort(attack,T)[2] > sort(deff,T)[2]) {
n <- n - 1}
if (sort(attack,T)[2]<= sort(deff,T)[2]){
k <- k - 1}
}
return(c(k, n)
}
But this gives me the next error:
Error in sort(attack, T) : object 'attack' not found
Any ideas? Thanks!

Create sequence in R

Hi I was wondering if someone knows how to realize this sequence in R?
Consider a sequence with following requirement.
a1=1
an=an-1+3 (If n is a even number)
an=2×an-1-5 (If n is a odd number)
e.g. 1,4,3,6,7,10,15,...
a30=?
Try the following.
It will return the entire sequence, not just the last element.
seq_chih_peng <- function(n){
a <- integer(n)
a[1] <- 1
for(i in seq_along(a)[-1]){
if(i %% 2 == 0){
a[i] <- a[i - 1] + 3
}else{
a[i] <- 2*a[i - 1] - 5
}
}
a
}
seq_chih_peng(30)
Note that I do not include code to check for input errors such as passing n = 0 or a negative number.
If you want to do it recursively, you just have the write the equations in your function as follows:
sequence <- function(n) {
if (n == 1) return(1)
else if (n > 1) {
if (n %% 2 == 1) {
return(2 * sequence(n - 1) - 5)
}else{
return(sequence(n - 1) + 3)
}
}else{
stop("n must be stricly positive")
}
}
sequence(30)
# returns 32770

Can a convolution function written in tail recursive form?

I have a function that I want to write in tail recursive form. The function calculates the number of ways to get the sum of k by rolling an s sided die n times. I have seen the mathematical solution for this function on this answer. It is as follows:
My reference recursive implementation in R is:
sum_ways <- function(n_times, k_sum, s_side) {
if (k_sum < n_times || k_sum > n_times * s_side) {
return(0)
} else if (n_times == 1) {
return(1)
} else {
sigma_values <- sapply(
1:s_side,
function(j) sum_ways(n_times - 1, k_sum - j, s_side)
)
return(sum(sigma_values))
}
}
I have tried to re-write the function in continuation passing style as I have learned from this answer, but I wasn't successful. Is there a way to write this function in tail-recursive form?
EDIT
I know that R doesn't optimise for tail-recursion. My question is not R specific, a solution in any other language is just as welcome. Even if it is a language that does not optimise for tail-recursion.
sapply isn't in continuation-passing style, so you have to replace it.
Here's a translation to continuation-passing style in Python (another language that does not have proper tail calls):
def sum_ways_cps(n_times, k_sum, s_side, ctn):
"""Compute the number of ways to get the sum k by rolling an s-sided die
n times. Then pass the answer to ctn."""
if k_sum < n_times or k_sum > n_times * s_side:
return ctn(0)
elif n_times == 1:
return ctn(1)
else:
f = lambda j, ctn: sum_ways_cps(n_times - 1, k_sum - j, s_side, ctn)
return sum_cps(1, s_side + 1, 0, f, ctn)
def sum_cps(j, j_max, total_so_far, f, ctn):
"""Compute the sum of f(x) for x=j to j_max.
Then pass the answer to ctn."""
if j > j_max:
return ctn(total_so_far)
else:
return f(j, lambda result: sum_cps(j + 1, j_max, total_so_far + result, f, ctn))
sum_ways_cps(2, 7, 6, print) # 6
Try this (with recursion, we need to think of a linear recurrence relation if we want a tail recursive version):
f <- function(n, k) {
if (n == 1) { # base case
return(ifelse(k<=6, 1, 0))
} else if (k > n*6 | k < n) { # some validation
return(0)
}
else {
# recursive calls, f(1,j)=1, 1<=j<=6, otherwise 0
return(sum(sapply(1:min(k-n+1, 6), function(j) f(n-1,k-j))))
}
}
sapply(1:13, function(k) f(2, k))
# [1] 0 1 2 3 4 5 6 5 4 3 2 1 0

Get distance to next largest floating point number in R [duplicate]

Is there any implementation of functionality in R, such that it is possible to get the next representable floating point number from a given floating point number. This would be similar to the nextafter function in the C standard library. Schemes such as number + .Machine$double.eps don't work in general.
No, but there are two ways you can make it:
Using C
If you want the exact functionality of the nextafter() function, you can write a C function that works as an interface to the function such that the following two constraints are met:
The function does not return a value. All work is accomplished as a "side effect" (changing the values of arguments).
All the arguments are pointers. Even scalars are vectors (of length one) in R.
That function should then be compiled as a shared library:
R CMD SHLIB foo.c
for UNIX-like OSs. The shared library can be called using dyn.load("foo.so"). You can then call the function from inside R using the .C() function
.C("foo", ...)
A more in depth treatment of calling C from R is here.
Using R
number + .Machine$double.eps is the way to go but you have to consider edge cases, such as if x - y < .Machine$double.eps or if x == y. I would write the function like this:
nextafter <- function(x, y){
# Appropriate type checking and bounds checking goes here
delta = y - x
if(x > 0){
factor = 2^floor(log2(x)) + ifelse(x >= 4, 1, 0)
} else if (x < 0) {
factor = 65
}
if (delta > .Machine$double.eps){
return(x + factor * .Machine$double.eps)
} else if (delta < .Machine$double.eps){
return(x - factor * .Machine$double.eps)
} else {
return(x)
}
}
Now, unlike C, if you want to check integers, you can do so in the same function but you need to change the increment based on the type.
UPDATE
The previous code did not perform as expected for numbers larger than 2. There is a factor that needs to be multiplied by the .Machine$double.eps to make it large enough to cause the numbers to be different. It is related to the nearest power of 2 plus one. You can get an idea of how this works with the below code:
n <- -100
factor <- vector('numeric', 100)
for(i in 1:n){
j = 0
while(TRUE){
j = j + 1
if(i - j * .Machine$double.eps != i) break()
}
factor[i] = j
}
If you prefer Rcpp:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
double nextAfter(double x, double y) {
return nextafter(x, y);
}
Then in R:
sprintf("%.20f", 1)
#[1] "1.00000000000000000000"
sprintf("%.20f", nextAfter(1, 2))
#[1] "1.00000000000000022204"
I'm not sure if Christopher Louden's answer works for all values, but here's a pure R version of the classic approach (increments/decrements the integer bits). R does not make it easy to convert between doubles and integers, nor does it have a 64-bit integer type, so there's quite a lot of code for this.
doubleToRaw <- function(d) writeBin(d, raw());
rawToDouble <- function(r) readBin(r, numeric());
int64inc <- function(lo, hi) {
if (lo == 0xffffffff) { hi <- hi + 1; lo <- 0; } else { lo <- lo + 1; }
return(c(lo, hi));
}
int64dec <- function(lo, hi) {
if (lo == 0) { hi <- hi - 1; lo <- 0xffffffff; } else { lo <- lo - 1; }
return(c(lo, hi));
}
nextafter <- function(x, y) {
if (is.nan(x + y))
return(NaN);
if (x == y)
return(x);
if (x == 0)
return(sign(y) * rawToDouble(as.raw(c(0, 0, 0, 0, 0, 0, 0, 1))));
ints <- packBits(rawToBits(doubleToRaw(x)), "integer")
if ((y > x) == (x > 0))
ints <- int64inc(ints[1], ints[2])
else
ints <- int64dec(ints[1], ints[2]);
return(rawToDouble(packBits(intToBits(ints), "raw")))
}

Speeding up Julia's poorly written R examples

The Julia examples to compare performance against R seem particularly convoluted. https://github.com/JuliaLang/julia/blob/master/test/perf/perf.R
What is the fastest performance you can eke out of the two algorithms below (preferably with an explanation of what you changed to make it more R-like)?
## mandel
mandel = function(z) {
c = z
maxiter = 80
for (n in 1:maxiter) {
if (Mod(z) > 2) return(n-1)
z = z^2+c
}
return(maxiter)
}
mandelperf = function() {
re = seq(-2,0.5,.1)
im = seq(-1,1,.1)
M = matrix(0.0,nrow=length(re),ncol=length(im))
count = 1
for (r in re) {
for (i in im) {
M[count] = mandel(complex(real=r,imag=i))
count = count + 1
}
}
return(M)
}
assert(sum(mandelperf()) == 14791)
## quicksort ##
qsort_kernel = function(a, lo, hi) {
i = lo
j = hi
while (i < hi) {
pivot = a[floor((lo+hi)/2)]
while (i <= j) {
while (a[i] < pivot) i = i + 1
while (a[j] > pivot) j = j - 1
if (i <= j) {
t = a[i]
a[i] = a[j]
a[j] = t
}
i = i + 1;
j = j - 1;
}
if (lo < j) qsort_kernel(a, lo, j)
lo = i
j = hi
}
return(a)
}
qsort = function(a) {
return(qsort_kernel(a, 1, length(a)))
}
sortperf = function(n) {
v = runif(n)
return(qsort(v))
}
sortperf(5000)
The key word in this question is "algorithm":
What is the fastest performance you can eke out of the two algorithms below (preferably with an explanation of what you changed to make it more R-like)?
As in "how fast can you make these algorithms in R?" The algorithms in question here are the standard Mandelbrot complex loop iteration algorithm and the standard recursive quicksort kernel.
There are certainly faster ways to compute the answers to the problems posed in these benchmarks – but not using the same algorithms. You can avoid recursion, avoid iteration, and avoid whatever else R isn't good at. But then you're no longer comparing the same algorithms.
If you really wanted to compute Mandelbrot sets in R or sort numbers, yes, this is not how you would write the code. You would either vectorize it as much as possible – thereby pushing all the work into predefined C kernels – or just write a custom C extension and do the computation there. Either way, the conclusion is that R isn't fast enough to get really good performance on its own – you need have C do most of the work in order to get good performance.
And that's exactly the point of these benchmarks: in Julia you never have to rely on C code to get good performance. You can just write what you want to do in pure Julia and it will have good performance. If an iterative scalar loop algorithm is the most natural way to do what you want to do, then just do that. If recursion is the most natural way to solve the problem, then that's ok too. At no point will you be forced to rely on C for performance – whether via unnatural vectorization or writing custom C extensions. Of course, you can write vectorized code when it's natural, as it often is in linear algebra; and you can call C if you already have some library that does what you want. But you don't have to.
We do want to have the fairest possible comparison of the same algorithms across languages:
If someone does have faster versions in R that use the same algorithm, please submit patches!
I believe that the R benchmarks on the julia site are already byte-compiled, but if I'm doing it wrong and the comparison is unfair to R, please let me know and I will fix it and update the benchmarks.
Hmm, in the Mandelbrot example the matrix M has its dimensions transposed
M = matrix(0.0,nrow=length(im), ncol=length(re))
because it's filled by incrementing count in the inner loop (successive values of im). My implementation creates a vector of complex numbers in mandelperf.1 and operates on all elements, using an index and subsetting to keep track of which elements of the vector have not yet satisfied the condition Mod(z) <= 2
mandel.1 = function(z, maxiter=80L) {
c <- z
result <- integer(length(z))
i <- seq_along(z)
n <- 0L
while (n < maxiter && length(z)) {
j <- Mod(z) <= 2
if (!all(j)) {
result[i[!j]] <- n
i <- i[j]
z <- z[j]
c <- c[j]
}
z <- z^2 + c
n <- n + 1L
}
result[i] <- maxiter
result
}
mandelperf.1 = function() {
re = seq(-2,0.5,.1)
im = seq(-1,1,.1)
mandel.1(complex(real=rep(re, each=length(im)),
imaginary=im))
}
for a 13-fold speed-up (the results are equal but not identical because the original returns numeric rather than integer values).
> library(rbenchmark)
> benchmark(mandelperf(), mandelperf.1(),
+ columns=c("test", "elapsed", "relative"),
+ order="relative")
test elapsed relative
2 mandelperf.1() 0.412 1.00000
1 mandelperf() 5.705 13.84709
> all.equal(sum(mandelperf()), sum(mandelperf.1()))
[1] TRUE
The quicksort example doesn't actually sort
> set.seed(123L); qsort(sample(5))
[1] 2 4 1 3 5
but my main speed-up was to vectorize the partition around the pivot
qsort_kernel.1 = function(a) {
if (length(a) < 2L)
return(a)
pivot <- a[floor(length(a) / 2)]
c(qsort_kernel.1(a[a < pivot]), a[a == pivot], qsort_kernel.1(a[a > pivot]))
}
qsort.1 = function(a) {
qsort_kernel.1(a)
}
sortperf.1 = function(n) {
v = runif(n)
return(qsort.1(v))
}
for a 7-fold speedup (in comparison to the uncorrected original)
> benchmark(sortperf(5000), sortperf.1(5000),
+ columns=c("test", "elapsed", "relative"),
+ order="relative")
test elapsed relative
2 sortperf.1(5000) 6.60 1.000000
1 sortperf(5000) 47.73 7.231818
Since in the original comparison Julia is about 30 times faster than R for mandel, and 500 times faster for quicksort, the implementations above are still not really competitive.

Resources