Does anyone know how I can make my code run faster? - r

I am trying to calculate Sigma(n=0 to infinity) (−1)^n/(n + 1) as accurately as possible. But my code takes forever and I am not able to see whether my answer is right. Does anyone know how I can make my code faster? The sum is supposed to converge to log(2). My idea is that f(n) will eventually become a very small number (less than 2^-52) and a time would come when R would consider sum = sum + f(n) and that's when I'd want the code to stop running. But clearly, that doesn't seem to work and my code takes forever to run and at least to me, it doesn't seem to ever stop.
f <- function(n)
return(((-1)^(n))/(n+1))
s <- function(f){
sum <- 0
n <- 0
while(sum != sum + f(n)) {
sum <- sum + f(n)
n <- n + 1
}
return(c(sum, n))
}
s(f)

library(Rcpp)
cppFunction("
List s(int max_iter) {
double sum = 0;
double sum_prec=NA_REAL;
double n = 0;
for (;sum != sum_prec && n < max_iter;n++) {
sum_prec = sum;
sum+=pow(-1,n)/(n+1);
}
return List::create(
_[\"sum\"] = sum,
_[\"iterations\"] = n,
_[\"precision\"] = sum-sum_prec
) ;
}")
test <- s(100000000)
test
When you use a huge number of subsequent iterations you know that R is not appropriated. However C++ functions are very easy to use within R. You can do something like that by example. The function needs a max of iterations and returns a list with your sum, the number of iterations and the precision.
EDIT : By precision I only do sum-sum_prec so this is not the real interval.
EDIT 2 : I let the sum != sum_prec for the example but if you don't have a supercomputer you're not supposed to see the end lol
EDIT 3 :
Typically, a fast R base solution would be something like :
base_sol <- function(n_iter) {
v <- seq_len(n_iter)
v <- (-1L)^(v-1L)/v
list(
sum = sum(v),
iterations = n_iter,
precision = v[length(v)]
)
}
Which is only 1.5 times slower than c++, which is pretty fast for an interpreted language, but has the con of loading every member of the sum in ram (but then, R is made for stats not for calculating things at 2^-52)

Related

Sum of combinations of numbers

I want to solve a mathematical problem in a fastest possible way.
I have a set of natural numbers between 1 to n, for example {1,2,3,4,n=5} and I want to calculate a formula like this:
s = 1*2*3*4+1*2*3*5+1*2*4*5+1*3*4*5+2*3*4*5
as you can see, each element in the sum is a multiplications of n-1 numbers in the set. For example in (1*2*3*4), 5 is excluded and in (1*2*3*5), 4 is excluded. I know some of the multiplications are repeated, for example (1*2) is repeated in 3 of the multiplications. How can I solve this problem with least number of multiplications.
Sorry for bad English.
Thanks.
Here is a way that does not "cheat" by replacing multiplication with repeated addition or by using division. The idea is to replace your expression with
1*2*3*4 + 5*(1*2*3 + 4*(1*2 + 3*(1 + 2)))
This used 9 multiplications for the numbers 1 through 5. In general I think the multiplication count would be one less than the (n-1)th triangular number, n * (n - 1) / 2 - 1. Here is Python code that stores intermediate factorial values to reduce the number of multiplications to just 6, or in general 2 * n - 4, and the addition count to the same (but half of them are just adding 1):
def f(n):
fact = 1
term = 2
sum = 3
for j in range(2, n):
fact *= j
term = (j + 1) * sum
sum = fact + term
return sum
The only way to find which algorithm is the fastest is to code all of them in one language, and run each using a timer.
The following would be the most straightforward answer.
def f(n):
result = 0
nList = [i+1 for i in range(n)]
for i in range(len(nList)):
result += reduce(lambda x, y: x*y,(nList[:i]+nList[i+1:]))
return result
Walkthrough - use the reduce function to multiply all list's of length n-1 and add to the variable result.
If you just want to minimise the number of multiplications, you can replace all the multiplications by additions, like this:
// Compute 1*2*…*n
mult_all(n):
if n = 1
return 1
res = 0
// by adding 1*2*…*(n-1) an entirety of n times
for i = 1 to n do
res += mult_all(n-1)
return res
// Compute sum of 1*2*…*(i-1)*(i+1)*…*n
sum_of_mult_all_but_one(n):
if n = 1
return 0
// by computing 1*2*…*(n-1) + (sum 1*2*…*(i-1)*(i+1)*…*(n-1))*n
res = mult_all(n-1)
for i = 1 to n do
res += sum_of_mult_all_but_one(n-1)
return res
Here is an answer that would work with javascript. It is not the fastest way because it is not optimized, but it should work if you want to just find the answer.
function combo(n){
var mult = 1;
var sum = 0;
for (var i = 1; i <= n; i++){
mult = 1;
for (var j = 1; j<= n; j++){
if(j != i){
mult = mult*j;
}
}
sum += mult;
}
return (sum);
}
alert(combo(n));

Iteration in R, weird result with for-loop

I'm writing some code in R in order to determine an optimal estimator for ai given any tolerance. So far, I've come up with this:
iter<- function (ai, k, tolerance){
at = ai*(1-ai^2*R[k]^ai*(log(R[k]))^2/(1-R[k]^ai)^2)/
(1 - (ai^2*R[k]^ai*(log(R[k]))^2)/(1-R[k]^ai)^2 + ai*(H(k)
- 1/ai - R[k]^ai*log(R[k])/(1-R[k]^ai)))
while((at-ai) > tolerance) {
ai = at
at = ai*(1-ai^2*R[k]^ai*(log(R[k]))^2/(1-R[k]^ai)^2)/
(1 - (ai^2*R[k]^ai*(log(R[k]))^2)/(1-R[k]^ai)^2 + ai*(H(k)
- 1/ai - R[k]^ai*log(R[k])/(1-R[k]^ai)))
a0 = at
}
return(at)
}
x<- iter(ai = H(k), k, tolerance = 0.000001)
where R and H are known variables for every k and also an initial estimator for ai is known, namely H(k). This code works fine for any value of k, for example,
x<- iter(ai = H(k), 21, tolerance = 0.000001)
gives a good result. However, my problem is, that when I try to embed this in a for-loop (I actually want a vector x[k] where every iteration for k is calculated), i.e. :
for (k in seq (along = 1: (n-1)){
x<- iter(ai = H(k), 21, tolerance = 0.000001)
}
this code doesn't give me a vector, but instead it gives one value for x. That doesn't make much sense to me, as I'm trying to assign a value to x for every possible k. What am I missing here?
As always, any help would be dearly appreciated.
Since you want a vector, x should be a vector.
x<-numeric(n-1)
for (k in seq (along = 1: (n-1)){
x[k]<- iter(ai = H(k), 21, tolerance = 0.000001)
}

Newton's Method in R Precision/Output

So, I'm supposed to write the code to execute Newton's Method to calculate the square root of any arbitrary number to a specified precision (tolerance).
Here is my code:
MySqrt <- function(x, eps = 1e-6, itmax = 100, verbose = TRUE) {
GUESS <- 11
myvector <- integer(0)
i <- 1
if (x < 0) {
stop("Square root of negative value")
}
else {
myvector[i] <- GUESS
while (i <= itmax) {
GUESS <- (GUESS + (x/GUESS)) * 0.5
myvector[i+1] <- GUESS
if (abs(GUESS-myvector[i]) < eps) {
break()
}
if (verbose) {
cat("Iteration: ", formatC(i, width = 1), formatC(GUESS, digits = 10, width = 12), "\n")
}
i <- i + 1
}
}
myvector[i]
}
eps is the tolerance. When I use the function to calculate the square root of, say, 21, I got this as an output:
> MySqrt(21, eps = 1e-1, verbose = TRUE)
Iteration: 1 6.454545455
Iteration: 2 4.854033291
Iteration: 3 4.59016621
I'm not sure if the function stops carrying out iterations when it is supposed to, however. Can someone verify if my code is correct? This would be greatly appreciated!
Your code is almost correct. It is iterating the correct number of times. The only bug is that you don't increment i until after the break statement, so you are not returning the most recent approximation. Instead you are returning the previous one.
In order to verify that it is stopping at the right time, you can move the tracing line up above the break. You can also add GUESS-myvector[i] to the trace, so you can watch it halt as soon as the difference gets small enough. If you do this and run the function, the fact that it is stopping at the right time, as well as the fact that it is returning the wrong value, will be obvious:
> MySqrt(21,eps=1e-1)
Iteration: 1 6.454545 -4.545455
Iteration: 2 4.854033 -1.600512
Iteration: 3 4.590166 -0.2638671
Iteration: 4 4.582582 -0.007584239
[1] 4.590166
While your code is (almost) correct, it is not written in very good R style. For example, unless you want to return the entire vector of estimates, there is no reason that you need to keep them all around. Also, rather than using a while loop, here it would make more sense to use a for loop. Here one possible improved version of your function:
MySqrt <- function(x, eps = 1e-6, itmax = 100, verbose = TRUE) {
GUESS <- 11
if (x < 0) {
stop("Square root of negative value")
}
for(i in 1:itmax){
nextGUESS <- (GUESS + (x/GUESS)) * 0.5
if (verbose)
cat("Iteration: ", i, nextGUESS, nextGUESS-GUESS, "\n")
if (abs(GUESS-nextGUESS) < eps)
break
GUESS<- nextGUESS
}
nextGUESS
}

Speeding up Julia's poorly written R examples

The Julia examples to compare performance against R seem particularly convoluted. https://github.com/JuliaLang/julia/blob/master/test/perf/perf.R
What is the fastest performance you can eke out of the two algorithms below (preferably with an explanation of what you changed to make it more R-like)?
## mandel
mandel = function(z) {
c = z
maxiter = 80
for (n in 1:maxiter) {
if (Mod(z) > 2) return(n-1)
z = z^2+c
}
return(maxiter)
}
mandelperf = function() {
re = seq(-2,0.5,.1)
im = seq(-1,1,.1)
M = matrix(0.0,nrow=length(re),ncol=length(im))
count = 1
for (r in re) {
for (i in im) {
M[count] = mandel(complex(real=r,imag=i))
count = count + 1
}
}
return(M)
}
assert(sum(mandelperf()) == 14791)
## quicksort ##
qsort_kernel = function(a, lo, hi) {
i = lo
j = hi
while (i < hi) {
pivot = a[floor((lo+hi)/2)]
while (i <= j) {
while (a[i] < pivot) i = i + 1
while (a[j] > pivot) j = j - 1
if (i <= j) {
t = a[i]
a[i] = a[j]
a[j] = t
}
i = i + 1;
j = j - 1;
}
if (lo < j) qsort_kernel(a, lo, j)
lo = i
j = hi
}
return(a)
}
qsort = function(a) {
return(qsort_kernel(a, 1, length(a)))
}
sortperf = function(n) {
v = runif(n)
return(qsort(v))
}
sortperf(5000)
The key word in this question is "algorithm":
What is the fastest performance you can eke out of the two algorithms below (preferably with an explanation of what you changed to make it more R-like)?
As in "how fast can you make these algorithms in R?" The algorithms in question here are the standard Mandelbrot complex loop iteration algorithm and the standard recursive quicksort kernel.
There are certainly faster ways to compute the answers to the problems posed in these benchmarks – but not using the same algorithms. You can avoid recursion, avoid iteration, and avoid whatever else R isn't good at. But then you're no longer comparing the same algorithms.
If you really wanted to compute Mandelbrot sets in R or sort numbers, yes, this is not how you would write the code. You would either vectorize it as much as possible – thereby pushing all the work into predefined C kernels – or just write a custom C extension and do the computation there. Either way, the conclusion is that R isn't fast enough to get really good performance on its own – you need have C do most of the work in order to get good performance.
And that's exactly the point of these benchmarks: in Julia you never have to rely on C code to get good performance. You can just write what you want to do in pure Julia and it will have good performance. If an iterative scalar loop algorithm is the most natural way to do what you want to do, then just do that. If recursion is the most natural way to solve the problem, then that's ok too. At no point will you be forced to rely on C for performance – whether via unnatural vectorization or writing custom C extensions. Of course, you can write vectorized code when it's natural, as it often is in linear algebra; and you can call C if you already have some library that does what you want. But you don't have to.
We do want to have the fairest possible comparison of the same algorithms across languages:
If someone does have faster versions in R that use the same algorithm, please submit patches!
I believe that the R benchmarks on the julia site are already byte-compiled, but if I'm doing it wrong and the comparison is unfair to R, please let me know and I will fix it and update the benchmarks.
Hmm, in the Mandelbrot example the matrix M has its dimensions transposed
M = matrix(0.0,nrow=length(im), ncol=length(re))
because it's filled by incrementing count in the inner loop (successive values of im). My implementation creates a vector of complex numbers in mandelperf.1 and operates on all elements, using an index and subsetting to keep track of which elements of the vector have not yet satisfied the condition Mod(z) <= 2
mandel.1 = function(z, maxiter=80L) {
c <- z
result <- integer(length(z))
i <- seq_along(z)
n <- 0L
while (n < maxiter && length(z)) {
j <- Mod(z) <= 2
if (!all(j)) {
result[i[!j]] <- n
i <- i[j]
z <- z[j]
c <- c[j]
}
z <- z^2 + c
n <- n + 1L
}
result[i] <- maxiter
result
}
mandelperf.1 = function() {
re = seq(-2,0.5,.1)
im = seq(-1,1,.1)
mandel.1(complex(real=rep(re, each=length(im)),
imaginary=im))
}
for a 13-fold speed-up (the results are equal but not identical because the original returns numeric rather than integer values).
> library(rbenchmark)
> benchmark(mandelperf(), mandelperf.1(),
+ columns=c("test", "elapsed", "relative"),
+ order="relative")
test elapsed relative
2 mandelperf.1() 0.412 1.00000
1 mandelperf() 5.705 13.84709
> all.equal(sum(mandelperf()), sum(mandelperf.1()))
[1] TRUE
The quicksort example doesn't actually sort
> set.seed(123L); qsort(sample(5))
[1] 2 4 1 3 5
but my main speed-up was to vectorize the partition around the pivot
qsort_kernel.1 = function(a) {
if (length(a) < 2L)
return(a)
pivot <- a[floor(length(a) / 2)]
c(qsort_kernel.1(a[a < pivot]), a[a == pivot], qsort_kernel.1(a[a > pivot]))
}
qsort.1 = function(a) {
qsort_kernel.1(a)
}
sortperf.1 = function(n) {
v = runif(n)
return(qsort.1(v))
}
for a 7-fold speedup (in comparison to the uncorrected original)
> benchmark(sortperf(5000), sortperf.1(5000),
+ columns=c("test", "elapsed", "relative"),
+ order="relative")
test elapsed relative
2 sortperf.1(5000) 6.60 1.000000
1 sortperf(5000) 47.73 7.231818
Since in the original comparison Julia is about 30 times faster than R for mandel, and 500 times faster for quicksort, the implementations above are still not really competitive.

translating matlab script to R

I've just been working though converting some MATLAB scripts to work in R, however having never used MATLAB in my life, and not exactly being an expert on R I'm having some trouble.
Edit: It's a script I was given designed to correct temperature measurements for lag generated by insulation mass effects. My understanding is that It looks at the rate of change of the temperature and attempts to adjust for errors generated by the response time of the sensor. Unfortunately there is no literature available to me to give me an indication of the numbers i am expecting from the function, and the only way to find out will be to experimentally test it at a later date.
the original script:
function [Tc, dT] = CTD_TempTimelagCorrection(T0,Tau,t)
N1 = Tau/t;
Tc = T0;
N = 3;
for j=ceil(N/2):numel(T0)-ceil(N/2)
A = nan(N,1);
# Compute weights
for k=1:N
A(k) = (1/N) + N1 * ((12*k - (6*(N+1))) / (N*(N^2 - 1)));
end
A = A./sum(A);
# Verify unity
if sum(A) ~= 1
disp('Error: Sum of weights is not unity');
end
Comp = nan(N,1);
# Compute components
for k=1:N
Comp(k) = A(k)*T0(j - (ceil(N/2)) + k);
end
Tc(j) = sum(Comp);
dT = Tc - T0;
end
where I've managed to get to:
CTD_TempTimelagCorrection <- function(temp,Tau,t){
## Define which equation to use based on duration of lag and frequency
## With ESM2 profiler sampling # 2hz: N1>tau/t = TRUE
N1 = Tau/t
Tc = temp
N = 3
for(i in ceiling(N/2):length(temp)-ceiling(N/2)){
A = matrix(nrow=N,ncol=1)
# Compute weights
for(k in 1:N){
A[k] = (1/N) + N1 * ((12*k - (6*(N+1))) / (N*(N^2 - 1)))
}
A = A/sum(A)
# Verify unity
if(sum(A) != 1){
print("Error: Sum of weights is not unity")
}
Comp = matrix(nrow=N,ncol=1)
# Compute components
for(k in 1:N){
Comp[k] = A[k]*temp[i - (ceiling(N/2)) + k]
}
Tc[i] = sum(Comp)
dT = Tc - temp
}
return(dT)
}
I think the problem is the Comp[k] line, could someone point out what I've done wrong? I'm not sure I can select the elements of the array in such a way.
by the way, Tau = 1, t = 0.5 and temp (or T0) will be a vector.
Thanks
edit: apparently my description is too brief in explaining my code samples, not really sure what more I could write that would be relevant and not just wasting peoples time. Is this enough Mr Filter?
The error is as follows:
Error in Comp[k] = A[k] * temp[i - (ceiling(N/2)) + k] :
replacement has length zero
In addition: Warning message:
In Comp[k] = A[k] * temp[i - (ceiling(N/2)) + k] :
number of items to replace is not a multiple of replacement length
If you write print(i - (ceiling(N/2)) + k) before that line, you will see that you are using incorrect indices for temp[i - (ceiling(N/2)) + k], which means that nothing is returned to be inserted into Comp[k]. I assume this problem is due to Matlab allowing the use of 0 as an index and not R, and the way negative indices are handled (they don't work the same in both languages). You need to implement a fix to return the correct indices.

Resources