Evaluate polynominal function - r

Im trying to evaluate this polynomial:
Using two different approaches, one is directly and the other is horner´s rule. My code:
Direct way:
directpoly1 <- function(x, coef, seqcoef = seq(coef) - 1) {
sum(coef*x^seqcoef)
}
directpoly <- function(x, coef) {
seqcoef <- seq(coef) - 1
sapply(x, directpoly1, coef, seqcoef)
Horner´s rule:
hornerpoly <- function(x, coef) {
n <- length(coef);
a <- rep(0, n);
a[n] <- coef[n];
while (n > 0) {
n <- n - 1;
a[n] <- coef[n] + a[n + 1] * x;
}
return(a[1]);
}
I need to compare the speed of the two different approaches, but I can´t figure out how to do that. My initial approach is:
system.time(directpoly(x=seq(-10,10, length=5000000), c(1:39)))
system.time(hornerpoly(x=seq(-10,10, length=5000000), c(1:39)))
Any suggestions?

ptm <- proc.time()
#.... your function ...
proc.time() - ptm
Do this for second function as well and compare the times.

Related

Extracting repetition index from apply / map loop

I'm trying wherever possible to replace my for loops with apply / map functions
However I am stuck when it comes to times where I need to use the loop index as a position. This is easy to do with a for loop
Take the following code, I use the index i in both the left hand and the right hand side of the assignment:
score <- function(x) {
n <- length(x)
right <- x
for(i in 1:n) {
right[i] <- (n - x[i] + 1) / (i * n)
}
(1 / n) * sum(right)
}
score(c(2,1,3))
how do i rewrite the above using map or apply functions?
You could use this:
x = c(2,1,3)
n = length(x)
(1/n) * sum(sapply(1:n, function(i) (n - x[i] + 1)/(i*n) ))
We could vectorize this
v1 <- c(2, 1, 3)
n <- length(v1)
(1/n) *sum((n - v1 + 1)/(seq_along(v1) * n))
#[1] 0.4259259

Sampling a log-concave distribution using the adaptive rejection sampling method (R)

I am not very familiar with R. I have been trying to use the implementation of the adaptive rejection sampling method in R, in order to sample from the following distribution:
here is my R code:
library(ars)
g1 <- function(x,r){(1./r)*((1-x)^r)}
f1 <- function(x,a,k) {
add<-0
for(i in 1:k) {
add<- add+g1(x,i)
}
res <- (a* add)+(a-1)*log(x)+k*log(1-x)
return(res)
}
g2 <- function(x,r){(1-x)^(r-1)}
f1prima <- function(x,a,k) {
add<-0
for(i in 1:k) {
add<- add-g2(x,i)
}
res <- (a* add)+(a-1)/x-k/(1-x)
return(res)
}
mysample1<-ars(20,f1,f1prima,x=c(0.001,0.09),m=2,emax=128,lb=TRUE,xlb=0.0, ub=TRUE, xub=1,a=0.5,k=100)
The function is a log-concave, but I get different error messages when I run ars and fiddling around with the input parameters won't help here. Any suggestion would be appreciated.
First thing, which you already noticed is that your log-concave function is not very well defined at x=0 and x=1.0. So useful interval would be something like 0.01...0.99, not 0.0...1.0
Second, I don't like the idea to compute hundreds of terms in your summation term.
So, good idea might be to express it in following way, starting with derivative
S1N-1 qi is obviously geometric series and could be replaced with
(1-qN)/(1-q), where q=1-x.
This is derivative, so to get to similar term in function itself, just integrate it.
http://www.wolframalpha.com/input/?i=integrate+(1-q%5EN)%2F(1-q)+dq will return Gauss Hypergeometric function 2F1 plus logarithm
-qN+1 2F1(1, N+1; N+2; q)/(N+1) - log(1-q)
NB: It is the same integral as Beta before, but dealing with it was a bit more cumbersome
So, code to compute those terms:
library(gsl)
library(ars)
library(ggplot2)
Gauss2F1 <- function(a, b, c, x) {
ifelse(x >= 0.0 & x < 1.0, hyperg_2F1(a, b, c, x), hyperg_2F1(c - a, b, c, 1.0 - 1.0/(1.0 - x))/(1.0 - x)^b)
}
f1sum <- function(x, N) {
q <- 1.0 - x
- q^(N+1) * Gauss2F1(1, N+1, N+2, q)/(N+1) - log(1.0 - q)
}
f1sum.1 <- function(x, N) {
q <- 1.0 - x
res <- rep(0.0, length.out = length(x))
s <- rep(1.0, length.out = length(x))
for(k in 1:N) {
s <- s * q / as.numeric(k)
res <- res + s
}
res
}
f1 <- function(x, a, N) {
a * f1sum(x, N) + (a - 1.0)*log(x) + N*log(1.0 - x)
}
f1.1 <- function(x, a, N) {
a * f1sum.1(x, N) + (a - 1.0)*log(x) + N*log(1.0 - x)
}
f1primesum <- function(x, N) {
q <- 1.0 - x
(1.0 - q^N)/(1.0 - q)
}
f1primesum.1 <- function(x, N) {
res <- rep(0.0, length.out = length(x))
s <- rep(1.0, length.out = length(x))
for(k in 1:N) {
res <- res + s
s <- s * q
}
-res
}
f1prime <- function(x, a, N) {
a* f1primesum(x, N) + (a - 1.0)/x - N/(1.0 - x)
}
f1prime.1 <- function(x, a, N) {
a* f1primesum.1(x, N) + (a - 1.0)/x - N/(1.0 - x)
}
p <- ggplot(data.frame(x = c(0, 1)), aes(x = x)) +
stat_function(fun = f1, args = list(0.5, 100), colour = "#4271AE") +
stat_function(fun = f1.1, args = list(0.5, 100), colour = "#1F3552") +
scale_x_continuous(name = "X", breaks = seq(0, 1, 0.2), limits=c(0.001, 0.5)) +
scale_y_continuous(name = "F") +
ggtitle("Log-concave function")
p
As you can see, I've implemented both versions - one using summation and another using analytical form of sums. Computed data for a=0.5, N=100.
First, there is a bit of a difference between direct sum and 2F1 - I attribute it to precision loss in summation.
Second, more important result - function is NOT log-concave. No questions why ars() if failing left and right. See graph below

R: function choose, efficient

I need to speed up my R-code. My bottleneck is a function that needs to use the choose function. It looks like this:
P_ni <- function(Pn,Pi,eta1,eta2,p,d=NA)
{
if(is.na(d)) d <- 1-p
if(Pn==Pi) output <- p^Pn
else
{
if(Pi==1)seq1 <- seq_len(Pn-1)
if(Pi>1)seq1 <- seq_len(Pn-1)[-seq_len(Pi-1)]
output <- sum(choose((Pn-Pi-1),c(seq1-Pi))*choose(Pn,seq1)*
(eta1/(eta1+eta2))^c(seq1-Pi)*
(eta2/(eta1+eta2))^c(Pn-seq1)*p^seq1*d^c(Pn-seq1)
)
}
return(output)
}
This function need to be called several times with different Pn and Pi. The Problem here is, that Pn and Pi only are able to take a single number and not work with vectors. This is caused by the choose()-function.
I do this with a for-loop at the moment and it works perfectly, but it is slow.
The for-loop looks like this:
for(i in 1:nrow(n_k_matrix_p))
{
n_k_matrix_p[i,4] <- P_ni(n_k_matrix_p[i,1],n_k_matrix_p[i,2],eta1,eta2,p)
}
To make it reproducible:
eta1 <- 10
eta2 <- 5
p <- 0.4
n_k_matrix <- expand.grid(c(1:20),c(1:20))
n_k_matrix <- n_k_matrix[n_k_matrix[,1] >=n_k_matrix[,2],]
n_k_matrix <- n_k_matrix[order(n_k_matrix[,1]),]
The n_k_matrix contains my numbers for Pn and Pi.
Unfortunately the loop is still faster than using apply.
Does anyone have any idea how to speed things up?
You can regroup or precompute some computations.
P_ni2 <- function(n, eta1, eta2, p, d = 1 - p) {
res <- matrix(0, n, n)
diag(res) <- p^seq_len(n)
C1 <- eta1 / eta2 * p / d
C2 <- eta2 / (eta1 + eta2) * d
C3 <- eta1 / (eta1 + eta2)
C2_n <- C2^seq_len(n)
C3_n <- C3^seq_len(n)
precomputed <- outer(0:n, 0:n, choose)
for (j in seq_len(n)) {
for (i in seq_len(j - 1)) {
seq1 <- seq(i, j - 1)
res[i, j] <- sum(
precomputed[j-i, seq1-i+1] * precomputed[j+1, seq1+1] * C1^seq1
) * C2_n[j] / C3_n[i]
}
}
res
}
Verif:
> system.time({
+ n_k_matrix[[3]] <- sapply(1:nrow(n_k_matrix), function(i) {
+ P_ni(n_k_matrix[i,1], n_k_matrix[i,2], eta1, eta2, p)
+ })
+ })
utilisateur système écoulé
11.799 0.000 11.797
> system.time({
+ test <- P_ni2(400, eta1, eta2, p)
+ n_k_matrix[[4]] <- test[as.matrix(n_k_matrix[, 2:1])]
+ })
utilisateur système écoulé
2.328 0.003 2.341
> all.equal(n_k_matrix[[3]], n_k_matrix[[4]])
[1] TRUE
Note that I first store the results in the upper triangle of a squared matrix. Then, I convert it in your data frame format (that you call a matrix by the way).
This solution is 5 times faster for n = 400. I think you could improve it by recoding the double-loop (only) in Rcpp.

Unused arguments within a function in R

Below is the code I have. It works for primitive functions, such as sin. However, when using a function called gllik, it returns an error in f(y0): unused argument (y0). I'm not sure how to correct this.
newton_search2 <- function(f, h, guess, conv=0.001) {
y0 <- guess
N = 100
i <- 1; y1 <- y0
p <- numeric(N)
while (i <= N) {
make_derivative <- function(f, h) {
(f(y0 + h) - f(y0 - h)) / (2*h)
}
y1 <- (y0 - (f(y0)/make_derivative(f, h)))
p[i] <- y1
i <- i + 1
if (abs(y1 - y0) < conv) break
y0 <- y1
}
return (p[(i-1)])
}
The gllik function is as follows:
x <- rgamma(n=30, shape=4.5)
gllik <- function() {
s <- sum(log(x))
n <- length(x)
function(a) {
(a - 1) * s - n * lgamma(a)
}
}
The code I used was:
newton_search2(gllik, 0.001, mean(x), conv = 0.001)
I'm not sure how to fix the error or get the correct answer which is supposed to be 4.5 (the maximum liklihood estimate of a).
The problem is that gllik does not take any arguments. Furthermore, it returns a function and not a value.
Perhaps what you want to to is the following?
gllik <- function(a) {
s <- sum(log(x))
n <- length(x)
return((a - 1) * s - n * lgamma(a))
}
EDIT: An alternative solution is to just use the returned function. While this type of construction is often elegant, it does seem like overkill in this case:
newton_search2(gllik(), 0.001, mean(x), conv = 0.001)

Parallelize an R Script

The problem with my R script is that it takes too much time and the main solution that I consider is to parallelize it. I don't know where to start.
My code look like this:
n<- nrow (aa)
output <- matrix (0, n, n)
akl<- function (dii){
ddi<- as.matrix (dii)
m<- rowMeans(ddi)
M<- mean(ddi)
r<- sweep (ddi, 1, m)
b<- sweep (r, 2, m)
return (b + M)
}
for (i in 1:n)
{
A<- akl(dist(aa[i,]))
dVarX <- sqrt(mean (A * A))
for (j in i:n)
{
B<- akl(dist(aa[j,]))
V <- sqrt (dVarX * (sqrt(mean(B * B))))
output[i,j] <- (sqrt(mean(A * B))) / V
}
}
I would like to parallelize on different cpus. How can I do that?
I saw the SNOW package, is it suitable for my purpose?
Thank you for suggestions,
Gab
There are two ways in which your code could be made to run faster that I could think of:
First: As #Dwin was saying (with a small twist), you could precompute akl (yes, not necesarily dist, but the whole of akl).
# a random square matrix
aa <- matrix(runif(100), ncol=10)
n <- nrow(aa)
output <- matrix (0, n, n)
akl <- function(dii) {
ddi <- as.matrix(dii)
m <- rowMeans(ddi)
M <- mean(m) # mean(ddi) == mean(m)
r <- sweep(ddi, 1, m)
b <- sweep(r, 2, m)
return(b + M)
}
# precompute akl here
require(plyr)
akl.list <- llply(1:nrow(aa), function(i) {
akl(dist(aa[i, ]))
})
# Now, apply your function, but index the list instead of computing everytime
for (i in 1:n) {
A <- akl.list[[i]]
dVarX <- sqrt(mean(A * A))
for (j in i:n) {
B <- akl.list[[j]]
V <- sqrt (dVarX * (sqrt(mean(B * B))))
output[i,j] <- (sqrt(mean(A * B))) / V
}
}
This should already get your code to run faster than before (as you compute akl everytime in the inner loop) on larger matrices.
Second: In addition to that, you can get it faster by parallelising as follows:
# now, the parallelisation you require can be achieved as follows
# with the help of `plyr` and `doMC`.
# First step of parallelisation is to compute akl in parallel
require(plyr)
require(doMC)
registerDoMC(10) # 10 Cores/CPUs
akl.list <- llply(1:nrow(aa), function(i) {
akl(dist(aa[i, ]))
}, .parallel = TRUE)
# then, you could write your for-loop using plyr again as follows
output <- laply(1:n, function(i) {
A <- akl.list[[i]]
dVarX <- sqrt(mean(A * A))
t <- laply(i:n, function(j) {
B <- akl.list[[j]]
V <- sqrt(dVarX * (sqrt(mean(B*B))))
sqrt(mean(A * B))/V
})
c(rep(0, n-length(t)), t)
}, .parallel = TRUE)
Note that I have added .parallel = TRUE only on the outer loop. This is because, you assign 10 processors to the outer loop. Now, if you add it to both outer and inner loops, then the total number of processers will be 10 * 10 = 100. Please take care of this.

Resources