I'm trying wherever possible to replace my for loops with apply / map functions
However I am stuck when it comes to times where I need to use the loop index as a position. This is easy to do with a for loop
Take the following code, I use the index i in both the left hand and the right hand side of the assignment:
score <- function(x) {
n <- length(x)
right <- x
for(i in 1:n) {
right[i] <- (n - x[i] + 1) / (i * n)
}
(1 / n) * sum(right)
}
score(c(2,1,3))
how do i rewrite the above using map or apply functions?
You could use this:
x = c(2,1,3)
n = length(x)
(1/n) * sum(sapply(1:n, function(i) (n - x[i] + 1)/(i*n) ))
We could vectorize this
v1 <- c(2, 1, 3)
n <- length(v1)
(1/n) *sum((n - v1 + 1)/(seq_along(v1) * n))
#[1] 0.4259259
Related
I have a vector named y which has n numerical elements and a matrix n*n named w where its elements are numbers. I want to use the below formula for the above data:
I have written the following code in R using functions colSums and sum:
dy<-y-mean(y)
n*(y-mean(y))*colSums(w*dy)/sum(dy^2)#=local[,1]
Now, I want to change my above code to write it with for loop (without using colSums and sum). Indeed, I want to make the formula using for loop instead of using colSums and sum.
Thank you in advance for your help.
I don't know why you want to use a for-loop since in this case there are many disadvantages, but here we go.
First we calculate the donominator:
dy_square_sum <- 0
for (i in seq_along(y)) {
dy_square_sum <- dy_square_sum + dy[i]^2
}
In the next step, we calculate the enumerator and build up your desired output:
weighted_sum <- rep(0,n)
output <- rep(0, n)
for (i in seq_along(y)) {
for (j in seq_along(y)) {
weighted_sum[i] <- weighted_sum[i] + w[j,i] * dy[j]
}
output[i] <- n * dy[i] / dy_square_sum * weighted_sum[i]
}
or a little simplified but less efficient
output <- rep(0, n)
for (i in seq_along(y)) {
for (j in seq_along(y)) {
output[i] <- output[i] + n * dy[i] * w[j,i] * dy[j] / dy_square_sum
}
}
Im trying to evaluate this polynomial:
Using two different approaches, one is directly and the other is horner´s rule. My code:
Direct way:
directpoly1 <- function(x, coef, seqcoef = seq(coef) - 1) {
sum(coef*x^seqcoef)
}
directpoly <- function(x, coef) {
seqcoef <- seq(coef) - 1
sapply(x, directpoly1, coef, seqcoef)
Horner´s rule:
hornerpoly <- function(x, coef) {
n <- length(coef);
a <- rep(0, n);
a[n] <- coef[n];
while (n > 0) {
n <- n - 1;
a[n] <- coef[n] + a[n + 1] * x;
}
return(a[1]);
}
I need to compare the speed of the two different approaches, but I can´t figure out how to do that. My initial approach is:
system.time(directpoly(x=seq(-10,10, length=5000000), c(1:39)))
system.time(hornerpoly(x=seq(-10,10, length=5000000), c(1:39)))
Any suggestions?
ptm <- proc.time()
#.... your function ...
proc.time() - ptm
Do this for second function as well and compare the times.
I have an exercise that compares efficiency of loop functions.
I have function
banana <- function(x)
{d <- length(x)
xi <- x[1:(d-1)]
xnext <- x[2:d]
sum <- sum(100*(xnext-xi^2)^2 + (xi-1)^2)
y <- sum
return(y)
}
I want to re-write the above using a for loop (or any loop). I have so far
for (i in x){
n = length(x)
y <- 100*(x[i+1]-x[i]^2)^2 +(x[i]-1)^2
}
I want the function to stop at n-1 and having difficulty knowing where to add the break. Can someone help?
Thanks in advance,
Sean
You don't really have to add a break statement, you can just loop over all but the nth i.
[-length(x)] removes the last element from the sequence.
y <- 0
for (i in seq_along(x)[-length(x)]) {
y <- y + 100 * (x[i + 1] - x[i])^2 + (x[i] - 1)^2
}
I'm trying to write code to efficiently solve for all values in a matrix based on an optimization. I'm trying to find the value of x that minimizes the equation:
(x - (1 / ((1 / x) - sqrt(1 / x))))^2
I wrote some code that accomplishes this task, but it isn't pretty (nor is it fast).
mSS <- function(x)
{
#Sum of squares for X and the transformation
(x - (1 / ((1 / test_mat[rows, cols]) - sqrt(1 / x))))^2
}
n = 151
m = 50000
test_mat = matrix(rnorm(n * m, mean = 0, sd = 1), n, m)
trans_mat = matrix(data = NA, n, m)
#Go through each row/col position in array, find value that minimizes mSS function
for (cols in 1:ncol(test_mat)) {
for (rows in 1:nrow(test_mat)) {
trans_mat[rows, cols] = optimize(mSS, c(0, 3))$minimum
}
}
I'm mentally stuck trying to figure out the best approach for making this go faster. I was thinking maybe using apply with some custom functions might be the route, but I'm having difficulty figuring out a workable solution. Any pointers in the right direction would be appreciated.
Try this:
mSS<-function(x, a)
{
#Sum of squares for X and the transformation
(x-(1/((1/a)-sqrt(1/x))))^2
}
y <- as.numeric(test_mat)
ty <- sapply(y, function(x) optimize(mSS,c(0,3),a=x)$minimum)
trans_mat <- matrix(ty, nrow=nrow(test_mat))
The problem with my R script is that it takes too much time and the main solution that I consider is to parallelize it. I don't know where to start.
My code look like this:
n<- nrow (aa)
output <- matrix (0, n, n)
akl<- function (dii){
ddi<- as.matrix (dii)
m<- rowMeans(ddi)
M<- mean(ddi)
r<- sweep (ddi, 1, m)
b<- sweep (r, 2, m)
return (b + M)
}
for (i in 1:n)
{
A<- akl(dist(aa[i,]))
dVarX <- sqrt(mean (A * A))
for (j in i:n)
{
B<- akl(dist(aa[j,]))
V <- sqrt (dVarX * (sqrt(mean(B * B))))
output[i,j] <- (sqrt(mean(A * B))) / V
}
}
I would like to parallelize on different cpus. How can I do that?
I saw the SNOW package, is it suitable for my purpose?
Thank you for suggestions,
Gab
There are two ways in which your code could be made to run faster that I could think of:
First: As #Dwin was saying (with a small twist), you could precompute akl (yes, not necesarily dist, but the whole of akl).
# a random square matrix
aa <- matrix(runif(100), ncol=10)
n <- nrow(aa)
output <- matrix (0, n, n)
akl <- function(dii) {
ddi <- as.matrix(dii)
m <- rowMeans(ddi)
M <- mean(m) # mean(ddi) == mean(m)
r <- sweep(ddi, 1, m)
b <- sweep(r, 2, m)
return(b + M)
}
# precompute akl here
require(plyr)
akl.list <- llply(1:nrow(aa), function(i) {
akl(dist(aa[i, ]))
})
# Now, apply your function, but index the list instead of computing everytime
for (i in 1:n) {
A <- akl.list[[i]]
dVarX <- sqrt(mean(A * A))
for (j in i:n) {
B <- akl.list[[j]]
V <- sqrt (dVarX * (sqrt(mean(B * B))))
output[i,j] <- (sqrt(mean(A * B))) / V
}
}
This should already get your code to run faster than before (as you compute akl everytime in the inner loop) on larger matrices.
Second: In addition to that, you can get it faster by parallelising as follows:
# now, the parallelisation you require can be achieved as follows
# with the help of `plyr` and `doMC`.
# First step of parallelisation is to compute akl in parallel
require(plyr)
require(doMC)
registerDoMC(10) # 10 Cores/CPUs
akl.list <- llply(1:nrow(aa), function(i) {
akl(dist(aa[i, ]))
}, .parallel = TRUE)
# then, you could write your for-loop using plyr again as follows
output <- laply(1:n, function(i) {
A <- akl.list[[i]]
dVarX <- sqrt(mean(A * A))
t <- laply(i:n, function(j) {
B <- akl.list[[j]]
V <- sqrt(dVarX * (sqrt(mean(B*B))))
sqrt(mean(A * B))/V
})
c(rep(0, n-length(t)), t)
}, .parallel = TRUE)
Note that I have added .parallel = TRUE only on the outer loop. This is because, you assign 10 processors to the outer loop. Now, if you add it to both outer and inner loops, then the total number of processers will be 10 * 10 = 100. Please take care of this.