Is there some way to make this loop faster in r?
V=array(NA, dim=c(nrow(pixDF), n))
for(i in 1:n)
{
sdC<-sqrt(det(Cov[,i,]))
iC<-inv(Cov[,i,])
V[,i]<-apply(pixDF,1,function(x)(sdC*exp(-0.5*((x-Mean[i,])%*%iC%*%as.matrix((x-Mean[i,]))))))
}
where, in this case, pixDF is a matrix with 490000 rows and 4 columns filled with doubles. n = 5. Cov is a (4,5,4) array filled with "doubles". Mean is a (5,4) array filled with doubles as well.
This loop was taking about 30min on my computer. (before editing).
Right now it's taking 1min.
As Ronak notes, it is hard to help without reproducible example. But, I think that apply could be avoided. Something like this COULD work:
V <- array(NA, dim = c(nrow(pixDF), n))
tpixDF <- t(pixDF)
for (i in 1:n) {
x <- Cov[, i, ]
sdC <- sqrt(det(x))
iC <- solve(x)
mi <- Mean[i, ]
k <- t(tpixDF - mi)
V[, i] <- sdC*exp(-0.5*rowSums(k %*% iC * k))
}
Also, as Roland mentions inv probably is equal solve.
Related
Why does the computation of the following code in R take so much time? It takes many minutes, so I have interruped the calculations.
My aim is to adapt my simulated random numbers (sumzv, dim(sumzv) = 1000000 x 10) to my market model S_t (geometric brownian motion).
The vectors m and s describe the drift and the deviation of the GBM and are vectors containing 10 numbers. DEL is the variable for the time steps. S_0 is a vector containing 10 stock prices at time 0.
n <- 1000000
k <- 10
S_t <- data.frame(matrix(0, nrow = n, ncol = k))
i <- 1
j <- 1
t <- 10
for (j in 1:k) {
for (i in 1:n) {
S_t[i, j] <- S_0[j] * exp(m[j] * t * DEL + s[j] * sqrt(DEL) * sumzv[i, j])
}
}
Thank you for your help. Please keep in mind that I'm a beginner :)
Unfortunately, I couldn't find any helpful information so far on the internet. Some pages said, vectorization is helpful to speed up an R Code, but this doesn't seem helpful to me.
I tried to break down the data frames into vectors but this got very complex.
The following code with vectorized inner loop is equivalent to the posted code.
It also pre-computes some inner loop vectors, fac1 and fac2.
S_t <- data.frame(matrix(0, nrow = n, ncol = m))
fac1 <- m * t * DEL
fac2 <- s * sqrt(DEL)
for (j in 1:k) {
S_t[, j] <- S_0[j] * exp(fac1[j] + fac2[j] * sumzv[, j])
}
The fully vectorized version of the loop on j above is the one-liner below. The transposes are needed because R is column major and we are multiplying by row vectors indexed on j = 1:k.
S_t2 <- t(S_0 * exp(fac1 + fac2 * t(sumzv)))
Below is an example of what I'm currently doing and it's rather slow. I figure that there must be something more efficient than this since it takes about 95 seconds to run.
df <- data.frame(matrix(vector(), 1000, 1000))
for (i in (1:1000)) {
for (j in (i:1000)) {
df[i, j] <- i *1000 + j
df[j, i] <- df[i, j]
}
}
Also note that the i*1000+j is just for illustration. I'm doing something else there, but it's a calculation dependent on getting all the pairs for 1:1000. Thanks.
Consider outer with a re-assignment of lower triangle of matrix return.
out_mat <- outer(1:1E3, 1:1E3, function(i,j) i * 1E3 + j)
out_mat[lower.tri(out_mat)] <- t(out_mat)[lower.tri(t(out_mat))]
df2 <- data.frame(out_mat)
Results show exact equivalent of df generated from for loop:
identical(df, df2)
# [1] TRUE
However, above depends on your actual calculation as beyond simple arithmetic, the outer call may not work depending on the dimensions.
I am writing a for loop to calculate a numerator which is part of a larger formula. I used a for loop but it is taking a lot of time to compute. What would be a better way to do this.
city is a dataframe with the following columns: pop, not.white, pct.not.white
n <- nrow(city)
numerator = 0
for(i in 1:n) {
ti <- city$pop[i]
pi<- city$pct.not.white[i]
for(j in 1:n) {
tj <- city$pop[j]
pj <- city$pct.not.white[j]
numerator = numerator + (ti * tj) * abs(pi -pj)
}
}
Use the following toy data for result validation.
set.seed(0)
city <- data.frame(pop = runif(101), pct.not.white = runif(101))
The most obvious "vectorization":
# n <- nrow(city)
titj <- tcrossprod(city$pop)
pipj <- outer(city$pct.not.white, city$pct.not.white, "-")
numerator <- sum(titj * abs(pipj))
Will probably have memory problem if n > 5000.
A clever workaround (exploiting symmetry; more memory efficient "vectorization"):
## see https://stackoverflow.com/a/52086291/4891738 for function: tri_ind
n <- nrow(city)
ij <- tri_ind(n, lower = TRUE, diag = FALSE)
titj <- city$pop[ij$i] * city$pop[ij$j]
pipj <- abs(city$pct.not.white[ij$i] - city$pct.not.white[ij$j])
numerator <- 2 * crossprod(titj, pipj)[1]
The ultimate solution is to write C / C++ loop, which I will not showcase.
I am struggling to produce an efficient code to compute the vector result r result from an input vector v using this function.
r(i) = \sum_{j=i}^{i-N} [o(i)-o(j)] * exp(o(i)-o(j))
where i loops (from N to M) over the vector v. Size of v is M>>N.
Of course this is feasible with 2 nested for loops, but it is too slow for computational purposes, probably out of fashion and deprecated style...
A MWE:
for (i in c(N+1):length(v)){
csum <- 0
for (j in i:c(i-N)) {
csum <- csum + (v[i]-v[j])*exp(v[i]-v[j])
}
r[i] <- csum
}
In my real application M > 10^5 and the v vector is indeed several vectors.
I have been trying with nested applications of lapply and rollapply without success.
Any suggestion is welcome.
Thanks!
I don't know if it is any more efficient but something you can try:
r[N:M] <- sapply(N:M, function(i) tail(cumsum((v[i]-v[1:N])*exp(v[i]-v[1:N])), 1))
checking that both computations give same results, I got r with your way and r2 with mine, initializing r2 to rep(NA, M) and assessed the similarity:
all((r-r2)<1e-12, na.rm=TRUE)
# [1] TRUE
NOTE: as in #lmo answer, tail(cumsum(...), 1) can be efficiently replaced by just using sum(...):
r[N:M] <- sapply(N:M, function(i) sum((v[i]-v[1:N])*exp(v[i]-v[1:N])))
Here is a method with a single for loop.
# create new blank vector
rr <- rep(NA,M)
for(i in N:length(v)) {
rr[i] <- sum((v[i] - v[seq_len(N)]) * exp(v[i] - v[seq_len(N)]))
}
check for equality
all.equal(r, rr)
[1] TRUE
You could reduce the number of operations by 1 if you store the difference. This should add a little speed up.
for(i in N:length(v)) {
x <- v[i] - v[seq_len(N)]
rr[i] <- sum(x * exp(x))
}
I am trying to construct a new variable, z, using two pre-existing variables - x and y. Suppose for simplicity that there are only 5 observations (corresponding to 5 time periods) and that x=c(5,7,9,10,14) and y=c(0,2,1,2,3). I’m really only using the first observation in x as the initial value, and then constructing the new variable z using depreciated values of x[1] (depreciation rate of 0.05 per annum) and each of the observations over time in the vector, y. The variable I am constructing takes the form of a new 5 by 1 vector, z, and it can be obtained using the following simple commands in R:
z=NULL
for(i in 1:length(x)){n=seq(1,i,by=1)
z[i]=sum(c(0.95^(i-1)*x[1],0.95^(i-n)*y[n]))}
The problem I am having is that I need to define this operation as a function. That is, I need to create a function f that will spit out the vector z whenever any arbitrary vectors x and y are plugged into the function, f(x,y). I’ve been going around in circles for days now and I was wondering if someone would be kind enough to provide me with a suggestion about how to proceed. Thanks in advance.
I hope following will work for you...
x=c(5,7,9,10,14)
y=c(0,2,1,2,3)
getZ = function(x,y){
z = NULL
for(i in 1:length(x)){
n=seq(1,i,by=1)
z[i]=sum(c(0.95^(i-1)*x[1],0.95^(i-n)*y[n]))
}
return = z
}
z = getZ(x,y)
z
5.000000 6.750000 7.412500 9.041875 11.589781
This will allow .05 (or any other value) passed in as r.
ConstructZ <- function(x, y, r){
n <- length(y)
d <- 1 - r
Z <- vector(length = n)
for(i in seq_along(x)){
n = seq_len(i)
Z[i] = sum(c(d^(i-1)*x[1],d^(i-n)*y[n]))
}
return(Z)
}
Here is a cool (if I say so myself) way to implement this as an infix operator (since you called it an operation).
ff = function (x, y, i) {
n = seq.int(i)
sum(c(0.95 ^ (i - 1) * x[[1]],
0.95 ^ (i - n) * y[n]))
}
`%dep%` = function (x, y) sapply(seq_along(x), ff, x=x, y=y)
x %dep% y
[1] 5.000000 6.750000 7.412500 9.041875 11.589781
Doing the loop multiple times and recalculating the exponents every time may be inefficient. Here's another way to implement your calculation
getval <- function(x,y,lambda=.95) {
n <- length(y)
pp <- lambda^(1:n-1)
yy <- sapply(1:n, function(i) {
sum(y * c(pp[i:1], rep.int(0, n-i)))
})
pp*x[1] + yy
}
Testing with #vrajs5's sample data
x=c(5,7,9,10,14)
y=c(0,2,1,2,3)
getval(x,y)
# [1] 5.000000 6.750000 7.412500 9.041875 11.589781
but appears to be about 10x faster when testing on larger data such as
set.seed(15)
x <- rpois(200,20)
y <- rpois(200,20)
I'm not sure of how often you will run this or on what size of data so perhaps efficiency isn't a concern for you. I guess readability is often more important long-term for maintenance.