I am programming an expectation-maximization algorithm with R. In order to speed-up the computation, I would like to vectorize this bottleneck. I know that N is about a hundred times k.
MyLoglik = 0
for (i in c(1:N))
{
for (j in c(1:k))
{
MyLoglik = MyLoglik + MyTau[i,j]*log(MyP[j]*MyF(MyD[i,], MyMu[j,], MyS[[j]]))
}
}
There is also this list of matrices:
MyDf.list <- vector("list", k)
for(i in 1:k)
{
MyDf.list[[i]] <- matrix(0,d,d)
for (j in c(1:N))
{
MyDf.list[[i]] = MyDf.list[[i]] + MyTau[j,i]*as.numeric((MyD[j,]-MyMu[i,])) %*% t(as.numeric(MyD[j,]-MyMu[i,]))
}
MyDf.list[[i]] = MyDf.list[[i]] / MyM[i]
}
I have sped things up a bit using:
MyLoglik = 0
for (j in c(1:k))
{
MyR= apply(MyD, 1, function(x) log(MyP[j]*MyF(x, MyMu[j,], MyS[[j]])))
MyLoglik = MyLoglik + sum(MyTau[,j]*MyR)
}
and:
d = dim(MyD)[2]
MyDf.list <- vector("list", k)
for(i in 1:k)
{
MyDf.list[[i]] <- matrix(0,d,d)
MyR= apply(MyD, 1, function(x) as.numeric((x-MyMu[i,])) %*% t(as.numeric(x-MyMu[i,])))
MyDf.list[[i]] = matrix(rowSums(t(MyTau[,i]*t(MyR))) / MyM[i],d,d)
}
For the first one, I'm assuming MyF is a function you've made? If you can make sure it will take your matrices and lists as inputs and output a matrix, you could do something like:
MyLoglik = sum(MyTau%*%log(MyP)) + sum(MyTau*log(MyF(MyD, MyMu, MyS)))
For the second one, I think because you're doing it as list it will be more difficult to vectorize. Maybe instead of a list of matrices you could have a 3-dimensional array? So that MyDf.array[i,j,k] has dimensions N, d, d (or d, d, N).
I hate to even suggest this prematurely, but this is the sort of thing where building a C-extension in R might make sense. For matrices with defined (known) size (which you have here!), C-extensions aren't that hard to build, I promise! The nastiest bit here would probably be passing in 'myF'
My R-knowledge is quite out of date, but for loops (especially like this one!) used to be brutal.
Maybe timing and figuring out which part is slow would help? Is it myF? What if you change it to an identity?
You can cut down on the work done in the inner loop if things are symmetric: A[i,j] = A[j,i]
Related
How do I populate a 10X10 empty matrix called mat.horiz, with values 1 to 100 by row (i.e. filling in values across columns, in descending rows), using two for() loops?
New to loops and am barely grasping the structure of them. Any help and explanation would be much appreciated:)
If you really want to use for loops, you can try the code below
out <- matrix(NA,10,10)
for (i in 1:10) {
for (j in 1:10) {
out[i,j] <- j + (i-1)*10
}
}
or
out <- matrix(NA,10,10)
k <- 0
for (i in 1:10) {
for (j in 1:10) {
k <- k + 1
out[i,j] <- k
}
}
A simpler way is using
out <- matrix(1:100,10,10,byrow = TRUE)
I would like to store result of X[i,j].
X[i,j] = alpha[i] + beta [j]
I tried writing this double for loop but could not get it to store the result. Appreciate any help here. Thanks!
for (i in length(alpha)) {
for (j in length(beta)) {
Xij <- alpha[i] + beta[j]
matrix[i,j] <- Xij
}
}
Edit: Is there a more efficient way to do this? The for loop run is taking a long time as the dataset is huge.
If the loop is providing poor performance, you should try the outer statement:
outer(alpha, beta, FUN = `+`)
for (i in 1:length(alpha)) {
for (j in 1:length(beta)) {
Xij <- alpha[i] + beta[j]
matrix[i,j] <- Xij
}
}
Is there some way to make this loop faster in r?
V=array(NA, dim=c(nrow(pixDF), n))
for(i in 1:n)
{
sdC<-sqrt(det(Cov[,i,]))
iC<-inv(Cov[,i,])
V[,i]<-apply(pixDF,1,function(x)(sdC*exp(-0.5*((x-Mean[i,])%*%iC%*%as.matrix((x-Mean[i,]))))))
}
where, in this case, pixDF is a matrix with 490000 rows and 4 columns filled with doubles. n = 5. Cov is a (4,5,4) array filled with "doubles". Mean is a (5,4) array filled with doubles as well.
This loop was taking about 30min on my computer. (before editing).
Right now it's taking 1min.
As Ronak notes, it is hard to help without reproducible example. But, I think that apply could be avoided. Something like this COULD work:
V <- array(NA, dim = c(nrow(pixDF), n))
tpixDF <- t(pixDF)
for (i in 1:n) {
x <- Cov[, i, ]
sdC <- sqrt(det(x))
iC <- solve(x)
mi <- Mean[i, ]
k <- t(tpixDF - mi)
V[, i] <- sdC*exp(-0.5*rowSums(k %*% iC * k))
}
Also, as Roland mentions inv probably is equal solve.
I have a vector of positive integers of unknown length. Let's call it vector a with elements a[1], a[2], ...
I want to perform calculations on vector b where for all i, 0 <= b[i] <= a[i].
The following does not work:
for(b in 0:a)
{
# calculations
}
The best I have come up with is:
probabilities <- function(a,p)
{
k <- a
k[1] <- 1
h <- rep(0,sum(a)+1)
for(i in 2:length(a))
{
k[i] <- k[i-1]*(a[i-1]+1)
}
for(i in 0:prod(a+1))
{
b <- a
for(j in 1:length(a))
{
b[j] <- (floor(i/k[j]) %% (a[j]+1))
}
t <- 1
for(j in 1:length(a))
{
t <- t * choose(a[j],b[j])*(p[j])^(b[j])*(1-p[j])^(a[j]-b[j])
}
h[sum(b)+1] <- h[sum(b)+1] + t
}
return(h)
}
In the middle of my function is where I create b. I start off by setting b equal to a (so that it is the same size). Then, I replace all of the elements of b with different elements that are rather tricky to calculate. This seems like an inefficient solution. It works, but it is fairly slow as the numbers get large. Any ideas for how I can cut down on process time? Essentially, what this does for b is the first time through, b is all zeros. Then, it is 1, 0,0,0,... The first element keeps incrementing until it reaches a[1], then b[2] increments and b[1] is set to 0. Then b[1] starts incrementing again.
I know the math is sound, I just do not trust that it is efficient. I studied combinatorics for a few years, but have never studied computational complexity theory, so coming up with a fast algorithm is a bit beyond my realm of knowledge. Any ideas would be helpful!
I am trying to use a two dimension matrix to produce a two dimension matrix result where
the number of rows and number of columns are determined in a new way everytime I change the values in the function which determines the number of rows and number of columns accordingly.
The function that I would like to ask and resolve the "subscript out of bounds" problem is the following:
HRC <- function(n,b,c)
{
R=matrix( ,nrow = n*b, ncol = c)
R[0,]=133
for (j in 1:c)
{
r=rnorm(n*b)
for (i in 1:n*b){
R[i+1,j]=R[i,j]+3*b/r[i]
}
}
return(R)
}
HRC(10,1,3)
The error message that I get is the following:
Error in R[i + 1, j] = R[i, j] + 3 * b/r[i] : subscript out of bounds
I wonder how I can resolve this problem. Thank you so much in advance.
R's indexing starts at 1, not 0.
You also have to be careful with the operators precedence rules: the : operator has higher precedence than *. See ?Syntax.
This should work:
HRC <- function(n, b, c) {
R <- matrix(NA, nrow = n*b, ncol = c)
R[1,]=133
for (j in 1:c) {
r = rnorm(n*b)
for (i in 1:(n*b-1)){
R[i+1,j] = R[i,j] + 3*b/r[i]
}
}
return(R)
}
HRC(10,1,3)
The problem is that you loop from row b to row n*b (with stride b, due to the precedence of * and :) and then index to one greater, so you attempt to index row n*b + 1 of R, which is out of bounds.
R[0,]<- will cause incorrect results but not elicit an error from R.
I find the code easier to read if you loop from 2 to n*b, the number of rows, and write the formula in terms of creating row i from row i-1 (rather than creating row i+1 from row i).
In addition, you can drop one loop dimension by vectorizing the operations over the rows:
HRC <- function(n, b, c) {
R <- matrix(NA, nrow = n*b, ncol = c)
R[1,] <- 133
r <- matrix(rnorm(n*b*c), ncol=c)
for (i in 2:(n*b)){
R[i,] <- R[i-1,] + 3*b/r[i-1,]
}
return(R)
}
HRC(10,1,3)
Here, the same number of random samples are taken with rnorm but they are formed as a matrix, and used in the same order as used in the question. Note that not all of the random values are actually used in the computation.
If you set a random seed and then run this function, and the function in #flodel's answer, you will get identical results. His answer is also correct.
I think you are making three mistakes:
First: You are messing up the row count on the index. It should be 1:(n*b) and not 1:n*b.
Second: In R, indexing starts at 1. So R[0,] should be replaced by R[1,].
Third: You are running the loops in the right bounds 1:c and 1:(n:b), but you are probably not keeping track of the indices.
Try this:
set.seed(100)
HRC <- function(n, b, c) {
R <- matrix(0, nrow = n*b, ncol = c)
R[1,] <- 133
for (j in 1:c) {
r <- rnorm(n*b)
for (i in 2:(n*b)){
R[i,j] <- R[i-1,j] + 3*b/r[i-1]
}
}
return(R)
}
HRC(10,1,3)
Lastly, I would like to warn you about interchangeable use of the assignment operators. See here.