I have this code:
getSomething = function(x, y) {
return something
}
b = matrix(NA, nrow = ncol(a), ncol = ncol(a))
# Loop through the columns
for(i in 1:ncol(a)) {
# Loop through the columns for each column
for(j in 1:ncol(a)) {
b[i, j] = getSomething(as.matrix(a[i]), as.matrix(a[j]))
}
}
It works just fine, but when I try to run the code on big datasets, it takes a very long time to run.
How to convert it to lapply function so that it can run faster?
Thank you.
Rather than using lapply, look at outer which does these loops for you:
outer(seq(ncol(a)), seq(ncol(a)),
FUN=function(i, j) getSomething(as.matrix(a[i]), as.matrix(a[j]))
)
Related
I am trying to convert the following for loop to foreach to take the advantage of parallel.
dt = data.frame(t(data.frame(a=sample(1:10,10), b=sample(1:10,10), c=sample(1:10,10), d=sample(1:10,10))))
X = as.matrix(dt)
c = ncol(X)
itemnames=names(dt)
sm=matrix(0,c,c)
colnames(sm)=itemnames
row.names(sm)=itemnames
for (j in 1:c){
ind=setdiff(1:c,j)
print(ind)
print(j)
sm[j,ind]=sign(X[j]-X[ind])
print(sm[j,ind])
}
cvec = 1:c
r = foreach(d = cvec, .combine = rbind) %dopar% {
ind = setdiff(1:10,d)
sm[d,ind]=sign(X[d]-X[ind])
}
With for loop I am getting the 10*10 matrix where the above sign function repelaces the off diagonal elements and it would be 0 for diagonal elements.
But with foreach, I am getting 10*9 matrix, its missing the diagonal elements and everything else is same.
Please help me to get the same output as for loop. Thanks in advance.
I am not sure what you are trying to achieve here, since you are only using the first ten elements of you matrix. This can be done without any loops:
sign(outer(X[1:10], X[1:10], FUN = "-"))
In addition, I am not sure that parallel processing will be faster for this kind of problem, even assuming that the real case is much bigger. But if you want to use foreach, you should not assign to the global sm within the loop and instead return a suitable vector in the end:
foreach(d = cvec, .combine = rbind) %dopar% {
ind <- setdiff(cvec,d)
res <- rep(0, 10)
res[ind] <- sign(X[d]-X[ind])
res
}
If you want to assign to a matrix in parallel, you'll need a shared matrix:
# devtools::install_github("privefl/bigstatsr")
library(bigstatsr)
sm <- FBM(c, c)
library(foreach)
cl <- parallel::makeCluster(3)
doParallel::registerDoParallel(cl)
r = foreach(d = cvec, .combine = c) %dopar% {
ind = setdiff(1:10,d)
sm[d,ind]=sign(X[d]-X[ind])
NULL
}
parallel::stopCluster(cl)
sm[]
I would like to fill in the matrix in inside of the parallel loop.
When I call the function, it returns back me the empty matrix
I was wondering whether can someone help me with that.
Compute_TaskSimilarity<-function(X,...){
Task_similarity<-matrix(0,nrow=100,ncol=100)
foreach(i = 1:K, .combine = "cbind") %dopar% {
for (j in (i + 1):(ncol(Task_similarity))) {
Myvalue<- ComputeValue
if (Myvalue!=0){
TaskSimilarity[i, j] <- Myvalue
} else{
TaskSimilarity[i, j] <- 0.0
}
}
return(TaskSimilarity)
}
Maybe do something like this: create a data.frame of all combinations of indices, then apply over all combinations, then reshape into a matrix of the right size. (I added a simple multiplication as example for a more complex operation and used a smaller array of lenght 10):
indices <- data.frame(x=rep(1:10, each=10), y=rep(1:10, 10))
result <- foreach(i=1:nrow(indices)) %dopar% {
# just an example for a more complex calculation
indices$x[i] * indices$y[i]
}
result <- do.call(c, result)
dim(result) <- c(10, 10)
I have a function that takes i and j as parameters and returns a single value and I currently have a nested loop designed to compute a value for each entry in a square matrix. But in essence since each individual value can be computed in parallel. Is there a way I can apply lapply in this situation? The resulting matrix must be N X N and the function is dependant on i and j. Thanks
for ( i in 1:matrixRowLength ) {
for ( j in 1:matrixColLength ) {
result_matrix[i,j] <- function(i,j) } }
The foreach package has a nesting operator that can be useful when parallelizing nested for loops. Here's an example:
library(doSNOW)
cl <- makeSOCKcluster(3)
registerDoSNOW(cl)
matrixRowLength <- 5
matrixColLength <- 5
fun <- function(i, j) 10 * i + j
result_matrix.1 <-
foreach(j=1:matrixColLength, .combine='cbind') %:%
foreach(i=1:matrixRowLength, .combine='c') %dopar% {
fun(i, j)
}
Note that I reversed the order of the loops so that the matrix is computed column by column. This is generally preferable since matrices in R are stored in column-major order.
The nesting operator is useful if you have large tasks and at least one of the loops may have a small number of iterations. But in many cases, it's safer to only parallelize the outer loop:
result_matrix.2 <-
foreach(j=1:matrixColLength, .combine='cbind') %dopar% {
x <- double(matrixRowLength)
for (i in 1:matrixRowLength) {
x[i] <- fun(i, j)
}
x
}
Note that it can also be useful to use chunking in the outer loop to decrease the amount of post processing performed by the master process. Unfortunately, this technique is a bit more tricky:
library(itertools)
nw <- getDoParWorkers()
result_matrix.3 <-
foreach(jglobals=isplitIndices(matrixColLength, chunks=nw),
.combine='cbind') %dopar% {
localColLength <- length(jglobals)
m <- matrix(0, nrow=matrixRowLength, ncol=localColLength)
for (j in 1:localColLength) {
for (i in 1:matrixRowLength) {
m[i,j] <- fun(i, jglobals[j])
}
}
m
}
In my experience, this method often gives the best performance.
Thanks for an interesting question / use case. Here's a solution using the future package (I'm the author):
First, define (*):
future_array_call <- function(dim, FUN, ..., simplify = TRUE) {
args <- list(...)
idxs <- arrayInd(seq_len(prod(dim)), .dim = dim)
idxs <- apply(idxs, MARGIN = 1L, FUN = as.list)
y <- future::future_lapply(idxs, FUN = function(idx_list) {
do.call(FUN, args = c(idx_list, args))
})
if (simplify) y <- simplify2array(y)
dim(y) <- dim
y
}
This function does not make any assumptions on what data type your function returns, but with the default simplify = TRUE it will try to simplify the returned data type iff possible (similar to how sapply() works).
Then with your matrix dimensions (**):
matrixRowLength <- 5
matrixColLength <- 5
dim <- c(matrixRowLength, matrixColLength)
and function:
slow_fun <- function(i, j, ..., a = 1.0) {
Sys.sleep(0.1)
a * i + j
}
you can run calculate slow_fun(i, j, a = 10) for all elements as:
y <- future_array_call(dim, FUN = slow_fun, a = 10)
To do it in parallel on your local machine, use:
library("future")
plan(multiprocess)
y <- future_array_call(dim, FUN = slow_fun, a = 10)
On a cluster of machines (for which you have SSH access with SSH-key authentication), use:
library("future")
plan(cluster, workers = c("machine1", "machine2"))
y <- future_array_call(dim, FUN = slow_fun, a = 10)
Footnotes:
(*) If you wonder how it works, just replace the future::future_lapply() statement with a regular lapply().
(**) future_array_call(dim, FUN) should work for any length(dim), not just for two (= matrices).
I want to gather all non-factors from a dataframe d3 and put them into the table m
for (i in 1:12) {
m<-as.data.frame(matrix(100,nrow = nrow(d3),ncol = ncol(d3)))
if (!is.factor(d3[,i])){
m[,i]<-d3[,i]
}
}
For specific values of i (e.g. if i=2) , I get what I want. But looping does not work. Where do I make a mistake in code above?
You're recreating m each time through the loop; it should be outside.
m <- as.data.frame(matrix(100, nrow = nrow(d3), ncol = ncol(d3)))
for (i in 1:12) {
if (!is.factor(d3[, i])) {
m[, i] <- d3[, i]
}
}
But you really should just be using vectorized operations to do it all at once.
nonf <- !sapply(d3, is.factor)
m[, nonf] <- d3[, nonf]
I spend multiple hours of thinking about the following problem. I am running a simulation study and I want to define functions outside the simulation study in order to be able to call these functions in the end of my code.
This example illustrates the problem, but is not replicable (below you will find a replicable example of the problem). I make use of the "metafor" package for doing a meta-analysis.
I would like to use the following function that I define outside my final simulation code:
mat <- matrix(NA, nrow = 8, ncol = 3)
funtr.stu <- function(i) {
for (y in 1:8) {
mat[y,i] <- tr[[y]]$k0
}
return(mat)
}
"tr" is a list and consists of the results of 8 times an analysis. I want to retrieve the object "k0" from that list and store it into the matrix "mat".
In the following part of the code (in which I run the simulation), I want to call the function and fill the matrix "mat" with the correct numbers.
for (i in 1:iterations) {
tr.stu <- funtr.stu()
}
The result of this code is a filled matrix, but within each column the same numbers. Thus, R isnĀ“t storing the numbers every iteration, but stores only the last iteration.
How can I modify my code in such a way that R is storing the output as I want?
A very simplified example:
Mat represents just a matrix with numbers and res is an empty matrix that I want to fill.
mat <- matrix(data = c(1,2,3,4,5,6), ncol = 2, nrow = 3)
res <- matrix(NA, ncol = 2, nrow = 3)
I use the function "fun" to fill the empty matrix res.
fun <- function() {
for (i in 1:2) {
res[y,i] <- mat[y,i]
}
return(res)
}
This is what I would like to put in the end of my code (I just want to call the function and with this function I want to fill the matrix "res"). However, if I use the code below R only fills the third row and not the first and second row.
for (y in 1:3) {
test <- fun()
}
Thank you in advance!
This should work in your case. Basically, return one row in each iteration of the for loop. Where as you are returning the entire 'res' matrix.
mat <- matrix(data = c(1,2,3,4,5,6), ncol = 2, nrow = 3)
res <- matrix(NA, ncol = 2, nrow = 3)
fun <- function() {
for (i in 1:2) {
res[y,i] <- mat[y,i]
}
return(res[y,])
}
for (y in 1:3) {
test[y,] <- fun()
}