Problem
Say I have a function that is currently not vectorized. The following is just an example :
FunctionNotVectorized = function(x,y,some_options) return(x[1]+y[1])
which has, say, 10 different options. I would like to
1) define a matrix of size 1e5 x 1e5 for each option.
2) then, for each matrix, assign values for their corresponding indices.
First, I defined a matrix of size 1e5 x 1e5 for each option, by for loop :
for (k in 1:10){
assign(sprintf("res%02d", k), matrix(0,1e5,1e5))
}
which defines matrices named res01, ... res10.
Second, I tried to assign values for their corresponding indices for each matrix. But I'm stuck here
Try
What I would like to do:
for (i in 1:1e5){
for (j in 1:1e5){
for (k in 1:10){
assign(sprintf("res%02d[i,j]", k),
FunctionNotVectorized(i,j,some_options=k))
}
}
}
but clearly, assign(sprintf("res%02d[i,j]", k) does not work. Any help will be appreciated.
Avoid using loops in R, because in makes calculation hundreds times slowlier. Only with iterations<100 it is ok to use for/while/etc
Use lapply to operate on whatever objects the same way, then do.call to aggregate them from list. Use lists instead of assigning. lapply and list are close friends
Here is an example for matrices with sizes 15x15:
mtxs = list() #create empty list which will get filled
for(k in 1:10){ # loop over 10 matrixes
mtx = do.call(c,lapply(1:15,function(x){ # gathering second vectorized calculation
do.call(c,lapply(1:15, # gathering first vectorized calculation
function(y){functionNotVectorized(y, x, k) } ))})) # lapply over rows ans cols
mtxs[[k]] = matrix(mtx, 15, 15) # assigning matrices
}
Simply use a named list without need to use assign to add objects to global environment:
# BUILD LIST OF MATRICES
my_matrix_list <- setNames(replicate(10, matrix(0,1e5,1e5), simplify = FALSE),
paste0("res", 1:10, "d"))
# DYNAMICALLY ASSIGN VALUE BY OBJECT NAME
for (i in 1:1e5){
for (j in 1:1e5){
for (k in 1:10){
my_matrix_list[paste0("res", k, "d")][i,j] <-
FunctionNotVectorized(i,j,some_options=k)
}
}
}
# REFERENCE ITEMS IN LIST
my_matrix_list$res1d
my_matrix_list$res2d
my_matrix_list$res3d
...
Related
I am trying to multiply the values stored in a list containing 1,000 values with another list containing ages. Ultimately, I want to store 1,000 rows to a dataframe.
I wonder if it's better to use lapply fucntion or for loop function here.
list 1
lambdaSamples1 <- lapply(
floor(runif(numSamples, min = 1, max = nrow(mcmcMatrix))),
function(x) mcmcMatrix[x, lambdas[[1]]])
*the out put is 1,000 different values in a list. *
list 2
ager1= 14:29
What I want to do is
for (i in 1: numSamples) {
assign(paste0("newRow1_", i), 1-exp(-lambdaSample1[[i]]*ager1))
}
now I got 1,000 rows of values that I want to store in a predetermiend dataframe, outDf_1 (nrow=1000, ncol = ager1).
I tried
`
for (i in 1:numSamples) {
outDf_1[i,] <- newRow1_i
}
I want to store newRow1_1, ,,,,,, , newRow1_1000 to each of the 1,000 row of outDf_1 dataframe.
SHould I approach different way?
I think you're overcomplicating this a bit. Many operations in R are vectorized so you shoudln't need lapply or for loops for this. You didn't give us any data to work with but the code below should do what you want in a more straightforward and fast way.
lambdaSamples1 <- mcmcMatrix[sample(nrow(mcmcMatrix), numSamples, replace=T),
lambdas[[1]]]
outDF_1 <- 1 - exp(-lambdaSamples1 %*% t(ager1))
Just note that this makes outDF_1 a matrix, not a data frame.
To do this for multiple ages, you could use a loop to save your resulting matrices in a list:
outDF <- list()
x <- 5
for (i in seq_len(x)) {
lambdaSamples <- mcmcMatrix[sample(nrow(mcmcMatrix), numSamples, replace=T),
lambdas[[1]]]
outDF[[i]] <- 1 - exp(-lambdaSamples %*% t(ager[[i]]))
}
Here, ager1, ..., agerx are expected to be stored in a list (ager).
I have a vector X of length n, and a list of indices L of variable length. Let F be a function from R^m to R. I want to apply the function F to each subvector X[L[[i]]. This is, I want to calculate F( X[ L[[i]] ] )
For example, suppose that F is the mean
set.seed(123)
X <- rnorm(100)
L <- list()
for(i in 1:10) L[[i]] <- sample(1:100,30,replace = FALSE)
By brute force I could calculate
out <- vector()
for(i in 1:10) out[i] <- mean(X[ L[[i]] ])
However, this for loop is rather slow for larger dimensions. I was wondering if there is a more direct way for calculating out? I have tried to use lapply but it does no seem to work for the combination of a vector + a list of indices + a function.
You can simply use lapply to loop over your list and use each element to subset your vector X. Once you subset, calculate the mean, i.e.
lapply(L, function(i) mean(X[i]))
I am trying to figure out how to have R accept my index within a for loop. As a simple example, I would like each new matrix to have a name index that is different than the one before:
for(i in 1:5){
new.matrix.i <- matrix(NA, nrow = i, ncol = i)
}
From this loop, I know it obviously doesn't work, but was wondering how I could create 5 new matrices, with the first being a one-by-one matrix of NA, and the second a two-by-two matrix of NA's, all the way to a five-by-five matrix with all NA's.
In other words, I am wondering how to have R treat
new.matrix.i
with i as a dynamic name instead of just a regular name for a matrix? Thanks!
We can use lapply to create a list of matrices
lst <- lapply(1:5, function(i) matrix(NA, nrow = i, ncol = i))
Or we proceed with for loop, initialize the new.matrix.i as a list
new.matrix.i <- vector("list", 5)
for(i in 1:5){
new.matrix.i[[i]] <- matrix(NA, nrow = i, ncol = i)
}
NOTE: It is better not to create multiple objects in the global environment. A list of matrix (or other objects) are easier and convenient to use
I'm trying to create a data.frame that takes different values depending on the value of a reference data.frame. I only know how to do this with a "for loop", but have been advised to avoid for loops in R... and my actual data have ~500,000 rows x ~200 columns.
a <- as.data.frame(matrix(rbinom(10,1,0.5),5,2,dimnames=list(c(1:5),c("a","b"))))
b <- data.frame(v1=c(2,10,12,5,11,3,4,14,2,13),v2=c("a","b","b","a","b","a","a","b","a","b"))
c <- as.data.frame(matrix(0,5,2))
for (i in 1:5){
for(j in 1:2){
if(a[i,j]==1){
c[i,j] <- mean(b$v1[b$v2==colnames(a)[j]])
} else {
c[i,j]= mean(b$v1)
}}}
c
I create data.frame "c" based on the value in each cell, and the corresponding column name, of data.frame "a".
Is there another way to do this? Indexing? Using data.table? Maybe apply functions?
Any and all help is greatly appreciated!
(a == 0) * mean(b$v1) + t(t(a) * c(tapply(b$v1, b$v2, mean)))
Run in pieces to understand what's happening. Also, note that this assumes ordered names in a (and 0's and 1's as entries in it, as per OP).
An alternative to a bunch of t's as above is using mapply (this assumes a is a data.frame or data.table and not a matrix, while the above doesn't care):
(a == 0) * mean(b$v1) + mapply(`*`, a, tapply(b$v1, b$v2, mean))
#subsetting a matrix is faster
res <- as.matrix(a)
#calculate fill-in values outside the loop
in1 <- mean(b$v1)
in2 <- sapply(colnames(a),function(i) mean(b$v1[b$v2==i]))
#loop over columns and use a vectorized approach
for (i in seq_len(ncol(res))) {
res[,i] <- ifelse(res[,i]==0, in1, in2[i])
}
I have 1000 matrices named A1, A2, A3,...A1000.
In a for loop I would like to simply take the colMeans() of each matrix:
for (i in 1:1000){
means[i,]<-colMeans(A1)
}
I would like to do this for each matrix Ax. Is there a way to put Ai instead of A1 in the for loop?
So, one way is:
for (i in 1:1000){
means[i,]<-colMeans(get(paste('A', i, sep = '')))
}
but I think that misses the point of some of the comments, i.e., you probably had to do something like this:
csvs = lapply(list.files('.', pattern = 'A*.csv'), function(fname) {
read.csv(fname)
})
Then the answer to your question is:
means = lapply(csvs, colMeans)
I don't completely understand, but maybe you have assigned each matrix to a different variable name? That is not the best structure, but you can recover from it:
# Simulate the awful data structure.
matrix.names<-paste0('A',1:1000)
for (name in matrix.names) assign(name,matrix(rnorm(9),ncol=3))
# Pull it into an appropriate list
list.of.matrices<-lapply(matrix.names,get)
# Calculate the column means
column.mean.by.matrix<-sapply(list.of.matrices,colMeans)
You initial question asks for a 'for loop' solution. However, there is an easy way to get the desired
result if we use an 'apply' function.
Perhaps putting the matrices into a list, and then applying a function would prove worthwhile.
### Create matrices
A1 <- matrix(1:4, nrow = 2, ncol = 2)
A2 <- matrix(5:9, nrow = 2, ncol = 2)
A3 <- matrix(11:14, nrow = 2, ncol = 2)
### Create a vector of names
names <- paste0('A', 1:3)
### Create a list of matrices, and assign names
list <- lapply(names, get)
names(list) <- names
### Apply the function 'colMeans' to every matrix in our list
sapply(list, colMeans)
I hope this was useful!
As others wrote already, using a list is perhaps your best option. First you'll need to place your 1000 matrices in a list, most easily accomplished using a for-loop (see several posts above). Your next step is more important: using another for-loop to calculate the summary statistics (colMeans).
To apply a for-loop through an R object, in general you can do one of the two options:
Loop over by indices: for example:
for(i in 1:10){head(mat[i])} #simplistic example
Loop "directly"
for(i in mat){print(i)} #simplistic example
In the case of looping through R lists, the FIRST option will be much easier to set up. Here is the idea adapted to your example:
column_means <- rep(NA,1000) #empty vector to store column means
for (i in 1:length(list_of_matrices)){
mat <- list_of_matrices[[i]] #temporarily store individual matrices
##be sure also to use double brackets!
column_means <- c(column_means, colMeans(mat))