I am running some big simulations where in each iteration a matrix of unknown row number gets created. I want to unite all these matrices into one big matrix. Intuitively, the easiest way to do this is setting up an empty matrix before the loop and then appending each result of an iteration with rbind. However, this is very inefficient because of memory allocation and I need to have a fast script!
Therefore, I decided to create a sparse results matrix filled with 0s prior to the loop and then replace its rows iteratively with the new matrices. The size of this results matrix is very exaggerated to make sure that I have "enough" rows. The "unused" rows get removed after the loop. An example of my current solution is as follows (this is just dummy code, the real code is more complex):
library(Matrix)
#Create empty results matrix
matrix_to_fill<-sparseMatrix(i=integer(0),j=integer(0),x=0,
dims=c(100,10))
#Run 10 iterations where a new matrix of unknown size is created
for(i in 1:10){
#Find first row that is a 0
first0<-(1:100)[matrix_to_fill[,1]==0][1]
#Create a new dataframe with random number of rows
new_matrix<-matrix(rep(1,10*(rpois(1,2)+1)),ncol=10)
#Replace section of matrix_to_fill starting at first 0-line with new matrix
matrix_to_fill[first0:(first0+nrow(new_matrix)-1),]<-new_matrix
}
matrix_to_fill<-matrix_to_fill[matrix_to_fill[,1]!=0,]
However, again, I am running into a similar problem with memory allocation. Since I do not know what is the number of rows of each of my matrices, I have to store them first, calculate their row number and then replace the respective rows in my results matrix.
Is there a way I can replace an unknown number of rows in my results table with a new matrix (I do know the starting row and I do know that my table is big enough to "fit" the new matrix)? Or do I have to solve this by creating a "results list" of known size prior to the loop instead of a "results matrix" and then fill each new matrix into the list? This would be of course possible, but I'm afraid this would be less efficient...
Thanks!
Related
I have a vector "perm" of length 445, which includes 260 zeros and 185 ones. I need to generate the matrix of all its possible permutations. Then I need to perform a couple of calculations efficiently. I have kind of managed to do this but not completely and not efficiently (I used a loop). I would need help in improving my code, as it would be very instructive.
First, since R does not let me create this huge matrix of permutations, I use the 'ri' package to randomly sample 100,000 permutations.
library(ri)
lalonde <- read.csv('lalonde.csv')
perms <- genperms(lalonde$treat, maxiter=100000)
Instead, I would like to be able to generate the full matrix of permutations (or a list of lists, if that works better).
Then, I merge the my original dataset, lalonde, with the permutation dataset.
lalonde1 <- data.frame(perms, lalonde)
I create an empty vector, where I store the output of the loop that follows
diff_vec <- vector(mode='list', length=100000)
I create a loop that calculates an absolute conditional difference in means per each permuted vector. I store the results in the empty vector. This is far from efficient and I would very much appreciate if I could get some advice on how to do it better.
for (i in 1:100000) {
diff_vec[i] <- abs((mean(lalonde1[lalonde1[[i]]==1,"re78"])-mean(lalonde1[lalonde1[[i]]==0,"re78"])))
}
Finally, for each absolute difference in means, check if it is greater or equal to a value I stored, assigning 1 if it is and 0 otherwise.
p_val<-ifelse(diff_vec>=tau_hat, 1, 0)
For those who care, this is to calculate an exact p-value in a completely randomised experiment.
I have a large data set (>100,000 rows) and would like to create a new column that sums all previous values of another column.
For a simulated data set test.data with 100,000 rows and 2 columns, I create the new vector that sums the contents of column 2 with:
sapply(1:100000, function(x) sum(test.data[1:x[1],2]))
I append this vector to the test.table later with cbind() This is too slow, however. Is there a faster way to accomplish this, or be able to reference the vector that sapply is making in sapply so I can just update the cumulative sum instead of performing the whole calc again?
Per my comment above it'll be faster if you do a direct assignment and use cumsum instead of sapply (cumsum was specifically built for what you want to do).
This should work:
test.data$sum <- cumsum(test.data[, 2])
I would like to know the command in R that can let me to do sample by bootstrap. I have a TxN big matrix and a Tx1 vector.
I would like just to extract random row (mantaining the same position of the elements in the row, WITH REPLACEMENT) from the matrix and the vector, creating a new TxN matrix' and a new Tx1 vector', but the random t extract shoould be the same for the matrix as for the vector (if the random row is the number 5 for the matrix, i want that the random row for the vector is still the number 5)
Based on your question I think this might be the answer, but it seems too simple.
I've created some fake data to work with.
TN<-matrix(1:20, 4,5)
T<-4
ind<-sample(1:T,T,replace = T)
newTN<-TN[ind,]
T1<-c(1:T)
newT1<-T1[ind]
I got the error message when I tried to pass a list of matrices to a newly defined list.
t<-list()
for(i in 1:N){t[[i]]<-Z[[i]][,2]}
here Z is a list of matrices with different sizes(the number of columns in each matrix is same but the rows of matrix is different. And I attempted to create a new matrices list t and each matrix in t has the second column of corresponding matrix in original list Z. But it always didn't work.
The most strange thing is N is 300 but this loop stopped when N approached to 81 every time and gave the error message. Thank you in advance!
Hopefully this has an easy answer I just haven't been able to find:
I am trying to write a simulation that will compare a number of statistical procedures on different subsets of rows (subjects) and columns (variables) of a large matrix.
Subsets of rows was fairly easy using a sample() of the subject ID numbers, but I am running into a little more trouble with columns.
Essentially, what I'd like to be able to do is create a random sample of column index numbers which will then be used to create a new matrix. What's got me the closest so far is:
testmat <- matrix(rnorm(10000),nrow=1000,ncol=100)
column.ind <- sample(3:100,20)
teststr <- paste("testmat[,",column.ind,"]",sep="",collapse=",")
which gives me a string that has a testmat[,column.ind] for every sampled index number. Is there any way to easily plug that into a cbind() function to make a new matrix? Is there any other obvious way I'm missing?
I've been able to do it using a loop (i.e. cbind(matrix,newcolumn) over and over), but that's fairly slow as the matrix I'm using is quite large and I will be doing this many times. I'm hoping there's a couple-line solution that's more elegant and quicker.
Have you tried testmat[, column.ind]?
Rows and columns can be indexed in the same way with logical vectors, a set of names, or numbers for indexes.
See here for an example: http://ideone.com/EtuUN.