So, I'm new to programming in R, so I don't even know if this is feasible to even do. I have 50 matrices (50,000 rows by 10 columns) I'm trying to populate for a Monte Carlo simulation. I created all matrices in a loop and they're called mCMatrix1, mCMatrix2 etc.
I want to populate the matrices in a loop, something to this effect:
for (i in 50){
for (j in 50000){
num <- mu + tR %*% rnorm(10) # returns a 10 row, 1 column matrix
mCMatrixC"i"[]= num[,1] # basically rotates the matrix to fill in the first row
}
}
where I can somehow code the program to know that it needs to populate mCMatrix1, then mCMatrix2, all the way to the 50th matrix. For STATA users, I remember you could loop through variables with with v = forval(range of values), mCMatrix`v' . (It's been a while since I've used STATA, so the syntax probably isn't right, but it was something to that effect.
You can build a list of matrices for easier access and access it using the following. I am not sure about the matrix operation you do in the loop so I have chosen a random matrix as an example.
> list_matrices = c()
> for (i in 1:10) { list_matrices[[i]] = matrix(rnorm(9), nrow=3)}
> list_matrices[[1]]
[,1] [,2] [,3]
[1,] -0.09855292 0.2665513 0.72873888
[2,] -0.03005994 -0.4834303 -1.12356622
[3,] 0.98443875 0.5895932 0.07072777
If the core issue is to generate new (numbered) variable names and assign values to them, then I think you can use this approach:
for(i in 1:3)
{
n<- sprintf("matr%d",i)
print(n)
assign(x=n,value = i)
}
matr1
matr2
matr3
R runs on lists and data.frames which is a little bit different from other methods. Your easiest method is to create a list of of matrix names and iterate through the list.
Rawr's approach is the simplest and probably most effective.
Then you simply access it by mlist[n], n being the matrix you want.
If you want a complete data frame approach its a little more complicated but it gives a data table with indices rather than a list of matrices
library(dplyr)
yourData <- data.frame()
for (k in 1:50) {
yourData <- yourData %>%
rbind((as.data.frame(matrix(rnorm(50000 * 10), nrow=50000, ncol=10))) %>%
mutate(Run = k))
}
That way you could access it as
yourData %>% filter(Run = n)
Related
I am trying to create a matrix of coordinates(indexes) that I randomly pick one from using the sample function. I then use these to select a cell in another matrix. What is the best way to do this? The trouble is how to store these integers in the matrix so that they are easy to separate. Right now I have them stored as strings with a comma, that I then split. Someone suggested I use a pair, or a string, but I cannot seam to get these to work with a matrix. Thanks!
EDIT:What i currently have looks like this (changed a little to make sense out of context):
probs <- matrix(c(0,0,0.6,0,0,
0,0.7,1,0.7,0,
0.6,1,0,1,0.6,
0,0.7,1,0.7,0,
0,0,0.6,0,0),5,5)
cordsMat <- matrix("",5,5)
for (x in 1:5){
for (y in 1:5){
cordsMat[x,y] = paste(x,y,sep=",")
}
}
cords <- sample(cordsMat,1,,probs)
cordsVec <- unlist(strsplit(cords,split = ","))
cordX <- as.numeric(cordsVec[1])
cordY <- as.numeric(cordsVec[2])
otherMat[cordX,cordY]
It sort of works but i would also be interested for a better way, as this will get repeated a lot.
If you want to set the probabilities it can easily be done by providing it to sample
# creating the matrix
matrix(sample(rep(1:6, 15:20), 25), 5) -> other.mat
# set the probs vec
probs <- c(0,0,0.6,0,0,
0,0.7,1,0.7,0,
0.6,1,0,1,0.6,
0,0.7,1,0.7,0,
0,0,0.6,0,0)
# the coordinates matrix
mat <- as.matrix(expand.grid(1:nrow(other.mat),1:ncol(other.mat)))
# sampling a row randomly
sample(mat, 1, prob=probs) -> rand
# getting the value
other.mat[mat[rand,1], mat[rand,2]]
[1] 6
I am trying to create 1000 variables, which I want to name with the index number. I don't know how to create these new variables.
for(i in 1:1000) {
Ui <- rnorm(200,0,1)
}
This is a common sort of thing that people want to do, especially when they are coming from other programming languages. However, there are better ways to accomplish the same thing, and you should not follow recommendations to use assign; that is bad advice that you will likely regret later on.
The way we do this sort of thing in R is to use lists, specifically named lists:
x <- replicate(1000,rnorm(200,0,1),simplify = FALSE)
x <- setNames(x,paste0("A",seq_along(x)))
Now x is a named list of length 1000, each element is a vector of length 200 from a normal(0,1) distribution.
You can refer to each one via x[[1]] or x[["A1"]] as needed. Additionally, since they are all in the same object, you can operate on the easily as a group using tools like lapply.
Pretty much any time you find yourself wanting to create a sequence of objects with similar names, that should be a signal to you that you should be using a list instead.
There is no point in cluttering your environment with so many variables, try to store them in a named list instead
l1 <- setNames(lapply(1:5, function(x) rnorm(5)), paste0("A", 1:5))
l1
#$A1
#[1] 0.4951453 -1.4278665 0.5680115 0.3537730 -0.7757363
#$A2
#[1] -0.11096037 0.05958700 0.02578168 1.00591996 0.54852030
#$A3
#[1] 0.1058318 0.6988443 -0.8213525 -0.1072289 0.8757669
#$A4
#[1] -0.6629634 0.8321713 -0.3073465 -0.2645550 -1.0064132
#$A5
#[1] 2.2191246 0.2054360 -0.1768357 1.6875302 -1.1495807
Now you can access individual list element as
l1[["A1"]]
#[1] 0.4951453 -1.4278665 0.5680115 0.3537730 -0.7757363
Moreover, other method is to generate all the numbers together and then split them into list.
groups = 5
each = 5
setNames(split(rnorm(groups * each), rep(seq_len(groups), each = each)),
paste0("A", seq_len(groups)))
I agree with the others that this is not a good idea. Anyway, to answer your question how you wolud do that is
k <- 1000 # number of variables
n <- 200 # sample size of each variable
for(i in 1:k){
assign(paste0("variable", i), rnorm(n, 0, 1))}
variable1
-0.012947062 0.728284959 -1.627796366 0.003471491 ...
However, personally I would prefer another solution. The both answers so far suggest using lists. I find lsts quite cumbersome, especially if you are new to R. So I would suggest creating a matrix where every column contains one variable.
# creates a matrix
m <- matrix(rep(NA, n*k), ncol= k)
# generates rnorm() in each column
for(i in 1:k){
m[ , i] <- rnorm(n, 0, 1)
}
# now you can name the columns
colnames(m) <- paste0("variable", 1:k)
m
variable1 variable2 ...
[1,] 0.30950749 -2.07388046
[2,] -1.13232330 -0.55511476
...
I have two 3-D arrays, and I want to calculate some statistics on them. As long as I am working with only one variable, I know how to do it. For example, to calculate the mean over the first dimension, I use the following:
obs<-array(1:8,c(2,2,2));
mod<-array(9:2,c(2,2,2));
meanObs <- apply(obs,c(2,3),mean) # mean of observation
meanMod <- apply(mod,c(2,3),mean) # mean od model simulation/forecast
However, I do not know how to feed two sliced array into apply. For example, I am trying to calculate the correlation coefficient over the first dimension. I can do it with the following loop functions:
pearsonCor<-matrix(, nrow = dim(obs)[2], ncol = dim(obs)[3])
for (i in 1:dim(obs)[2]){
for (j in 1:dim(obs)[3]){
pearsonCor[i,j]<-tryCatch(suppressWarnings(cor(obs[,i,j], mod[,i,j], method = "pearson")),
error=function(cond) {return(NA)})
}
}
result:
> pearsonCor
[,1] [,2]
[1,] -1 -1
[2,] -1 -1
But I want to learn how to deal with this situation with apply.Any help would be very much appreciated.
Thanks,
You can use expand.grid to get the index combination as in your nested for loop. Then apply over the data.frame of indices.
pearsonCor[] <- apply(expand.grid(1:dim(obs)[2], 1:dim(obs)[3]), 1, function(x)
cor(obs[,x[[1]], x[[2]]], mod[,x[[1]], x[[2]]]))
This will actually loop more quickly over the first variable (corresponding to i in the loops), so the indices would need to be reversed to have the matrix in the ordering of your question.
I have a list of matrices such that my_list[[1]] consists of a matrix and my_list[[2]] contains another matrix and so on. I want to embed this list inside a loop such that for every iteration of the loop I have a different my_list with different matrices, and want to be able to access them later. Is there any way I could do this in R? For example like creating an array (of size = number of iterations of the loop), and each index of the array would have a different list of matrices. Or something similar. And how can I access it. Could anyone please help me with this? I would greatly appreciate the help. I have looked around but cannot find a way to do this. Lists of lists seem to be an option, and I have tried to experiment with it for one iteration but it gives this error:
> nes <- list()
> nes[[1]] <- append(nes[[1]], my_list[[1]])
Error in nes[[1]] : subscript out of bounds
Would be great if anyone could help me with this.
EDIT:
Basically what I have is an initial list known as particles. Something like this:
for (k in 1:10)
{
# three centroids; k = 3
particle[[k]] <- rbind(features.dataf[sample(1:10, 1),2:4],
features.dataf[sample(1:10, 1),2:4],
features.dataf[sample(1:10, 1),2:4])
row.names(particle[[k]]) <- c(1,2,3)
}
Then I run this loop again. With an extra outer loop.
for (n in 1:30) {
for (k in 1:10) {
###some calculations
### create a vector f[k] with an f value for each k (calculated according to some formula)
pbestFitness[n,k] <- f[k] ##create a nXk dataframe that stores the f[k] value for every iteration of n
### over here I want to create a list of lists
}
}
In the above code where I create the list of lists, such that for every iteration of the outer loop I have a particle[[k]]th matrix stored.
Any particle[[k]] is of the form:
[,1] [,2] [,3]
[1,] 0.96436532 0.8958297 0.6089338
[2,] 0.08555853 0.7762849 0.6647247
[3,] 0.30792817 0.8061227 0.5099790
The desired output would be something like that if I try to access this new lists of lists (nes), its nes[[n]] value should have a list with k number of matrices.
I am having trouble optimising a piece of R code. The following example code should illustrate my optimisation problem:
Some initialisations and a function definition:
a <- c(10,20,30,40,50,60,70,80)
b <- c(“a”,”b”,”c”,”d”,”z”,”g”,”h”,”r”)
c <- c(1,2,3,4,5,6,7,8)
myframe <- data.frame(a,b,c)
values <- vector(length=columns)
solution <- matrix(nrow=nrow(myframe),ncol=columns+3)
myfunction <- function(frame,columns){
athing = 0
if(columns == 5){
athing = 100
}
else{
athing = 1000
}
value[colums+1] = athing
return(value)}
The problematic for-loop looks like this:
columns = 6
for(i in 1:nrow(myframe){
values <- myfunction(as.matrix(myframe[i,]), columns)
values[columns+2] = i
values[columns+3] = myframe[i,3]
#more columns added with simple operations (i.e. sum)
solution <- rbind(solution,values)
#solution is a large matrix from outside the for-loop
}
The problem seems to be the rbind function. I frequently get error messages regarding the size of solution which seems to be to large after a while (more than 50 MB).
I want to replace this loop and the rbind with a list and lapply and/or foreach. I have started with converting myframeto a list.
myframe_list <- lapply(seq_len(nrow(myframe)), function(i) myframe[i,])
I have not really come further than this, although I tried applying this very good introduction to parallel processing.
How do I have to reconstruct the for-loop without having to change myfunction? Obviously I am open to different solutions...
Edit: This problem seems to be straight from the 2nd circle of hell from the R Inferno. Any suggestions?
The reason that using rbind in a loop like this is bad practice, is that in each iteration you enlarge your solution data frame and then copy it to a new object, which is a very slow process and can also lead to memory problems. One way around this is to create a list, whose ith component will store the output of the ith loop iteration. The final step is to call rbind on that list (just once at the end). This will look something like
my.list <- vector("list", nrow(myframe))
for(i in 1:nrow(myframe)){
# Call all necessary commands to create values
my.list[[i]] <- values
}
solution <- rbind(solution, do.call(rbind, my.list))
A bit to long for comment, so I put it here:
If columns is known in advance:
myfunction <- function(frame){
athing = 0
if(columns == 5){
athing = 100
}
else{
athing = 1000
}
value[colums+1] = athing
return(value)}
apply(myframe, 2, myfunction)
If columns is not given via environment, you can use:
apply(myframe, 2, myfunction, columns) with your original myfunction definition.