I have two 3-D arrays, and I want to calculate some statistics on them. As long as I am working with only one variable, I know how to do it. For example, to calculate the mean over the first dimension, I use the following:
obs<-array(1:8,c(2,2,2));
mod<-array(9:2,c(2,2,2));
meanObs <- apply(obs,c(2,3),mean) # mean of observation
meanMod <- apply(mod,c(2,3),mean) # mean od model simulation/forecast
However, I do not know how to feed two sliced array into apply. For example, I am trying to calculate the correlation coefficient over the first dimension. I can do it with the following loop functions:
pearsonCor<-matrix(, nrow = dim(obs)[2], ncol = dim(obs)[3])
for (i in 1:dim(obs)[2]){
for (j in 1:dim(obs)[3]){
pearsonCor[i,j]<-tryCatch(suppressWarnings(cor(obs[,i,j], mod[,i,j], method = "pearson")),
error=function(cond) {return(NA)})
}
}
result:
> pearsonCor
[,1] [,2]
[1,] -1 -1
[2,] -1 -1
But I want to learn how to deal with this situation with apply.Any help would be very much appreciated.
Thanks,
You can use expand.grid to get the index combination as in your nested for loop. Then apply over the data.frame of indices.
pearsonCor[] <- apply(expand.grid(1:dim(obs)[2], 1:dim(obs)[3]), 1, function(x)
cor(obs[,x[[1]], x[[2]]], mod[,x[[1]], x[[2]]]))
This will actually loop more quickly over the first variable (corresponding to i in the loops), so the indices would need to be reversed to have the matrix in the ordering of your question.
Related
I need to briefly explain the context before letting you know my question.
I am trying to process a large graph, namely the Social circles: Google+ here. The file gplus_combined.txt downloaded from that site is read by using data.table package:
library(data.table)
data = fread('gplus_combined.txt',stringsAsFactors = TRUE)
Variable data is of dimensions dim(data) = c(30494865,2) and here is an example of a row of data:
>data[1,]
>1: 112188647432305746617 107727150903234299458
The two long integer strings are ids of nodes of the graph, and each row of data corresponds to an edge between the first and second node ids. Since working with node ids like those are not very convenient, I'd like to convert them to numbers using R function strtoi. Here is what I have tried
M = matrix(0,2,2)
for (i in 1:2) {
for (j in 1:2) {
M[i,j] = strtoi(data[i,j,with = FALSE])
}
}
print(M)
[,1] [,2]
[1,] 47826 45374
[2,] 65616 2462
This works well, for just two rows of data. But it is too slow for processing about 30 millions rows of data. So I want to use R function apply to speed up the calculation. The problem is that if I just use
apply(data[1:2,], 1:2, strtoi)
[1,] NA NA
[2,] NA NA
then it returns a 2x2 matrix with NA entries. Note that to get the matrix M above, I need to include the parameter with = FALSE,
strtoi(data[i,j,with = FALSE])
otherwise M would also be a matrix of NA entries. Is there a way to pass the option with = FALSE to apply function? Or any other faster way to get the same result like matrix M? Any sugguestions/comments are greatly appreciated!
Thank you for spending your time reading this long post!
I'd like to populate a matrix using information from two other matrices
I have managed to do this with a given dataset, but I need to integrate this within a larger script, and the size of the two matrices I'm using to populate the larger matrix may differ each time.
Example data:
days = 150
block <- matrix(c(50,120,150), nrow=3, ncol=1)
[,1]
[1,] 50
[2,] 120
[3,] 150
e1 <- matrix(c(0.1,0.5,0.7), nrow=3, ncol=1)
[,1]
[1,] 0.1
[2,] 0.5
[3,] 0.7
result <- matrix(0, nrow = 150, ncol=1)
I need to create a vector of numbers (taken from e1) that repeat themselves depending on each number in 'block'
The code below demonstrates the desired outcome in this instance, however I'm trying to write a more flexible script that can cope with fewer than or more than 3 'blocks'
I appreciate there is probably a much easier way of doing this, but my head is stuck in loop mode and I can't seem to get out of it!
for (v1 in 1:days){
if(v1 <= block[1,1]){
result[v1,1] <- e1[1,1]
}
else if (v1 > block[1,1] & v1 <= block[2,1]){
result[v1,1] <- e1[2,1]
}
else if (v1 > block[2,1] & v1 <= block[3,1]){
result[v1,1] <- e1[3,1]
}
}
Any help would be much appreciated!
You can get this by using a nice feature of rep:
result <- rep(e1, c(block[1], diff(block)))
# cast the vector as a column matrix
result <- matrix(result, length(result))
This works because rep will accept a vector in its second argument that tells it how many times to repeat each element of its first argument.
If you know the length ahead of time, you can combine the lines, like
result <- matrix(rep(e1, c(block[1], diff(block))), days)
for example.
So, I'm new to programming in R, so I don't even know if this is feasible to even do. I have 50 matrices (50,000 rows by 10 columns) I'm trying to populate for a Monte Carlo simulation. I created all matrices in a loop and they're called mCMatrix1, mCMatrix2 etc.
I want to populate the matrices in a loop, something to this effect:
for (i in 50){
for (j in 50000){
num <- mu + tR %*% rnorm(10) # returns a 10 row, 1 column matrix
mCMatrixC"i"[]= num[,1] # basically rotates the matrix to fill in the first row
}
}
where I can somehow code the program to know that it needs to populate mCMatrix1, then mCMatrix2, all the way to the 50th matrix. For STATA users, I remember you could loop through variables with with v = forval(range of values), mCMatrix`v' . (It's been a while since I've used STATA, so the syntax probably isn't right, but it was something to that effect.
You can build a list of matrices for easier access and access it using the following. I am not sure about the matrix operation you do in the loop so I have chosen a random matrix as an example.
> list_matrices = c()
> for (i in 1:10) { list_matrices[[i]] = matrix(rnorm(9), nrow=3)}
> list_matrices[[1]]
[,1] [,2] [,3]
[1,] -0.09855292 0.2665513 0.72873888
[2,] -0.03005994 -0.4834303 -1.12356622
[3,] 0.98443875 0.5895932 0.07072777
If the core issue is to generate new (numbered) variable names and assign values to them, then I think you can use this approach:
for(i in 1:3)
{
n<- sprintf("matr%d",i)
print(n)
assign(x=n,value = i)
}
matr1
matr2
matr3
R runs on lists and data.frames which is a little bit different from other methods. Your easiest method is to create a list of of matrix names and iterate through the list.
Rawr's approach is the simplest and probably most effective.
Then you simply access it by mlist[n], n being the matrix you want.
If you want a complete data frame approach its a little more complicated but it gives a data table with indices rather than a list of matrices
library(dplyr)
yourData <- data.frame()
for (k in 1:50) {
yourData <- yourData %>%
rbind((as.data.frame(matrix(rnorm(50000 * 10), nrow=50000, ncol=10))) %>%
mutate(Run = k))
}
That way you could access it as
yourData %>% filter(Run = n)
I have a list of matrices such that my_list[[1]] consists of a matrix and my_list[[2]] contains another matrix and so on. I want to embed this list inside a loop such that for every iteration of the loop I have a different my_list with different matrices, and want to be able to access them later. Is there any way I could do this in R? For example like creating an array (of size = number of iterations of the loop), and each index of the array would have a different list of matrices. Or something similar. And how can I access it. Could anyone please help me with this? I would greatly appreciate the help. I have looked around but cannot find a way to do this. Lists of lists seem to be an option, and I have tried to experiment with it for one iteration but it gives this error:
> nes <- list()
> nes[[1]] <- append(nes[[1]], my_list[[1]])
Error in nes[[1]] : subscript out of bounds
Would be great if anyone could help me with this.
EDIT:
Basically what I have is an initial list known as particles. Something like this:
for (k in 1:10)
{
# three centroids; k = 3
particle[[k]] <- rbind(features.dataf[sample(1:10, 1),2:4],
features.dataf[sample(1:10, 1),2:4],
features.dataf[sample(1:10, 1),2:4])
row.names(particle[[k]]) <- c(1,2,3)
}
Then I run this loop again. With an extra outer loop.
for (n in 1:30) {
for (k in 1:10) {
###some calculations
### create a vector f[k] with an f value for each k (calculated according to some formula)
pbestFitness[n,k] <- f[k] ##create a nXk dataframe that stores the f[k] value for every iteration of n
### over here I want to create a list of lists
}
}
In the above code where I create the list of lists, such that for every iteration of the outer loop I have a particle[[k]]th matrix stored.
Any particle[[k]] is of the form:
[,1] [,2] [,3]
[1,] 0.96436532 0.8958297 0.6089338
[2,] 0.08555853 0.7762849 0.6647247
[3,] 0.30792817 0.8061227 0.5099790
The desired output would be something like that if I try to access this new lists of lists (nes), its nes[[n]] value should have a list with k number of matrices.
I am doing an R assignment and I have to write a function that does what dist.xyz does.
dist.xyz(a, b = NULL, all.pairs=FALSE)
a and b are matrices of numbers and the function computes the distances between corresponding rows of
‘a’ and ‘b’.
I tried a for loop (as below) but it takes too long and "apply" only allows us to do operation on 1 matrix at a time.
dis = vector()
for (i in 1:nrow(a)) {
append(dis,sqrt(sum((a[i,] - b[i,]) ^ 2)))
}
Is there some way to "apply" to two matrices?
Thanks in advance
Would be easier if you had example data. But here's my take. This isn't a general solution for '"apply" for 2 matrices'. However, in your case, you only need apply for a single matrix a-b, since the element-wise difference of each row is the first thing you take. Then apply square, sum, and square root to each row to obtain your result.
set.seed(7) # just to ensure reproducible results
rowDist<-function(a,b) {
apply(a-b,1,function(x)sqrt(sum(x^2)))
}
a<-matrix(rnorm(25),5,5)
b<-matrix(rnorm(25),5,5)
rowDist(a,b)
#[1] 2.716251 2.685056 3.699462 2.125998 3.437412