I'm using apply function to calculate migration transition probabilities from eight states at time t1 to the same eight states at time t2. My data is save in matrix format and named tmp (as follows). the states at time t1 are my rows, and the states at time t2 are my columns. e.g. 228 persons stayed at state 1 between t1 and t2; 3 persons moved from state 2 to state 1 between t1 and t2.
> class(tmp)
"matrix"
> tmp
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 228 0 0 0 0 0 0 0
[2,] 3 92 0 0 0 0 0 0
[3,] 0 0 30 0 0 0 0 0
[4,] 0 0 0 20 0 0 0 0
[5,] 0 0 0 0 19 0 0 0
[6,] 0 0 0 0 0 0 0 0
[7,] 0 0 0 3 0 0 0 0
[8,] 0 0 0 0 0 0 0 3
I used the follow code to calculate the probabilities of moving to or staying in a certain state. It is the proportions for cells in each row. The results are saved in tmp1.
> tmp1=apply(tmp,1,function(X){if (sum(X)!=0) {X/sum(X)} else {numeric(length(X))}})
The problem is: I expected tmp1[4,7] to be 0 (because tmp[4,7] is 0), but the code returns me 1 (bolded). Is there a problem in my apply function?
> tmp1
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 1 0.03157895 0 0 0 0 0 0
[2,] 0 0.96842105 0 0 0 0 0 0
[3,] 0 0.00000000 1 0 0 0 0 0
[4,] 0 0.00000000 0 1 0 0 **1** 0
[5,] 0 0.00000000 0 0 1 0 0 0
[6,] 0 0.00000000 0 0 0 0 0 0
[7,] 0 0.00000000 0 0 0 0 0 0
[8,] 0 0.00000000 0 0 0 0 0 1
Related
Say that I have a 10 x 5 matrix of zeros in matrix m
m <- matrix(0,10,5)
which looks like this
[,1] [,2] [,3] [,4] [,5]
[1,] 0 0 0 0 0
[2,] 0 0 0 0 0
[3,] 0 0 0 0 0
[4,] 0 0 0 0 0
[5,] 0 0 0 0 0
[6,] 0 0 0 0 0
[7,] 0 0 0 0 0
[8,] 0 0 0 0 0
[9,] 0 0 0 0 0
[10,] 0 0 0 0 0
now I have a list of coordinates in a matrix called xy:
x y
[1,] 3 1
[2,] 7 3
[3,] 8 1
[4,] 9 4
and I want to update the matrix by taking each row of coordinates above and adding 1 to the cell in matrix m that it refers to -- so the output would then look like this
[,1] [,2] [,3] [,4] [,5]
[1,] 0 0 0 0 0
[2,] 0 0 0 0 0
[3,] 1 0 0 0 0
[4,] 0 0 0 0 0
[5,] 0 0 0 0 0
[6,] 0 0 0 0 0
[7,] 0 0 1 0 0
[8,] 1 0 0 0 0
[9,] 0 0 0 1 0
[10,] 0 0 0 0 0
Your help is appreciated!
As long as you provide the coordinates as a matrix, 1st column specifiying row, 2nd column specifiying column, you can do:
xy = cbind(c(3,7,8,9),c(1,3,1,4))
m[xy] = 1
[,1] [,2] [,3] [,4] [,5]
[1,] 0 0 0 0 0
[2,] 0 0 0 0 0
[3,] 1 0 0 0 0
[4,] 0 0 0 0 0
[5,] 0 0 0 0 0
[6,] 0 0 0 0 0
[7,] 0 0 1 0 0
[8,] 1 0 0 0 0
[9,] 0 0 0 1 0
[10,] 0 0 0 0 0
How to do a row-wise replacement of values using R?
I have a Matrix and I would like to replace some of its values using an index vector. The problem is that R automatically does a column-wise extraction of the values as opposed to a row-wise.
You will find my code and results below:
Matrix=matrix(rep(0,42),nrow=6,ncol=7,byrow=TRUE)
v=c(1,7,11,16,18)
Matrix[v]=1
Matrix
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 1 1 0 0 0 0 0
[2,] 0 0 0 0 0 0 0
[3,] 0 0 0 0 0 0 0
[4,] 0 0 1 0 0 0 0
[5,] 0 1 0 0 0 0 0
[6,] 0 0 1 0 0 0 0
What I actually want to get is the row-wise version of this meaning:
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 1 0 0 0 0 0 1
[2,] 0 0 0 1 0 0 0
[3,] 0 1 0 1 0 0 0
[4,] 0 0 0 0 0 0 0
[5,] 0 0 0 0 0 0 0
[6,] 0 0 0 0 0 0 0
>
Apparently R does a column-wise replacement of values by default.
What is the best way to obtain a row-wise replacement of the values?
Thanks!
You could recalculate the onedimensional indizes to row- and column-indices. Supposing you have calculated the row-indices in the first column of the matrix Ind and the columnindices in the second column of Ind you can do Matrix[Ind] <- 1
Matrix <- matrix(rep(0,42),nrow=6,ncol=7,byrow=TRUE)
v <- c(1,7,11,16,18)
Row <- (v-1) %/% ncol(Matrix) +1
Col <- (v-1) %% ncol(Matrix) +1
Matrix[cbind(Row,Col)] <- 1
Matrix
# [,1] [,2] [,3] [,4] [,5] [,6] [,7]
# [1,] 1 0 0 0 0 0 1
# [2,] 0 0 0 1 0 0 0
# [3,] 0 1 0 1 0 0 0
# [4,] 0 0 0 0 0 0 0
# [5,] 0 0 0 0 0 0 0
# [6,] 0 0 0 0 0 0 0
We can do
+(matrix(seq_along(Matrix) %in% v, ncol=ncol(Matrix), nrow=nrow(Matrix), byrow=TRUE))
# [,1] [,2] [,3] [,4] [,5] [,6] [,7]
#[1,] 1 0 0 0 0 0 1
#[2,] 0 0 0 1 0 0 0
#[3,] 0 1 0 1 0 0 0
#[4,] 0 0 0 0 0 0 0
#[5,] 0 0 0 0 0 0 0
#[6,] 0 0 0 0 0 0 0
You could redo your 1's to make them row-wise or you can do the following:
Matrix=matrix(rep(0,42),nrow=6,ncol=7,byrow=TRUE)
v=c(1,7,11,16,18)
Matrix<-t(Matrix)
Matrix[v]=1
Matrix<-t(Matrix)
Matrix
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 1 0 0 0 0 0 1
[2,] 0 0 0 1 0 0 0
[3,] 0 1 0 1 0 0 0
[4,] 0 0 0 0 0 0 0
[5,] 0 0 0 0 0 0 0
[6,] 0 0 0 0 0 0 0
I have a question, I am trying to create a 10x10 matrix using the code below, where the first column contains 10 values from a normal distribution with std dev of .5 and a mean equal to j where j is a value 1:10. My code below produces the observed matrix, where only the final column is filled with values. What am I doing wrong? Thank you.
for(j in 1:10){
y<-matrix(0,ncol=10,nrow=10)
y[,j]<-rnorm(n=10,mean=j,sd=.5)
}
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 0 0 0 0 0 0 0 0 0 10.857520
[2,] 0 0 0 0 0 0 0 0 0 10.490549
[3,] 0 0 0 0 0 0 0 0 0 9.888620
[4,] 0 0 0 0 0 0 0 0 0 9.495205
[5,] 0 0 0 0 0 0 0 0 0 9.674356
[6,] 0 0 0 0 0 0 0 0 0 10.810197
[7,] 0 0 0 0 0 0 0 0 0 10.337517
[8,] 0 0 0 0 0 0 0 0 0 9.715229
[9,] 0 0 0 0 0 0 0 0 0 9.902603
[10,] 0 0 0 0 0 0 0 0 0 8.972656
I have a matrix(initialized to zeros) and a set of indices. If the i'th value in indices is j, then I want to set the (j,i)th entry of the matrix to 1.
For eg:
> m = matrix(0, 10, 7)
> indices
[1] 2 9 3 4 5 1 10
And the result should be
> result
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 0 0 0 0 0 1 0
[2,] 1 0 0 0 0 0 0
[3,] 0 0 1 0 0 0 0
[4,] 0 0 0 1 0 0 0
[5,] 0 0 0 0 1 0 0
[6,] 0 0 0 0 0 0 0
[7,] 0 0 0 0 0 0 0
[8,] 0 0 0 0 0 0 0
[9,] 0 1 0 0 0 0 0
[10,] 0 0 0 0 0 0 1
I asked a somewhat related question a little while back, which used a vector instead of a matrix. Is there a similar simple solution to this problem?
## OP's example data
m = matrix(0, 10, 7)
j <- c(2, 9, 3, 4, 5, 1, 10)
## Construct a two column matrix of indices (1st column w. rows & 2nd w. columns)
ij <- cbind(j, seq_along(j))
## Use it to subassign into the matrix
m[ij] <- 1
m
# [,1] [,2] [,3] [,4] [,5] [,6] [,7]
# [1,] 0 0 0 0 0 1 0
# [2,] 1 0 0 0 0 0 0
# [3,] 0 0 1 0 0 0 0
# [4,] 0 0 0 1 0 0 0
# [5,] 0 0 0 0 1 0 0
# [6,] 0 0 0 0 0 0 0
# [7,] 0 0 0 0 0 0 0
# [8,] 0 0 0 0 0 0 0
# [9,] 0 1 0 0 0 0 0
# [10,] 0 0 0 0 0 0 1
For the record, the answer in your linked question can easily be adapted to suit this scenario too by using sapply:
indices <- c(2, 9, 3, 4, 5, 1, 10)
sapply(indices, tabulate, nbins = 10)
# [,1] [,2] [,3] [,4] [,5] [,6] [,7]
# [1,] 0 0 0 0 0 1 0
# [2,] 1 0 0 0 0 0 0
# [3,] 0 0 1 0 0 0 0
# [4,] 0 0 0 1 0 0 0
# [5,] 0 0 0 0 1 0 0
# [6,] 0 0 0 0 0 0 0
# [7,] 0 0 0 0 0 0 0
# [8,] 0 0 0 0 0 0 0
# [9,] 0 1 0 0 0 0 0
# [10,] 0 0 0 0 0 0 1
For small datasets you might not notice the performance difference, but Josh's answer, which uses matrix indexing, would definitely be much faster, even if you changed my answer here to use vapply instead of sapply.
I'm trying to apply a function to a list using apply but I'm having trouble doing so. I'm trying to calculate the earth-movers distance using the emdist package. Every index in the list has two subindices. I want to calculate the earth-movers distance for these subindices iteratively (the real list has thousands of indices). The problem is Rstudio crashes each time I try to run the code on a test dataset. An example of the test dataset:
set.seed(42)
output1 <- list(list(matrix(0,8,11),matrix(0,8,11)), list(matrix(rnorm(80),8,10),matrix(rnorm(80),8,10)))
[[1]]
[[1]][[1]]
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
[1,] 0 0 0 0 0 0 0 0 0 0 0
[2,] 0 0 0 0 0 0 0 0 0 0 0
[3,] 0 0 0 0 0 0 0 0 0 0 0
[4,] 0 0 0 0 0 0 0 0 0 0 0
[5,] 0 0 0 0 0 0 0 0 0 0 0
[6,] 0 0 0 0 0 0 0 0 0 0 0
[7,] 0 0 0 0 0 0 0 0 0 0 0
[8,] 0 0 0 0 0 0 0 0 0 0 0
[[1]][[2]]
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
[1,] 0 0 0 0 0 0 0 0 0 0 0
[2,] 0 0 0 0 0 0 0 0 0 0 0
[3,] 0 0 0 0 0 0 0 0 0 0 0
[4,] 0 0 0 0 0 0 0 0 0 0 0
[5,] 0 0 0 0 0 0 0 0 0 0 0
[6,] 0 0 0 0 0 0 0 0 0 0 0
[7,] 0 0 0 0 0 0 0 0 0 0 0
[8,] 0 0 0 0 0 0 0 0 0 0 0
Now when I do this:
library(emdist)
sapply(output1,function(x) {emd2d(x[[seq_along(x)[1]]],x[[seq_along(x)[2]]]) })
Rstudio simply crashes. I have also tried:
mapply(emd2d,sapply(output1,`[`,1),sapply(output1,`[`,2))
But to no avail. Any ideas? I'm running this on a 2013 macbook air with 2gb of RAM.
this works fine:
> emd2d(output1[[2]][[1]],output1[[2]][[2]])
[1] -6.089909
this does not:
emd2d(output1[[1]][[1]],output1[[1]][[2]])
Seems emd2d() might hate it when you compare two all zero matrices...
At least for me on OSX as this succeeds for me:
set.seed(666)
output2 <- list(list(matrix(5,8,11),matrix(5,8,11)),
list(matrix(rnorm(80),8,10),matrix(rnorm(80),8,10)))
sapply(output2,function(x) {emd2d(x[[1]],x[[2]]) })
#[1] 0.000000 -7.995288
# not i removed your seq_along because I don't think you really want this..
as does this:
> set.seed(666)
> output2 <- list(list(matrix(0,8,11),matrix(5,8,11)), list(matrix(rnorm(80),8,10),matrix(rnorm(80),8,10)))
> sapply(output2,function(x) {emd2d(x[[1]],x[[2]]) })
[1] NaN -7.995288
Maybe you need to contact the package creator about this then, in the mean time you could create a function that checks if both matrices are all zeros, e.g.
foo <- function(z){ if( sum(length(z[[1]][ z[[1]] != 0]),
length(z[[2]][ z[[2]] != 0]) ) > 0){
emd2d(z[[1]],z[[2]])
}else{
0
}
}
# i use length and subsetting, not just sum(), in case somehow
# the two matrices sum to zero because you have minus values in them
> sapply(output1, foo)
[1] 0.000000 -6.089909