Converting a matrix into a markov transition matrix in R - r

I have a matrix mat with values between 0 and 1 (so can be probabilities) as follows:
> t <- c(22, 65, 37, 84, 36, 14, 9, 19, 5, 49)
> x <- t/max(t)
> mat <- x%*%t(x)
I now want to convert this matrix b into a Markov transition matrix, i.e. have the elements of each row add up to 1. I achieve this by dividing the matrix by rowSums:
> y <- mat/rowSums(mat)
> z <- y/rowSums(y)
> rowSums(z)
[1] 1 1 1 1 1 1 1 1 1 1
However, this causes the elements in each column to have the same value:
[,1] [,2] [,3] [,4] [,5]
[1,] 0.06470588 0.1911765 0.1088235 0.2470588 0.1058824
[2,] 0.06470588 0.1911765 0.1088235 0.2470588 0.1058824
[3,] 0.06470588 0.1911765 0.1088235 0.2470588 0.1058824
[4,] 0.06470588 0.1911765 0.1088235 0.2470588 0.1058824
This is not what I want. I require each element to have different values but I'm not sure how to do this. Any suggestions appreciated!

Why you are first making matrix y and then z?
Your idea of dividing by rowSums is right, but I think the problem is in your original matrix, as this works:
mat<-matrix(runif(100),10,10)
mat2<-mat/rowSums(mat)
rowSums(mat2)
[1] 1 1 1 1 1 1 1 1 1 1
edit: Line x <- t/max(t) seems to cause the unwanted behaviour, i.e. you shouldn't divide your vector t with the maximum as that makes your resulting matrix singular.

You cant have a simetric matrix in mat. Try this:
n <- 10
mat <- matrix(runif(n**2),n)
mat <- mat/rowSums(mat)

You could fill the matrix with runif's and then iterating the Sinkhorn-Knopf algorithm (described on e.g., the top of page 3 of http://www.cerfacs.fr/algor/reports/2006/TR_PA_06_42.ps.gz) will converge to a doubly stochastic (Markov) matrix.

Related

Matrix multiplication (row by row)

I have a matrix of dimension 1000x100. I want to make an inner product (of each row with itselg) row by row, so in theory I could get a vector of 1000x1. For example:
A<-matrix(c(1,2,3,4),nrow=2,ncol=2,byrow=2)
[,1] [,2]
[1,] 1 2
[2,] 3 4
I want to get a vector that looks like this:
[,1]
[1,] c(1,2) %*% t(c(1,2))
[2,] c(3,4) %*% t(c(3,4))
I tried doing a loop, but an error occurs:
U<-matrix(nrow=1000,ncol=1)
U
k=0
for(i in 1:nrow(U_hat)){
for(j in 1:nrow(U_hat)){
k=k+1
U[k,1]=U_hat[i,]%*%t(U_hat[j,])
}
}
where U_hat is the matrix of dimension 1000x100.
I would appreciate the help to know how to do this multiplication. Thank you.
Multiply A by itself and take the row sums:
rowSums(A*A)
## [1] 5 25
This would also work:
apply(A, 1, crossprod)
## [1] 5 25
This would work too:
diag(tcrossprod(A))
## [1] 5 25

is it possible to have a matrix of matrices in R?

is it possible to have a matrix of matrices in R? if yes, how should I define such matrix?
for example to have a 10 x 10 matrix, and each element of this matrix contains a matrix itself.
1) list/matrix Yes, create a list and give it dimensions using matrix:
m <- matrix(1:4, 2)
M <- matrix(list(m, 2*m, 3*m, 4*m), 2)
so element 1,1 of M is m:
> M[[1,1]]
[,1] [,2]
[1,] 1 3
[2,] 2 4
2) list/dim<- This also works:
M <- list(m, 2*m, 3*m, 4*m)
dim(M) <- c(2, 2)
3) array This is not quite what you asked for but depending on your purpose it might satisfy your need:
A <- array(c(m, 2*m, 3*m, 4*m), c(2, 2, 2, 2)) # 2x2x2x2 array
so element 1,1 is:
> A[1,1,,]
[,1] [,2]
[1,] 1 3
[2,] 2 4

How to generate a matrices A) each row has a single value of one; B) rows sum to one

This is a two-part problem: the first is to create an NXN square matrix for which only one random element in each row is 1, the other items must be zero. (i.e. the sum of elements in each row is 1).
The second is to create an NXN square matrix for which the sum of items in each row is 1, but each element follows a distribution e.g. normal distribution.
Related questions include (Create a matrix with conditional sum in each row -R)
Matlab seems to do what I want automatically (Why this thing happens with random matrix such that all rows sum up to 1?), but I am looking for a solution in r.
Here is what I tried:
# PART 1
N <- 50
x <- matrix(0,N,N)
lapply(1:N, function(y){
x[y,sample(N,1)]<- 1
})
(I get zeroes still)
# PART 2
N <- 50
x <- matrix(0,N,N)
lapply(1:N, function(y){
x[y,]<- rnorm(N)
})
(It needs scaling)
Here's another loop-less solution that uses the two column addressing facility using the "[<-" function. This creates a two-column index matrix whose first column is simply an ascending series that assigns the row locations, and whose second column (the one responsible for picking the column positions) is a random integer value. (It's a vectorized version of Matthew's "easiest method", and I suspect would be faster since there is only one call to sample.):
M <- matrix(0,N,N)
M[ cbind(1:N, sample(1:N, N, rep=TRUE))] <- 1
> rowSums(M)
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
If you didn't specify rep=TRUE, then colSums(M) would have all been ones as well, but that was not what you requested. It does mean the rank of your resultant matrix may be less than N. If you left out the rep=TRUE the matrix would be full rank.
Here you see why lapply doesn't always replace a loop. You're trying to iterate through the rows of x and modify the matrix, but what you're modifying is a copy of the x from the global environment.
The easiest fix is to use a for loop:
for (y in 1:N) {
x[y,sample(N,1)]<- 1
}
apply series should be used for the return value, rather than programming functions with side-effects.
A way to do this is to return the rows, then rbind them into a matrix. The second example is shown here, as this more closely resembles an apply:
do.call(rbind, lapply((1:N), function(i) rnorm(N)))
However, this is more readable:
matrix(rnorm(N*N), N, N)
Now to scale this to have row sums equal to 1. You use the fact that a matrix is column-oriented and that vectors are recycled, meaning that you can divide a matrix M by rowSums(M). Using a more reasonable N=5:
m <- matrix(rnorm(N*N), N, N)
m/rowSums(m)
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0.1788692 0.5398464 0.24980924 -0.01282655 0.04430168
## [2,] 0.4176512 0.2564463 0.11553143 0.35432975 -0.14395871
## [3,] 0.3480568 0.7634421 -0.38433940 0.34175983 -0.06891932
## [4,] 1.1807180 -0.0192272 0.16500179 -0.31201400 -0.01447859
## [5,] 1.1601173 -0.1279919 -0.07447043 0.20865963 -0.16631458
No-loop solution :)
n <- 5
# on which column in each row insert 1s
s <- sample(n,n,TRUE)
# indexes for each row
w <- seq(1,n*n,by=n)-1
index <- s+w
# vector of 0s
vec <- integer(n*n)
# put 1s
vec[index] <- 1
# voila :)
matrix(vec,n,byrow = T)
[,1] [,2] [,3] [,4] [,5]
[1,] 1 0 0 0 0
[2,] 0 0 0 1 0
[3,] 0 0 0 0 1
[4,] 1 0 0 0 0
[5,] 1 0 0 0 0

R get highest x cells and their rownames/colnames in a symmetric matrix?

I have a symmetric matrix mat:
A B C
A 1 . .
B . 1 .
C . . 1
And I want to calculate the two highest elements of it. Now since it's a symmetric matrix I thought of using upper.tri like so:
mat.tri<-upper.tri(mat) # convert to upper tri
mat.ord<-order(mat.tri,na.last=TRUE,decreasing=TRUE)[1:2] # order by largest
a.ind<-which(mat%in%mat.tri[mat.ord]) # get absolute indices
r.ind<-arrayInd(a.ind,dim(mat)) # get relative indices
# get row/colnames using these indices
So the above is such a roundabout way of doing things, and even then the output has 'duplicate' rows in that they are just transposed..
Anyone got a more intuitive way of doing this?
Thanks.
Liberally borrowing from the excellent ideas of #SimonO'Hanlon and #lukeA, you can construct a two-liner function to do what you want. I use:
arrayInd() to return the array index
order() to order the upper triangular elements
and the additional trick of setting the lower triangular matrix to NA, using m[lower.tr(m)] <- NA
Try this:
whichArrayMax <- function(m, n=2){
m[lower.tri(m)] <- NA
arrayInd(order(m, decreasing=TRUE)[seq(n)], .dim=dim(m))
}
mat <- matrix( c(1,2,3,2,1,5,3,5,1) , 3 , byrow = TRUE )
mat
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 2 1 5
[3,] 3 5 1
whichArrayMax(mat, 2)
[,1] [,2]
[1,] 2 3
[2,] 1 3
arrayInd(which.max(mat), .dim=dim(mat))
which is basically the same as which( mat == max(mat) , arr.ind = TRUE )[1,] from #SimonO'Hanlon, but more efficient.

Get rank of matrix entries?

Assume a matrix:
> a <- matrix(c(100, 90, 80, 20), 2, 2)
> a
[,1] [,2]
[1,] 100 80
[2,] 90 20
Suppose I want to convert the elements of the matrix to ranks:
>rank.a <- rank(a)
> rank.a
[1] 4 3 2 1
This returns a vector, i.e. the matrix structure is lost. Is it possible to rank a matrix such that the output will be of the form:
[,1] [,2]
[1,] 4 2
[2,] 3 1
An alternative to #EDi's Answer is to copy a and then assign the output of rank(a) directly into the elements of the copy of a:
> a <- matrix(c(100, 90, 80, 20), 2, 2)
> rank.a <- a
> rank.a[] <- rank(a)
> rank.a
[,1] [,2]
[1,] 4 2
[2,] 3 1
That saves you from rebuilding a matrix by interrogating the dimensions of the input matrix.
Note that (as #Andrie mentions in the comments) the copying of a is only required if one wants to keep the original a. The main point to note is that because a is already of the appropriate dimensions, we can treat it like a vector and replace the contents of a with the vector of ranks of a.
why not convert the vector back to a matrix, with the dimensions of the original matrix?
> a <- matrix(c(100, 90, 80, 20, 10, 5), 2, 3)
> a
[,1] [,2] [,3]
[1,] 100 80 10
[2,] 90 20 5
> rank(a)
[1] 6 5 4 3 2 1
> rmat <- matrix(rank(a), nrow = dim(a)[1], ncol = dim(a)[2])
> rmat
[,1] [,2] [,3]
[1,] 6 4 2
[2,] 5 3 1
#Gavin Simpson has a very nice and elegant solution! But there is one caveat though:
The type of the matrix will stay the same or be widened. Mostly you wouldn't notice, but consider the following:
a <- matrix( sample(letters, 4), 2, 2)
rank.a <- a
rank.a[] <- rank(a)
typeof(rank.a) # character
Since the matrix was character to start with, the rank values (which are doubles) got coerced into character strings!
Here's a safer way that simply copies all the attributes:
a <- matrix( sample(letters, 4), 2, 2)
rank.a <- rank(a)
attributes(rank.a) <- attributes(a)
typeof(rank.a) # double
Or, as a one-liner using structure to copy only the relevant attributes (but more typing):
a <- matrix( sample(letters, 4), 2, 2)
rank.a <- structure(rank(a), dim=dim(a), dimnames=dimnames(a))
Of course, dimnames could be left out in this particular case.

Resources