stochastic matrix stationary distribution - r

I have a large right stochastic matrix(row sums to 1).size~20000x20000. How can I find the stationary distribution of it?
I tried to calculate the eigenvalues and vectors, and get complex eigenvalues, eg.1+0i(more than one).
And try to use the following method:
pi=u[I-P+U]^-1
while when I do the inversion with solve() I got the error message Error in solve.default(A):system is computationally singular: reciprocal condition number = 3.16663e-19
As far as I understand, the Perron–Frobenius theorem ensures that every stochastic matrix as a stationary probability vector pi that the largest absolute value of an eigenvalue is always 1, so pi=piP,and my matrix has all positive entries,I can get a uniq pi,am I correct?
Or if there any method I can calculate the row vector pi?

Every stochastic matrix indeed has a stationary distribution. Since P has all row sums = 1,
(P-I) has row sums = 0 => (P-I)*(1, ...., 1) always gives you zero. So rank(P-I) <= n-1, and so is rank of transpose to P-I. Hence, there exists q such that (t(P)-I)*q = 0 => t(P)q = q.
Complex value 1+0i seems to be quite real for me. But if you get only complex values, i.e. coefficient before i is not 0, then the algorithm produces an error somewhere -- it solves the problem numerically and does not have to be true all the time. Also it does not matter how many eigenvalues and vectors it produces, what matters is that it finds the right eigenvector for eigenvalue 1 and that's what you need.
Make sure that your stationary distribution is indeed your limit distribution, otherwise there is no point in computing it. You could try to find it out by multiplying different vectors with your matrix^1000, but I don't know how much time it will take in your case.
Last but not least, here is an example:
# first we need a function that calculates matrix^n
mpot = function (A, p) {
# calculates A^p (matrix multiplied p times with itself)
# inputes: A - real-valued square matrix, p - natural number.
# output: A^p
B = A
if (p>1)
for (i in 2:p)
B = B%*%A
return (B)
}
# example matrix
P = matrix( nrow = 3, ncol = 3, byrow = T,
data = c(
0.1, 0.9, 0,
0.4, 0, 0.6,
0, 1, 0
)
)
# this converges to stationary distribution independent of start distribution
t(mpot(P,1000)) %*% c(1/3, 1/3, 1/3)
t(mpot(P,1000)) %*% c(1, 0, 0)
# is it stationary?
xx = t(mpot(P,1000)) %*% c(1, 0, 0)
t(P) %*% xx
# find stationary distribution using eigenvalues
eigen(t(P)) # here it is!
eigen_vect = eigen(t(P))$vectors[,1]
stat_dist = eigen_vect/sum(eigen_vect) # as there is subspace of them,
# but we need the one with sum = 1
stat_dist

Related

R - Standardize matrix to have unit diagonals

I am seeking to generate the below matrix:
Θ = B + δIp ∈ Rp×p, where Ip is the identity matrix, each off-diagonal entry
in B (symmetric matrix) is generated independently and equals 0.5 with probability
0.1 or 0 with probability 0.9. The parameter δ > 0 is chosen such that Θ is positive definite. The matrix is standardized to have unit diagonals (transforming from covariance matrix to correlation matrix).
I think that I have most of the code, but I'm unsure of how to standardize the matrix to have unit diagonals syntactically in R (and theoretically, why that is a useful feature of a matrix).
# set number of cols/rows
p <- 5
set.seed(123)
# generate matrix B with values of 0.5 given probabilities
B <- matrix(sample(c(0,0.5), p^2, replace=TRUE, prob=c(0.9,0.1)), p)
# call the matrix lower triangle, need a symmetric matrix
i <- lower.tri(B)
B[i] <- t(B)[i]
diag(B) <- rep(0, p)
# finding parameter delta, such that Θ is positive definite.
(delta <- -min(eigen(B, symmetric=TRUE, only.values=TRUE)$values))
# set theta (delta is 2.8802)
theta <- B + 2.89*(diag(p))
# now to standardize the matrix to have unit diagonals ?
There are many ways to do this, but the following is very fast in timing experiments:
v <- 1/sqrt(diag(theta))
B <- theta * outer(v, v)
This divides all rows and columns by their standard deviations, which are the square roots of the diagonal elements.
It will fail whenever any diagonal is zero (or negative): but in that case such standardization isn't possible. Computing the square roots and their reciprocals first allows you to learn as soon as possible--with minimal computation--whether the procedure will succeed.
BTW, a direct way to compute B in the first part of your code (which has a zero diagonal) is
B <- as.matrix(structure(sample(c(0,1/2), p*(p-1)/2, replace=TRUE, prob=c(.9,.1),
Size=p, Diag=TRUE, class="dist"))
This eliminates the superfluous sampling.

How do I minimize a linear least squares function in R?

I'm reading Deep Learning by Goodfellow et al. and am trying to implement gradient descent as shown in Section 4.5 Example: Linear Least Squares. This is page 92 in the hard copy of the book.
The algorithm can be viewed in detail at https://www.deeplearningbook.org/contents/numerical.html with R implementation of linear least squares on page 94.
I've tried implementing in R, and the algorithm as implemented converges on a vector, but this vector does not seem to minimize the least squares function as required. Adding epsilon to the vector in question frequently produces a "minimum" less than the minimum outputted by my program.
options(digits = 15)
dim_square = 2 ### set dimension of square matrix
# Generate random vector, random matrix, and
set.seed(1234)
A = matrix(nrow = dim_square, ncol = dim_square, byrow = T, rlnorm(dim_square ^ 2)/10)
b = rep(rnorm(1), dim_square)
# having fixed A & B, select X randomly
x = rnorm(dim_square) # vector length of dim_square--supposed to be arbitrary
f = function(x, A, b){
total_vector = A %*% x + b # this is the function that we want to minimize
total = 0.5 * sum(abs(total_vector) ^ 2) # L2 norm squared
return(total)
}
f(x,A,b)
# how close do we want to get?
epsilon = 0.1
delta = 0.01
value = (t(A) %*% A) %*% x - t(A) %*% b
L2_norm = (sum(abs(value) ^ 2)) ^ 0.5
steps = vector()
while(L2_norm > delta){
x = x - epsilon * value
value = (t(A) %*% A) %*% x - t(A) %*% b
L2_norm = (sum(abs(value) ^ 2)) ^ 0.5
print(L2_norm)
}
minimum = f(x, A, b)
minimum
minimum_minus = f(x - 0.5*epsilon, A, b)
minimum_minus # less than the minimum found by gradient descent! Why?
On page 94 of the pdf appearing at https://www.deeplearningbook.org/contents/numerical.html
I am trying to find the values of the vector x such that f(x) is minimized. However, as demonstrated by the minimum in my code, and minimum_minus, minimum is not the actual minimum, as it exceeds minimum minus.
Any idea what the problem might be?
Original Problem
Finding the value of x such that the quantity Ax - b is minimized is equivalent to finding the value of x such that Ax - b = 0, or x = (A^-1)*b. This is because the L2 norm is the euclidean norm, more commonly known as the distance formula. By definition, distance cannot be negative, making its minimum identically zero.
This algorithm, as implemented, actually comes quite close to estimating x. However, because of recursive subtraction and rounding one quickly runs into the problem of underflow, resulting in massive oscillation, below:
Value of L2 Norm as a function of step size
Above algorithm vs. solve function in R
Above we have the results of A %% x followed by A %% min_x, with x estimated by the implemented algorithm and min_x estimated by the solve function in R.
The problem of underflow, well known to those familiar with numerical analysis, is probably best tackled by the programmers of lower-level libraries best equipped to tackle it.
To summarize, the algorithm appears to work as implemented. Important to note, however, is that not every function will have a minimum (think of a straight line), and also be aware that this algorithm should only be able to find a local, as opposed to a global minimum.

R: what is the vector of quantiles in density function dvmnorm

library(mvtnorm)
dmvnorm(x, mean = rep(0, p), sigma = diag(p), log = FALSE)
The dmvnorm provides the density function for a multivariate normal distribution. What exactly does the first parameter, x represent? The documentation says "vector or matrix of quantiles. If x is a matrix, each row is taken to be a quantile."
> dmvnorm(x=c(0,0), mean=c(1,1))
[1] 0.0585
Here is the sample code on the help page. In that case are you generating the probability of having quantile 0 at a normal distribution with mean 1 and sd 1 (assuming that's the default). Since this is a multivariate normal density function, and a vector of quantiles (0, 0) was passed in, why isn't the output a vector of probabilities?
Just taking bivariate normal (X1, X2) as an example, by passing in x = (0, 0), you get P(X1 = 0, X2 = 0) which is a single value. Why do you expect to get a vector?
If you want a vector, you need to pass in a matrix. For example, x = cbind(c(0,1), c(0,1)) gives
P(X1 = 0, X2 = 0)
P(X1 = 1, X2 = 1)
In this situation, each row of the matrix is processed in parallel.

Constrained optimization of a vector

I have a (non-symmetric) probability matrix, and an observed vector of integer outcomes. I would like to find a vector that maximises the probability of the outcomes, given the transition matrix. Simply, I am trying to estimate a distribution of particles at sea given their ultimate distribution on land, and a matrix of probabilities of a particle released from a given point in the ocean ending up at a given point on the land.
The vector that I want to find is subject to the constraint that all components must be between 0-1, and the sum of the components must equal 1. I am trying to figure out the best optimisation approach for the problem.
My transition matrix and data set are quite large, but I have created a smaller one here:
I used a simulated known at- sea distribution of
msim<-c(.3,.2,.1,.3,.1,0) and a simulated probability matrix (t) to come up with an estimated coastal matrix (Datasim2), as follows:
t<-matrix (c(0,.1,.1,.1,.1,.2,0,.1,0,0,.3,0,0,0,0,.4,.1,.3,0,.1,0,.1,.4,0,0,0,.1,0,.1,.1),
nrow=5,ncol=6, byrow=T)
rownames(t)<-c("C1","C2","C3","C4","C5") ### locations on land
colnames(t)<-c("S1","S2","S3","S4","S5","S6") ### locations at sea
Datasim<-as.numeric (round((t %*% msim)*500))
Datasim2<-c(rep("C1",95), rep("C2",35), rep("C3",90),rep("C4",15),rep("C5",30))
M <-c(0.1,0.1,0.1,0.1,0.1,0.1) ## starting M
I started with a straightforward function as follows:
EstimateSource3<-function(M,Data,T){
EstEndProbsall<-M%*%T
TotalLkhd<-rep(NA, times=dim(Data)[1])
for (j in 1:dim(Data)[1]){
ObsEstEndLkhd<-0
ObsEstEndLkhd<-1-EstEndProbsall[1,] ## likelihood of particle NOT ending up at locations other than the location of interest
IndexC<-which(colnames(EstEndProbsall)==Data$LocationCode[j], arr.ind=T) ## likelihood of ending up at location of interest
ObsEstEndLkhd[IndexC]<-EstEndProbsall[IndexC]
#Total likelihood
TotalLkhd[j]<-sum(log(ObsEstEndLkhd))
}
SumTotalLkhd<-sum(TotalLkhd)
return(SumTotalLkhd)
}
DistributionEstimate <- optim(par = M, fn = EstimateSource3, Data = Datasim2, T=t,
control = list(fnscale = -1, trace=5, maxit=500), lower = 0, upper = 1)
To constrain the sum to 1, I tried using a few of the suggestions posted here:How to set parameters' sum to 1 in constrained optimization
e.g. adding M<-M/sum(M) or SumTotalLkhd<-SumTotalLkhd-(10*pwr) to the body of the function, but neither yielded anything like msim, and in fact, the 2nd solution came up with the error “L-BFGS-B needs finite values of 'fn'”
I thought perhaps the quadprog package might be of some help, but I don’t think I have a symmetric positive definite matrix…
Thanks in advance for your help!
What about that: Let D = distribution at land, M = at sea, T the transition matrix. You know D, T, you want to calculate M. You have
D' = M' T
hence D' T' = M' (T T')
and accordingly D'T'(T T')^(-1) = M'
Basically you solve it as when doing linear regression (seems SO does not support math notation: ' is transpose, ^(-1) is ordinary matrix inverse.)
Alternatively, D may be counts of particles, and now you can ask questions like: what is the most likely distribution of particles at sea. That needs a different approach though.
Well, I have never done such models but think along the following lines. Let M be of length 3 and D of length 2, and T is hence 3x2. We know T and we observe D_1 particles at location 1 and D_2 particles at location 2.
What is the likelihood that you observe one particle at location D_1? It is Pr(D = 1) = M_1 T_11 + M_2 T_21 + M_3 T_32. Analogously, Pr(D = 2) = M_1 T_12 + M_2 T_22 + M_3 T_32. Now you can easily write the log-likelihood of observing D_1 and D_2 particles at locations 1 and 2. The code might look like this:
loglik <- function(M) {
if(M[1] < 0 | M[1] > 1)
return(NA)
if(M[2] < 0 | M[2] > 1)
return(NA)
M3 <- 1 - M[1] - M[2]
if(M3 < 0 | M3 > 1)
return(NA)
D[1]*log(T[1,1]*M[1] + T[2,1]*M[2] + T[3,1]*M3) +
D[2]*log(T[1,2]*M[1] + T[2,2]*M[2] + T[3,2]*M3)
}
T <- matrix(c(0.1,0.2,0.3,0.9,0.8,0.7), 3, 2)
D <- c(100,200)
library(maxLik)
m <- maxLik(loglik, start=c(0.4,0.4), method="BFGS")
summary(m)
I get the answer (0, 0.2, 0.8) when I estimate it but standard errors are very large.
As I told, I have never done it so I don't know it it makes sense.

Square Root of a Singular Matrix in R

I need to compute the matrix A on the power of -1/2, which basically means the square root of the initial matrix's inverse.
If A is singular then the Moore-Penrose generalized inverse is computed with the ginv function from the MASS package, otherwise the regular inverse is computed using the solve function.
Matrix A is defined below:
A <- structure(c(604135780529.807, 0, 58508487574887.2, 67671936726183.9,
0, 0, 0, 1, 0, 0, 0, 0, 58508487574887.2, 0, 10663900590720128,
10874631465443760, 0, 0, 67671936726183.9, 0, 10874631465443760,
11315986615387788, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1), .Dim = c(6L,
6L))
I check singularity with the comparison of the rank and the dimension.
rankMatrix(A) == nrow(A)
The above code returns FALSE, So I have to use ginv to get the inverse. The inverse of A is as follows:
A_inv <- ginv(A)
The square-root of the inverse matrix is computed with the sqrtm function from the expm package.
library(expm)
sqrtm(A_inv)
The function returns the following error:
Error in solve.default(X[ii, ii] + X[ij, ij], S[ii, ij] - sumU) :
Lapack routine zgesv: system is exactly singular
So how can we compute the square root in this case? Please note that matrix A is not always singular so we have to provide a general solution for the problem.
Your question relates to two distinct problems:
Inverse of a matrix
Square root of a matrix
Inverse
The inverse does not exist for singular matrices. In some applications, the Moore-Penrose or some other generalised inverse may be taken as a suitable substitute for the inverse. However, note that computer numerics will incur rounding errors in most cases; and these errors may make a singular matrix appear regular to the computer or vice versa.
If A always exhibits the the block structure of the matrix you give, I suggest to consider only its non-diagonal block
A3 = A[ c( 1, 3, 4 ), c( 1, 3, 4 ) ]
A3
[,1] [,2] [,3]
[1,] 6.041358e+11 5.850849e+13 6.767194e+13
[2,] 5.850849e+13 1.066390e+16 1.087463e+16
[3,] 6.767194e+13 1.087463e+16 1.131599e+16
instead of all of A for better efficiency and less rounding issues. The remaining 1-diagonal entries would remain 1 in the inverse of the square root, so no need to clutter the calculation with them. To get an impression of the impact of this simplification, note that R can calculate
A3inv = solve(A3)
while it could not calculate
Ainv = solve(A)
But we will not need A3inverse, as will become evident below.
Square root
As a general rule, the square root of a matrix A will only exist if the matrix has a diagonal Jordan normal form (https://en.wikipedia.org/wiki/Jordan_normal_form). Hence, there is no truly general solution of the problem as you require.
Fortunately, like “most” (real or complex) matrices are invertible, “most” (real or complex) matrices have a diagonal complex Jordan normal form. In this case, the Jordan normal form
A3 = T·J·T⁻¹
can be calculated in R as such:
X = eigen(A3)
T = X$vectors
J = Diagonal( x=X$values )
To test this recipe, compare
Tinv = solve(T)
T %*% J %*% Tinv
with A3. They should match (up to rounding errors) if A3 has a diagonal Jordan normal form.
Since J is diagonal, its squareroot is simply the diagonal matrix of the square roots
Jsqrt = Diagonal( x=sqrt( X$values ) )
so that Jsqrt·Jsqrt = J. Moreover, this implies
(T·Jsqrt·T⁻¹)² = T·Jsqrt·T⁻¹·T·Jsqrt·T⁻¹ = T·Jsqrt·Jsqrt·T⁻¹ = T·J·T⁻¹ = A3
so that in fact we obtain
√A3 = T·Jsqrt·T⁻¹
or in R code
A3sqrt = T %*% Jsqrt %*% Tinv
To test this, calculate
A3sqrt %*% A3sqrt
and compare with A3.
Square root of the inverse
The square root of the inverse (or, equally, the inverse of the sqare root) can be calculated easily once a diagonal Jordan normal form has been calculated. Instead of J use
Jinvsqrt = Diagonal( x=1/sqrt( X$values ) )
and calculate, analogously to above,
A3invsqrt = T %*% Jinvsqrt %*% Tinv
and observe
A3·A3invsqrt² = … = T·(J/√J/√J)·T⁻¹ = 1
the unit matrix so that A3invsqrt is the desired result.
In case A3 is not invertible, a generalised inverse (not necessarily the Moore-Penrose one) can be calculated by replacing all undefined entries in Jinvsqrt by 0, but as I said above, this should be done with suitable care in the light of the overall application and its stability against rounding errors.
In case A3 does not have a diagonal Jordan normal form, there is no square root, so the above formulas will yield some other result. In order not to run into this case at times of bad luck, best implement a test whether
A3invsqrt %*% A3 %*% A3invsqrt
is close enough to what you would consider a 1 matrix (this only applies if A3 was invertible in the first place).
PS: Note that you can prefix a sign ± for each diagonal entry of Jinvsqrt to your liking.

Resources