I need to compute the matrix A on the power of -1/2, which basically means the square root of the initial matrix's inverse.
If A is singular then the Moore-Penrose generalized inverse is computed with the ginv function from the MASS package, otherwise the regular inverse is computed using the solve function.
Matrix A is defined below:
A <- structure(c(604135780529.807, 0, 58508487574887.2, 67671936726183.9,
0, 0, 0, 1, 0, 0, 0, 0, 58508487574887.2, 0, 10663900590720128,
10874631465443760, 0, 0, 67671936726183.9, 0, 10874631465443760,
11315986615387788, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1), .Dim = c(6L,
6L))
I check singularity with the comparison of the rank and the dimension.
rankMatrix(A) == nrow(A)
The above code returns FALSE, So I have to use ginv to get the inverse. The inverse of A is as follows:
A_inv <- ginv(A)
The square-root of the inverse matrix is computed with the sqrtm function from the expm package.
library(expm)
sqrtm(A_inv)
The function returns the following error:
Error in solve.default(X[ii, ii] + X[ij, ij], S[ii, ij] - sumU) :
Lapack routine zgesv: system is exactly singular
So how can we compute the square root in this case? Please note that matrix A is not always singular so we have to provide a general solution for the problem.
Your question relates to two distinct problems:
Inverse of a matrix
Square root of a matrix
Inverse
The inverse does not exist for singular matrices. In some applications, the Moore-Penrose or some other generalised inverse may be taken as a suitable substitute for the inverse. However, note that computer numerics will incur rounding errors in most cases; and these errors may make a singular matrix appear regular to the computer or vice versa.
If A always exhibits the the block structure of the matrix you give, I suggest to consider only its non-diagonal block
A3 = A[ c( 1, 3, 4 ), c( 1, 3, 4 ) ]
A3
[,1] [,2] [,3]
[1,] 6.041358e+11 5.850849e+13 6.767194e+13
[2,] 5.850849e+13 1.066390e+16 1.087463e+16
[3,] 6.767194e+13 1.087463e+16 1.131599e+16
instead of all of A for better efficiency and less rounding issues. The remaining 1-diagonal entries would remain 1 in the inverse of the square root, so no need to clutter the calculation with them. To get an impression of the impact of this simplification, note that R can calculate
A3inv = solve(A3)
while it could not calculate
Ainv = solve(A)
But we will not need A3inverse, as will become evident below.
Square root
As a general rule, the square root of a matrix A will only exist if the matrix has a diagonal Jordan normal form (https://en.wikipedia.org/wiki/Jordan_normal_form). Hence, there is no truly general solution of the problem as you require.
Fortunately, like “most” (real or complex) matrices are invertible, “most” (real or complex) matrices have a diagonal complex Jordan normal form. In this case, the Jordan normal form
A3 = T·J·T⁻¹
can be calculated in R as such:
X = eigen(A3)
T = X$vectors
J = Diagonal( x=X$values )
To test this recipe, compare
Tinv = solve(T)
T %*% J %*% Tinv
with A3. They should match (up to rounding errors) if A3 has a diagonal Jordan normal form.
Since J is diagonal, its squareroot is simply the diagonal matrix of the square roots
Jsqrt = Diagonal( x=sqrt( X$values ) )
so that Jsqrt·Jsqrt = J. Moreover, this implies
(T·Jsqrt·T⁻¹)² = T·Jsqrt·T⁻¹·T·Jsqrt·T⁻¹ = T·Jsqrt·Jsqrt·T⁻¹ = T·J·T⁻¹ = A3
so that in fact we obtain
√A3 = T·Jsqrt·T⁻¹
or in R code
A3sqrt = T %*% Jsqrt %*% Tinv
To test this, calculate
A3sqrt %*% A3sqrt
and compare with A3.
Square root of the inverse
The square root of the inverse (or, equally, the inverse of the sqare root) can be calculated easily once a diagonal Jordan normal form has been calculated. Instead of J use
Jinvsqrt = Diagonal( x=1/sqrt( X$values ) )
and calculate, analogously to above,
A3invsqrt = T %*% Jinvsqrt %*% Tinv
and observe
A3·A3invsqrt² = … = T·(J/√J/√J)·T⁻¹ = 1
the unit matrix so that A3invsqrt is the desired result.
In case A3 is not invertible, a generalised inverse (not necessarily the Moore-Penrose one) can be calculated by replacing all undefined entries in Jinvsqrt by 0, but as I said above, this should be done with suitable care in the light of the overall application and its stability against rounding errors.
In case A3 does not have a diagonal Jordan normal form, there is no square root, so the above formulas will yield some other result. In order not to run into this case at times of bad luck, best implement a test whether
A3invsqrt %*% A3 %*% A3invsqrt
is close enough to what you would consider a 1 matrix (this only applies if A3 was invertible in the first place).
PS: Note that you can prefix a sign ± for each diagonal entry of Jinvsqrt to your liking.
Related
I want to solve the following simultaneous equations using R:
# -0.9x+0.01y+0.001z+0.001n=0
# 0.9x-0.82y+0.027z+0.027n=0
# 0x+0.81y+(0.243-1)z+0.243n=0
# 0x+0y+0.729z+(0.729-1)n=0
# x+y+z+n=1
I tried:
A <- matrix(data=c(-0.9,0.01,0.001,0.001,0.9,-0.82,0.027,0.027,0,0.81,0.243-1,0.243,0,0,0.729,0.729-1,1,1,1,1), nrow=5, ncol=4, byrow=TRUE)
b <- matrix(data=c(0, 0, 0, 0, 1), nrow=5, ncol=1, byrow=FALSE)
round(solve(A, b), 4)
and it caught error:
Error in solve.default(A, b) : 'a' (5 x 4) must be square
What does the error mean? Is this function only applicable to square matrix?
Edit:
I removed the last equation and tried:
A <- matrix(data=c(-0.9,0.01,0.001,0.001,0.9,-0.82,0.027,0.027,0,0.81,0.243-1,0.243,0,0,0.729,0.729-1), nrow=4, ncol=4, byrow=TRUE)
b <- matrix(data=c(0, 0, 0, 0), nrow=4, ncol=1, byrow=FALSE)
A;b;
round(solve(A, b), 4)
which caught error:
Error in solve.default(A, b) : Lapack routine dgesv: system is exactly singular: U[4,4] = 0
Since you have 5 linear equations (rows of your matrix) and only 4 columns (variables), then your problem is overdetermined and in general it can't be solved. Consider a smaller example: x+y=1; x-2*y=2; 3*x-5=0. Once you use the first two equations:
(1) x+y=1 → y=1-x
(2) x-2*(1-x)=2 → 3*x-2=2 → x=4/3 ## substitute (1) into second eq.
(3) → y=-1/3
you have a solution for x and y (x=4/3, y=-1/3), so the last equation is either redundant or makes your solution impossible (in this case, the latter: 3*x-5 = -1 != 5).
On the other hand, if you have fewer rows than columns, then your system is underdetermined and you can't get a unique answer.
So the number of matrix rows must equal the number of columns → you can only do this with a square matrix. This is not a limitation of the R function, it's about the underlying math of linear equations ...
In your specific case the problem is that the sum of the first four rows of your matrix add up to zero, so these are redundant. As you found out, just dropping the last equation doesn't help. But if you drop one of the redundant rows you can get an answer.
solve(A[-4,],b[-4])
Note you get the same answer whether you drop row 1, 2, 3, or 4.
Is there a way of extracting the functional form of the likelihood function that gets formed by the msm function in R?
How can I extract the likelihood function that gets formed in the example below? I want to try and implement my own version of the quasi-Newton maximisation algorithm to improve my understanding.
library(msm)
# look at transition counts
statetable.msm(state, PTNUM, data = cav)
# define transition intensity matrix
# 1's mean a transition can occur
# 0's mean a transition should not occur
# any number can be placed on the diagonal as R overwrites the diagonals
# prior to maximising
q <- rbind(
c(0, 1, 0, 1),
c(1, 0, 1, 1),
c(0, 1, 0, 1),
c(0, 0, 0, 0)
)
# fit msm to the data
# the fnscale rescales the likelihood to prevent overflow
msm.fit <- msm(state ~ years, PTNUM, data = cav, qmatrix = q, control=list(fnscale=4000))
I am trying to grasp the basic concept of invertible and non-invertible matrices.
I created a random non-singular square matrix
S <- matrix(rnorm(100, 0, 1), ncol = 10, nrow = 10)
I know that this matrix is positive definite (thus invertible) because when I decompose the matrix S into its eigenvalues, their product is positive.
eig_S <- eigen(S)
eig_S$values
[1] 3.0883683+0.000000i -2.0577317+1.558181i -2.0577317-1.558181i 1.6884120+1.353997i 1.6884120-1.353997i
[6] -2.1295086+0.000000i 0.1805059+1.942696i 0.1805059-1.942696i -0.8874465+0.000000i 0.8528495+0.000000i
solve(S)
According to this paper, we can compute the inverse of a non-singular matrix by its SVD too.
Where
(where U and V are eigenvectors and D eigenvalues, please do correct me if I am wrong).
The inverse then is, .
Indeed, I can run the formula in R:
s <- svd(S)
s$v%*%solve(diag(s$d))%*%t(s$u)
Which produces exactly the same result as solve(S).
My first question is:
1) Are s$d indeed represent the eigenvalues of S? Because s$d and eig_S$values are quite different.
Now the second part,
If I create a singular matrix
I <- matrix(rnorm(100, 0, 1), ncol = 5, nrow = 20)
I <- I%*%t(I)
eig_I <- eigen(I)
eig_I$values
[1] 3.750029e+01 2.489995e+01 1.554184e+01 1.120580e+01 8.674039e+00 3.082593e-15 5.529794e-16 3.227684e-16
[9] 2.834454e-16 5.876634e-17 -1.139421e-18 -2.304783e-17 -6.636508e-17 -7.309336e-17 -1.744084e-16 -2.561197e-16
[17] -3.075499e-16 -4.150320e-16 -7.164553e-16 -3.727682e-15
The solve function will produce an error
solve(I)
system is computationally singular: reciprocal condition number =
1.61045e-19
So, again according to the same paper we can use the SVD
i <- svd(I)
solve(i$u %*% diag(i$d) %*% t(i$v))
which produces the same error.
Then I tried to use the Cholesky decomposition for matrix inversion
Conj(t(I))%*%solve(I%*%Conj(t(I)))
and again I get the same error.
Could someone please explain where am I using the equations wrong?
I know that for matrix I%*%Conj(t(I)), the determinant of the eigenvalue matrix is positive but the matrix is not a full rank due to the initial multiplication that I did.
j <- eigen(I%*%Conj(t(I)))
det(diag(j$values))
[1] 3.17708e-196
qr(I %*% Conj(t(I)))$rank
[1] 5
UPDATE 1: Following the comments bellow, and after going through the paper/Wikipedia page again. I used these two codes, which they produce some results but I am not sure about their validity. The first example seems more believable. The SVD solution
i$v%*%diag(1/i$d)%*%t(i$u)
and the Cholesky
Conj(t(I))%*%(I%*%Conj(t(I)))^(-1)
I am not sure if I interpreted the two sources correctly though.
Custom contrasts are very widely used in analyses, e.g.: "Do DV values at level 1 and level 3 of this three-level factor differ significantly?"
Intuitively, this contrast is expressed in terms of cell means as:
c(1,0,-1)
One or more of these contrasts, bound as columns, form a contrast coefficient matrix, e.g.
mat = matrix(ncol = 2, byrow = TRUE, data = c(
1, 0,
0, 1,
-1, -1)
)
[,1] [,2]
[1,] 1 0
[2,] 0 1
[3,] -1 -1
However, when it comes to running these contrasts specified by the coefficient matrix, there is a lot of (apparently contradictory) information on the web and in books. My question is which information is correct?
Claim 1: contrasts(factor) takes a coefficient matrix
In some examples, the user is shown that the intuitive contrast coefficient matrix can be used directly via the contrasts() or C() functions. So it's as simple as:
contrasts(myFactor) <- mat
Claim 2: Transform coefficients to create a coding scheme
Elsewhere (e.g. UCLA stats) we are told the coefficient matrix (or basis matrix) must be transformed from a coefficient matrix into a contrast matrix before use. This involves taking the inverse of the transform of the coefficient matrix: (mat')⁻¹, or, in Rish:
contrasts(myFactor) = solve(t(mat))
This method requires padding the matrix with an initial column of means for the intercept. To avoid this, some sites recommend using a generalized inverse function which can cope with non-square matrices, i.e., MASS::ginv()
contrasts(myFactor) = ginv(t(mat))
Third option: premultiply by the transform, take the inverse, and post multiply by the transform
Elsewhere again (e.g. a note from SPSS support), we learn the correct algebra is: (mat'mat)-¹ mat'
Implying to me that the correct way to create the contrasts matrix should be:
x = solve(t(mat)%*% mat)%*% t(mat)
[,1] [,2] [,3]
[1,] 0 0 1
[2,] 1 0 -1
[3,] 0 1 -1
contrasts(myFactor) = x
My question is, which is right? (If I am interpreting and describing each piece of advice accurately). How does one specify custom contrasts in R for lm, lme etc?
Refs
Claim 2 is correct (see the answers here and here) and sometimes claim 1, too. This is because there are cases in which the generalized inverse of the (transposed) coefficient matrix is equal to the matrix itself.
For what it's worth....
If you have a factor with 3 levels (levels A, B, and C) and you want to test the following orthogonal contrasts: A vs B, and the avg. of A and B vs C, your contrast codes would be:
Cont1<- c(1,-1, 0)
Cont2<- c(.5,.5, -1)
If you do as directed on the UCLA site (transform coefficients to make a coding scheme), as such:
Contrasts(Variable)<- solve(t(cbind(c(1,1,1), Cont1, Cont2)))[,2:3]
then your results are IDENTICAL to if you had created two dummy variables (e.g.:
Dummy1<- ifelse(Variable=="A", 1, ifelse(Variable=="B", -1, 0))
Dummy2<- ifelse(Variable=="A", .5, ifelse(Variable=="B", .5, -1))
and entered them both into the regression equation instead of your factor, which makes me inclined to think that this is the correct way.
PS I don't write the most elegant R code, but it gets the job done. Sorry, I'm sure there are easier ways to recode variables, but you get the gist.
I'm probably missing something, but in each of your three examples, you specify the contrast matrix in the same way, i.e.
## Note it should plural of contrast
contrasts(myFactor) = x
The only thing that differs is the value of x.
Using the data from the UCLA website as an example
hsb2 = read.table('http://www.ats.ucla.edu/stat/data/hsb2.csv', header=T, sep=",")
#creating the factor variable race.f
hsb2$race.f = factor(hsb2$race, labels=c("Hispanic", "Asian", "African-Am", "Caucasian"))
We can specify either the treatment version of the contrasts
contrasts(hsb2$race.f) = contr.treatment(4)
summary(lm(write ~ race.f, hsb2))
or the sum version
contrasts(hsb2$race.f) = contr.sum(4)
summary(lm(write ~ race.f, hsb2))
Alternatively, we can specify a bespoke contrast matrix.
See ?contr.sum for other standard contrasts.
I have a large right stochastic matrix(row sums to 1).size~20000x20000. How can I find the stationary distribution of it?
I tried to calculate the eigenvalues and vectors, and get complex eigenvalues, eg.1+0i(more than one).
And try to use the following method:
pi=u[I-P+U]^-1
while when I do the inversion with solve() I got the error message Error in solve.default(A):system is computationally singular: reciprocal condition number = 3.16663e-19
As far as I understand, the Perron–Frobenius theorem ensures that every stochastic matrix as a stationary probability vector pi that the largest absolute value of an eigenvalue is always 1, so pi=piP,and my matrix has all positive entries,I can get a uniq pi,am I correct?
Or if there any method I can calculate the row vector pi?
Every stochastic matrix indeed has a stationary distribution. Since P has all row sums = 1,
(P-I) has row sums = 0 => (P-I)*(1, ...., 1) always gives you zero. So rank(P-I) <= n-1, and so is rank of transpose to P-I. Hence, there exists q such that (t(P)-I)*q = 0 => t(P)q = q.
Complex value 1+0i seems to be quite real for me. But if you get only complex values, i.e. coefficient before i is not 0, then the algorithm produces an error somewhere -- it solves the problem numerically and does not have to be true all the time. Also it does not matter how many eigenvalues and vectors it produces, what matters is that it finds the right eigenvector for eigenvalue 1 and that's what you need.
Make sure that your stationary distribution is indeed your limit distribution, otherwise there is no point in computing it. You could try to find it out by multiplying different vectors with your matrix^1000, but I don't know how much time it will take in your case.
Last but not least, here is an example:
# first we need a function that calculates matrix^n
mpot = function (A, p) {
# calculates A^p (matrix multiplied p times with itself)
# inputes: A - real-valued square matrix, p - natural number.
# output: A^p
B = A
if (p>1)
for (i in 2:p)
B = B%*%A
return (B)
}
# example matrix
P = matrix( nrow = 3, ncol = 3, byrow = T,
data = c(
0.1, 0.9, 0,
0.4, 0, 0.6,
0, 1, 0
)
)
# this converges to stationary distribution independent of start distribution
t(mpot(P,1000)) %*% c(1/3, 1/3, 1/3)
t(mpot(P,1000)) %*% c(1, 0, 0)
# is it stationary?
xx = t(mpot(P,1000)) %*% c(1, 0, 0)
t(P) %*% xx
# find stationary distribution using eigenvalues
eigen(t(P)) # here it is!
eigen_vect = eigen(t(P))$vectors[,1]
stat_dist = eigen_vect/sum(eigen_vect) # as there is subspace of them,
# but we need the one with sum = 1
stat_dist