How can I compress a markov transition probability matrix? - math

I'm performing a simple experiment. I have a markov transition matrix and I need to know how to compute a transition similarity matrix using spears algorithm. I'm following this paper(pages 8 and 9)
I understand how to get the graph, the original probability matrix but I don't know how the original matrix gets converted to the transition similarity matrix.
Anyone can explain?


Applying PCA to a covariance matrix

I am have some difficulty understanding some steps in a procedure. They take coordinate data, find the covariance matrix, apply PCA, then extract the standard deviation from the square root of each eigenvalue in short. I am trying to re-produce this process, but I am stuck on the steps.
The Steps Taken
The data set consists of one matrix, R, that contains coordiante paris, (x(i),y(i)) with i=1,...,N for N is the total number of instances recorded. We applied PCA to the covariance matrix of the R input data set, and the following variables were obtained:
a) the principal components of the new coordinate system, the eigenvectors u and v, and
b) the eigenvalues (λ1 and λ2) corresponding to the total variability explained by each principal component.
With these variables, a graphical representation was created for each item. Two orthogonal segments were centred on the mean of the coordinate data. The segments’ directions were driven by the eigenvectors of the PCA, and the length of each segment was defined as one standard deviation (σ1 and σ2) around the mean, which was calculated by extracting the square root of each eigenvalue, λ1 and λ2.
My Steps
#reproducable data
# Note my data is not perfectly distirbuted in this fashion
df<-data.frame(x,y) # this is my R matrix
covar.df<-cov(df,use="all.obs",method='pearson') # this is my covariance matrix
pca.results<-prcomp(covar.df) # this applies PCA to the covariance matrix
pca.results$sdev # these are the standard deviations of the principal components
# which is what I believe I am looking for.
This is where I am stuck because I am not sure if I am trying to get the sdev output form prcomp() or if I should scale my data first. They are all on the same scale, so I do not see the issue with it.
My second question is how do I extract the standard deviation in the x and y direciton?
You don't apply prcomp to the covariance matrix, you do it on the data itself.
result= prcomp(df)
If by scaling you mean normalize or standardize, that happens before you do prcomp(). For more information on the procedure see this link that is introductory to the procedure: pca on R. That can walk you through the basics. To get the sdev use the the summary on the result object
You don't apply prcomp to the covariance matrix. scale=T bases the PCA on the correlation matrix and F on the covariance matrix
df.cor = prcomp(df, scale=TRUE)
df.cov = prcomp(df, scale=FALSE)

Find vector in which points are more sparse

(1) I have n points in 3D space
(2) I have a random vector
(3) I project all n points into the vector
Then I find the average distance between all points
How could I find the vector in which after projecting the points into it, the average distance between points is the greatest?
Can this be done in O(n)?
There is one method which you can use from machine learning, specifically dimensionality reduction. (This is based on PCA which was mentioned in one of the comments.)
Compute the covariance matrix.
Find the eigenvalues and the eigenvectors.
The eigenvector with the largest eigenvalue will correspond to the direction of the most variance, so the direction in which the points are most spread out.
Map the points onto the line defined by the vector.
Centring the points around 0 before the projection, and then moving them back afterwards may be needed as well. The issue with this, is that it is quite expensive in terms of time. For more details looks at this question: How is the complexity of PCA O(min(p^3,n^3))?

How to calculate "compound" Markov transition matrix in Stata or R?

By "compound" I mean the transition matrix satisfies the Markov property,namely I have two columns s_t and s_t+k that represent state of each individual in two period t and t+k respectively.
What I want is to find the matrix M that
s_t+k = M^k * s_t
so that matrix M satisfies the Markov property.
My default working language is Stata, in which commands like tab, svy:tab or xttran can generate one period transition matrices, but these matrices do not necessarily satisfy the Markov property. So I wonder how to achieve my goal in Stata or other common language like R or Python.
PS:This problem raise from a paper which research many countries' GDP_per_capita transition dynamics from 1960 to 2010. Say, at the beginning of each decades, we group all countries into 5 groups (from 1:extremely poor country to 5: high-income country), so we have a distribution of countries with 5 states. It's easy if I simply estimate the decade-to-decade transition matrix using markovchain class. However, the author claim that (page11, footnote4)
“The decade average transition matrix is estimated based on
the 5-decade transition matrices from 1960 to 2010 by employing
a numerical optimization program. Instead of taking the simple average
for the five transition matrices (which suffers from Jensen’s
Inequality), we estimate a transition matrix that can give us an exact
5 decade duration transition matrix (entry in 1960 and exit in 2010)
by taking its power 5.”
In R you can use the markovchain package to get the transition matrix that satisfies markov property. You can use the following example code...
myFit<-markovchainFit(data=mysequence,,method="bootstrap",nboot=5, name="Bootstrap Mc")
The myFit is your estimated transition matrix. This example uses the Alofi rainfall dataset.
The multiplication of matrix in R is not * but %*%.
I wrote a simple function in R to solve the problem.
trans_mat = function(k,s_t,M){
for(i in 1:k){
M = M % * % M
now, what you need to do is to type in k(how long the period you want),s_t(the original state), and M(markov property).
s_t+k = trans_mat(k,s_t,M)
The markovchain package directly implements the power for any markovchain object:
#creating the MC
#5th power of the MC

PCA analysis using Correlation Matrix as input in R

Now i have a 7000*7000 correlation matrix and I have to do PCA on this in R.
I used the
CorPCA <- princomp(covmat=xCor)
, xCor is the correlation matrix
but it comes out
"covariance matrix is not non-negative definite"
it is because i have some negative correlation in that matrix.
I am wondering which inbuilt function in R that i can use to get the result of PCA
One method to do the PCA is to perform an eigenvalue decomposition of the covariance matrix, see wikipedia.
The advantage of the eigenvalue decomposition is that you see which directions (eigenvectors) are significant, i.e. have a noticeable variation expressed by the associated eigenvalues. Moreover, you can detect if the covariance matrix is positive definite (all eigenvalues greater than zero), not negative-definite (which is okay) if there are eigenvalues equal zero or if it is indefinite (which is not okay) by negative eigenvalues. Sometimes it also happens that due to numerical inaccuracies a non-negative-definite matrix becomes negative-definite. In that case you would observe negative eigenvalues which are almost zero. In that case you can set these eigenvalues to zero to retain the non-negative definiteness of the covariance matrix. Furthermore, you can still interpret the result: eigenvectors contributing the significant information are associated with the biggest eigenvalues. If the list of sorted eigenvalues declines quickly there are a lot of directions which do not contribute significantly and therefore can be dropped.
The built-in R function is eigen
If your covariance matrix is A then
eigen_res <- eigen(A)
# sorted list of eigenvalues
# slightly negative eigenvalues, set them to small positive value
eigen_res$values[eigen_res$values<0] <- 1e-10
# and produce regularized covariance matrix
Areg <- eigen_res$vectors %*% diag(eigen_res$values) %*% t(eigen_res$vectors)
not non-negative definite does not mean the covariance matrix has negative correlations. It's a linear algebra equivalent of trying to take square root of negative number! You can't tell by looking at a few values of the matrix, whether it's positive definite.
Try adjusting some default values like tolerance in princomp call. Check this thread for example: How to use princomp () function in R when covariance matrix has zero's?
An alternative is to write some code of your own to perform what is called a n NIPLAS analysis. Take a look at this thread on the R-mailing list:
I'd even go as far as asking where did you obtain the correlation matrix? Did you construct it yourself? Does it have NAs? If you constructed xCor from your own data, do you think you can sample the data and construct a smaller xCor matrix? (say 1000X1000). All these alternatives try to drive your PCA algorithm through the 'happy path' (i.e. all matrix operations can be internally carried out without difficulties in diagonalization etc..i.e., no more 'non-negative definite error msgs)

The user wants to impose a unique, non-trivial, upper/lower bound on the correlation between every pair of variable in a var/covar matrix.
For example: I want a variance matrix in which all variables have 0.9 > |rho(x_i,x_j)| > 0.6, rho(x_i,x_j) being the correlation between variables x_i and x_j.
There are MANY issues here.
First of all, are the pseudo-random deviates assumed to be normally distributed? I'll assume they are, as any discussion of correlation matrices gets nasty if we diverge into non-normal distributions.
Next, it is rather simple to generate pseudo-random normal deviates, given a covariance matrix. Generate standard normal (independent) deviates, and then transform by multiplying by the Cholesky factor of the covariance matrix. Add in the mean at the end if the mean was not zero.
And, a covariance matrix is also rather simple to generate given a correlation matrix. Just pre and post multiply the correlation matrix by a diagonal matrix composed of the standard deviations. This scales a correlation matrix into a covariance matrix.
I'm still not sure where the problem lies in this question, since it would seem easy enough to generate a "random" correlation matrix, with elements uniformly distributed in the desired range.
So all of the above is rather trivial by any reasonable standards, and there are many tools out there to generate pseudo-random normal deviates given the above information.
Perhaps the issue is the user insists that the resulting random matrix of deviates must have correlations in the specified range. You must recognize that a set of random numbers will only have the desired distribution parameters in an asymptotic sense. Thus, as the sample size goes to infinity, you should expect to see the specified distribution parameters. But any small sample set will not necessarily have the desired parameters, in the desired ranges.
For example, (in MATLAB) here is a simple positive definite 3x3 matrix. As such, it makes a very nice covariance matrix.
S = randn(3);
S = S'*S
S =
0.78863 0.01123 -0.27879
0.01123 4.9316 3.5732
-0.27879 3.5732 2.7872
I'll convert S into a correlation matrix.
s = sqrt(diag(S));
C = diag(1./s)*S*diag(1./s)
C =
1 0.0056945 -0.18804
0.0056945 1 0.96377
-0.18804 0.96377 1
Now, I can sample from a normal distribution using the statistics toolbox (mvnrnd should do the trick.) As easy is to use a Cholesky factor.
L = chol(S)
L =
0.88805 0.012646 -0.31394
0 2.2207 1.6108
0 0 0.30643
Now, generate pseudo-random deviates, then transform them as desired.
X = randn(20,3)*L;
ans =
0.79069 -0.14297 -0.45032
-0.14297 6.0607 4.5459
-0.45032 4.5459 3.6549
ans =
1 -0.06531 -0.2649
-0.06531 1 0.96587
-0.2649 0.96587 1
If your desire was that the correlations must ALWAYS be greater than -0.188, then this sampling technique has failed, since the numbers are pseudo-random. In fact, that goal will be a difficult one to achieve unless your sample size is large enough.
You might employ a simple rejection scheme, whereby you do the sampling, then redo it repeatedly until the sample has the desired properties, with the correlations in the desired ranges. This may get tiring.
An approach that might work (but one that I've not totally thought out at this point) is to use the standard scheme as above to generate a random sample. Compute the correlations. I they fail to lie in the proper ranges, then identify the perturbation one would need to make to the actual (measured) covariance matrix of your data, so that the correlations would be as desired. Now, find a zero mean random perturbation to your sampled data that would move the sample covariance matrix in the desired direction.
This might work, but unless I knew that this is actually the question at hand, I won't bother to go any more deeply into it. (Edit: I've thought some more about this problem, and it appears to be a quadratic programming problem, with quadratic constraints, to find the smallest perturbation to a matrix X, such that the resulting covariance (or correlation) matrix has the desired properties.)
This is not a complete answer, but a suggestion of a possible constructive method:
Looking at the characterizations of the positive definite matrices ( I think one of the most affordable approaches could be using the Sylvester criterion.
You can start with a trivial 1x1 random matrix with positive determinant and expand it in one row and column step by step while ensuring that the new matrix has also a positive determinant (how to achieve that is up to you ^_^).
"First of all, are the pseudo-random deviates assumed to be normally distributed?"
"Perhaps the issue is the user insists that the resulting random matrix of deviates must have correlations in the specified range."
Yes, that's the whole difficulty
"You must recognize that a set of random numbers will only have the desired distribution parameters in an asymptotic sense."
True, but this is not the problem here: your strategy works for p=2, but fails for p>2, regardless of sample size.
"If your desire was that the correlations must ALWAYS be greater than -0.188, then this sampling technique has failed, since the numbers are pseudo-random. In fact, that goal will be a difficult one to achieve unless your sample size is large enough."
It is not a sample size issue b/c with p>2 you do not even observe convergence to the right range for the correlations, as sample size growths: i tried the technique you suggest before posting here, it obviously is flawed.
"You might employ a simple rejection scheme, whereby you do the sampling, then redo it repeatedly until the sample has the desired properties, with the correlations in the desired ranges. This may get tiring."
Not an option, for p large (say larger than 10) this option is intractable.
"Compute the correlations. I they fail to lie in the proper ranges, then identify the perturbation one would need to make to the actual (measured) covariance matrix of your data, so that the correlations would be as desired."
As for the QP, i understand the constraints, but i'm not sure about the way you define the objective function; by using the "smallest perturbation" off some initial matrix, you will always end up getting the same (solution) matrix: all the off diagonal entries will be exactly equal to either one of the two bounds (e.g. not pseudo random); plus it is kind of an overkill isn't it ?
Come on people, there must be something simpler
