compute eigenvector using a dominant eigenvalue - eigenvector

I want to ask some question about eigenvector centrality.
I have to compute a eigenvalue using power iteration. This is my code to compute eigenvalue :
v=rand(165,1);
for k=1:5
w = data_table*v;
lamda = norm(w);
v = w/lamda;
end
When I have get a single eigenvalue, I confused to compute eigenvector score using a single eigenvalue that I had get it. for example in my code to compute eigenvalue I get dominant eigenvalue = 78.50. With this eigenvalue score, I want compute eigenvector score. usually, we always compute eigenvalue and eigenvector using code for example : [U,V] = eig(data_matrix);
but, the result from that code :
v =
-167.59 0 0
0 -117.51 0
0 0 -112.0
V =
0.0404505 0.04835455 -0.01170
0.0099050 -0.0035217 -0.05561
0.0319591 -0.0272589 0.018426
From the result we compute the Eigenvector using three eigenvalue score. My question is how to compute the eigenvector score but just using only one eigenvalue score that we get in power iteration code ?

power iteration finds the dominant eigenvector, that is the eigenvector with the largest eigenvalue.
if you start with
v=ones(165,1)/165; % initialisation
for i=1:5 % 5 iterations
w=data_table*v; % assuming size(data_table) = [165 165]
v=w/norm(w);
end
and your algorithm converges in 5 iterations, then v is your dominant eigenvector;
Also, I would start with a smaller example to test your code. Your matlab call [U,V] = eig(data_matrix); is confusing because V should be a diagonal matrix of size [165 165], not a full matrix of size [3 3];
Try this:
X=[1 1 1;1 1 2;1 2 2]
[U,V]=eig(X)
X*U(:,3)
U(:,3)*V(3,3)
to see what the largest eigenvalue is in the matlab output, i.e. (V3,3), and the corresponding vector U(:,3).
You cat use power iteration to find this eigenvector:
v=ones(1,3)
w=v*X;v=w/norm(w)
w=v*X;v=w/norm(w)
w=v*X;v=w/norm(w)
w=v*X;v=w/norm(w)

Related

Is eigenvector centrality in igraph wrong?

I am trying to improve my understanding of eigenvector centrality. This overview from the University of Washington was very helpful, especially when read in conjunction with this R code. However, when I use evcent(graph_from_adjacency_matrix(A)), the result differs.
The below code
library(matrixcalc)
library(igraph)
# specify the adjacency matrix
A <- matrix(c(0,1,0,0,0,0,
1,0,1,0,0,0,
0,1,0,1,1,1,
0,0,1,0,1,0,
0,0,1,1,0,1,
0,0,1,0,1,0 ),6,6, byrow= TRUE)
EV <- eigen(A) # compute eigenvalues and eigenvectors
max(EV$values) # find the maximum eigenvalue
centrality <- data.frame(EV$vectors[,1])
names(centrality) <- "Centrality"
print(centrality)
B <- A + diag(6) # Add self loops
EVB <- eigen(B) # compute eigenvalues and eigenvectors
# they are the same as EV(A)
c <- matrix(c(2,3,5,3,4,3)) # Degree of each node + self loop
ck <- function(k){
n <- (k-2)
B_K <- B # B is the original adjacency matrix, w/ self-loops
for (i in 1:n){
B_K <- B_K%*%B #
#print(B_K)
}
c_k <- B_K%*%c
return(c_k)
}
# derive EV centrality as k -> infinity
# k = 100
ck(100)/frobenius.norm(ck(100)) # .09195198, .2487806, .58115487, .40478177, .51401731, .040478177
# Does igraph match?
evcent(graph_from_adjacency_matrix(A))$vector # No: 0.1582229 0.4280856 1.0000000 0.6965127 0.8844756 0.6965127
The rank correlation is the same, but it is still bothersome that the values are not the same. What is going on?
The result returned by igraph is not wrong, but note that there are subtleties to defining eigenvector centrality, and not all implementations handle self-loops in the same way.
Please see what I wrote here.
One way to define eigenvector centrality is simply as "the leading eigenvector of the adjacency matrix". But this is imprecise without specifying what the adjacency matrix is, especially what its diagonal elements should be when there are self-loops present. Depending on application, diagonal entries of the adjacency matrix of an undirected graph are sometimes defined as the number of self-loops, and sometimes as twice the number of self-loops. igraph uses the second definition when computing eigenvector centrality. This is the source of the difference you see.
A more intuitive definition of eigenvector centrality is that the centrality of each vertex is proportional to the sum of its neighbours centralities. Thus the details of the computation hinge on who the neighbours are. Consider a single vertex with a self-loop. It is its own neighbour, but how many times? We can traverse the self-loop in both directions, so it is reasonable to say that it is its own neighbour twice. Indeed, its degree is conventionally taken to be 2, not 1.
You will find that different software packages treat self-loops differently when computing the eigenvector centrality. In igraph, we made a choice by looking at the intuitive interpretation of eigenvector centrality rather than rigidly following a formal definition, with no regard for the motivation behind that definition.
Note: What I wrote about refers to how eigenvector centrality computations work internally, not to what as_adjacency_matrix() return. as_adjacency_matrix() adds one (not two) to the diagonal for each self-loop.

Matrix dimension do not mach in regression formula

I'm trying to calculate this regression formula, but I have problem with the dimension calculation, they are not correct:
Where:
X-a matrix with dimensions 200x20, n=200 samples, p=20 predictors,
y-a matrix with dimensions 200x1,
- a sequence of coefficients, dimensions 20x1, and k=1,2,3...
- dimensions 20x200
j- and value from 1...p so from 1...20,
The problem is when I calculate
For example for k=20, k-1=19 i have and the dimensions do not match to do a substraction 200x1 - 200x20 x 1x1 =200x1 - 200x20 will not work.
If I take all the beta vector then it is correct. does this: mean to take the 19th value of Beta and to multiply it with the matrix X?
Source of the formula:
You should be using the entire beta vector at each stage of the calculation.
(Tibshirani has been a bit permissive with his use of notation, perhaps...)
The k is just a counter for which step of the algorithm we are on. Right at the start (k = 0 or "step 0") we initialise the entire beta vector to have all elements equal to zero:
At each step of the algorithm (steps k = 1, 2, 3... and so on) we use our previous estimate of the vector beta ( calculated in step k - 1) to calculate a new improved estimate for the vector beta (). The superscript number is not an index into the vector, rather it is a label telling us at which stage of the algorithm that beta vector was produced.
I hope this makes sense. The important point is that each of the values is a different 20x1 vector.

Calculate Rao's quadratic entropy

Rao QE is a weighted Euclidian distance matrix. I have the vectors for the elements of the d_ijs in a data table dt, one column per element (say there are x of them). p is the final column. nrow = S. The double sums are for the lower left (or upper right since it is symmetric) elements of the distance matrix.
If I only needed an unweighted distance matrix I could simply do dist() over the x columns. How do I weight the d_ijs by the product of p_i and p_j?
And example data set is at https://github.com/GeraldCNelson/nutmod/blob/master/RaoD_example.csv with the ps in the column called foodQ.ratio.
You still start with dist for the raw Euclidean distance matrix. Let it be D. As you will read from R - How to get row & column subscripts of matched elements from a distance matrix, a "dist" object is not a real matrix, but a 1D array. So first do D <- as.matrix(D) or D <- dist2mat(D) to convert it to a complete matrix before the following.
Now, let p be the vector of weights, the Rao's QE is just a quadratic form q'Dq / 2:
c(crossprod(p, D %*% p)) / 2
Note, I am not doing everything in the most efficient way. I have performed a symmetric matrix-vector multiplication D %*% p using the full D rather than just its lower triangular part. However, R does not have a routine doing triangular matrix-vector multiplication. So I compute the full version than divide 2.
This doubles computation amount that is necessary; also, making D a full matrix doubles memory costs. But if your problem is small to medium size this is absolutely fine. For large problem, if you are R and C wizard, call BLAS routine dtrmv or even dtpmv for the triangular matrix-vector computation.
Update
I just found this simple paper: Rao's quadratic entropy as a measure of functional diversity based on multiple traits for definition and use of Rao's EQ. It mentions that we can replace Euclidean distance with Mahalanobis distance. In case we want to do this, use my code in Mahalanobis distance of each pair of observations for fast computation of Mahalanobis distance matrix.

How to construct the POE ensemble in julia

I'm having a trouble in building the POE ensemble in julia. I am following this paper and part of this other paper.
In julia, I calculate:
X = randn(dim, dim)
Q, R = qr(X)
Q = Q*diagm(sign(diag(R)))
ij = (irealiz-1)*dim
phases_ens[1+ij:ij+dim] = angle(eigvals(Q))
where dim is the matrix dimension and irealiz is just and index for the total number of realizations.
I am interested in the phases of Q, since I want that Q be an orthogonal matrix with the appropriate Haar measure. If dim=50 and the total number of realization is 100000, and since I am correcting Q, I should expect a flat phases_ens distribution. However, I obtain a flat distribution except a peak at zero and at pi. Is there something wrong with the code?
The code is actually correct, you just have the wrong field
The eigenvalue result is true for unitary matrices (complex entries); based on the code from section 4.6 of the Edelman and Rao paper, if you replace the first line by
X = randn(dim, dim) + im*randn(dim, dim)
you get the result you want.
Orthogonal matrices (real entries) behave slightly differently (see remark 1, in section 3 of this paper):
when dims is odd, one eigenvalue will be +1 or -1 (each with probability 1/2), all others will occur as conjugate pairs.
when dims is even, both +1 and -1 will be eigenvalues with probability 1/2, otherwise there are no real eigenvalues.
(Thanks for the links by the way: I wasn't aware of the Stewart paper)

efficient computation of Trace(AB^{-1}) given A and B

I have two square matrices A and B. A is symmetric, B is symmetric positive definite. I would like to compute $trace(A.B^{-1})$. For now, I compute the Cholesky decomposition of B, solve for C in the equation $A=C.B$ and sum up the diagonal elements.
Is there a more efficient way of proceeding?
I plan on using Eigen. Could you provide an implementation if the matrices are sparse (A can often be diagonal, B is often band-diagonal)?
If B is sparse, it may be efficient (i.e., O(n), assuming good condition number of B) to solve for x_i in
B x_i = a_i
(sample Conjugate Gradient code is given on Wikipedia). Taking a_i to be the column vectors of A, you get the matrix B^{-1} A in O(n^2). Then you can sum the diagonal elements to get the trace. Generally, it's easier to do this sparse inverse multiplication than to get the full set of eigenvalues. For comparison, Cholesky decomposition is O(n^3). (see Darren Engwirda's comment below about Cholesky).
If you only need an approximation to the trace, you can actually reduce the cost to O(q n) by averaging
r^T (A B^{-1}) r
over q random vectors r. Usually q << n. This is an unbiased estimate provided that the components of the random vector r satisfy
< r_i r_j > = \delta_{ij}
where < ... > indicates an average over the distribution of r. For example, components r_i could be independent gaussian distributed with unit variance. Or they could be selected uniformly from +-1. Typically the trace scales like O(n) and the error in the trace estimate scales like O(sqrt(n/q)), so the relative error scales as O(sqrt(1/nq)).
If generalized eigenvalues are more efficient to compute, you can compute the generalized eigenvalues, A*v = lambda* B *v and then sum up all the lambdas.

Resources