PageRank in R. Issue with vectors and how to iterate through adjacency matrix - r

I have a 500x500 adjacency matrix of 1 and 0, and I need to calculate pagerank for each page. I have a code here, where R is the matrix and T=0.15 is a constant:
n = ncol(R)
B = matrix(1/n, n, n) # the teleportation matrix
A = 0.85 * R + 0.15 * B
ranks = eigen(A)$vectors[1] # my PageRanks
print(ranks)
[1] -0.5317519+0i
I don't have much experience with R, but I assume that the given output is a general pagerank, and I need a pagerank for each page.
Is there a way to construct a table of pageranks with relation to the matrix? I didn't find anything related to my particular case in the web.

Few points:
(1) You need to convert the binary adjacency matrix (R in your case) to a column-stochastic transition matrix to start with (representing probability of transitions between the pages).
(2) A needs to remain as column stochastic as well, then only the dominant eigenvector corresponding to the eigenvalue 1 will be the page rank vector.
(3) To find the first eigenvector of the matrix A, you need use eigen(A)$vectors[,1]
Example with a small 5x5 adjacency matrix R:
set.seed(12345)
R = matrix(sample(0:1, 25, replace=TRUE), nrow=5) # random binary adjacency matrix
R = t(t(R) / rowSums(t(R))) # convert the adjacency matrix R to a column-stochastic transition matrix
n = ncol(R)
B = matrix(1/n, n, n) # the teleportation matrix
A = 0.85 * R + 0.15 * B
A <- t(t(A) / rowSums(t(A))) # make A column-stochastic
ranks = eigen(A)$vectors[,1] # my PageRanks
print(ranks)
# [1] 0.05564937 0.05564937 0.95364105 0.14304616 0.25280990
print(ranks / sum(ranks)) # normalized ranks
[1] 0.03809524 0.03809524 0.65282295 0.09792344 0.17306313

Related

R - Standardize matrix to have unit diagonals

I am seeking to generate the below matrix:
Θ = B + δIp ∈ Rp×p, where Ip is the identity matrix, each off-diagonal entry
in B (symmetric matrix) is generated independently and equals 0.5 with probability
0.1 or 0 with probability 0.9. The parameter δ > 0 is chosen such that Θ is positive definite. The matrix is standardized to have unit diagonals (transforming from covariance matrix to correlation matrix).
I think that I have most of the code, but I'm unsure of how to standardize the matrix to have unit diagonals syntactically in R (and theoretically, why that is a useful feature of a matrix).
# set number of cols/rows
p <- 5
set.seed(123)
# generate matrix B with values of 0.5 given probabilities
B <- matrix(sample(c(0,0.5), p^2, replace=TRUE, prob=c(0.9,0.1)), p)
# call the matrix lower triangle, need a symmetric matrix
i <- lower.tri(B)
B[i] <- t(B)[i]
diag(B) <- rep(0, p)
# finding parameter delta, such that Θ is positive definite.
(delta <- -min(eigen(B, symmetric=TRUE, only.values=TRUE)$values))
# set theta (delta is 2.8802)
theta <- B + 2.89*(diag(p))
# now to standardize the matrix to have unit diagonals ?
There are many ways to do this, but the following is very fast in timing experiments:
v <- 1/sqrt(diag(theta))
B <- theta * outer(v, v)
This divides all rows and columns by their standard deviations, which are the square roots of the diagonal elements.
It will fail whenever any diagonal is zero (or negative): but in that case such standardization isn't possible. Computing the square roots and their reciprocals first allows you to learn as soon as possible--with minimal computation--whether the procedure will succeed.
BTW, a direct way to compute B in the first part of your code (which has a zero diagonal) is
B <- as.matrix(structure(sample(c(0,1/2), p*(p-1)/2, replace=TRUE, prob=c(.9,.1),
Size=p, Diag=TRUE, class="dist"))
This eliminates the superfluous sampling.

Find limiting distribution of transition matrix and plot in R

may I know how I can find and plot the results of the limiting distribution or a unique stationary distribution of a transition matrix in R? (my goal is to have a unique and constant result instead of a random result)
This is the P matrix used:
P=matrix(c(0.2,0.3,0.5,0.1,0.8,0.1,0.4,0.2,0.4),nrow=3,ncol=3,byrow=TRUE)
I misspoke in my earlier answer. Either the sums of the rows or the column need to all be 1 for a transition matrix. It depends on whether you are using v'P or Pv to transition to the next step.
I'll use Pv.
For the limiting distribution to be stable, we must have:
Pv = v, or (P - I)v = 0. So the limiting distribution is an eigenvector with eigenvalue 1. Then to be sure it's a distribution sum(v) == 1.
Since your matrix has rows that sum to 1, not columns, we need to use the transpose of the matrix to calculate the eigenvalues:
e <- eigen(t(P))$vectors[, 1]
e <- e / sum(e)
Gives:
e
[1] 0.1960784 0.5490196 0.2549020
To check this:
P=matrix(c(0.2,0.3,0.5,0.1,0.8,0.1,0.4,0.2,0.4),nrow=3,ncol=3, byrow = TRUE)
ans <- e
for (i in 1:1000) {
ans <- ans %*% P
}
ans
ans
[,1] [,2] [,3]
[1,] 0.1960784 0.5490196 0.254902
Same, so it's stable.
I'm not clear as to what you wanted to plot.

Multiplicating a matrix with a vector results in a matrix

I have a document-term matrix:
document_term_matrix <- as.matrix(DocumentTermMatrix(corpus, control = list(stemming = FALSE, stopwords=FALSE, minWordLength=3, removeNumbers=TRUE, removePunctuation=TRUE )))
For this document-term matrix, I've calculated the local term- and global term weighing as follows:
lw_tf <- lw_tf(document_term_matrix)
gw_idf <- gw_idf(document_term_matrix)
lw_tf is a matrix with the same dimensionality as the document-term-matrix (nxm) and gw_idf is a vector of size n. However, when I run:
tf_idf <- lw_tf * gw_idf
The dimensionality of tf_idf is again nxm.
Originally, I would not expect this multiplication to work, as the dimensionalities are not conformable. However, given this output I now expect the dimensionality of gw_idf to be mxm. Is this indeed the case? And if so: what happened to the gw_idf vector of size n?
Matrix multiplication is done in R by using %*%, not * (the latter is just element-wise multiplication). Your reasoning is partially correct, you were just using the wrong symbols.
About the matrix multiplication, a matrix multiplication is only possible if the second dimension of the first matrix is the same as the first dimensions of the second matrix. The resulting dimensions is the dim1 of first matrix by the dim2 of the second matrix.
In your case, you're telling us you have a 1 x n matrix multiplied by a n x m matrix, which should result in a 1 x m matrix. You can check such case in this example:
a <- matrix(runif(100, 0 , 1), nrow = 1, ncol = 100)
b <- matrix(runif(100 * 200, 0, 1), nrow = 100, ncol = 200)
c <- a %*% b
dim(c)
[1] 1 200
Now, about your specific case, I don't really have this package that makes term-documents (would be nice of you to provide an easily reproducible example!), but if you're multiplying a nxm matrix element-wise (you're using *, like I said in the beginning) by a nx1 array, the result does not make sense. Either your variable gw_idf is not an array at all (maybe it's just a scalar) or you're simply making a wrong conclusion.

Pointwise multiplication and right matrix division

I'm currently trying to recreate this Matlab function in R:
function X = uniform_sphere_points(n,d)
% X = uniform_sphere_points(n,d)
%
%function generates n points unformly within the unit sphere in d dimensions
z= randn(n,d);
r1 = sqrt(sum(z.^2,2));
X=z./repmat(r1,1,d);
r=rand(n,1).^(1/d);
X = X.*repmat(r,1,d);
Regarding the the right matrix division I installed the pracma package. My R code right now is:
uniform_sphere_points <- function(n,d){
# function generates n points uniformly within the unit sphere in d dimensions
z = rnorm(n, d)
r1 = sqrt(sum(z^2,2))
X = mrdivide(z, repmat(r1,1,d))
r = rnorm(1)^(1/d)
X = X * matrix(r,1,d)
return(X)
}
But it is not really working since I always end with a non-conformable arrays error in R.
This operation for sampling n random points from the d-dimensional unit sphere could be stated in words as:
Construct a n x d matrix with entries drawn from the standard normal distribution
Normalize each row so it has (2-norm) magnitude 1
For each row, compute a random value by taking a draw from the uniform distribution (between 0 and 1) and raise that value to the 1/d power. Multiply all elements in the row by that value.
The following R code does these operations:
unif.samp <- function(n, d) {
z <- matrix(rnorm(n*d), nrow=n, ncol=d)
z * (runif(n)^(1/d) / sqrt(rowSums(z^2)))
}
Note that in the second line of code I have taken advantage of the fact that multiplying a n x d matrix in R by a vector of length n will multiply each row by the corresponding value in that vector. This saves us the work of using repmat to construct matrices of exactly the same size as our original matrix for these sorts of row-specific operations.

Find K nearest neighbors, starting from a distance matrix

I'm looking for a well-optimized function that accepts an n X n distance matrix and returns an n X k matrix with the indices of the k nearest neighbors of the ith datapoint in the ith row.
I find a gazillion different R packages that let you do KNN, but they all seem to include the distance computations along with the sorting algorithm within the same function. In particular, for most routines the main argument is the original data matrix, not a distance matrix. In my case, I'm using a nonstandard distance on mixed variable types, so I need to separate the sorting problem from the distance computations.
This is not exactly a daunting problem -- I obviously could just use the order function inside a loop to get what I want (see my solution below), but this is far from optimal. For example, the sort function with partial = 1:k when k is small (less than 11) goes much faster, but unfortunately returns only sorted values rather than the desired indices.
Try to use FastKNN CRAN package (although it is not well documented). It offers k.nearest.neighbors function where an arbitrary distance matrix can be given. Below you have an example that computes the matrix you need.
# arbitrary data
train <- matrix(sample(c("a","b","c"),12,replace=TRUE), ncol=2) # n x 2
n = dim(train)[1]
distMatrix <- matrix(runif(n^2,0,1),ncol=n) # n x n
# matrix of neighbours
k=3
nn = matrix(0,n,k) # n x k
for (i in 1:n)
nn[i,] = k.nearest.neighbors(i, distMatrix, k = k)
Notice: You can always check Cran packages list for Ctrl+F='knn'
related functions:
https://cran.r-project.org/web/packages/available_packages_by_name.html
For the record (I won't mark this as the answer), here is a quick-and-dirty solution. Suppose sd.dist is the special distance matrix. Suppose k.for.nn is the number of nearest neighbors.
n = nrow(sd.dist)
knn.mat = matrix(0, ncol = k.for.nn, nrow = n)
knd.mat = knn.mat
for(i in 1:n){
knn.mat[i,] = order(sd.dist[i,])[1:k.for.nn]
knd.mat[i,] = sd.dist[i,knn.mat[i,]]
}
Now knn.mat is the matrix with the indices of the k nearest neighbors in each row, and for convenience knd.mat stores the corresponding distances.

Resources