R - Standardize matrix to have unit diagonals - r

I am seeking to generate the below matrix:
Θ = B + δIp ∈ Rp×p, where Ip is the identity matrix, each off-diagonal entry
in B (symmetric matrix) is generated independently and equals 0.5 with probability
0.1 or 0 with probability 0.9. The parameter δ > 0 is chosen such that Θ is positive definite. The matrix is standardized to have unit diagonals (transforming from covariance matrix to correlation matrix).
I think that I have most of the code, but I'm unsure of how to standardize the matrix to have unit diagonals syntactically in R (and theoretically, why that is a useful feature of a matrix).
# set number of cols/rows
p <- 5
set.seed(123)
# generate matrix B with values of 0.5 given probabilities
B <- matrix(sample(c(0,0.5), p^2, replace=TRUE, prob=c(0.9,0.1)), p)
# call the matrix lower triangle, need a symmetric matrix
i <- lower.tri(B)
B[i] <- t(B)[i]
diag(B) <- rep(0, p)
# finding parameter delta, such that Θ is positive definite.
(delta <- -min(eigen(B, symmetric=TRUE, only.values=TRUE)$values))
# set theta (delta is 2.8802)
theta <- B + 2.89*(diag(p))
# now to standardize the matrix to have unit diagonals ?

There are many ways to do this, but the following is very fast in timing experiments:
v <- 1/sqrt(diag(theta))
B <- theta * outer(v, v)
This divides all rows and columns by their standard deviations, which are the square roots of the diagonal elements.
It will fail whenever any diagonal is zero (or negative): but in that case such standardization isn't possible. Computing the square roots and their reciprocals first allows you to learn as soon as possible--with minimal computation--whether the procedure will succeed.
BTW, a direct way to compute B in the first part of your code (which has a zero diagonal) is
B <- as.matrix(structure(sample(c(0,1/2), p*(p-1)/2, replace=TRUE, prob=c(.9,.1),
Size=p, Diag=TRUE, class="dist"))
This eliminates the superfluous sampling.

Related

chol2inv(chol(x)) and solve(x)

I assumed that chol2inv(chol(x)) and solve(x) are two different methods that arrive at the same conclusion in all cases. Consider for instance a matrix S
A <- matrix(rnorm(3*3), 3, 3)
S <- t(A) %*% A
where the following two commands will give equivalent results:
solve(S)
chol2inv(chol(S))
Now consider the transpose of the Cholesky decomposition of S:
L <- t(chol(S))
where now the results of the following two commands do not give equivalent results anymore:
solve(L)
chol2inv(chol(L))
This surprised me a bit. Is this expected behavior?
chol expects (without checking) that its first argument x is a symmetric positive definite matrix, and it operates only on the upper triangular part of x. Thus, if L is a lower triangular matrix and D = diag(diag(L)) is its diagonal part, then chol(L) is actually equivalent to chol(D), and chol2inv(chol(L)) is actually equivalent to solve(D).
set.seed(141339L)
n <- 3L
S <- crossprod(matrix(rnorm(n * n), n, n))
L <- t(chol(S))
D <- diag(diag(L))
all.equal(chol(L), chol(D)) # TRUE
all.equal(chol2inv(chol(L)), solve(D)) # TRUE

R Derivatives of an Inverse

I have an expression that contains several parts. However, for simplicity, consider only the following part as MWE:
Let's assume we have the inverse of a matrix Y that I want to differentiate w.r.t. x.
Y is given as I - (x * b * t(b)), where I is the identity matrix, x is a scalar, and b is a vector.
According to The Matrix Cookbook Equ. 59, the partial derivative of an inverse is:
Normally I would use the function D from the package stats to calculate the derivatives. But that is not possible in this case, because e.g. solve to specify Y as inverse and t() is not in the table of derivatives.
What is the best workaround to circumvent this problem? Are there any other recommended packages that can handle such input?
Example that doesn't work:
f0 <- expression(solve(I - (x * b %*% t(b))))
D(f0, "x")
Example that works:
f0 <- expression(x^3)
D(f0, "x")
3 * x^2
I assume that the question is how to get an explicit expression for the derivative of the inverse of Y with respect to x. In the first section we compute it and in the second section we double check it by computing it numerically and show that the two approaches give the same result.
b and the null space of b are both eigenspaces of Y which we can readily verify by noting that Yb = (1-(b'b)x)b and if z belongs to the nullspace of b then Yz = z. This also shows that the corresponding eigenvalues are 1 - x(b'b) with multiplicity 1 and 1 with multiplicity n-1 (since the nullspace of b has that dimension).
As a result of the fact that we can expand such a matrix into the sum of each eigenvalue times the projection onto its eigenspace we can express Y as the following where bb'/b'b is the projection onto the eigenspace spanned by b and the part pre-multiplying it is the eigenvalue. The remaining terms do not involve x because they involve an eigenvalue of 1 independently of x and the nullspace of b is independent of x as well.
Y = (1-x(b'b))(bb')/(b'b) + terms not involving x
The inverse of Y is formed by taking the reciprocals of the eigenvalues so:
Yinv = 1/(1-x(b'b)) * (bb')/(b'b) + terms not involving x
and the derivative of that wrt x is:
(b'b) / (1 - x(b'b))^2 * (bb')/(b'b)
Cancelling the b'b and writing the derivative in terms of R code:
1/(1 - x*sum(b*b))^2*outer(b, b)
Double check
Using specific values for b and x we can verify it against the numeric derivative as follows:
library(numDeriv)
x <- 1
b <- 1:3
# Y inverse as a function of x
Yinv <- function(x) solve(diag(3) - x * outer(b, b))
all.equal(matrix(jacobian(Yinv, x = 1), 3),
1/(1 - x*sum(b*b))^2*outer(b, b))
## [1] TRUE

How do I minimize a linear least squares function in R?

I'm reading Deep Learning by Goodfellow et al. and am trying to implement gradient descent as shown in Section 4.5 Example: Linear Least Squares. This is page 92 in the hard copy of the book.
The algorithm can be viewed in detail at https://www.deeplearningbook.org/contents/numerical.html with R implementation of linear least squares on page 94.
I've tried implementing in R, and the algorithm as implemented converges on a vector, but this vector does not seem to minimize the least squares function as required. Adding epsilon to the vector in question frequently produces a "minimum" less than the minimum outputted by my program.
options(digits = 15)
dim_square = 2 ### set dimension of square matrix
# Generate random vector, random matrix, and
set.seed(1234)
A = matrix(nrow = dim_square, ncol = dim_square, byrow = T, rlnorm(dim_square ^ 2)/10)
b = rep(rnorm(1), dim_square)
# having fixed A & B, select X randomly
x = rnorm(dim_square) # vector length of dim_square--supposed to be arbitrary
f = function(x, A, b){
total_vector = A %*% x + b # this is the function that we want to minimize
total = 0.5 * sum(abs(total_vector) ^ 2) # L2 norm squared
return(total)
}
f(x,A,b)
# how close do we want to get?
epsilon = 0.1
delta = 0.01
value = (t(A) %*% A) %*% x - t(A) %*% b
L2_norm = (sum(abs(value) ^ 2)) ^ 0.5
steps = vector()
while(L2_norm > delta){
x = x - epsilon * value
value = (t(A) %*% A) %*% x - t(A) %*% b
L2_norm = (sum(abs(value) ^ 2)) ^ 0.5
print(L2_norm)
}
minimum = f(x, A, b)
minimum
minimum_minus = f(x - 0.5*epsilon, A, b)
minimum_minus # less than the minimum found by gradient descent! Why?
On page 94 of the pdf appearing at https://www.deeplearningbook.org/contents/numerical.html
I am trying to find the values of the vector x such that f(x) is minimized. However, as demonstrated by the minimum in my code, and minimum_minus, minimum is not the actual minimum, as it exceeds minimum minus.
Any idea what the problem might be?
Original Problem
Finding the value of x such that the quantity Ax - b is minimized is equivalent to finding the value of x such that Ax - b = 0, or x = (A^-1)*b. This is because the L2 norm is the euclidean norm, more commonly known as the distance formula. By definition, distance cannot be negative, making its minimum identically zero.
This algorithm, as implemented, actually comes quite close to estimating x. However, because of recursive subtraction and rounding one quickly runs into the problem of underflow, resulting in massive oscillation, below:
Value of L2 Norm as a function of step size
Above algorithm vs. solve function in R
Above we have the results of A %% x followed by A %% min_x, with x estimated by the implemented algorithm and min_x estimated by the solve function in R.
The problem of underflow, well known to those familiar with numerical analysis, is probably best tackled by the programmers of lower-level libraries best equipped to tackle it.
To summarize, the algorithm appears to work as implemented. Important to note, however, is that not every function will have a minimum (think of a straight line), and also be aware that this algorithm should only be able to find a local, as opposed to a global minimum.

PageRank in R. Issue with vectors and how to iterate through adjacency matrix

I have a 500x500 adjacency matrix of 1 and 0, and I need to calculate pagerank for each page. I have a code here, where R is the matrix and T=0.15 is a constant:
n = ncol(R)
B = matrix(1/n, n, n) # the teleportation matrix
A = 0.85 * R + 0.15 * B
ranks = eigen(A)$vectors[1] # my PageRanks
print(ranks)
[1] -0.5317519+0i
I don't have much experience with R, but I assume that the given output is a general pagerank, and I need a pagerank for each page.
Is there a way to construct a table of pageranks with relation to the matrix? I didn't find anything related to my particular case in the web.
Few points:
(1) You need to convert the binary adjacency matrix (R in your case) to a column-stochastic transition matrix to start with (representing probability of transitions between the pages).
(2) A needs to remain as column stochastic as well, then only the dominant eigenvector corresponding to the eigenvalue 1 will be the page rank vector.
(3) To find the first eigenvector of the matrix A, you need use eigen(A)$vectors[,1]
Example with a small 5x5 adjacency matrix R:
set.seed(12345)
R = matrix(sample(0:1, 25, replace=TRUE), nrow=5) # random binary adjacency matrix
R = t(t(R) / rowSums(t(R))) # convert the adjacency matrix R to a column-stochastic transition matrix
n = ncol(R)
B = matrix(1/n, n, n) # the teleportation matrix
A = 0.85 * R + 0.15 * B
A <- t(t(A) / rowSums(t(A))) # make A column-stochastic
ranks = eigen(A)$vectors[,1] # my PageRanks
print(ranks)
# [1] 0.05564937 0.05564937 0.95364105 0.14304616 0.25280990
print(ranks / sum(ranks)) # normalized ranks
[1] 0.03809524 0.03809524 0.65282295 0.09792344 0.17306313

Pointwise multiplication and right matrix division

I'm currently trying to recreate this Matlab function in R:
function X = uniform_sphere_points(n,d)
% X = uniform_sphere_points(n,d)
%
%function generates n points unformly within the unit sphere in d dimensions
z= randn(n,d);
r1 = sqrt(sum(z.^2,2));
X=z./repmat(r1,1,d);
r=rand(n,1).^(1/d);
X = X.*repmat(r,1,d);
Regarding the the right matrix division I installed the pracma package. My R code right now is:
uniform_sphere_points <- function(n,d){
# function generates n points uniformly within the unit sphere in d dimensions
z = rnorm(n, d)
r1 = sqrt(sum(z^2,2))
X = mrdivide(z, repmat(r1,1,d))
r = rnorm(1)^(1/d)
X = X * matrix(r,1,d)
return(X)
}
But it is not really working since I always end with a non-conformable arrays error in R.
This operation for sampling n random points from the d-dimensional unit sphere could be stated in words as:
Construct a n x d matrix with entries drawn from the standard normal distribution
Normalize each row so it has (2-norm) magnitude 1
For each row, compute a random value by taking a draw from the uniform distribution (between 0 and 1) and raise that value to the 1/d power. Multiply all elements in the row by that value.
The following R code does these operations:
unif.samp <- function(n, d) {
z <- matrix(rnorm(n*d), nrow=n, ncol=d)
z * (runif(n)^(1/d) / sqrt(rowSums(z^2)))
}
Note that in the second line of code I have taken advantage of the fact that multiplying a n x d matrix in R by a vector of length n will multiply each row by the corresponding value in that vector. This saves us the work of using repmat to construct matrices of exactly the same size as our original matrix for these sorts of row-specific operations.

Resources