R Derivatives of an Inverse - r

I have an expression that contains several parts. However, for simplicity, consider only the following part as MWE:
Let's assume we have the inverse of a matrix Y that I want to differentiate w.r.t. x.
Y is given as I - (x * b * t(b)), where I is the identity matrix, x is a scalar, and b is a vector.
According to The Matrix Cookbook Equ. 59, the partial derivative of an inverse is:
Normally I would use the function D from the package stats to calculate the derivatives. But that is not possible in this case, because e.g. solve to specify Y as inverse and t() is not in the table of derivatives.
What is the best workaround to circumvent this problem? Are there any other recommended packages that can handle such input?
Example that doesn't work:
f0 <- expression(solve(I - (x * b %*% t(b))))
D(f0, "x")
Example that works:
f0 <- expression(x^3)
D(f0, "x")
3 * x^2

I assume that the question is how to get an explicit expression for the derivative of the inverse of Y with respect to x. In the first section we compute it and in the second section we double check it by computing it numerically and show that the two approaches give the same result.
b and the null space of b are both eigenspaces of Y which we can readily verify by noting that Yb = (1-(b'b)x)b and if z belongs to the nullspace of b then Yz = z. This also shows that the corresponding eigenvalues are 1 - x(b'b) with multiplicity 1 and 1 with multiplicity n-1 (since the nullspace of b has that dimension).
As a result of the fact that we can expand such a matrix into the sum of each eigenvalue times the projection onto its eigenspace we can express Y as the following where bb'/b'b is the projection onto the eigenspace spanned by b and the part pre-multiplying it is the eigenvalue. The remaining terms do not involve x because they involve an eigenvalue of 1 independently of x and the nullspace of b is independent of x as well.
Y = (1-x(b'b))(bb')/(b'b) + terms not involving x
The inverse of Y is formed by taking the reciprocals of the eigenvalues so:
Yinv = 1/(1-x(b'b)) * (bb')/(b'b) + terms not involving x
and the derivative of that wrt x is:
(b'b) / (1 - x(b'b))^2 * (bb')/(b'b)
Cancelling the b'b and writing the derivative in terms of R code:
1/(1 - x*sum(b*b))^2*outer(b, b)
Double check
Using specific values for b and x we can verify it against the numeric derivative as follows:
library(numDeriv)
x <- 1
b <- 1:3
# Y inverse as a function of x
Yinv <- function(x) solve(diag(3) - x * outer(b, b))
all.equal(matrix(jacobian(Yinv, x = 1), 3),
1/(1 - x*sum(b*b))^2*outer(b, b))
## [1] TRUE

Related

How to code quadratic form both naively and efficiently

I'm trying to code a quadratic form Z'(S)^{-1} Z
The code is as below
z <- matrix(rnorm(200 * 100), 200, 100)
S <- cov(z)
quad.naive <- function(z, S) {
Sinv <- solve(S)
rowSums((z %*% Sinv) * z)
}
However, I'm not sure I understand thoroughly the last line of the function
rowSums((z %*% Sinv) * z)
Because naively, we should just type exactly the same as the mathematical formula which is
t(Z) %*% Sinv %*% Z
So, anyone can explain why is the row sums form the same as the naive mathematical form, esp. why after two metrics (z, and Sin) multiplication, it use a element-wise multiply symbol * to times Z, rather than use %*%.
(z %*% Sinv) * z
The following is a bit too long for a comment.
"I'm trying to code a quadratic form Z'(S)^{-1} Z" I don't think the quadratic form is correct.
Assume Z is a m x n matrix. Then:
S = cov(Z) is a n x n matrix
S^-1 is a n x n matrix
t(Z) is a n x m matrix
So Z' S^-1 Z (in R: t(Z) %*% solve(S) %*% Z) would mean multiplying matrices with the following dimensions
(n x m) (n x m) (m x n)
which obviously won't work.
Perhaps you meant Z %*% solve(S) %*% t(Z) which returns a m x m matrix, the diagonal of which is the same as rowSums(Z %*% Sinv * Z).
More fundamentally: Shouldn't the quadratic form be a scalar? Or are you talking about a different quadratic form?
Ok, following our exchange in the comments and the link you gave to the relevant section in the book Advanced Statistical Computing I think I understand what the issue is.
I post this a separate (and real) answer, to avoid confusing future readers who may want to read through the train of thoughts in the comments.
Let's return to the code given in your post (which is copied from section 1.3.3 Multivariate Normal Distribution)
set.seed(2017-07-13)
z <- matrix(rnorm(200 * 100), 200, 100)
S <- cov(z)
quad.naive <- function(z, S) {
Sinv <- solve(S)
rowSums((z %*% Sinv) * z)
}
Considering that the quadratic form is defined as the scalar quantity z' Sigma^-1 z (or in R language t(z) %*% solve(Sigma) %*% z) for a random p × 1 column vector, two questions may arise:
Why is z given as a matrix (instead of a p-dimensional column vector, as stated in the book), and
what is the reason for using rowSums in quad.naive?
First off, keep in mind that the quadratic form is a scalar quantity for a single random multivariate sample. What quad.naive is actually returning is the distribution of the quadratic form in multivariate samples (plural!). z here contains 200 samples from a p = 100-dimensional normal.
Then S is the 100 x 100 covariance matrix, and solve(S) returns the inverse matrix of S. The quantity z %*% Sinv * z (the additional brackets are not necessary due to R's operator precedence) returns the diagonal elements of t(z) %*% solve(Sigma) %*% z for every sample of z as row vectors in a matrix. Taking the rowSums is then the same as taking the trace (i.e. having the quadratic form return a scalar for every sample). Also note that you get the same result with diag(z %*% solve(Sigma) %*% t(z)), but in quad.naive we avoid the double matrix multiplication and additional transposition.
A more fundamental question remains: Why look at the distribution of quadratic forms? It can be shown that the distribution of certain quadratic forms in standard normal variables follows a chi-square distribution (see e.g. Mathai and Provost, Quadratic Forms in Random Variables: Theory and Applications and Normal distribution - Quadratic forms)
Specifically, we can show that the quadratic form (x - μ)' Σ^-1 (x - μ) for a p × 1 column vector is chi-square distributed with p degrees of freedom.
To illustrate this, let's draw 100 samples from a bivariate standard normal, and calculate the quadratic forms for every sample.
set.seed(2020)
nSamples <- 100
z <- matrix(rnorm(nSamples * 2), nSamples, 2)
S <- cov(z)
Sinv <- solve(S)
dquadform <- rowSums(z %*% Sinv * z)
We can visualise the distribution as a histogram and overlay the theoretical chi-square density for 2 degrees of freedom.
library(ggplot2)
bw = 0.2
ggplot(data.frame(x = dquadform), aes(x)) +
geom_histogram(binwidth = bw) +
stat_function(fun = function(x) dchisq(x, df = 2) * nSamples * bw)
Finally, results from a Kolmogorov-Smirnov test comparing the distribution of the quadratic forms with the cumulative chi-square distribution with 2 degrees of freedom lead us to fail to reject the null hypothesis (of the equality of both distributions).
ks.test(dquadform, pchisq, df = 2)
#
# One-sample Kolmogorov-Smirnov test
#
#data: dquadform
#D = 0.063395, p-value = 0.8164
#alternative hypothesis: two-sided

How do I minimize a linear least squares function in R?

I'm reading Deep Learning by Goodfellow et al. and am trying to implement gradient descent as shown in Section 4.5 Example: Linear Least Squares. This is page 92 in the hard copy of the book.
The algorithm can be viewed in detail at https://www.deeplearningbook.org/contents/numerical.html with R implementation of linear least squares on page 94.
I've tried implementing in R, and the algorithm as implemented converges on a vector, but this vector does not seem to minimize the least squares function as required. Adding epsilon to the vector in question frequently produces a "minimum" less than the minimum outputted by my program.
options(digits = 15)
dim_square = 2 ### set dimension of square matrix
# Generate random vector, random matrix, and
set.seed(1234)
A = matrix(nrow = dim_square, ncol = dim_square, byrow = T, rlnorm(dim_square ^ 2)/10)
b = rep(rnorm(1), dim_square)
# having fixed A & B, select X randomly
x = rnorm(dim_square) # vector length of dim_square--supposed to be arbitrary
f = function(x, A, b){
total_vector = A %*% x + b # this is the function that we want to minimize
total = 0.5 * sum(abs(total_vector) ^ 2) # L2 norm squared
return(total)
}
f(x,A,b)
# how close do we want to get?
epsilon = 0.1
delta = 0.01
value = (t(A) %*% A) %*% x - t(A) %*% b
L2_norm = (sum(abs(value) ^ 2)) ^ 0.5
steps = vector()
while(L2_norm > delta){
x = x - epsilon * value
value = (t(A) %*% A) %*% x - t(A) %*% b
L2_norm = (sum(abs(value) ^ 2)) ^ 0.5
print(L2_norm)
}
minimum = f(x, A, b)
minimum
minimum_minus = f(x - 0.5*epsilon, A, b)
minimum_minus # less than the minimum found by gradient descent! Why?
On page 94 of the pdf appearing at https://www.deeplearningbook.org/contents/numerical.html
I am trying to find the values of the vector x such that f(x) is minimized. However, as demonstrated by the minimum in my code, and minimum_minus, minimum is not the actual minimum, as it exceeds minimum minus.
Any idea what the problem might be?
Original Problem
Finding the value of x such that the quantity Ax - b is minimized is equivalent to finding the value of x such that Ax - b = 0, or x = (A^-1)*b. This is because the L2 norm is the euclidean norm, more commonly known as the distance formula. By definition, distance cannot be negative, making its minimum identically zero.
This algorithm, as implemented, actually comes quite close to estimating x. However, because of recursive subtraction and rounding one quickly runs into the problem of underflow, resulting in massive oscillation, below:
Value of L2 Norm as a function of step size
Above algorithm vs. solve function in R
Above we have the results of A %% x followed by A %% min_x, with x estimated by the implemented algorithm and min_x estimated by the solve function in R.
The problem of underflow, well known to those familiar with numerical analysis, is probably best tackled by the programmers of lower-level libraries best equipped to tackle it.
To summarize, the algorithm appears to work as implemented. Important to note, however, is that not every function will have a minimum (think of a straight line), and also be aware that this algorithm should only be able to find a local, as opposed to a global minimum.

Julia error using convex package with diagind function

I'm trying to solve the problem
d = 0.5 * ||X - \Sigma||_{Frobenius Norm} + 0.01 * ||XX||_{1},
where X is a symmetric positive definite matrix, and all the diagnoal element should be 1. XX is same with X except the diagonal matrix is 0. \Sigma is known, I want minimum d with X.
My code is as following:
using Convex
m = 5;
A = randn(m, m);
x = Semidefinite(5);
xx=x;
xx[diagind(xx)].=0;
obj=vecnorm(A-x,2)+sumabs(xx)*0.01;
pro= minimize(obj, [x >= 0]);
pro.constraints+=[x[diagind(x)].=1];
solve!(pro)
MethodError: no method matching diagind(::Convex.Variable)
I just solve the optimal problem by constrain the diagonal elements in matrix, but it seems diagind function could not work here, How can I solve the problem.
I think the following does what you want:
m = 5
Σ = randn(m, m)
X = Semidefinite(m)
XX = X - diagm(diag(X))
obj = 0.5 * vecnorm(X - Σ, 2) + 0.01 * sum(abs(XX))
constraints = [X >= 0, diag(X) == 1]
pro = minimize(obj, constraints)
solve!(pro)
For the types of operations:
diag extracts the diagonal of a matrix, as a vector
diagm constructs a diagonal matrix out of a vector
So, to have XX be X with zero diagonal, we subtract the diagonal of X from it. And to constrain X having diagonal 1, we compare its diagonal with 1, using ==.
It is a good idea to keep immutable values as far as possible, instead of trying to modify things. I don't know whether Convex even supports that.

Pointwise multiplication and right matrix division

I'm currently trying to recreate this Matlab function in R:
function X = uniform_sphere_points(n,d)
% X = uniform_sphere_points(n,d)
%
%function generates n points unformly within the unit sphere in d dimensions
z= randn(n,d);
r1 = sqrt(sum(z.^2,2));
X=z./repmat(r1,1,d);
r=rand(n,1).^(1/d);
X = X.*repmat(r,1,d);
Regarding the the right matrix division I installed the pracma package. My R code right now is:
uniform_sphere_points <- function(n,d){
# function generates n points uniformly within the unit sphere in d dimensions
z = rnorm(n, d)
r1 = sqrt(sum(z^2,2))
X = mrdivide(z, repmat(r1,1,d))
r = rnorm(1)^(1/d)
X = X * matrix(r,1,d)
return(X)
}
But it is not really working since I always end with a non-conformable arrays error in R.
This operation for sampling n random points from the d-dimensional unit sphere could be stated in words as:
Construct a n x d matrix with entries drawn from the standard normal distribution
Normalize each row so it has (2-norm) magnitude 1
For each row, compute a random value by taking a draw from the uniform distribution (between 0 and 1) and raise that value to the 1/d power. Multiply all elements in the row by that value.
The following R code does these operations:
unif.samp <- function(n, d) {
z <- matrix(rnorm(n*d), nrow=n, ncol=d)
z * (runif(n)^(1/d) / sqrt(rowSums(z^2)))
}
Note that in the second line of code I have taken advantage of the fact that multiplying a n x d matrix in R by a vector of length n will multiply each row by the corresponding value in that vector. This saves us the work of using repmat to construct matrices of exactly the same size as our original matrix for these sorts of row-specific operations.

plotting matrix equation in R

I'm new to R and I need to plot the quadratic matrix equation:
x^T A x + b^T x + c = 0
in R^2, with A being a 2x2, b a 2x1, and c a constant. The equation is for a boundary that defines classes of points. I need to plot that boundary for x0 = -6...6, x1 = -4...6. My first thought was generate a bunch of points and see where they are zero, but it depends on the increment between the numbers (most likely I'm not going guess what points are zero).
Is there a better way than just generating a bunch of points and seeing where it is zero or multiplying it out? Any help would be much appreciated,
Thank you.
Assuming you have a symmetric matrix A,
eg
# A = | a b/2 |
# | b/2 c |
and your equation represents a conic section, you can use the conics package
What you need is a vector of coefficients c(a,b,c,d,e,f) representing
a.x^2 + b*x*y + c*y^2 + d*x + e*y + f
In your case, say you have
A <- matrix(c(2,1,1,2))
B <- c(-20,-28)
C <- 10
# create the vector
v <- append(c(diag(A),B,C),A[lower.tri(A)]*2), 1)
conicPlot(v)
You could easily wrap the multiplication out into a simple function
# note this does no checking for symmetry or validity of arguments
expand.conic <- function(A, B, C){
append(c(diag(A),B,C),A[lower.tri(A)]*2), 1)
}

Resources