How to code quadratic form both naively and efficiently - r

I'm trying to code a quadratic form Z'(S)^{-1} Z
The code is as below
z <- matrix(rnorm(200 * 100), 200, 100)
S <- cov(z)
quad.naive <- function(z, S) {
Sinv <- solve(S)
rowSums((z %*% Sinv) * z)
}
However, I'm not sure I understand thoroughly the last line of the function
rowSums((z %*% Sinv) * z)
Because naively, we should just type exactly the same as the mathematical formula which is
t(Z) %*% Sinv %*% Z
So, anyone can explain why is the row sums form the same as the naive mathematical form, esp. why after two metrics (z, and Sin) multiplication, it use a element-wise multiply symbol * to times Z, rather than use %*%.
(z %*% Sinv) * z

The following is a bit too long for a comment.
"I'm trying to code a quadratic form Z'(S)^{-1} Z" I don't think the quadratic form is correct.
Assume Z is a m x n matrix. Then:
S = cov(Z) is a n x n matrix
S^-1 is a n x n matrix
t(Z) is a n x m matrix
So Z' S^-1 Z (in R: t(Z) %*% solve(S) %*% Z) would mean multiplying matrices with the following dimensions
(n x m) (n x m) (m x n)
which obviously won't work.
Perhaps you meant Z %*% solve(S) %*% t(Z) which returns a m x m matrix, the diagonal of which is the same as rowSums(Z %*% Sinv * Z).
More fundamentally: Shouldn't the quadratic form be a scalar? Or are you talking about a different quadratic form?

Ok, following our exchange in the comments and the link you gave to the relevant section in the book Advanced Statistical Computing I think I understand what the issue is.
I post this a separate (and real) answer, to avoid confusing future readers who may want to read through the train of thoughts in the comments.
Let's return to the code given in your post (which is copied from section 1.3.3 Multivariate Normal Distribution)
set.seed(2017-07-13)
z <- matrix(rnorm(200 * 100), 200, 100)
S <- cov(z)
quad.naive <- function(z, S) {
Sinv <- solve(S)
rowSums((z %*% Sinv) * z)
}
Considering that the quadratic form is defined as the scalar quantity z' Sigma^-1 z (or in R language t(z) %*% solve(Sigma) %*% z) for a random p × 1 column vector, two questions may arise:
Why is z given as a matrix (instead of a p-dimensional column vector, as stated in the book), and
what is the reason for using rowSums in quad.naive?
First off, keep in mind that the quadratic form is a scalar quantity for a single random multivariate sample. What quad.naive is actually returning is the distribution of the quadratic form in multivariate samples (plural!). z here contains 200 samples from a p = 100-dimensional normal.
Then S is the 100 x 100 covariance matrix, and solve(S) returns the inverse matrix of S. The quantity z %*% Sinv * z (the additional brackets are not necessary due to R's operator precedence) returns the diagonal elements of t(z) %*% solve(Sigma) %*% z for every sample of z as row vectors in a matrix. Taking the rowSums is then the same as taking the trace (i.e. having the quadratic form return a scalar for every sample). Also note that you get the same result with diag(z %*% solve(Sigma) %*% t(z)), but in quad.naive we avoid the double matrix multiplication and additional transposition.
A more fundamental question remains: Why look at the distribution of quadratic forms? It can be shown that the distribution of certain quadratic forms in standard normal variables follows a chi-square distribution (see e.g. Mathai and Provost, Quadratic Forms in Random Variables: Theory and Applications and Normal distribution - Quadratic forms)
Specifically, we can show that the quadratic form (x - μ)' Σ^-1 (x - μ) for a p × 1 column vector is chi-square distributed with p degrees of freedom.
To illustrate this, let's draw 100 samples from a bivariate standard normal, and calculate the quadratic forms for every sample.
set.seed(2020)
nSamples <- 100
z <- matrix(rnorm(nSamples * 2), nSamples, 2)
S <- cov(z)
Sinv <- solve(S)
dquadform <- rowSums(z %*% Sinv * z)
We can visualise the distribution as a histogram and overlay the theoretical chi-square density for 2 degrees of freedom.
library(ggplot2)
bw = 0.2
ggplot(data.frame(x = dquadform), aes(x)) +
geom_histogram(binwidth = bw) +
stat_function(fun = function(x) dchisq(x, df = 2) * nSamples * bw)
Finally, results from a Kolmogorov-Smirnov test comparing the distribution of the quadratic forms with the cumulative chi-square distribution with 2 degrees of freedom lead us to fail to reject the null hypothesis (of the equality of both distributions).
ks.test(dquadform, pchisq, df = 2)
#
# One-sample Kolmogorov-Smirnov test
#
#data: dquadform
#D = 0.063395, p-value = 0.8164
#alternative hypothesis: two-sided

Related

R Derivatives of an Inverse

I have an expression that contains several parts. However, for simplicity, consider only the following part as MWE:
Let's assume we have the inverse of a matrix Y that I want to differentiate w.r.t. x.
Y is given as I - (x * b * t(b)), where I is the identity matrix, x is a scalar, and b is a vector.
According to The Matrix Cookbook Equ. 59, the partial derivative of an inverse is:
Normally I would use the function D from the package stats to calculate the derivatives. But that is not possible in this case, because e.g. solve to specify Y as inverse and t() is not in the table of derivatives.
What is the best workaround to circumvent this problem? Are there any other recommended packages that can handle such input?
Example that doesn't work:
f0 <- expression(solve(I - (x * b %*% t(b))))
D(f0, "x")
Example that works:
f0 <- expression(x^3)
D(f0, "x")
3 * x^2
I assume that the question is how to get an explicit expression for the derivative of the inverse of Y with respect to x. In the first section we compute it and in the second section we double check it by computing it numerically and show that the two approaches give the same result.
b and the null space of b are both eigenspaces of Y which we can readily verify by noting that Yb = (1-(b'b)x)b and if z belongs to the nullspace of b then Yz = z. This also shows that the corresponding eigenvalues are 1 - x(b'b) with multiplicity 1 and 1 with multiplicity n-1 (since the nullspace of b has that dimension).
As a result of the fact that we can expand such a matrix into the sum of each eigenvalue times the projection onto its eigenspace we can express Y as the following where bb'/b'b is the projection onto the eigenspace spanned by b and the part pre-multiplying it is the eigenvalue. The remaining terms do not involve x because they involve an eigenvalue of 1 independently of x and the nullspace of b is independent of x as well.
Y = (1-x(b'b))(bb')/(b'b) + terms not involving x
The inverse of Y is formed by taking the reciprocals of the eigenvalues so:
Yinv = 1/(1-x(b'b)) * (bb')/(b'b) + terms not involving x
and the derivative of that wrt x is:
(b'b) / (1 - x(b'b))^2 * (bb')/(b'b)
Cancelling the b'b and writing the derivative in terms of R code:
1/(1 - x*sum(b*b))^2*outer(b, b)
Double check
Using specific values for b and x we can verify it against the numeric derivative as follows:
library(numDeriv)
x <- 1
b <- 1:3
# Y inverse as a function of x
Yinv <- function(x) solve(diag(3) - x * outer(b, b))
all.equal(matrix(jacobian(Yinv, x = 1), 3),
1/(1 - x*sum(b*b))^2*outer(b, b))
## [1] TRUE

Faster alternative to R car::Anova for sum of square crossproduct matrix calculation for subsets of predictors

I need to compute the sum of squares crossproduct matrix (indeed the trace of this matrix) in a multivariate linear model, with Y (n x q) and X (n x p). Standard R code for doing that is:
require(MASS)
require(car)
# Example data
q <- 10
n <- 1000
p <- 10
Y <- mvrnorm(n, mu = rep(0, q), Sigma = diag(q))
X <- as.data.frame(mvrnorm(n, mu = rnorm(p), Sigma = diag(p)))
# Fit lm
fit <- lm( Y ~ ., data = X )
# Type I sums of squares
summary(manova(fit))$SS
# Type III sums of squares
type = 3 # could be also 2 (II)
car::Anova(fit, type = type)$SSP
This has to be done thousands of times, unfortunately, it gets slow when the number of predictors is relatively large. As often I am interested only in a subset of s predictors, I tried to re-implement this calculation. Although my implementation directly translating linear algebra for s = 1 (below) is faster for small sample sizes (n),
# Hat matrix (X here stands for the actual design matrix)
H <- tcrossprod(tcrossprod(X, solve(crossprod(X))), X)
# Remove predictor of interest (e.g. 2)
X.r <- X[, -2]
H1 <- tcrossprod(tcrossprod(X.r, solve(crossprod(X.r))), X.r)
# Compute e.g. type III sum of squares
SS <- crossprod(Y, H - H1) %*% Y
car still goes faster for large n:
I already tried Rcpp implementation which much success, as these matrix products in R already use a very efficient code.
Any hint on how to do this faster?
UPDATE
After reading the answers, I tried the solution proposed in this post which relies on QR/SVD/Cholesky factorization for hat matrix calculation. However it seems that car::Anova is still faster to compute all p = 30 matrices than me computing just one (s = 1)!! for e.g. n = 5000, q = 10:
Unit: milliseconds
expr min lq mean median uq max neval
ME 1137.5692 1202.9888 1257.8979 1251.6834 1318.9282 1398.9343 10
QR 1005.9082 1031.9911 1084.5594 1037.5659 1095.7449 1364.9508 10
SVD 1026.8815 1065.4629 1152.6631 1087.9585 1241.4977 1446.8318 10
Chol 969.9089 1056.3093 1115.9608 1102.1169 1210.7782 1267.1274 10
CAR 205.1665 211.8523 218.6195 214.6761 222.0973 242.4617 10
UPDATE 2
The best solution for now was to go over the car::Anova code (i.e. functions car:::Anova.III.mlm and subsequently car:::linearHypothesis.mlm) and re-implement them to account for a subset of predictors, instead of all of them.
The relevant code by car is as follows (I skipped checks, and simplified a bit):
B <- coef(fit) # Model coefficients
M <- model.matrix(fit) # Model matrix M
V <- solve(crossprod(M)) # M'M
p <- ncol(M) # Number of predictors in M
I.p <- diag(p) # Identity (p x p)
terms <- labels(terms(fit)) # terms (add intercept)
terms <- c("(Intercept)", terms)
n.terms <- length(terms)
assign <- fit$assign # assignation terms <-> p variables
SSP <- as.list(rep(0, n.terms)) # Initialize empty list for sums of squares cross-product matrices
names(SSP) <- terms
for (term in 1:n.terms){
subs <- which(assign == term - 1)
L <- I.p[subs, , drop = FALSE]
SSP[[term]] <- t(L %*% B) %*% solve(L %*% V %*% t(L)) %*% (L %*% B)
}
Then it is just a matter of selecting the subset of terms.
This line and the similar one below it for H1 could probably be improved:
H <- tcrossprod(tcrossprod(X, solve(crossprod(X))), X)
The general idea is that you should rarely use solve(Y) %*% Z, because it is the same as solve(Y, Z) but slower. I haven't fully expanded your tcrossprod calls to see what the best equivalent formulation of the expressions for H and H1 would be.
You could also look at this question https://stats.stackexchange.com/questions/139969/speeding-up-hat-matrices-like-xxx-1x-projection-matrices-and-other-as for a description of doing it via QR decomposition.

How can I find LGCP random field Lambda values in overall area?

There is a rLGCP model example in the RandomField package.
if(require(RandomFields)) {
# homogeneous LGCP with exponential covariance function
X <- rLGCP("exp", 3, var=0.2, scale=.1)
# inhomogeneous LGCP with Gaussian covariance function
m <- as.im(function(x, y){5 - 1.5 * (x - 0.5)^2 + 2 * (y - 0.5)^2}, W=owin())
X <- rLGCP("gauss", m, var=0.15, scale =0.5)
plot(attr(X, "Lambda"))
points(X)
}
I think that the Lambda attribute of X does not show the overall values in the overall two dimensional area.
How can I find the overall Lambda values in overall area?
I'm not entirely sure if this is what you are looking for, but the matrix of values of Lambda for each point in the plot are stored in the Lambda attribute of the model created by spatstat::rLGCP.
You can access them like this:
m <- as.im(function(x, y){5 - 1.5 * (x - 0.5)^2 + 2 * (y - 0.5)^2}, W=owin())
X <- rLGCP("gauss", m, var=0.15, scale = 0.5)
lambda_matrix <- attr(X, "Lambda")$v
Now lambda_matrix is a 128 x 128 matrix containing the value of Lambda at each point on the grid.

Solve indeterminate equation system in R

I have a equation system and I want to solve it using numerical methods. I want to get a close solution given a starting seed. Let me explain.
I have a vector of constants ,X, of values:
X <- (c(1,-2,3,4))
and a vector W of weights:
W <- (c(0.25,0.25,0.25,0.25))
I want that the sum of the components of W will be (sum(W)=1), and the sum of the multiplication of X and W element by element will be a given number N (sum(W*X)=N).
Is there a easy way to do this in R? I have it in Excel, using Solver, but I need to automatize it.
Here is your constant and your target value:
x <- c(1, -2, 3, 4)
n <- 10
You need a function to minimize. The first line contains each of your conditions, and the second line provides a measure of how to combine the errors into a single score. You may want to change the second line. For example, you could make one error term be more heavily weighted than the other using sum(c(1, 5) * errs ^ 2).
fn <- function(w)
{
errs <- c(sum(w) - 1, sum(x * w) - n)
sum(errs ^ 2)
}
The simplest thing is to start with all the weights the same value.
init_w <- rep.int(1 / length(x), length(x))
Use optim to optimize.
optim(init_w, fn)
## $par
## [1] 0.1204827 -1.2438883 1.1023338 1.0212406
##
## $value
## [1] 7.807847e-08
##
## $counts
## function gradient
## 111 NA
##
## $convergence
## [1] 0
##
## $message
## NULL
The par element contains your weights.
There is no unique solution for this problem. If you try other initial values for w you will most likely get different results from optim.
The problem can be formulated as solving an underdetermined system of linear equations.
A <- matrix(c(rep(1,4),x), nrow=2,byrow=TRUE)
b <- matrix(c(1,n), nrow=2)
We seek a solution that satisfies A %*% w = b but which one? Minimum norm solution? Or maybe some other one? There are infinitely many solutions. Solutions can be given using the pseudo-inverse of the matrix A. Use package MASS for this.
library(MASS)
Ag <- ginv(A)
The minimum norm solution is
wmnorm <- Ag %*% b
And check with A %*% wmnorm - b and fn(wmnorm).
See the Wikipedia page System of linear equations
the section Matrix solutions.
The solutions are given by
Az <- diag(nrow=nrow(Ag)) - Ag %*% A
w <- wmnorm + Az %*% z
where z is an arbitrary vector of ncol(Az) elements.
And now generate some solutions and check
xb <- wmnorm
z <- runif(4)
wsol.2 <- xb + Az %*% z
wsol.2
A %*% wsol.2 - b
fn(wsol.2)
z <- runif(4)
wsol.3 <- xb + Az %*% z
wsol.3
A %*% wsol.2 - b
fn(wsol.3)
And you'll see that these two solutions are valid solutions when given as argument to fn. And are quite different from the solution found by optim. You could test this by choosing a different starting point init_w for example by init_w1 <- runif(4)/4.

Applying conjugate gradient method to find optimal weight vector for portfolios

I have sample data of 10 stocks.
a<-replicate(10,rnorm(100))
cov.Rt<-cov(a)
I am trying to find a weight vector that minimizes the variance of returns.
To do this I have a starting weight vector W1 and I am trying to find the weight vector that induces the maximum marginal rate of decrease in variance.
Using the conjugate gradient method I am trying to find a vector W2 subject to the condition that it is conjugate with respect to
W1 %*% cov.Rt %*% W2 = 0.
I tried the following:
First I find the first conjugate portfolio wcg1:
mat1<-rep(-1,N-1)
mat2<-diag(1,N-1,N-1)
mat<-rbind(mat1,mat2)
rownames(mat)<-NULL
Z<- mat %*% solve(t(mat) %*% mat) %*% t(mat)
e<-rep(1,10)
cov.Rt<-cov(a)
wcg1<- as.vector(-Z%*% t(Z) %*% cov.Rt %*% e/N)
then using optim()
n.iters=1
fn<-function(x,cov.Rt){as.numeric(t(x)%*%cov.Rt%*%x)}
wc<-matrix(NA,ncol=10,nrow=n.iters+1)
wc[1,]<-wcg1
for(i in n.iters) wc[i+1,]<-optim(wc[i,],fn, cov.Rt=cov.Rt)$par
DeMiguel et al. (http://faculty.london.edu/avmiguel/DeMiguelGarlappiNogalesUppalMS.pdf) write on page ec4 (http://pubsonline.informs.org/doi/suppl/10.1287/mnsc.1080.0986/suppl_file/mnsc.1080.0986-sm-ec.pdf) that if we iterate this process N-1 times I obtain a portfolio weight vector that is identical to the minimum variance portfolio. Where N is the length of the weight vector.
However, I do not get the same results to the minimum variance portfolio.
What is wrong with my approach?
Does the method I am using fulfil the condition the condition that the weight vector that is outputed by optim() is conjugate with respect to the starting vector wcg1 (i.e. W1 %*% cov.Rt %*% W2 = 0)?
First, from a pure financial perspective, the risk you want to minimize on the portfolio will be the total variance which writes through the function:
fn<-function(x, cov.Rt) as.numeric(t(x) %*% cov.Rt %*% x)
Back to code, the first argument of fn must be the parameter you change with optim .... so x.
You then specify cov.Rt argument in optim:
optim(w1, fn, cov.Rt=cov.rt)

Resources