regarding the cholesky decomposition of a crossproduct of a matrix - r

With respect to the following R implementation
Y %*% solve(chol(crossprod(Y)))
I can see it aims to perform cholesky decomposition over the Y'Y, and then multiplied by Y again.
What is it used for in the data processing? I do not quite understand the underlying mechanism.

Related

Is Eigen library perform gaussian reduction in order to solve homogeneous system?

I used to perform Cholesky/LU decompostion to solve linear problem with Eigen but now I have to solve an Linear homogeneous system (the right hand side of my linear system is the null vector). I have to do an Gaussian reduction on my square Matrix in order to find a space of solution, but I can not find any gaussian reduction algorithm on Eigen's documentation. So Is this any Gaussian reduction algorithm on Eigen ?
If you have the eigen decomposition of the coefficient matrix A, the solution for the homogeneous system will be any vector that can be written as a linear combination of eigenvectors associated with eigenvalue 0.
The function eig gives you the eigen decomposition of a matrix. Numerical errors will result in the eigenvectors not being exact so you simply choose the eigenvector with smallest magnitude, and you solve the least squares problem this way.
So your problem boils down to
w, v = np.linalg.eig(A)
x = v[:,np.argmin(abs(v))]
Then A # x is approximately the null vector.

how to generate a zero-mean Gaussian random vector ,which have correlation matrices whose eigenvalues are exponentially distributed

1.This is a question from paper "Fast Generalized Eigenvector Tracking Based on the Power Method".
2.The author wrote "We generate two zero-mean Gaussian random vectors ,which have correlation matrices A and B whose eigenvalues are exponentially distributed".
3.But how to generate a zero-mean Gaussian random vector ,which have correlation matrices whose eigenvalues are exponentially distributed ,this confused me almost a week.
4.It seems that we could only use randn in MATLAB to generate random vector,
so the problem is how to make sure correlation matrices whose eigenvalues exponentially distributed at the same time?
Let S be a positive definite matrix. Therefore S has a Cholesky decomposition L.L' = S where L is a lower-triangular matrix and ' denotes the matrix transpose and . denotes matrix multiplication. Let x be drawn from a Gaussian distribution with mean zero and covariance equal to the identity matrix. Then y = L.x has a Gaussian distribution with mean zero and covariance S.
So if you can find suitable covariance matrices A and B, you can use their Cholesky decompositions to generate samples. Now about constructing a matrix which has eigenvalues following a given distribution. My advice is to start with a list of samples from an exponential distribution; these will be your eigenvalues. Let E = a matrix with the exponential samples on the diagonal and zeros otherwise. Let U be any unitary matrix (i.e. columns are orthogonal and norm of each column is 1). Then U.E.U' is a positive definite matrix with the specified eigenvalues.
U can be any unitary matrix. In particular U can be the identity matrix. That might make everything else simpler; you'll have to verify whether U = identity is workable for the problem you're working on.

Generalized linear model fit with constraints and poisson error

I am currently trying to fit a linear model to count data, where the errors are following a poisson distribution. Namely I would like to minimize the following
where I have i samples. β is a vector with m coefficients and x is consisting of m independent (explanatory) variables. β should sum up to 1 and each coefficient should be larger than 0.
I am using R and I tried the package glmc without much success. The only example in the documentation is only confusing me, as I don't get how the constraint matrix Amat is enforcing a constraint on the coefficients. Is there any other example I could have a look at or another package?
I also tried solving this analytically with medium success.
Any help is appreciated!
kind regards, Lena

Extracting the (i) estimated variance-covariance matrix of random effects and/or (ii) mixed model equation solutions from lme4 as matrices?

As the title says I am trying to extract matrices from an lme4 (or other packages?) object. To make clear what I want precisely I think it is easiest to refer to the SAS documentation: https://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_mixed_sect022.htm
Variance-covariance matrix of random effects
In SAS notation this matrix is called G and is the variance-covariance matrix of the random effect parameter gamma. By using the option "G" in PROC MIXED and the Output Delivery System you obtain G as a matrix.
I am aware that it is relatively simple to construct this matrix manually once I have the variance components and dimensions of gamma. I nevertheless expected there to be an even simpler way.
Mixed model equations solution
In SAS notation these are called C.
By using the option "MMEQSOL" in PROC MIXED and the Output Delivery System you request that a solution to the mixed model equations be produced, as well as the inverted coefficients matrix. It is the latter that I am interested in.
Thanks in advance!
Not a very sensible model (see ?lme4::cake), but reasonable for illustration:
library(lme4)
fm1 <- lmer(angle ~ temperature +
(1|recipe)+(1|replicate), cake)
The VarCorr() method gives a list of variance-covariance matrices for each term (in this case each one is 1x1), with its own print method:
v <- VarCorr(fm1)
You can combine these into a single matrix by using the bdiag() (block-diagonal) function from Matrix (as.matrix() converts from a sparse matrix to a standard (dense) R matrix object).
as.matrix(Matrix::bdiag(v))
## [,1] [,2]
## [1,] 39.21541 0.0000000
## [2,] 0.00000 0.4949681
The C matrix is unfortunately not so easy to get. As discussed in vignette("lmer",package="lme4"), lme4 doesn't use the Henderson-equation formulation. The upper block of C (variance-covariance matrix of fixed effects) is accessible via vcov(), but the variance-covariance matrix of the variances is not so easy: see e.g. here.

Kernel SVM, select important features in R

I have a matrix with 1000*700 dimension with binary column as classification. I have used Ksvm in kernlab to fit a polynomial model on my data. ROC and precession-Recall curve showed the polynomial model works better than others. I am interested to extract the most important features that determine the model sensitivity and specificity. General speaking, what are the most important features in model?
So I am using the following command to extract the features but I am not sure if I am right.
svp <- ksvm(as.matrix(xtrain),as.factor(ytrain),type="C-svc",kernel="poly")
weight <- colSums(coef(svp)[[1]] * xtrain[SVindex(svp),])
w <- sort( weight, decreasing=T)
w <- data.frame(w)
So the top max weight are the most important features.
Please let me know if I am right or wrong?
Thanks
Morteza

Resources