Inaccuracies w/ prcomp? R lang PCA for eigenfaces - r

My question is: in the case of having a matrix we want to do PCA on, where the number of features greatly outnumbers the number of trials, why doesn't prcomp behave as expected (or am I missing something)?
Below is a summary of the issue, full code is here, compressed 7MB data source is here (is 55MB uncompressed), target image is here.
My exact situation is that I have a matrix p by n matrix X (p = features, n = trials) where the trials are photos taken of faces, and the features are the pixels in the photos (so a 32256 by 148 matrix). What I want to do is find the principal component score vectors of that matrix. Since finding the covariance matrix XX^T is too expensive, an easy solution is to find the eigenvectors (v_i) of X^TX and transform them by X (Xv_i) more info.
XTX <- t(X) %*% X # missing the 1 / n - 1 for cov matrix b/c we normalize later anyway
eigen <- eigen(XTX)
eigenvectors.XTX.col <- eigen$vectors
principal.component.scores <- apply(eigenvectors.XTX.col, 2, function(c) {
normalize.vector(X %*% matrix(c, ncol = 1))
})
The principal component scores are eigenfaces in my case, and can be used to successfully reconstruct the target face as seen here: http://cl.ly/image/260w0N0u0Z3y (refer to my full code for how)
Passing X to prcomp should do something equivalent, but has a different result than the above homegrown way:
pca <- prcomp(X)
pca$x # right size, but wrong pc scores
The result of using pca$x in reconstructing the face is not total crap, but much worse: http://cl.ly/image/2p19360u2P43
I also checked that using prcomp on t(x) yielded a different rotation matrix, so prcomp is doing something fancy, but something mysterious under the hood. I know from here that prcomp is using SVD to calculate the principal component loading vectors instead of eigen decomposition, but that should not be leading to any errors here (or so I think...).
What is the correct way of using the built in prcomp method, there must be a way, right?

Wow, the answer is not a fun one at all, and rather has to do with default parameters in the prcomp method:
To solve this issue, first, I looked at the R source of prcomp and saw that the rotation matrix should equal svd(X)$v. Checking this on the R command line proved that with my X (data here) it did not. This is because even though there is the default param scale = F to prcomp, prcomp will still run the R scale method, if only to center the matrix, which is default True as seen here. In my case, this is bad because I passed the data as already centered (subtracted mean image).
So, rerunning with prcomp(X, center = F) will yield a rotation matrix equal to svd(X)$v as expected. From this point forward, the only "mistake" prcomp makes when constructing prcomp(X, center = F)$x will be to not normalize the columns, so they are each only off by a scalar multiple from the principal.component.scores matrix I reference above in my code. Without normalizing prcomp(X, center = F)$x the results are better, but not quite great as seen here:
http://cl.ly/image/3u2y3m1h2S0o
But after normalizing via pca.x.norm <- apply(pca$x, 2, normalize.vector) the results of prcomp in reconstructing the face is identical:
http://cl.ly/image/24390O3x0A0x
tl;dr - prcomp unexpectedly centers the data even with the param scale = F, plus for the purposes of eigenfaces you will need to normalize the columns of prcomp(X, center = F)$x, then everything will work as desired!

Related

PCA : eigen values vs eigen vectors vs loadings in python vs R?

I am trying to calculate PCA loadings of a dataset. The more I read about it, the more I get confused because "loadings" is used differently at many places.
I am using sklearn.decomposition in python for PCA analysis as well as R (using factomineR and factoextra libraries) as it provides easy visualization techniques. The following is my understanding:
pca.components_ give us the eigen vectors. They give us the directions of maximum variation.
pca.explained_variance_ give us the eigen values associated with the eigen vectors.
eigenvectors * sqrt(eigen values) = loadings which tell us how principal components (pc's) load the variables.
Now, what I am confused by is:
Many forums say that eigen vectors are the loadings. Then, when we multiply the eigen vectors by the sqrt(eigen values) we just get the strength of association. Others say eigenvectors * sqrt(eigen values) = loadings.
Eigen vectors squared tells us the contribution of variable to pc? I believe this is equivalent to var$contrib in R.
loading squared (eigen vector or eigenvector*sqrt(eigenvalue) I don't know which one) shows how well a pc captures a variable (closer to 1 = variable better explained by a pc). Is this equivalent of var$cos2 in R? If not what is cos2 in R?
Basically I want to know how to understand how well a principal component captures a variable and what is the contribution of a variable to a pc. I think they both are different.
What is pca.singular_values_? It is not clear from the documentation.
These first and second links that I referred which contains R code with explanation and the statsexchange forum that confused me.
Okay, after much research and going through many papers I have the following,
pca.components_ = eigen vectors. Take a transpose so that pc's are columns and variables are rows.
1.a: eigenvector**2 = variable contribution in principal components. If it's close to 1 then a particular pc is well explained by that variable.
In python -> (pow(pca.components_.T),2) [Multiply with 100 if you want percentages and not proportions] [R equivalent -> var$contrib]
pca.variance_explained_ = eigen values
pca.singular_values_ = singular values obtained from SVD.
(singular values)**2/(n-1) = eigen values
eigen vectors * sqrt(eigen values) = loadings matrix
4.a: vertical sum of squared loading matrix = eigen values. (Given you have taken transpose as explained in step 1)
4.b: horizontal sum of squared loading matrix = observation's variance explained by all principal components -How much all pc's retain a variables variance after transformation. (Given you have taken transpose as explained in step 1)
In python-> loading matrix = pca.components_.T * sqrt(pca.explained_variance_).
For questions pertaining to r:
var$cos2 = var$cor (Both matrices are same). Given the coordinates of the variables on a factor map, how well it is represented by a particular principal component. Seems like variable and principal component's correlation.
var$contrib = Summarized by point 1. In r:(var.cos2 * 100) / (total cos2 of the component) PCA analysis in R link
Hope it helps others who are confused by PCA analysis.
Huge thanks to -- https://stats.stackexchange.com/questions/143905/loadings-vs-eigenvectors-in-pca-when-to-use-one-or-another

Apply a transformation matrix over time

I have an initial frame and a bounding box around some information. I have a transformation matrix T, for which I want to use to transform this bounding box.
I could easily apply the transformation and draw it in the output frame, but I would like to apply the transformation over a sequence of x frames, can anyone suggest a way to do this?
Aly
Building on #egor-n comment, you could compute R = T^{1/x} and compute your bounding box on frame i+1 from the one at frame i by
B_{i+1} = R * B_{i}
with B_{0} your initial bounding box. Depending on the precise form of T, we could discuss how to compute R.
There are methods for affine transforms - to make decomposition of affine transform matrix to product of translation, rotation, scaling and shear matrices, and linear interpolation of parameters of every matrix (for example, rotation angle for R and so on). Example
But for homography matrix there is no single solution, as described here, so one can find some "good" approximation (look at complex math in that article). Probably, some limitations for possible transforms could simplify the problem.
Here's something a little different you could try. Let M be the matrix representing the final transformation. You could try interpolating between I (the identity matrix, with 1's on the diagonal and 0's elsewhere) using the formula
M(t) = exp(t * ln(M))
where t is time from 0 to 1, M(0) = I, M(1) = M, exp is the exponential function for matrices given by the usual infinite series, and ln is the similar natural logarithm function for matrices given by the usual infinite series.
The correctness of the formula depends on the type of transformation represented by M and the type of transformations allowed in intermediate steps. The formula should work for rigid motions. For other types of transformations, various bad things might happen, including divergence of the logarithm series. Other formulas can be used in other cases; let me know if you're using transformations other than rigid motions and I can give some other formulas.
The exponential and logarithm functions may be available in a matrix library. If not, they can be easily implemented as partial sums of infinite series.
The above method should give the same result as some quaternion methods in the case of rotations. The quaternion methods are probably faster when they're available.
UPDATE
I see you mention elsewhere that your transformation is a homography (perspectivity), so the method I suggested above for rigid motions won't work. Instead you could use a different, but related method outlined in ftp://ftp.cs.huji.ac.il/users/aristo/papers/SYGRAPH2005/sig05.pdf. It goes as follows: represent your transformation by a matrix in one higher dimension. Scale the matrix so that its determinant is equal to 1. Call the resulting matrix G. You want to interpolate from the identity matrix I to G, going through perspectivities.
In what follows, let M^T be the transpose of M. Let the function expp be defined by
expp(M) = exp(-M^T) * exp(M+M^T)
You need to find the inverse of that function at G; in other words you need to solve the equation
expp(M) = G
where G is your transformation matrix with determinant 1. Call the result M = logp(G). That equation can be solved by standard numerical techniques, or you can use Matlab or other math software. It's somewhat time-consuming and complicated to do, but you only have to do it once.
Then you calculate the series of transformations by
G(t) = expp(t * logp(G))
where t varies from 0 to 1 in steps of 1/k, where k is the number of frames you want.
You could parameterize the transform over some number of frames by adding a variable with a domain greater than zero but less than 1.
Let t be the frame number
Let T be the total number of frames
Let P be the original location and orientation of the object
Let theta be the total rotation angle
and translation be the vector [x,y]'
The transform in 2D becomes:
T(P|t) = R(t)*P +(t*[x,y]')/T
where R(t) = {{Cos((theta*t)/T),-Sin((theta*t)/T)},{Sin((theta*t)/T),Cos((theta*t)/T)}}
So that at frame t_n you apply the transform T(t) to the position of the object at time t_0 = 0 (which is equivalent to no transform)

covariance formula: multiplying just the weights "in couple" in R

ok basically if you look at the covariance formula when weights are involved (look at this picture so everything is clear http://postimg.org/image/sjr2tnk85/), I just want to calculate the sum of all the different couples of weights as highlighted in the link of the picture I uploaded.
I absolutely need that specific quantity highlighted in the picture. I have no use of the formulas cor() [i tried but it was useless]
I have tried to use "for" loops trying to following the mathematical formula but came out empty handed.
I am sorry if this post lacks the specificity required for this forum but it was the best way I could think of in order to explain my problem.
sum(outer(w,w), -crossprod(w)) / 2
Z <- outer(a,b) creates a matrix where Z[i,j] = a[i]*b[j]. Plugging in w for both a and b, this is a symmetric matrix.
crossprod(x) calculates the sums of squares of x. This is the sum of the diagonals of the above matrix.
Take the difference, then divide by two because you only want the top half of the matrix.
Alternatively, you could try sum( apply(combn(w,2), 2, prod) ) to explicitly form each pair, multiply them, and sum them up.

How to compute the inverse of a close to singular matrix in R?

I want to minimize function FlogV (working with a multinormal distribution, Z is data matrix NxC; SIGMA it´s a square matrix CxC of var-covariance of data, R a vector with length C)
FLogV <- function(P){
(here I define parameters, P, within R and SIGMA)
logC <- (C/2)*N*log(2*pi)+(1/2)*N*log(det(SIGMA))
SOMA.t <- 0
for (j in 1:N){
SOMA.t <- SOMA.t+sum(t(Z[j,]-R)%*%solve(SIGMA)%*%(Z[j,]-R))
}
MlogV <- logC + (1/2)*SOMA.t
return(MlogV)
}
minLogV <- optim(P,FLogV)
All this is part of an extend code which was already tested and works well, except in the most important thing: I can´t optimize because I get this error:
“Error in solve.default(SIGMA) :
system is computationally singular: reciprocal condition number = 3.57726e-55”
If I use ginv() or pseudoinverse() or qr.solve() I get:
“Error in svd(X) : infinite or missing values in 'x'”
The thing is: if I take the SIGMA matrix after the error message, I can solve(SIGMA), the eigen values are all positive and the determinant is very small but positive
det(SIGMA)
[1] 3.384674e-76
eigen(SIGMA)$values
[1] 0.066490265 0.024034173 0.018738777 0.015718562 0.013568884 0.013086845
….
[31] 0.002414433 0.002061556 0.001795105 0.001607811
I already read several papers about change matrices like SIGMA (which are close to singular), did several transformations on data scale and form but I realized that, for a 34x34 matrix like the example, after det(SIGMA) close to e-40, R assumes it like 0 and calculation fails; also I can´t reduce matrix dimensions and can´t input in my function correction algorithms to singular matrices because R can´t evaluate it working with this optimization functions like optim. I really appreciate any suggestion to this problem.
Thanks in advance,
Maria D.
It isn't clear from your post whether the failure is coming from det() or solve()
If its just the solve in the quadratic term, you may want to try the two argument version of solve, it can be a bit more stable. solve(X,Y) is the same as solve(X) %*% Y
If you can factor sigma using chol(), you will get a triangular matrix such that LL'=Sigma. The determinant is the product of the diagonals, and you might try this for the quadratic term:
crossprod( backsolve(L, Z[j,]-R))

Is it impossible to do PCA on the data whose # of variables are bigger than that of individuals?

I am a new user of R and I try to do PCA on my data set using R. The dimension of data is 20x10000, i.e. # of features is 10000 and # of individuals is 20. It seems that prcomp() cannot handle the data exactly, because the dimension of calculated eigenvectors and new data is 20x20 and 10000x20 instead of 10000x10000 and 20x10000. I tried FactoMineR library also, but the results looked like that it looses some dimension, too. Is there any way to doing PCA on the data like this? :(
By reading the manual, it looks like no components are omitted by default but check the tol argument. The problem is with negative eigenvalues that may bet there (and often are) when you have less cases than individuals. (I think with 10000 cases and 20 individuals you will always have many negative eigenvalues.) See a simplified version of PCA I'm sometimes using that computes "PC loadings" the way they're usually used in psychology.
PCA <- function(X, cut=NULL, USE="complete.obs") {
if(is.null(cut)) cut<- ncol(X)
E<-eigen(cor(X,use=USE))
vec<-E$vectors
val<-E$values
P<-sweep(vec,2,sqrt(val),"*")[,1:cut]
P
}
The "loadings" are, basically, eigenvectors multiplied by the square root of eigenvalues -- but there's a problem here if you have negative eigenvalues. Something similar may happen with prcomp.
If you just want to reconstruct your data matrix exactly (for whatever reason), you can easily use svd or eigen directly. /My example used correlation matrix but the logic is not confined to this case./

Resources