Applying PCA on the user's defined covariance matrix - r

I am trying to apply principal component analysis on a covariance matrix estimated from the relationships among all individuals, see Mm in the following example.
I will appreciate it so much if anyone can show me how to do it
Example:
library(BLR)
library (rrBLUP)
data(wheat)
Mm is the covariance matrix that I want to compute PCA
Mm <- A.mat(X)

Try the following using your dataset. I'm not sure if this is what you are after
pca <- prcomp(Mm, scale=T)
# Check out whats in it
str(pca)
# Prints variance summary for all principal components.
summary(pca)
# Accesses subset of components.
summary(pca)$importance[, 1:20]
require(GGally)
ggpairs(pca$x[,1:4])

Related

Fitting a negative binomial model and covariance structure syntax in glmmTMB

I read the glmmTMB package vignettes (https://cran.r-project.org/web/packages/glmmTMB/vignettes/covstruct.html) and have the following questions:
In the vignetts, they fit the model using
glmmTMB(y~ar1(times+0|group),data=dat0)
and mentioned that the time+0 correspondes to a design matrix Z linking observation vector y (rows) with a random effects vector u(columns)"
What's the meaning of +0? Is there any difference with (times|group), (times+1|group) and (1|group)?
Are there any comprehensive summary about the syntax of the covariance structure?
If I want to fit a Negtive binomial model when outcome y_{ij} is generated from R function: rnbinom(mu=x_{ij}beta+b_i+e_{ij},size=1), where i is the group index and j is the individual index, and b_i~N(0,1), e_{ij}~N(0,1). Would the following code correctly specify to model?
dat <- data.frame(y,x,group)
glmmTMB(y~x+(1|group),data=dat,family=nbinom2)
Any suggestion and help is appreciated. Thanks in advance for your help!

User specified variance-covariance matrix in car::Anova not working

I am trying to use the car::Anova function to carry out joint Wald chi-squared tests for interaction terms involving categorical variables.
I would like to compare results when using bootstrapped variance-covariance matrix for the model coefficients. I have some concerns about the normality of residuals and am doing this as a first step before considering permutation tests as an alternative to joint Wald chi-squared tests.
I have found the variance covariance from the model fitted on 1000 bootstrap resamples of the data. The problem is that the car::Anova.merMod function does not seem to use the user-specified variance covariance matrix. I get the same results whether I specify vcov. or not.
I have made a very simple example below where I try to use the identity matrix in Anova(). I have tried this with the more realistic bootstrapped var-cov as well.
I looked at the code on github and it looks like there is a line where vcov. is overwritten using vcov(mod), so that might be an error. However I thought I'd see if anyone here had come across this issue or could see if I had made a mistake.
Any help would be great!
df1 = data.frame( y = rbeta(180,2,5), x = rnorm(180), group = letters[1:30] )
mod1 = lmer(y ~ x + (1|group), data = df1)
# Default, uses variance-covariance from the model
Anova(mod1)
# Should use user-specified varcov matrix but does not - same results as above
Anova(mod1, vcov. = diag(2))
# I'm not bootstrapping the var-cov matrix here to save space/time
p.s. Using car::linearHypothesis works for user-specified vcov, but this does not give results using type 3 sums of squares. It is also more laborious to use for more than one interaction term. Therefore I'd prefer to use car::Anova if possible.

Can I use a covariance matrix to specify the correlation structure in the nlme function gls?

I wish to use the function gls in the R package nlme to analyse a set of nested spatial samples, in which many samples overlap in at least some spatial coordinates. I want to account for non-independence in the response variable (the thing I'm measuring in each spatial sample) using either a corStruct or pdMat object, but I'm confused about how to do this.
I have generated a covariance matrix that should encode all the information about non-independence between spatial samples. Each row/column is a distinct spatial sample, the diagonal contains the total number of sampling units captured by each spatial sample, and the off-diagonal elements contain counts of sampling units shared between spatial samples.
I think I should use the nlme function gls while specifying a correlation structure, possibly using a corSymm or pdMat object. But I've only seen examples where the correlation structure in gls is specified via a formula. How can I use the covariance matrix that I've created?
I discovered that you can pass the nlme function gls a positive-definite correlation matrix by using the general correlation structure provided by corSymm.
# convert your variance covariance matrix into a correlation matrix
CM <- cov2cor(vcv_matrix)
# if your correlation matrix contains zeros, as mine did, you need to convert it to a positive-definite matrix that substitutes very small numbers for those zeros
CM <- nearPD(CM)$mat
# convert into a corStruct object using general correlation structure provided by corSymm
C <- corSymm(CM[lower.tri(CM)], fixed = T)
# correlation structure can now be included in a gls model
gls(y ~ x, correlation = C, method = "ML")

std values of principal component object differs in prcomp and caret

I was trying Principal Component Analysis in the following data set. I tried through prcomp function and caret preProcess function.
library(caret)
library(AppliedPredictiveModeling)
set.seed(3433)
data(AlzheimerDisease)
adData = data.frame(diagnosis,predictors)
inTrain = createDataPartition(adData$diagnosis, p = 3/4)[[1]]
training = adData[ inTrain,]
testing = adData[-inTrain,]
# from prcomp
names1 <-names(training)[substr(names(training),1,2)=="IL"]
prcomp.data <- prcomp(training[,names1],center=TRUE, scale=TRUE)
prcomp.data$sdev
## from caret package
preProcess(training[, names1], method=c("center", "scale", "pca"))$std
I was wondering why sdev values differ in the above processes.
Thanks
The first method is giving you standard deviations of 12 principal components (which you can see with prcomp.data$rotation).
Also, this is mentioned in the documentation for the sdev value:
the standard deviations of the principal components (i.e., the square
roots of the eigenvalues of the covariance/correlation matrix, though
the calculation is actually done with the singular values of the data
matrix).
The 2nd is giving you standard deviations on the pre-processed input data (hence the variable names associated with each standard deviation).
A small side note -- caret PCA's are automatically scaled and centered unless otherwise specified.

Cross validation of PCA+lm

I'm a chemist and about an year ago I decided to know something more about chemometrics.
I'm working with this problem that I don't know how to solve:
I performed an experimental design (Doehlert type with 3 factors) recording several analyte concentrations as Y.
Then I performed a PCA on Y and I used scores on the first PC (87% of total variance) as new y for a linear regression model with my experimental coded settings as X.
Now I need to perform a leave-one-out cross validation removing each object before perform the PCA on the new "training set", then create the regression model on the scores as I did before, predict the score value for the observation in the "test set" and calculate the error in prediction comparing the predicted score and the score obtained by the projection of the object in the test set in the space of the previous PCA. So repeated n times (with n the number of point of my experimental design).
I'd like to know how can I do it with R.
Do the calculations e.g. by prcomp and then lm. For that you need to apply the PCA model returned by prcomp to new data. This needs two (or three) steps:
Center the new data with the same center that was calculated by prcomp
Scale the new data with the same scaling vector that was calculated by prcomp
Apply the rotation calculated by prcomp
The first two steps are done by scale, using the $center and $scale elements of the prcomp object. You then matrix multiply your data by $rotation [, components.to.use]
You can easily check whether your reconstruction of the PCA scores calculation by calculating scores for the data you input to prcomp and comparing the results with the $x element of the PCA model returned by prcomp.
Edit in the light of the comment:
If the purpose of the CV is calculating some kind of error, then you can choose between calculating error of the predicted scores y (which is how I understand you) and calculating error of the Y: the PCA lets you also go backwards and predict the original variates from scores. This is easy because the loadings ($rotation) are orthogonal, so the inverse is just the transpose.
Thus, the prediction in original Y space is scores %*% t (pca$rotation), which is faster calculated by tcrossprod (scores, pca$rotation).
There is also R library pls (Partial Least Squares), which has tools for PCR (Principal Component Regression)

Resources