Simulating data using gaussian copula when dealing with non-positive definite matrix - r

I am simulating data using a gaussian copula which requires a correlation matrix. To construct the correlation matrix, I got its correlation coefficients from literature/past studies. However, how do you deal with a non-positive definite matrix when simulating non-normal data using a Gaussian copula and ensuring that the final outcome presents a correlation that contains almost similar values as the correlation matrix used to simulate the data?
Approaches of dealing with the challenge stated above in R programming

Related

Is there an R function to compare two variance-covariance matrices via fit indicators?

I obtained two variance-co-variance matrices from two different samples. Both contain data on the same variables. I would like to estimate their similarity according to fit indices, i.e., I am interested whether the pattern of covariances between the variables is similar or different in the two samples. I am familiar with fit indices from structural equation modeling (e.g., Chi-square, GFI, CFI, RMSEA, SRMR) which compare an empirical variance-covariance matrix with a model-implied variance-covariance matrix. Is there a way to obtain these fit indicators for the comparison of two empirical variance-covariance matrices?
I tried compareCov which only gives a visual comparison.

How do I extract the principal component`s values of all observations using psych package

I'm performing dimensionality reduction using the psych package. After analyzing the scree plot I decided to use the 9 most important PCs (out of 15 variables) to build a linear model.
My question is, how do I extract the values of the 9 most important PCs for each of the 500 observations I have? Is there any built in function for that, or do I have to manually compute it using the loadings matrix?
Returns eigen values, loadings, and degree of fit for a specified number of components after performing an eigen value decomposition. Essentially, it involves doing a principal components analysis (PCA) on n principal components of a correlation or covariance matrix. Can also display residual correlations.By comparing residual correlations to original correlations, the quality of the reduction in squared correlations is reported. In contrast to princomp, this only returns a subset of the best nfactors. To obtain component loadings more characteristic of factor analysis, the eigen vectors are rescaled by the sqrt of the eigen values.
principal(r, nfactors = 1, residuals = FALSE,rotate="varimax",n.obs=NA, covar=FALSE,
scores=TRUE,missing=FALSE,impute="median",oblique.scores=TRUE,
method="regression",...)
I think So.

why the calculation of eigenvectors and eigenvalues when performing PCA is so effective?

The core of Principal Componenet Analysis (PCA) lies at calculating eigenvalues and eigenvectors from the variance-covariance matrix corresponding to some dataset (for example, a matrix of multivariate data coming from a set of individuals). Text-book knowledge I have is that:
a) by multipliying such eigenvectors with the original data matrix one can calculate "scores" (as many as orignal set of variables) which are independent from each other
b) the eigenvalues summarize the amount of variance of each score.
These two properties make this process a very effective data transformation technique to simplify the analysis of multivariate data.
My question is why is that so? why is calculating eigenvalues and eigenvectors from a covariance-variance matrix results in such unique properties of the scores?

use correlation matrix in robust PCA functions R

I want to perform robust principal component analysis (PCA) on the correlation matrix. Namely, rrcov::PcaHubert.
I know that if I give to the function cor=TRUE, rrcov:CovMcd calculates the robust covariance and correlation matrix. How can I force the PCA to use the correlation matrix instead of the covariance matrix?
Thanks!

Cross validation of PCA+lm

I'm a chemist and about an year ago I decided to know something more about chemometrics.
I'm working with this problem that I don't know how to solve:
I performed an experimental design (Doehlert type with 3 factors) recording several analyte concentrations as Y.
Then I performed a PCA on Y and I used scores on the first PC (87% of total variance) as new y for a linear regression model with my experimental coded settings as X.
Now I need to perform a leave-one-out cross validation removing each object before perform the PCA on the new "training set", then create the regression model on the scores as I did before, predict the score value for the observation in the "test set" and calculate the error in prediction comparing the predicted score and the score obtained by the projection of the object in the test set in the space of the previous PCA. So repeated n times (with n the number of point of my experimental design).
I'd like to know how can I do it with R.
Do the calculations e.g. by prcomp and then lm. For that you need to apply the PCA model returned by prcomp to new data. This needs two (or three) steps:
Center the new data with the same center that was calculated by prcomp
Scale the new data with the same scaling vector that was calculated by prcomp
Apply the rotation calculated by prcomp
The first two steps are done by scale, using the $center and $scale elements of the prcomp object. You then matrix multiply your data by $rotation [, components.to.use]
You can easily check whether your reconstruction of the PCA scores calculation by calculating scores for the data you input to prcomp and comparing the results with the $x element of the PCA model returned by prcomp.
Edit in the light of the comment:
If the purpose of the CV is calculating some kind of error, then you can choose between calculating error of the predicted scores y (which is how I understand you) and calculating error of the Y: the PCA lets you also go backwards and predict the original variates from scores. This is easy because the loadings ($rotation) are orthogonal, so the inverse is just the transpose.
Thus, the prediction in original Y space is scores %*% t (pca$rotation), which is faster calculated by tcrossprod (scores, pca$rotation).
There is also R library pls (Partial Least Squares), which has tools for PCR (Principal Component Regression)

Resources