Comparing svd and princomp in R - r

I want to get singular values of a matrix in R to get the principal components, then make princomp(x) too to compare results
I know princomp() would give the principal components
Question
How to get the principal components from $d, $u, and $v (solution of s = svd(x) )?

One way or another, you should probably look into prcomp, which calculates PCA using svd instead of eigen (as in princomp). That way, if all you want is the PCA output, but calculated using svd, you're golden.
Also, if you type stats:::prcomp.default at the command line, you can see how it's using the output of svd yourself.

Related

Area under ROC in R

Is there a way of calculating or estimating the area under the curve as an external metric, using base R, from confusion matrices alone?
If not, how would I do it, given the clustering object?
e.g. we can start from
cutree(hclust(dist(iris[,1:4])),method="average"),3))
or, from a diagonal-maximized version of
table(iris$Species, cutree(hclust(dist(iris[,1:4])),method="average"),3))
the latter being the confusion matrix. I would much, much prefer a solution that goes from the confusion matrix but if it's impossible we can use the clustering object itself.
I read the comments here: Calculate AUC in R? -- the top solution looks good, but it's unclear to me how to generalise it for multi-class data like iris.
(No packages, obviously, I want to find out how to do it by hand in base R)

Which methods can I use to calculate correlation among words in quanteda?

My question is a continuation of this.
After cleaning my text data and visualizing it using a wordcloud, I want to see which words are correlated to each other. Here comes the problem:
quantedahas the function textstat_simil, but it says
similarity. So, are "similarity" and "correlation" in this case the same thing? (Is distance also related?).
Moreover, my dfm looks like a binary matrix. Is in this case phi
correlation (from chi'squared statistics) more indicated? Can I
calculate this via quanteda?
Do you guys have any other content rather than the source code of
github that explain in more detail the methods to calculate
similarity or distance measures? (I couldn't understand from
this
code, sorry).
Thanks for you patient!
To compute Pearson’s product-moment correlations among features, you would use:
textstat_simil(x, method = “correlation”, margin = “features”)
The documentation makes this pretty clear, and the correlation method is the default.
Pearson’s correlation would not be the most appropriate for binary data, and we currently do not implement Spearman’s or other correlation methods more appropriate for categorical or ordinal data. However you can always coerce the dfm to an ordinary matrix (use as.matrix()) and then use the stats::cor() methods, which include Spearman’s.
As for the last question, we use the standard implementation of these measures. If you want more clarity on what they mean, I suggest asking on Cross-Validated.

r corrplot with clustering: default dissimilarity measure for correlation matrix

I used the R package corrplot to visualize the correlation matrix from my data. I involved the clustering of variables using the embedded option hclust.
The invocation of the command was like this (plus various arrangements of titles, axes etc):
corrplot(Rbas,type="upper",order="hclust",method="ellipse")
But now I perform some analysis and visualizations using other packages, and the question arose about the compatibility of results. In particular, I have to repeat manually the clustering of the correlation matrix. But from the documentation to corrplot there is one obscure point: what dissimilarity measure was used in corrplot behind its reasonable defaults? Whether this is 1-|corr|, sqrt(1-corr^2), or anything else? In literature there are multiple choices, for example, as described in this article
Update to answer own question. I performed a guess trial, using the dissimilarity measure in the form 1-corr. That is I coded (Rbas is the correlation matrix):
dissim1<-1-Rbas
dist1<-as.dist(dissim1)
plot(hclust(dist1))
and recovered the ordering of variables, coinciding with the one suggested by default corrplot with hclust invocation. But it is not clear whether this is indeed their used mechanism and whether this will hold for any other matrix?
The function used by corrplot to reorder variables is corrMatOrder (try ?corrMatOrder). It returns a single permutation vector.
When order= "hclust" is selected in corrplot, corrMatOrder invokes the corrplot:::reorder_using_hclust function:
function (corr, hclust.method)
{
hc <- hclust(as.dist(1 - corr), method = hclust.method)
order.dendrogram(as.dendrogram(hc))
}
This function uses 1-corr as dissimilarity measure.

PLS coefficients with r

I'm making a PLS model using packages "pls" and "ChemometricswithR". I'm able to perform the model but I have a problem. I did a leave-one-out validation and if I ask for the coefficients I can see only an equation (I suppose the average of all the equations developed in leave one out validation).
Is there a way to see all the "n" equations (where n is the number of the observations in my matrix) with all the slopes coefficients?
this is the model i used: mod2<-plsr(SH_uve~matrix_uve,ncomp=11, data=dataset_uve, validation="LOO",jackknife = TRUE)
This would be easier to answer if you gave more information, how you are calling the functions etc? Based on what you said you are doing I'm assuming you are using the functions crossval() and PCA() from packages "pls" and "ChemometricswithR" respectively. I'm not familiar with these functions but the documentations sates that for coefficients "(only if
jackknife is TRUE) an array with the jackknifed regression coefficients.The dimensions correspond to the predictors, responses, number of components, and segments, respectively". So I would say make sure jackknife=TRUE and that you are specifying the correct number of segments in crossval(). If you are using different functions you should edit your question and add in the relevant information.
OK, i found the solution.
The model i used is:
mod2<plsr(SH_uve~matrix_uve,ncomp=11,data=dataset_uve,validation="LOO",jackknife = TRUE)
The coefficients matrix is inside the mod2 array. I called the matrix with the command:
coefficients<-mod2$validation$coefficients[,,11,] and i obtained the coefficients matrix for all the equations used in the leave-one-out cross validation.

Generalized Eigen Values and Vectors in Eigen Library

How do I find generalized Eigen Values, Vectors using Eigen3 library?
In octave, matlab, the eigen value function is of the form: [V, lambda] = eig (A, B).
I could only find this Class in Eigen3 lib but was not helpful in validating the results from above matlab/octave code.
You'll want to use the EigenSolver class which is located in the Eigen/Eigenvalues header. Either use the EigenSolver constructor that takes a matrix parameter or of or call the compute method with a matrix and it will solve for that matrix's eigenvalues and eigenvectors. Then you can use the eigenvalues() and eigenvectors() methods to retrieve the eigenvalues and eigenvectors.
This question is old. Anyway, if someone here is looking for it, they should consider the GeneralizedEigenSolver (http://eigen.tuxfamily.org/dox-devel/classEigen_1_1GeneralizedEigenSolver.html) that is available in the Eigen library. Although, at this time, as far as I know, it is not completely ready.

Resources