How to downproject with PCA in R?
When I use princomp function on my data
it creates as many principal components as
there are dimensions in the original data.
But how can I down-project, let's say if I have
10 dimensional data and I want to downproject to 2 dimensions?
if you mean doing PCA and keeping just a few of the components (dimensions) then one way is to use principal in package psych. (Using the argument nfactors)
Related
I am attempting to use the sarorderedprobit function (within the "spatialprobit" package) to perform a SAR Ordered Probit estimation using panel data.
I have imported my spatial weight matrix (representing the 50 states of the US) using the following script:
Weight_GAL<- read.gal(File, override.id=TRUE)
Weight_List<nb2listw(Weight_GAL,style="W", zero.policy=TRUE)
W<-listw2mat(Weight_List)
which successfully imports the 50x50 sparse matrix.
The following sarorderedprobit is run:
sarorderedprobit(formula, W=W, showProgress=TRUE)
When using cross-sectional data with 50 observations, the script successfully estimates the sarorderedprobit model. However, when panel data is used with 3 years (i.e., 150 observations), the script returns the following error:
Error: Matrices must have same dimensions in .Arith.Csparse(e1,e2, .Generic, class. = dgCMatrix")
The issue here seems to be related to the use of a 50x50 weight matrix with 150 observations. Unfortunately, I have not found any references to using the sarorderedprobit function with panel data. Can anyone provide guidance on whether the sarorderedprobit function supports estimation using panel or timeseries datasets?
EDIT:
I have calculated the Kronecker product using a sparse matrix to prepare a 150x150 weight matrix using the following script:
tperiods <- len(3)
t_diag <- Diagonal(tperiods)
bigWmat <- kronecker(t_diag,W)
Running the sarorderedprobit function using the bigWmat matrix is successful with no errors. However, I am concerned that this is not correctly handling the temporal nature of the panel data estimation. Do I need to add response dummy variables with the time periods (t=1, t=2, t=3 for the 3 years of panel data)?
I want to perform a k-means analysis in R. For that I need numeric data. I tried the following
unlist(pca)
as.numeric(pca)
lapply(pca,as.numeric(pca))
pca is just "normal" Principal Component Analysis data, showed in a plot (with fviz_pca_ind() function).
By the way, when I try to run the k-means analysis, it gives me "list object cannot be coerced to type double". That is why I thought to turn everything into numeric.
How to convert the pca-data into numeric?
Thank you ;)
You're almost correct
lapply(pca,as.numeric)
as.numeric is a function and therefore an object. You need to pass it to lapply() as an object and therefore without the quotation marks.
Most pca objects should return you a list, and you should show which package or function is used to perform the pca, so we can see what's in the list
For example, if you use prcomp, it returns a list of eigenvectors / loadings ($rotation) and principal components ($x). I suppose you are trying to do k-means on the principal componets, and you can do it like:
# perform pca
pca = prcomp(USArrests,scale=TRUE)
# we call out the PCs using pca$x
# and kmeans
kmeans_clus = kmeans(pca$x,3)
## plot
# define colors
COLS = c("#65587f","#f18867","#e85f99")
plot(pca$x[,1:2],col=COLS[kmeans_clus$cluster],pch=20)
legend("topright",fill=COLS,legend=1:3,horiz=TRUE)
My question is about functional principal component analysis in R.
I am working with a multi-dimensional time series looking something like this:
My goal is to reduce the dimensions by applying functional PCA and then plot the first principal component like this:
I have already used the FPCA function of the fdapace package on the dataset. Unfortunately, I don't understand how to interpret the resulting matrix of the FPCA estimates (xiEst).
In my understanding the values of the Principal components are stored in the columns of the matrix.
Unfortunately the number of columns doesn't fit the number of time intervals of my multi dimensional time series.
I don't know how the values in the matrix correspond to the values of the original data and how to plot the first principal component as a dimensional reduction of the original data.
If you need some code to reproduce the situation you can use the medfly dataset of the package:
library(fdapace)
data(medfly25)
Flies <- MakeFPCAInputs(medfly25$ID, medfly25$Days, medfly25$nEggs)
pfcaObjFlies <- FPCA(Flies$Ly, Flies$Lt)
when I plot the first principal component via
plot(fpcaObjFlies$xiEst[,1], type = "o")
the graph doesn't really fit my expectations:
I would have expected a graph with 25 observations similar to the graphs of the medfly dataset.
Looking at the parameters to the Rtsne function:
https://cran.r-project.org/web/packages/Rtsne/Rtsne.pdf
There is a parameter called "pca" defined as "logical; Whether an initial PCA step should be performed (default: TRUE)"
Let's say you have a 10 dimensional feature set and you run TSNE. I was thinking you would scale the 10-D matrix and then pass it to Rtsne().
What does the pca indicated by the pca parameter do?
WOuld it take the 10-D matrix and run PCA on that? If so, would it pass all 10 dimensions in the PCA space to Rtsne?
Is there any info anywhere else about what this initial PCA step is?
Thank you.
The original tSNE paper used PCA.
To reduce the dimensionality of the MNIST data prior to running tSNE.
I have performed PCA in R (using the prcomp() function) on returns data of 5 variables.
This is my code:
pca1 = prcomp(~df.var1+df.var2+df.var3+df.var4+df.var5, data = ccy)
I would like to move to the interpretation stage...the "rotation" matrix in the "pca1" object comprises, to my understanding, the coefficients assigned to each of the original 5 variables in the equation describing the principal component (PC). This link suggests calculating the correlations between the PCs and each of the variables. Is the "x" object within the "pca1" object (accessed using:
pcs = pca1$x
) a matrix of values for the PCs? If I calculated correlations between these values and the original variables, would that represent the correlation between the PCs and the variables? Is there perhaps a "built-in" method for this?