My question is about functional principal component analysis in R.
I am working with a multi-dimensional time series looking something like this:
My goal is to reduce the dimensions by applying functional PCA and then plot the first principal component like this:
I have already used the FPCA function of the fdapace package on the dataset. Unfortunately, I don't understand how to interpret the resulting matrix of the FPCA estimates (xiEst).
In my understanding the values of the Principal components are stored in the columns of the matrix.
Unfortunately the number of columns doesn't fit the number of time intervals of my multi dimensional time series.
I don't know how the values in the matrix correspond to the values of the original data and how to plot the first principal component as a dimensional reduction of the original data.
If you need some code to reproduce the situation you can use the medfly dataset of the package:
library(fdapace)
data(medfly25)
Flies <- MakeFPCAInputs(medfly25$ID, medfly25$Days, medfly25$nEggs)
pfcaObjFlies <- FPCA(Flies$Ly, Flies$Lt)
when I plot the first principal component via
plot(fpcaObjFlies$xiEst[,1], type = "o")
the graph doesn't really fit my expectations:
I would have expected a graph with 25 observations similar to the graphs of the medfly dataset.
Related
I am attempting to use the sarorderedprobit function (within the "spatialprobit" package) to perform a SAR Ordered Probit estimation using panel data.
I have imported my spatial weight matrix (representing the 50 states of the US) using the following script:
Weight_GAL<- read.gal(File, override.id=TRUE)
Weight_List<nb2listw(Weight_GAL,style="W", zero.policy=TRUE)
W<-listw2mat(Weight_List)
which successfully imports the 50x50 sparse matrix.
The following sarorderedprobit is run:
sarorderedprobit(formula, W=W, showProgress=TRUE)
When using cross-sectional data with 50 observations, the script successfully estimates the sarorderedprobit model. However, when panel data is used with 3 years (i.e., 150 observations), the script returns the following error:
Error: Matrices must have same dimensions in .Arith.Csparse(e1,e2, .Generic, class. = dgCMatrix")
The issue here seems to be related to the use of a 50x50 weight matrix with 150 observations. Unfortunately, I have not found any references to using the sarorderedprobit function with panel data. Can anyone provide guidance on whether the sarorderedprobit function supports estimation using panel or timeseries datasets?
EDIT:
I have calculated the Kronecker product using a sparse matrix to prepare a 150x150 weight matrix using the following script:
tperiods <- len(3)
t_diag <- Diagonal(tperiods)
bigWmat <- kronecker(t_diag,W)
Running the sarorderedprobit function using the bigWmat matrix is successful with no errors. However, I am concerned that this is not correctly handling the temporal nature of the panel data estimation. Do I need to add response dummy variables with the time periods (t=1, t=2, t=3 for the 3 years of panel data)?
I use prcomp to run PCA in r. When I output the summary, i.e. standard deviation, proportion of variance, cumulative proportion, the results are always ordered and the actual column name is replaced by PC1, PC2. Thus, I cannot tell the exact proportion of variance for each column.
Can anyone show me or give me some hint on how to display the column when outputting summary results. Two results pics are attached here:
It is not clear that you understand what principal components does. It reduces the dimensionality of the data. Assuming the rows are observations and the columns are variables, imagine plotting your rows in 35 dimensions (the columns). Most people have trouble visualizing more than 3 dimensions. Principal components creates a smaller set of axes that explains most the the variation in the data. The axes are Euclidian meaning they are at right angles to one another. Your plot and the result of the summary(res.pca5) and plot(res.pca5) functions show that the first dimension explains 28% of the variation in the 35 variables. Adding a second dimension gives you almost 38% and three gives you 44%. These new variables are combinations of your original variables, not the original variables. The first two components explain more of the variability than any other combination.
For some reason you did not try res.pca5 as a command (or the equivalent print(res.pca5)) which would show you the coefficients that pca used to create the components from the original variables or biplot(res.pca5) which plots the rows and columns in the new two dimensional space.
I have performed PCA in R (using the prcomp() function) on returns data of 5 variables.
This is my code:
pca1 = prcomp(~df.var1+df.var2+df.var3+df.var4+df.var5, data = ccy)
I would like to move to the interpretation stage...the "rotation" matrix in the "pca1" object comprises, to my understanding, the coefficients assigned to each of the original 5 variables in the equation describing the principal component (PC). This link suggests calculating the correlations between the PCs and each of the variables. Is the "x" object within the "pca1" object (accessed using:
pcs = pca1$x
) a matrix of values for the PCs? If I calculated correlations between these values and the original variables, would that represent the correlation between the PCs and the variables? Is there perhaps a "built-in" method for this?
How to downproject with PCA in R?
When I use princomp function on my data
it creates as many principal components as
there are dimensions in the original data.
But how can I down-project, let's say if I have
10 dimensional data and I want to downproject to 2 dimensions?
if you mean doing PCA and keeping just a few of the components (dimensions) then one way is to use principal in package psych. (Using the argument nfactors)
I have a file containing 2,500 random numbers. Is it possible to rearrange these saved numbers in the way that a specific autocorrelation is created? Lets say, autocorrelation to the lag 1 of 0.2, autocorrelation to the lag 2 of 0.4, etc.etc.
Any help is greatly appreciated!
To be more specific:
The time series of a daily return in percent of an asset has the following characteristics that I am trying to recreate:
Leptokurtic, symmetric distribution, let's say centered at a daily return of zero
No significant autocorrelations (because the sign of a daily return is not predictable)
Significant autocorrleations if the time series is squared
The aim is to produce a random time series which satisfies all these three characteristics. The only two inputs should be the leptokurtic distribution (this I have already created) and the specific autocorrelation of the squared resulting time series (e.g. the final squared time series should have an autocorrelation at lag 1 of 0.2).
I only know how to produce random numbers out of my own mixed-distribution. Naturally if I would square this resulting time series, there would be no autocorrelation. I would like to find a way which takes this into account.
Generally the most straightforward way to create autocorrelated data is to generate the data so that it's autocorrelated. For example, you could create an auto correlated path by always using the value at p-1 as the mean for the random draw at time period p.
Rearranging is not only hard, but sort of odd conceptually. What are you really trying to do in the end? Giving some context might allow better answers.
There are functions for simulating correlated data. arima.sim() from stats package and simulate.Arima() from the forecast package.
simulate.Arima() has the advantages that (1.) it can simulate seasonal ARIMA models (maybe sometimes called "SARIMA") and (2.) It can simulate a continuation of an existing timeseries to which you have already fit an ARIMA model. To use simulate.Arima(), you do need to already have an Arima object.
UPDATE:
type ?arima.sim then scroll down to "examples".
Alternatively:
install.packages("forecast")
library(forecast)
fit <- auto.arima(USAccDeaths)
plot(USAccDeaths,xlim=c(1973,1982))
lines(simulate(fit, 36),col="red")