I am doing some modelling work in which I am trying to parametrise an effect that varies with season and time of day. The time of day effect differs between seasons in a complex way so it seems the most general approach is to model the effect in a periodic [time of day, day of year] space.
The effect being described has a non-linear relationship to the actual predictor and predicted quantities, so I need an explicit parametrisation that I can tune using non-linear optimisation.
So, the most obvious option would be a 2D Fourier basis. Can anyone recommend an R package for generating this? I found the package fda which has the function 'create.fourier.basis' but this appears to only apply to 1D.
Beyond a Fourier approach, the sampling of the data I have is highly irregular in the [time of day, day of year] plane so ideally a more localised approach such as a periodic cubic spline in which I can place more knots in the data rich parts of the plane would be preferable. Does anyone know of an R package that creates a 2D basis for this kind of representation?
The mgcv package can create tensor product basis functions of two or more underlying basis functions. It also allows for cyclic cubic and p splines, which can be used for the variables you mention, as the underlying basis functions for the tensor product.
As mgcv comes with R I would start with that. Look at ?te and ?smooth.terms for starters.
The fda package is suited to handle multivariate functional data. Have a look on e.g.
?fd
The help for fd states that assigning a threedimensional array to your basis function object gives you a multivariate functional data object. In their book, Ramsay, Hooker and Graves (2009) use multivariate functional data objects to capture handwriting data with a 2D definition of the pen location plus the time dimension.
Maybe I am wrong, but couldn't you just apply the same framework for your data which is defined over season, daytime, and effect?
Related
I want to conduct a convergence study for my Dymos optimization results where I vary the number of nodes and compare the simulated solution to the optimization solution. From what I understand, Dymos fits polynomials to the system dynamics to represent the timeseries solution. What is the best way to compare the polynomial trajectory of the optimization solution to the trajectory of the simulated solution? I specifically want to evaluate the difference between the two trajectories away from the collocation/control nodes... to show that the polynomial fitting actually represents the simulated solution. How would I access the polynomial fitting data?
Thanks in advance.
For some of the testing we have an assert_timeseries_near_equal function that treats the more dense time series as the truth and tests that the less dense timeseries (usually the discrete solution) is reasonably close to it.
We're actually working on this method a bit more explicit right now so it's a little easier for users to apply in general cases, such as comparing discrete solutions from two different cases.
In general, there's a few different ways you can test your explicit results against an explicit integration. You could just verify that the final states of the two solutions are reasonably close. Since the error tends to increase over the course of the trajectory this is often good enough for a quick check. The downside of this approach is that it doesn't test that both solutions took the same path to the final condition.
To test the solution away from the nodes I'd recommend the following: Add a second timeseries output to the relevant phase that contains more segments or higher order segments. This timeseries will have more nodes. Dymos will interpolate from the solution's collocation grid onto this more dense timeseries output grid. Comparing this against the explicit simulation should still match exactly in terms of times, controls, and parameters, you'll better capture the interpolating state polynomials vs the explicitly simulated results.
There are other statistical methods out there for comparing timeseries that you can bring to bear, but visualizing the explicit trajectory plus some error bound as a "tube" into which we want to fit the discrete solution is usually how I handle it.
I am trying to use FPCA on my time series and I know that I should do some smoothing before using FPCA. However, I don't know what smoothing method is good?
Any resource is much appreciated!
Thanks!
Smoothing will depend on the data you have. Considering the FDA parametric approach (Ramsay & Silverman, 2005) the basis functions is your choice: in general, it is common to use the “fourier” basis for periodic data, and “bspline” basis for non-recurrent data. B-splines have a very good local behaviour.
You can find more info about the implementation of different basis functions in "Functional data analysis with R and MATLAB" (Ramsay et al. 2009)
There's no specific rule to choose the dimension of the basis as it depends on several factors. I strongly recommend studying the least square error of the process in all possible dimensions, and then to choose it by the region of convenience. Some packages have implemented functions to calculate it; e.g. fda.usc::min.basis() -best minimum number of basis functions- and also by the cross-validation method, e.g. fda.usc::CV.S().
P-splines provide the lowest approximation errors, its computational implementation is easier and are quite insensitive to the choice of knots. You can try to smooth your functional object like:
library(fda)
fdobj <- create.bspline.basis(df,nbasis=k,norder=4)
smooth.fdPar(fdobj, Lfdobj=NULL, lambda=0,
estimate=TRUE, penmat=NULL)
Functions in spatstat are mainly made for 2-3-dim data analysis. Is there a good possibility to apply them to one-dim data?
There is huge capability for class ppp in 2-dim.
There is a very general class ppx for arbitrary dimensions - but this is the problem - only very few functions are available.
Can I take a sledgehammer to crack a nut in inflating one-dim data to two-dim one and in the end projecting back to one-dim?
Or should I better rewrite functions for one-dim (rpoispp, rmpoispp, ...)?
It all depends on which analysis you are doing. In general I would not recommend inflating one-dim data to two-dim data.
As you state the class ppx is general, but doesn't have many functions implemented for it yet. If you just need to simulate an unmarked Poisson point process in 1-dim you can use rpoisppx.
To have more functions available one solution could be to represent your data as a point pattern on a linear network (class lpp). Here is a crude example representing a section of 1-dim space as a line in the unit square and simulating a Poisson process on the line with intensity 10:
X <- ppp(x=c(0,1), y=c(.5,.5), window=square(1))
L <- linnet(X, edges=matrix(1:2,1,2))
Y <- rpoislpp(10, L)
I'm using the fourier() and fourierf() functions in Ron Hyndman's excellent forecast package in R. Looking to verify whether the same terms are selected and used in fourier() and fourierf(), I plotted a few of the output terms.
Below is the original data using ts.plot(data). There's a frequency of 364 in the time series, FYI.
Below is the plot of the terms using fourier(data,3). Basically, it looks like mirror images of the existing data.
Looking at just the sin1 term of the output, again, we get some variation that shows similar 364-day seasonality in line with the data above.
However, when I plot the results of the Fourier forecast using fourierf(data,3, 410) I see the below data. It appears far more smooth than the terms provided by the original fourier function.
So, I wonder how the results of fourier() and fourierf() are related. Is it possible to just see one consolidated Fourier result, so that you can see the sin or cosine result moving through existing data and then through the forecasting period? If not, how can I confirm that the terms created by fourierf() fit the in-sample data?
I want to use it in an auto.arima or glm function with other external regressors like this:
trainFourier<-fourier(data,3)
trainFourier<-as.data.frame(trainFourier)
trainFourier$exogenous<-exogenousData
arima.object<-auto.arima(data, xreg=trainFourier)
futureFourier<-fourierf(data,3, 410)
fourierForecast<-forecast(arima.object, xreg=futureFourier, h=410)
and want to be completely sure that the auto.arima has the proper fitting (using the terms from fourier()) to what I'll put in under xreg for forecast (which has terms from a different function, i.e. ffourier()).
Figured out the problem. I was using both the fda and forecast packages. fda, which is for functional data analysis and regression, has its own fourier() function. If I detach fda, my S1 term from fourier(data,3) looks like this:
which lines up nicely with the Fourier forecast if I use ts.plot(c(trainFourier$S1,futureFourier$S1))
Moral of the story -- watch what your packages supress, folks!
we have to fit about 2000 or odd time series every month,
they have very idiosyncratic behavior in particular, some are arma/arima, some are ewma, some are arch/garch with or without seasonality and/or trend (only thing in common is the time series aspect).
one can in theory build ensemble model with aic or bic criterion to choose the best fit model but is the community aware of any library which attempts to solve this problem?
Google made me aware of the below one by Rob J Hyndman
link
but are they any other alternatives?
There are two automatic methods in the forecast package: auto.arima() which will handle automatic modelling using ARIMA models, and ets() which will automatically select the best model from the exponential smoothing family (including trend and seasonality where appropriate). The AIC is used in both cases for model selection. Neither handles ARCH/GARCH models though. The package is described in some detail in this JSS article: http://www.jstatsoft.org/v27/i03
Further to your question:
When will it be possible to use
forecast package functions, especially
ets function, with high dimensional
data(weekly data, for example)?
Probably early next year. The paper is written (see robjhyndman.com/working-papers/complex-seasonality) and we are working on the code now.
Thanks useRs, I have tried the forecast package, that too as a composite of arima and ets, but not to much acclaim from aic or bic(sbc), so i am now tempted to treat each of the time series to its own svm(support vector machine) because of its better genralization adaptability and also being able to add other variables apart from lags and non linear kernel functions
Any premonitions?