fourier() vs fourierf() function in R

I'm using the fourier() and fourierf() functions in Ron Hyndman's excellent forecast package in R. Looking to verify whether the same terms are selected and used in fourier() and fourierf(), I plotted a few of the output terms.
Below is the original data using ts.plot(data). There's a frequency of 364 in the time series, FYI.
Below is the plot of the terms using fourier(data,3). Basically, it looks like mirror images of the existing data.
Looking at just the sin1 term of the output, again, we get some variation that shows similar 364-day seasonality in line with the data above.
However, when I plot the results of the Fourier forecast using fourierf(data,3, 410) I see the below data. It appears far more smooth than the terms provided by the original fourier function.
So, I wonder how the results of fourier() and fourierf() are related. Is it possible to just see one consolidated Fourier result, so that you can see the sin or cosine result moving through existing data and then through the forecasting period? If not, how can I confirm that the terms created by fourierf() fit the in-sample data?
I want to use it in an auto.arima or glm function with other external regressors like this:
arima.object<-auto.arima(data, xreg=trainFourier)
futureFourier<-fourierf(data,3, 410)
fourierForecast<-forecast(arima.object, xreg=futureFourier, h=410)
and want to be completely sure that the auto.arima has the proper fitting (using the terms from fourier()) to what I'll put in under xreg for forecast (which has terms from a different function, i.e. ffourier()).

Figured out the problem. I was using both the fda and forecast packages. fda, which is for functional data analysis and regression, has its own fourier() function. If I detach fda, my S1 term from fourier(data,3) looks like this:
which lines up nicely with the Fourier forecast if I use ts.plot(c(trainFourier$S1,futureFourier$S1))
Moral of the story -- watch what your packages supress, folks!


Ensemble machine learning model with NNETAR and BRNN

I used the forecast package to forecast the daily time-series of variable Y using its lag values and a time series of an external parameter X. I found nnetar model (a NARX model) was the best in terms of overall performance. However, I was not able to get the prediction of peaks of the time series well despite my various attempts with parameter tuning.
I then extracted the peak values (above a threshold) of Y (and of course this is not a regular time series anymore) and corresponding X values and tried to fit a regression model (note: not an autoregression model) using various models in carat package. I found out the prediction of peak values using brnn(Bidirectional recurrent neural networks) model just using X values is better than that of nnetar which uses both lag values and X values.
Now my question is how do I go from here to create ensamples of these two models (i.e whenever the prediction using brnn regression model ( or any other regression model) is better I want to replace the prediction using nnetar and move forward - I am mostly concerned about the peaks)? Is this a commonly used approach?
Instead of trying to pick one model that would be the superior at anytime, it's typically better to do an average of the models, in order to include as many individual views as possible.
In the experiments I've been involved in, where we tried to pick one model that would outperform, based on historical performance, it's typically shown that a simple average was as good or better. Which is in line with the typical results on this problem:
So, before you try to go more advanced at it by using trying to pick a specific model based on previous performance, or by using an weighted average, consider doing a simple average of the two models.
If you want to continue with a sort of selection/weighted averaging, try to have a look at the FFORMA package in R:
I've not tried the specific package (yet), but have seen promising results in my test using the original m4metalearning package.

Multivariate ARIMA (MARIMA) modelling in R

I am currently using the Marima package for R invented by Henrik Spliid in order to forecast multivariate time series with ARIMA.
Overview can be found here:
When using the Marima function, it is required to define both the order of AR(p) and MA(q) first.
My question is, how can I determine appropriate values for p and q?
I know when it comes to univariate ARIMA analysis, that auto.arima gives a good suggestion for p and q. However, when I use auto.arima for every single univariate time series I want to analyze, there are (slightly) different suggestions for each time series. (For example (2,2,1) for the first, (1,1,1) for the second and so on)
Since I want to analyze all of the time series combined in the multivariate ARIMA model and I only can choose one value for each p and q (if I understood it correctly), I wonder how I can choose those values the most accurate way.
Could I just try to run the model a couple times and see what values for p and q work best (e.g. by testing the residuals of the forecast)?
What are your suggestions?
I would appreciate any help!

Quantile Regression with Time-Series Models (ARIMA-ARCH) in R

I am working on quantile forecasting with time-series data. The model I am using is ARIMA(1,1,2)-ARCH(2) and I am trying to get quantile regression estimates of my data.
So far, I have found "quantreg" package to perform quantile regression, but I have no idea how to put ARIMA-ARCH models as the model formula in function rq.
rq function seems to work for regressions with dependent and independent variables but not for time-series.
Is there some other package that I can put time-series models and do quantile regression in R? Any advice is welcome. Thanks.
I just put an answer on the Data Science forum.
It basically says that most of the ready made packages are using so called exact test based on assumption on the distribution (independent identical normal-Gauss distribution, or wider).
You also have a family of resampling methods in which you simulate a sample with a similar distribution of your observed sample, perform your ARIMA(1,1,2)-ARCH(2) and repeat the process a great number of times. Then you analyze this great number of forecast and measure (as opposed to compute) your confidence intervals.
The resampling methods differs in the way to generate the simulated samples. The most used are:
The Jackknife: in which you "forget" one point, that is you simulate a n samples of size n-1 (if n is the size of the observed sample).
The Bootstrap: in which you simulate a sample by taking n values of the original sample with replacements: some will be taken once, some twice or more, some never,...
It is a (not easy) theorem that the expectation of the confidence intervals, as most of the usual statistical estimators, are the same on the simulated sample than on the original sample. With the difference that you can measure them with a great number of simulations.
I can try to address your question, although this is hard since you don't provide any code/data. Also, I guess by "put ARIMA-ARCH models" you actually mean that you want to make an integrated series stationary using an ARIMA(1,1,2) plus an ARCH(2) filters.
For an overview of the R time-series capabilities you can refer to the CRAN task list.
You can easily apply these filters in R with an appropriate function.
For instance, you could use the Arima() function from the forecast package, then compute the residuals with residuals() from the stats package. Next, you can use this filtered series as input for the garch() function from the tseries package. Other possibilities are of course possible. Finally, you can apply quantile regression on this filtered series. For instance, you can check out the dynrq() function from the quantreg package, which allows time-series objects in the data argument.

Forecasting time series with R forecast package

I'm relatively new to R programming, but I've been reading your blogs and posts in order to get up-to-date with the forecast package. However, I have been struggling with the effect of seasonality.
Take for example the simplest signal possible:
train <- ts(sin((2*pi)*seq(from=0, to=10, by=0.01)))
If I just try to forecast this signal with brute force, I get irrelevant results:
However, if I manually detect the seasonality as 100, and do the following:
train <- ts(sin((2*pi)*seq(from=0, to=10, by=0.01)),frequency=100)
I get excellent forecasting results.
I'm honestly very puzzled by these results, which obviously happen for more complex signals.
If I remember correctly, when you create the time series object you have to specify its frequency. That way, the forecast method will be able to detect the seasonal pattern. There are some other ways to detect seasonality, like trying the auto arima function and checking if it selects a seasonal model. Apart from visual exploration, of course.

R package for creating multidimensional, periodic basis function?

I am doing some modelling work in which I am trying to parametrise an effect that varies with season and time of day. The time of day effect differs between seasons in a complex way so it seems the most general approach is to model the effect in a periodic [time of day, day of year] space.
The effect being described has a non-linear relationship to the actual predictor and predicted quantities, so I need an explicit parametrisation that I can tune using non-linear optimisation.
So, the most obvious option would be a 2D Fourier basis. Can anyone recommend an R package for generating this? I found the package fda which has the function 'create.fourier.basis' but this appears to only apply to 1D.
Beyond a Fourier approach, the sampling of the data I have is highly irregular in the [time of day, day of year] plane so ideally a more localised approach such as a periodic cubic spline in which I can place more knots in the data rich parts of the plane would be preferable. Does anyone know of an R package that creates a 2D basis for this kind of representation?
The mgcv package can create tensor product basis functions of two or more underlying basis functions. It also allows for cyclic cubic and p splines, which can be used for the variables you mention, as the underlying basis functions for the tensor product.
As mgcv comes with R I would start with that. Look at ?te and ?smooth.terms for starters.
The fda package is suited to handle multivariate functional data. Have a look on e.g.
The help for fd states that assigning a threedimensional array to your basis function object gives you a multivariate functional data object. In their book, Ramsay, Hooker and Graves (2009) use multivariate functional data objects to capture handwriting data with a 2D definition of the pen location plus the time dimension.
Maybe I am wrong, but couldn't you just apply the same framework for your data which is defined over season, daytime, and effect?
