Smoothing before FPCA - r

I am trying to use FPCA on my time series and I know that I should do some smoothing before using FPCA. However, I don't know what smoothing method is good?
Any resource is much appreciated!
Thanks!

Smoothing will depend on the data you have. Considering the FDA parametric approach (Ramsay & Silverman, 2005) the basis functions is your choice: in general, it is common to use the “fourier” basis for periodic data, and “bspline” basis for non-recurrent data. B-splines have a very good local behaviour.
You can find more info about the implementation of different basis functions in "Functional data analysis with R and MATLAB" (Ramsay et al. 2009)
There's no specific rule to choose the dimension of the basis as it depends on several factors. I strongly recommend studying the least square error of the process in all possible dimensions, and then to choose it by the region of convenience. Some packages have implemented functions to calculate it; e.g. fda.usc::min.basis() -best minimum number of basis functions- and also by the cross-validation method, e.g. fda.usc::CV.S().
P-splines provide the lowest approximation errors, its computational implementation is easier and are quite insensitive to the choice of knots. You can try to smooth your functional object like:
library(fda)
fdobj <- create.bspline.basis(df,nbasis=k,norder=4)
smooth.fdPar(fdobj, Lfdobj=NULL, lambda=0,
estimate=TRUE, penmat=NULL)

Related

How do I numerically compare the Dymos solution to the simulated solution?

I want to conduct a convergence study for my Dymos optimization results where I vary the number of nodes and compare the simulated solution to the optimization solution. From what I understand, Dymos fits polynomials to the system dynamics to represent the timeseries solution. What is the best way to compare the polynomial trajectory of the optimization solution to the trajectory of the simulated solution? I specifically want to evaluate the difference between the two trajectories away from the collocation/control nodes... to show that the polynomial fitting actually represents the simulated solution. How would I access the polynomial fitting data?
Thanks in advance.
For some of the testing we have an assert_timeseries_near_equal function that treats the more dense time series as the truth and tests that the less dense timeseries (usually the discrete solution) is reasonably close to it.
We're actually working on this method a bit more explicit right now so it's a little easier for users to apply in general cases, such as comparing discrete solutions from two different cases.
In general, there's a few different ways you can test your explicit results against an explicit integration. You could just verify that the final states of the two solutions are reasonably close. Since the error tends to increase over the course of the trajectory this is often good enough for a quick check. The downside of this approach is that it doesn't test that both solutions took the same path to the final condition.
To test the solution away from the nodes I'd recommend the following: Add a second timeseries output to the relevant phase that contains more segments or higher order segments. This timeseries will have more nodes. Dymos will interpolate from the solution's collocation grid onto this more dense timeseries output grid. Comparing this against the explicit simulation should still match exactly in terms of times, controls, and parameters, you'll better capture the interpolating state polynomials vs the explicitly simulated results.
There are other statistical methods out there for comparing timeseries that you can bring to bear, but visualizing the explicit trajectory plus some error bound as a "tube" into which we want to fit the discrete solution is usually how I handle it.

Can I use automatic differentiation for non-differentiable functions?

I am testing performance of different solvers on minimizing an objective function derived from simulated method of moments. Given that my objective function is not differentiable, I wonder if automatic differentiation would work in this case? I tried my best to read some introduction on this method, but I couldn't figure it out.
I am actually trying to use Ipopt+JuMP in Julia for this test. Previously, I have tested it using BlackBoxoptim in Julia. I will also appreciate if you could provide some insights on optimization of non-differentiable functions in Julia.
It seems that I am not clear on "non-differentiable". Let me give you an example. Consider the following objective function. X is dataset, B is unobserved random errors which will be integrated out, \theta is parameters. However, A is discrete and therefore not differentiable.
I'm not exactly an expert on optimization, but: it depends on what you mean by "nondifferentiable".
For many mathematical functions that are used, "nondifferentiable" will just mean "not everywhere differentiable" -- but that's still "differentiable almost everywhere, except on countably many points" (e.g., abs, relu). These functions are not a problem at all -- you can just chose any subgradient and apply any normal gradient method. That's what basically all AD systems for machine learning do. The case for non-singular subgradients will happen with low probability anyway. An alternative for certain forms of convex objectives are proximal gradient methods, which "smooth" the objective in an efficient way that preserves optima (cf. ProximalOperators.jl).
Then there's those functions that seem like they can't be differentiated at all, since they seem "combinatoric" or discrete, but are in fact piecewise differentiable (if seen from the correct point of view). This includes sorting and ranking. But you have to find them, and describing and implementing the derivative is rather complicated. Whether such functions are supported by an AD system depends on how sophisticated its "standard library" is. Some variants of this, like "permute", can just fall out AD over control structures, while move complex ones require the primitive adjoints to be manually defined.
For certain kinds of problems, though, we just work in an intrinsically discrete space -- like, integer parameters of some probability distributions. In these case, differentiation makes no sense, and hence AD libraries define their primitives not to work on these parameters. Possible alternatives are to use (mixed) integer programming, approximations, search, and model selection. This case also occurs for problems where the optimized space itself depends on the parameter in question, like the second argument of fill. We also have things like the ℓ0 "norm" or the rank of a matrix, for which there exist well-known continuous relaxations, but that's outside of the scope of AD).
(In the specific case of MCMC for discrete or dimensional parameters, there's other ways to deal with that, like combining HMC with other MC methods in a Gibbs sampler, or using a nonparametric model instead. Other tricks are possible for VI.)
That being said, you will rarely encounter complicated nowhere differentiable continuous functions in optimization. They are already complicated to describe, are just unlikely to arise in the kind of math we use for modelling.

K-Means Distance Measure - Large Data and mixed Scales

I´ve a question regarding k-means clustering. We have a dataset with 120,000 observations and need to compute a k-means cluster solution with R. The problem is that k-means usually use Euclidean Distance. Our dataset consists of 3 continous variables, 11 ordinal (Likert 0-5) (i think it would be okay to handle them like continous) and 5 binary variables. Do you have any suggestion for a distance measure that we can use for our k-means approach with regards to the "large" dataset? We stick to k-means, so I really hope one of you has a good idea.
Cheers,
Martin
One approach would be to normalize the features and then just use the 11-dimensional
Euclidean Distance. Cast the binary values to 0/1 (Well, it's R, so it does that anyway) and go from there.
I don't see an immediate problem with this method other than k-means in 11 dimensions will definitely be hard to interpret. You could try to use a dimensionality reduction technique and hopefully make the k-means output easier to read, but you know way more about the data set than we ever could, so our ability to help you is limited.
You can certainly encode there binary variables as 0,1 too.
It is a best practise in statistics to not treat likert scale variables as numeric, because of that uneven distribution.
But I don't you will get meaningful k-means clusters. That algorithm is all about computing means. That makes sense on continuous variables. Discrete variables usually lack "resolution" for this to work well. Three mean then degrades to a "frequency" and then the data should be handled very differently.
Do not choose the problem by the hammer. Maybe your data is not a nail; and even if you'd like to make it with kmeans, it won't solve your problem... Instead, formulate your problem, then choose the right tool. So given your data, what is a good cluster? Until you have an equation that measures this, handing the data won't solve anything.
Encoding the variables to binary will not solve the underlying problem. Rather, it will only aid in increasing the data dimensionality, an added burden. It's best practice in statistics to not alter the original data to any other form like continuous to categorical or vice versa. However, if you are doing so, i.e. the data conversion then it must be in sync with the question to solve as well as you must provide valid justification.
Continuing further, as others have stated, try to reduce the dimensionality of the dataset first. Check for issues like, missing values, outliers, zero variance, principal component analysis (continuous variables), correspondence analysis (for categorical variables) etc. This can help you reduce the dimensionality. After all, data preprocessing tasks constitute 80% of analysis.
Regarding the distance measure for mixed data type, you do understand the mean in k will work only for continuous variable. So, I do not understand the logic of using the algorithm k-means for mixed datatypes?
Consider choosing other algorithm like k-modes. k-modes is an extension of k-means. Instead of distances it uses dissimilarities (that is, quantification of the total mismatches between two objects: the smaller this number, the more similar the two objects). And instead of means, it uses modes. A mode is a vector of elements that minimizes the dissimilarities between the vector itself and each object of the data.
Mixture models can be used to cluster mixed data.
You can use the R package VarSelLCM which models, within each cluster, the continuous variables by Gaussian distributions and the ordinal/binary variables.
Moreover, missing values can be managed by the model at hand.
A tutorial is available at: http://varsellcm.r-forge.r-project.org/

How to consider different costs for different types of errors in SVM using R

Let Y be a binary variable.
If we use logistic regression for modeling, then we can use cv.glm for cross validation and there we can specify the cost function in the cost argument. By specifying the cost function, we can assign different unit costs to different types of errors:predicted Yes|reference is No or predicted No|reference is Yes.
I am wondering if I could achieve the same in SVM. In other words, is there a way for me to specify a cost(loss) function instead of using built-in loss function?
Besides the Answer by Yueguoguo, there is also three more solutions, the standard Wrapper approach, hyperplane tuning and the one in e1017.
The Wrapper approach (available out of the box for example in weka) is applicable to almost all classifiers. The idea is to over- or undersample the data in accordance with the misclassification costs. The learned model if trained to optimise accuracy is optimal under the costs.
The second idea is frequently used in textminining. The classification is svm's are derived from distance to the hyperplane. For linear separable problems this distance is {1,-1} for the support vectors. The classification of a new example is then basically, whether the distance is positive or negative. However, one can also shift this distance and not make the decision and 0 but move it for example towards 0.8. That way the classifications are shifted in one or the other direction, while the general shape of the data is not altered.
Finally, some machine learning toolkits have a build in parameter for class specific costs like class.weights in the e1017 implementation. the name is due to the fact that the term cost is pre-occupied.
The loss function for SVM hyperplane parameters is automatically tuned thanks to the beautiful theoretical foundation of the algorithm. SVM applies cross-validation for tuning hyperparameters. Say, an RBF kernel is used, cross validation is to select the optimal combination of C (cost) and gamma (kernel parameter) for the best performance, measured by certain metrics (e.g., mean squared error). In e1071, the performance can be obtained by using tune method, where the range of hyperparameters as well as attribute of cross-validation (i.e., 5-, 10- or more fold cross validation) can be specified.
To obtain comparative cross-validation results by using Area-Under-Curve type of error measurement, one can train different models with different hyperparameter configurations and then validate the model against sets of pre-labelled data.
Hope the answer helps.

R package for creating multidimensional, periodic basis function?

I am doing some modelling work in which I am trying to parametrise an effect that varies with season and time of day. The time of day effect differs between seasons in a complex way so it seems the most general approach is to model the effect in a periodic [time of day, day of year] space.
The effect being described has a non-linear relationship to the actual predictor and predicted quantities, so I need an explicit parametrisation that I can tune using non-linear optimisation.
So, the most obvious option would be a 2D Fourier basis. Can anyone recommend an R package for generating this? I found the package fda which has the function 'create.fourier.basis' but this appears to only apply to 1D.
Beyond a Fourier approach, the sampling of the data I have is highly irregular in the [time of day, day of year] plane so ideally a more localised approach such as a periodic cubic spline in which I can place more knots in the data rich parts of the plane would be preferable. Does anyone know of an R package that creates a 2D basis for this kind of representation?
The mgcv package can create tensor product basis functions of two or more underlying basis functions. It also allows for cyclic cubic and p splines, which can be used for the variables you mention, as the underlying basis functions for the tensor product.
As mgcv comes with R I would start with that. Look at ?te and ?smooth.terms for starters.
The fda package is suited to handle multivariate functional data. Have a look on e.g.
?fd
The help for fd states that assigning a threedimensional array to your basis function object gives you a multivariate functional data object. In their book, Ramsay, Hooker and Graves (2009) use multivariate functional data objects to capture handwriting data with a 2D definition of the pen location plus the time dimension.
Maybe I am wrong, but couldn't you just apply the same framework for your data which is defined over season, daytime, and effect?

Resources