Nonlinear regression / Curve fitting with L-infinity norm - r

I am looking into time series data compression at the moment.
The idea is to fit a curve on a time series of n points so that the maximum deviation on any of the points is not greater than a given threshold. In other words, none of the values that the curve takes at the points where the time series is defined, should be "further away" than a certain threshold from the actual values.
Till now I have found out how to do nonlinear regression using the least squares estimation method in R (nls function) and other languages, but I haven't found any packages that implement nonlinear regression with the L-infinity norm.
I have found literature on the subject:
http://www.jstor.org/discover/10.2307/2006101?uid=3737864&uid=2&uid=4&sid=21100693651721
or
http://www.dtic.mil/dtic/tr/fulltext/u2/a080454.pdf
I could try to implement this in R for instance, but I first looking to see if this hasn't already been done and that I could maybe reuse it.
I have found a solution that I don't believe to be "very scientific": I use nonlinear least squares regression to find the starting values of the parameters which I subsequently use as starting points in the R "optim" function that minimizes the maximum deviation of the curve from the actual points.
Any help would be appreciated. The idea is to be able to find out if this type of curve-fitting is possible on a given time series sequence and to determine the parameters that allow it.
I hope there are other people that have already encountered this problem out there and that could help me ;)
Thank you.

Related

Constrained Polynomial Regression - Fixed Maximum

I am trying to fit some kind of polynomical regression to a car engine performance curve. I know that the relationship between the studied two variables is not linear and should follow a quadratic function (performance v.s output power).
Power vs Performance
-14e-05{x^2}+0,009{x}+0,31545
Also I know that the derivative of the function that relates this two variables should be 0 (absolute maximum) when the engine is delivering the maximum power.
The problem comes when after fitting my curve I make the derivative of the function obtained through the polynomial regression I get has the maximum beyond the maximum real power output of the engine (under safety limits)
I have been looking for topics doing the same but I have found only issues related with the sum of the coefficients should be under certain value.
Any ideas to implement this in R?

How to fit a curve to data with sd in R?

I'm completely new to R, so apologies for asking something I'm sure must be basic. I just wonder if I can use the nls() command in R to fit a non-linear curve to a data structure where I have means and sd's, but not the actual replicates. I understand how to fit a curve to single data points or to replicates, but I can't see how to proceed when I have a mean+sd for each data point and I want R to consider variation in my data when fitting.
One possible way to go would be to simulate data using your means and standard deviations and do the regression with the simulated data. Doing this a number of times could give you a good impression on the margin of plausible values for your regression coefficients.

Understand Regression results

I have a set of numerical features that describe a phenomenon at different time points. In order to evaluate the individual performance of each feature, I perform a linear regression with a leave one out validation, and I compute the correlations and errors to evaluate the results.
So for a single feature, it would be something like:
Input: Feature F = {F_t1, F_t2, ... F_tn}
Input: Phenomenom P = {P_t1, P_t2, ... P_tn}
Linear Regression of P according to F, plus leave one out.
Evaluation: Compute correlations (linear and spearman) and errors (mean absolute and root mean squared)
For some of the variables, both correlations are really good (> 0.9), but when I take a look to the predictions, I realize that the predictions are all really close to the average (of the values to predict), so the errors are big.
How is that possible?
Is there a way to fix it?
For some technical precisions, I use the weka linear regression with the option "-S 1" in order to avoid the feature selection.
It seems to be because the problem we want to regress is not linear and we use a linear approach. Then it is possible to have good correlations and poor errors. It does not mean that the regression is wrong or really poor, but you have to be really careful and investigate further.
Anyway, a non linear approach that minimizes the errors and maximize the correlation is the way to go.
Moreover, outliers also make this problem occur.

Bootstrapping standard errors of cluster point process model (kppm)

I'd like to report the standard error of the clustering parameters (kappa, sigma) of an inhomogeneous Thomas point process model that I've fitted in spatstat. Yue and Loh (2015) reported doing this by a parametric bootstrap. I'm not very experienced in this concept, or applying it to point process models. How would I do this?
My first guess is to simulate my kppm a number of times and re-fit the resulting simulated points with the same covariates. Then, calculate the standard errors from the clustering parameters of each subsequent fitting. Is this correct? If so, how many simulations would be considered acceptable in this case? Thanks in advance for any pointers!
Basically your own description is completely correct.
My first guess is to simulate my kppm a number of times and re-fit the
resulting simulated points with the same covariates. Then, calculate
the standard errors from the clustering parameters of each subsequent
fitting.
The only question left is how many simulations to do. Basically the answer is: "As many as you have time to do!". It is common to see people do 1000 simulations, so why don't you start there?

Histogram matching - image processing - c/c++

I have two histograms.
int Hist1[10] = {1,4,3,5,2,5,4,6,3,2};
int Hist1[10] = {1,4,3,15,12,15,4,6,3,2};
Hist1's distribution is of type multi-modal;
Hist2's distribution is of type uni-modal with single prominent peak.
My questions are
Is there any way that i could determine the type of distribution programmatically?
How to quantify whether these two histograms are similar/dissimilar?
Thanks
Raj,
I posted a C function in your other question ( automatically compare two series -Dissimilarity test ) that will compute divergence between two sets of similar data. It's actually intended to tell you how closely real data matches predicted data but I suspect you could use it for your purpose.
Basically, the smaller the error, the more similar the two sets are.
These are just guesses, but I would try fitting each distribution as a gaussian distribution and use something like the R-squared value to determine if the distribution is uni-modal or not.
As to the similarity between the two distributions, I would try doing an autocorrelation and using the peak positive value in the autocorrelation as a similarity measure. These ideas are pretty rough, but hopefully they give you some ideas.
For #2, you could calculate their cross-correlation (so long as the buckets themselves can be sorted). That would give you a rough estimation of what "similarity".
Comparison of Histograms (For Use in Cloud Modeling).
(That's an MS .doc file.)
There are a variety of software packages that will "fit" your distributions to known discrete distributions for you - Minitab, STATA, R, etc. A reference to fitting distributions in R is here. I wouldn't advise programming this from scratch.
Regarding distribution comparisons, if neither distribution fits a known distribution (Poisson, Binomial, etc.), then you need to use non-parametric methods described here.

Resources