Predict distributions parameters - algebraic solver - r

I'm wondering if there is an R package which can help me to get the correct parameters for a distribution of my choice and for intervals of my choice.
For Instance, here Betancourt is looking at inverse gamma and he wants to learn which set of parameters will give >1% below 2 and >1% above 20 (like the graph below). Stan's solver returns the parameters for inv-gamma which results the intervals of interest. Is there any solution applied directly on R?
Or in other words,
I have the distribution
I have the intervals
Can I learn the correct parameters?
Thanks

Related

Quantile Regression with Time-Series Models (ARIMA-ARCH) in R

I am working on quantile forecasting with time-series data. The model I am using is ARIMA(1,1,2)-ARCH(2) and I am trying to get quantile regression estimates of my data.
So far, I have found "quantreg" package to perform quantile regression, but I have no idea how to put ARIMA-ARCH models as the model formula in function rq.
rq function seems to work for regressions with dependent and independent variables but not for time-series.
Is there some other package that I can put time-series models and do quantile regression in R? Any advice is welcome. Thanks.
I just put an answer on the Data Science forum.
It basically says that most of the ready made packages are using so called exact test based on assumption on the distribution (independent identical normal-Gauss distribution, or wider).
You also have a family of resampling methods in which you simulate a sample with a similar distribution of your observed sample, perform your ARIMA(1,1,2)-ARCH(2) and repeat the process a great number of times. Then you analyze this great number of forecast and measure (as opposed to compute) your confidence intervals.
The resampling methods differs in the way to generate the simulated samples. The most used are:
The Jackknife: in which you "forget" one point, that is you simulate a n samples of size n-1 (if n is the size of the observed sample).
The Bootstrap: in which you simulate a sample by taking n values of the original sample with replacements: some will be taken once, some twice or more, some never,...
It is a (not easy) theorem that the expectation of the confidence intervals, as most of the usual statistical estimators, are the same on the simulated sample than on the original sample. With the difference that you can measure them with a great number of simulations.
Hello and welcome to StackOverflow. Please take some time to read the help page, especially the sections named "What topics can I ask about here?" and "What types of questions should I avoid asking?". And more importantly, please read the Stack Overflow question checklist. You might also want to learn about Minimal, Complete, and Verifiable Examples.
I can try to address your question, although this is hard since you don't provide any code/data. Also, I guess by "put ARIMA-ARCH models" you actually mean that you want to make an integrated series stationary using an ARIMA(1,1,2) plus an ARCH(2) filters.
For an overview of the R time-series capabilities you can refer to the CRAN task list.
You can easily apply these filters in R with an appropriate function.
For instance, you could use the Arima() function from the forecast package, then compute the residuals with residuals() from the stats package. Next, you can use this filtered series as input for the garch() function from the tseries package. Other possibilities are of course possible. Finally, you can apply quantile regression on this filtered series. For instance, you can check out the dynrq() function from the quantreg package, which allows time-series objects in the data argument.

R identifying type of frequency distribution

I am interested in frequency distributions that are not normally distributed.
If I have a frequency distributions table which is not normally distributed.
Is there a function or package that will identify the type of distribution for me?
You can use the fitdistr function (library MASS i think) and check for yourself if you find a 'fitting' distribution. However i suggest that you plot the function first and see how it looks like. This approach is generally not recommended as you always can use different parameters to fit a distribution and thus confuse one distribution with another. If you have found a suited distribution you should test it against data.
Edit: For instance a normal distribution may look like a poisson distribution. Fitting is in my oppinion only useful if you have enough random variables. Otherwise just draw variables from your data if you need to
You can always try to test whether a distribution is adequate for your data with QQ plot. If you have data that is dynamic, I would suggest that you use ECDF (Empirical Cumulative Distribution Function) which will give you more precise distributions as your data grows. You can use ECDF in R with the ecdf() function.

Nonlinear regression / Curve fitting with L-infinity norm

I am looking into time series data compression at the moment.
The idea is to fit a curve on a time series of n points so that the maximum deviation on any of the points is not greater than a given threshold. In other words, none of the values that the curve takes at the points where the time series is defined, should be "further away" than a certain threshold from the actual values.
Till now I have found out how to do nonlinear regression using the least squares estimation method in R (nls function) and other languages, but I haven't found any packages that implement nonlinear regression with the L-infinity norm.
I have found literature on the subject:
http://www.jstor.org/discover/10.2307/2006101?uid=3737864&uid=2&uid=4&sid=21100693651721
or
http://www.dtic.mil/dtic/tr/fulltext/u2/a080454.pdf
I could try to implement this in R for instance, but I first looking to see if this hasn't already been done and that I could maybe reuse it.
I have found a solution that I don't believe to be "very scientific": I use nonlinear least squares regression to find the starting values of the parameters which I subsequently use as starting points in the R "optim" function that minimizes the maximum deviation of the curve from the actual points.
Any help would be appreciated. The idea is to be able to find out if this type of curve-fitting is possible on a given time series sequence and to determine the parameters that allow it.
I hope there are other people that have already encountered this problem out there and that could help me ;)
Thank you.

Comparing Kernel Density Estimation plots

I am actually a novice to R and stats.. Could something like this be done in R
Determining the density estimates of two samples ( 2 Vectors )..??
I have done this Using R and obtained 2 density curves for the 2 samples using kernel density estimation ..
Is there anyway to quantitatively compare how similar/Dissimilar the density estimates of 2 samples are..?
I am trying to find out which data sample exhibits has a similar distribution to a particular distribution..
I am using R Language... Can somebody please help..??
You can use Kolmogorov-Smirnov test (ks.test) to compare two distributions. Cramer-von-Mises test is another one. There is this PDF Fitting Distributions with R where they also list other tests that are available (although the nortest package that he uses only tests for normality).
Apprentice Queue is right about using the Kolmogorov-Smirnoff test, but I wanted to add a warning: don't use it on its own. You should visually compare the distributions as well, either with two kernel density plots or histograms, or with a qqplot. Human brains are very good at playing spot-the-difference.
You can try calculating the Earth mover's distance

Histogram matching - image processing - c/c++

I have two histograms.
int Hist1[10] = {1,4,3,5,2,5,4,6,3,2};
int Hist1[10] = {1,4,3,15,12,15,4,6,3,2};
Hist1's distribution is of type multi-modal;
Hist2's distribution is of type uni-modal with single prominent peak.
My questions are
Is there any way that i could determine the type of distribution programmatically?
How to quantify whether these two histograms are similar/dissimilar?
Thanks
Raj,
I posted a C function in your other question ( automatically compare two series -Dissimilarity test ) that will compute divergence between two sets of similar data. It's actually intended to tell you how closely real data matches predicted data but I suspect you could use it for your purpose.
Basically, the smaller the error, the more similar the two sets are.
These are just guesses, but I would try fitting each distribution as a gaussian distribution and use something like the R-squared value to determine if the distribution is uni-modal or not.
As to the similarity between the two distributions, I would try doing an autocorrelation and using the peak positive value in the autocorrelation as a similarity measure. These ideas are pretty rough, but hopefully they give you some ideas.
For #2, you could calculate their cross-correlation (so long as the buckets themselves can be sorted). That would give you a rough estimation of what "similarity".
Comparison of Histograms (For Use in Cloud Modeling).
(That's an MS .doc file.)
There are a variety of software packages that will "fit" your distributions to known discrete distributions for you - Minitab, STATA, R, etc. A reference to fitting distributions in R is here. I wouldn't advise programming this from scratch.
Regarding distribution comparisons, if neither distribution fits a known distribution (Poisson, Binomial, etc.), then you need to use non-parametric methods described here.

Resources