Comparing Kernel Density Estimation plots - r

I am actually a novice to R and stats.. Could something like this be done in R
Determining the density estimates of two samples ( 2 Vectors )..??
I have done this Using R and obtained 2 density curves for the 2 samples using kernel density estimation ..
Is there anyway to quantitatively compare how similar/Dissimilar the density estimates of 2 samples are..?
I am trying to find out which data sample exhibits has a similar distribution to a particular distribution..
I am using R Language... Can somebody please help..??

You can use Kolmogorov-Smirnov test (ks.test) to compare two distributions. Cramer-von-Mises test is another one. There is this PDF Fitting Distributions with R where they also list other tests that are available (although the nortest package that he uses only tests for normality).

Apprentice Queue is right about using the Kolmogorov-Smirnoff test, but I wanted to add a warning: don't use it on its own. You should visually compare the distributions as well, either with two kernel density plots or histograms, or with a qqplot. Human brains are very good at playing spot-the-difference.

You can try calculating the Earth mover's distance

Related

Generate data from an arbitrary multivariate continuous density function

I am trying to sample from a multivariate distribution given by a (quite complex, but continuous) density function in R. For the univariate case I used AbscontDistribution from the distr package, but I cannot make it work for the multivariate case.
I tried finding an appropriate package for this problem online, but cannot find one.
Any ideas?
Thanks! :)

Predict distributions parameters - algebraic solver

I'm wondering if there is an R package which can help me to get the correct parameters for a distribution of my choice and for intervals of my choice.
For Instance, here Betancourt is looking at inverse gamma and he wants to learn which set of parameters will give >1% below 2 and >1% above 20 (like the graph below). Stan's solver returns the parameters for inv-gamma which results the intervals of interest. Is there any solution applied directly on R?
Or in other words,
I have the distribution
I have the intervals
Can I learn the correct parameters?
Thanks

Detect multimodal distribution and split the data in R

I have a data with more than 10000 distributions looking like the ones in red. I want to compare each one of them with a reference distribution like the one in blue. Because some are unimodal and some are multimodal I cannot use a t-test for all of them. So I am trying to detect multimodal distribution to apply a conditional test (t-test for normal distribution, mann-whithney for multimodal distribution - If any other idea please let me know). Is there any way to detect multimodal distribution?
I am also thinking about splitting the modes when I have a multimodal distribution and compare each of the mode to the reference. Is this possible? I found this SO link Calculate the modes in a multimodal distribution in R but didn't find anything more recent.
I tried mclust to find how many mode can be found but it doesn't work well
as it will find 2 mode when the distribution looks unimodal.
library(mclust)
clust <- Mclust(data$sample_frequency)
I also tried dip.test
library(diptest)
dip.test(b$sample_frequency)
but again the p-value will not always be correct (for example the plot 77 will be significaant at p=0.001 when it will be at p=0.076 for the plot 79).
Any help/thought is welcome!
Thanks!

R identifying type of frequency distribution

I am interested in frequency distributions that are not normally distributed.
If I have a frequency distributions table which is not normally distributed.
Is there a function or package that will identify the type of distribution for me?
You can use the fitdistr function (library MASS i think) and check for yourself if you find a 'fitting' distribution. However i suggest that you plot the function first and see how it looks like. This approach is generally not recommended as you always can use different parameters to fit a distribution and thus confuse one distribution with another. If you have found a suited distribution you should test it against data.
Edit: For instance a normal distribution may look like a poisson distribution. Fitting is in my oppinion only useful if you have enough random variables. Otherwise just draw variables from your data if you need to
You can always try to test whether a distribution is adequate for your data with QQ plot. If you have data that is dynamic, I would suggest that you use ECDF (Empirical Cumulative Distribution Function) which will give you more precise distributions as your data grows. You can use ECDF in R with the ecdf() function.

Histogram matching - image processing - c/c++

I have two histograms.
int Hist1[10] = {1,4,3,5,2,5,4,6,3,2};
int Hist1[10] = {1,4,3,15,12,15,4,6,3,2};
Hist1's distribution is of type multi-modal;
Hist2's distribution is of type uni-modal with single prominent peak.
My questions are
Is there any way that i could determine the type of distribution programmatically?
How to quantify whether these two histograms are similar/dissimilar?
Thanks
Raj,
I posted a C function in your other question ( automatically compare two series -Dissimilarity test ) that will compute divergence between two sets of similar data. It's actually intended to tell you how closely real data matches predicted data but I suspect you could use it for your purpose.
Basically, the smaller the error, the more similar the two sets are.
These are just guesses, but I would try fitting each distribution as a gaussian distribution and use something like the R-squared value to determine if the distribution is uni-modal or not.
As to the similarity between the two distributions, I would try doing an autocorrelation and using the peak positive value in the autocorrelation as a similarity measure. These ideas are pretty rough, but hopefully they give you some ideas.
For #2, you could calculate their cross-correlation (so long as the buckets themselves can be sorted). That would give you a rough estimation of what "similarity".
Comparison of Histograms (For Use in Cloud Modeling).
(That's an MS .doc file.)
There are a variety of software packages that will "fit" your distributions to known discrete distributions for you - Minitab, STATA, R, etc. A reference to fitting distributions in R is here. I wouldn't advise programming this from scratch.
Regarding distribution comparisons, if neither distribution fits a known distribution (Poisson, Binomial, etc.), then you need to use non-parametric methods described here.

Resources