I have a multivariate data and I am interested to compute the distance of complete data to multivariate normal distribution. I want to use R. I have seen some functions like shapiro-wilk test etc. But from them I can only understand if p-value is less <0.05 it does not follow normal distribution. But I want to know how much it is far from the normal distribution. Can anyone please refer me to some functions that I can refer to for use.
Use the mqqnorm function from the RVAideMemoire package. It shows, among others, Mahalanobis distances. From the function example:
x <- 1:30+rnorm(30)
y <- 1:30+rnorm(30,1,3)
mqqnorm(cbind(x,y))
Related
I have a data set where observations come from highly distinct groups. Each group may have a wildly different distribution, so I am trying to find the best distribution using fitdist from fitdistrplus, then use gamlssML from the gamlss package to find the best parameters.
My issue is with transforming the data after this step. For some of the distributions, like the Box-Cox t, I can find the equation for normalizing the data using the BCT coefficients, but for many of these distributions I cannot.
Does gamlss have a function that normalizes the data after fitting? Their documentation only provides the transformations for a small number of distributions https://www.gamlss.com/wp-content/uploads/2018/01/DistributionsForModellingLocationScaleandShape.pdf
Thanks a lot
The normalised data values (for any distribution) are exactly equal to the residuals from a gamlss fit,
m1 <- gamlss()
which can be accessed by
residuals(m1) or
m1$residuals
I am using the rcorr function within the Hmisc package in R to develop Pearson correlation coefficients and corresponding p-values when analyzing the correlation of several fishery landings time series. The data isn't really important here but what I would like to know is: how are the p-values calculated for this? It states that the asymptotic P-values are approximated by using the t or F distributions but I am wondering if someone could help me find some more information on this or an equation that describes how exactly these values are calculated.
I am trying to use Julia to estimate a continuous univariate distribution using N observed data points (stored as an array of Float64 numbers), and then sample from this estimated distribution. I have no prior knowledge restricting attention to some family of distributions.
I was thinking of using the KernelDensity package to estimate the distribution, but I'm not sure how to sample from the resulting output.
Any help/tips would be much appreciated.
Without any restrictions on the estimated distribution, a natural candidate would be the empirical distribution function (see Wikipedia). For this distribution there are very nice theorems about convergence to actual distribution (see Dvoretzky–Kiefer–Wolfowitz inequality).
With this choice, sampling is especially simple. If dataset is a list of current samples, then dataset[rand(1:length(dataset),sample_size)] is a set of new samples from the empirical distribution. With the Distributions package, it could be more readable, like so:
using Distributions
new_sample = sample(dataset,sample_size)
Finally, Kernel density estimation is also good, but might need a parameter to be chosen (the kernel and its width). This shows a preference for a certain family of distributions. Sampling from a kernel distribution is surprisingly similar to sampling from the empirical distribution: 1. choose a sample from the empirical distributions; 2. perturb each sample using a sample from the kernal function.
For example, if the kernel function is a Normal distribution of width w, then the perturbed sample could be calculated as:
new_sample = dataset[rand(1:length(dataset),sample_size)]+w*randn(sample_size)
I have a vector of data. I need build the density / distribution function and from that, extract a random sample, i.e. I need obtain the result that give us a function similar to rnorm(), rpois(), rbinom(), etc, but with a distribution built from a vector of data. All in R. Thank you so much.
It has nothing to do with generate stochastic random deviates.
I know the function sample() do something similar, but not exactly. If I use sample() I obtain only elements from my original data, as a discrete distribution and I need as a continuous distribution.
I am trying to train my data with nearest shrunken centroid classifier using pamr.train() function in pamr package of R. However, I also have a vector including sample weights except the training data. Is there any way to use this function with considering these sample weights?
Or, is there a way to obtain the source code of this function. If so, I can write the codes for weighted mean and weighted variances instead of the unweighted ones.
Thank you,