Extract random sample from a unknown distribution (no generate stochastic random deviates) - r

I have a vector of data. I need build the density / distribution function and from that, extract a random sample, i.e. I need obtain the result that give us a function similar to rnorm(), rpois(), rbinom(), etc, but with a distribution built from a vector of data. All in R. Thank you so much.
It has nothing to do with generate stochastic random deviates.
I know the function sample() do something similar, but not exactly. If I use sample() I obtain only elements from my original data, as a discrete distribution and I need as a continuous distribution.

Related

How to transform data after fitting a distribution with gamlss?

I have a data set where observations come from highly distinct groups. Each group may have a wildly different distribution, so I am trying to find the best distribution using fitdist from fitdistrplus, then use gamlssML from the gamlss package to find the best parameters.
My issue is with transforming the data after this step. For some of the distributions, like the Box-Cox t, I can find the equation for normalizing the data using the BCT coefficients, but for many of these distributions I cannot.
Does gamlss have a function that normalizes the data after fitting? Their documentation only provides the transformations for a small number of distributions https://www.gamlss.com/wp-content/uploads/2018/01/DistributionsForModellingLocationScaleandShape.pdf
Thanks a lot
The normalised data values (for any distribution) are exactly equal to the residuals from a gamlss fit,
m1 <- gamlss()
which can be accessed by
residuals(m1) or
m1$residuals

I am doing a risk aggregation of losses in R using poisson distribution as frequency of losses and ecdf as severity of losses

I am really new to this and I have no idea how to use the ecdf function in R. Below I have mention everything step by step:
Frequency of losses is defined using a Poisson distribution
Generate an ecdf function that is going to be used for the severity of losses.
Linearly interpolate the ecdf function.
Take inverse transform of the linearly interpolated ecdf function.
For example,
I can use code freq <- rpois(10,5) to generate the random number of loss frequency but further I have to use this vector to do steps 2-4 and I have no idea how to do that. For step 2 I am facing the problem that how can I use that Poisson distribution as an input and then use to compute severity using the ecdf function. If anybody knows this please help me.

How to figure out the parameters from mppm in R

I am working using the spatstat library in R.
I have several point pattern objects built from my own dataset. The point patterns contain only the x and y coordinates of the points in them. I wanted to fit the point patterns to a Gibbs process with Strauss interaction to build a model and simulate similar point patterns. I was able to use ppm function for that purpose if I work with one point pattern at a time. I used rmhmodel function on the ppm object returned from the ppm function. The rmhmodel function gave me the parameters beta, gamma and r, which I needed to use in rStrauss function further to simulate new point patterns. FYI, I am not using the simulate function directly as I want the new simulated point pattern to have flexible number of points that simulate does not give me.
Now, if I want to work with all the point patterns I have, I can build a hyperframe of point patterns as described in the replicated point pattern chapter of the Baddeley textbook, but it requires mppm function instead of ppm function to fit the model and mppm is not working with rmhmodel when I am trying to figure out the model parameters beta, gamma and r.
How can I extract the fitted beta, gamma and r from a mppm object?
There are several ways to do this.
If you print a fitted model (obtained from ppm or from mppm) simply by typing the name of the object, the printed output contains a description of the fitted model including the model parameters.
If you apply the function parameters to a fitted model obtained from ppm you will obtain a list of the parameter values with appropriate names.
fit <- ppm(cells ~ 1, Strauss(0.12))
fit
parameters(fit)
For a model obtained from mppm, there could be different parameter values applying to each row of the hyperframe of data, so you would have to do lapply(subfits(model), parameters) and the result is a list with one entry for each row of the hyperframe, containing the parameters relevant to each row.
A <- hyperframe(Bugs=waterstriders)
mfit <- mppm(Bugs ~ 1, data=A, Strauss(5))
lapply(subfits(mfit), parameters)
Alternatively you can extract the canonical parameters by coef and transform them to the natural parameters.
You wrote:
I am not using the simulate function directly as I want the new simulated point pattern to have flexible number of points that simulate does not give me.
This cannot be right. The function simulate.mppm generates simulated realisations with a variable number of points. Try simulate(mfit).

R: functions to determine distance of multivariate data to normal distribution

I have a multivariate data and I am interested to compute the distance of complete data to multivariate normal distribution. I want to use R. I have seen some functions like shapiro-wilk test etc. But from them I can only understand if p-value is less <0.05 it does not follow normal distribution. But I want to know how much it is far from the normal distribution. Can anyone please refer me to some functions that I can refer to for use.
Use the mqqnorm function from the RVAideMemoire package. It shows, among others, Mahalanobis distances. From the function example:
x <- 1:30+rnorm(30)
y <- 1:30+rnorm(30,1,3)
mqqnorm(cbind(x,y))

Estimating a probability distribution and sampling from it in Julia

I am trying to use Julia to estimate a continuous univariate distribution using N observed data points (stored as an array of Float64 numbers), and then sample from this estimated distribution. I have no prior knowledge restricting attention to some family of distributions.
I was thinking of using the KernelDensity package to estimate the distribution, but I'm not sure how to sample from the resulting output.
Any help/tips would be much appreciated.
Without any restrictions on the estimated distribution, a natural candidate would be the empirical distribution function (see Wikipedia). For this distribution there are very nice theorems about convergence to actual distribution (see Dvoretzky–Kiefer–Wolfowitz inequality).
With this choice, sampling is especially simple. If dataset is a list of current samples, then dataset[rand(1:length(dataset),sample_size)] is a set of new samples from the empirical distribution. With the Distributions package, it could be more readable, like so:
using Distributions
new_sample = sample(dataset,sample_size)
Finally, Kernel density estimation is also good, but might need a parameter to be chosen (the kernel and its width). This shows a preference for a certain family of distributions. Sampling from a kernel distribution is surprisingly similar to sampling from the empirical distribution: 1. choose a sample from the empirical distributions; 2. perturb each sample using a sample from the kernal function.
For example, if the kernel function is a Normal distribution of width w, then the perturbed sample could be calculated as:
new_sample = dataset[rand(1:length(dataset),sample_size)]+w*randn(sample_size)

Resources