How to get Inverse CDF (kernel) in R? - r

Is there any function in R which will calculate the inverse kernel(i am considering normal) CDF for a particular alpha(0,1).
I have found quantile but I am not sure how it works.
Thanks

We can integrate to get the cdf and we can use a root finding algorithm to invert the cdf. First we'll want to interpolate the output from density.
set.seed(10000)
x <- rnorm(1000, 10, 13)
pdf <- density(x)
# Interpolate the density
f <- approxfun(pdf$x, pdf$y, yleft=0, yright=0)
# Get the cdf by numeric integration
cdf <- function(x){
integrate(f, -Inf, x)$value
}
# Use a root finding function to invert the cdf
invcdf <- function(q){
uniroot(function(x){cdf(x) - q}, range(x))$root
}
which gives
med <- invcdf(.5)
cdf(med)
#[1] 0.5000007
This could definitely be improved upon. One issue is that I don't guarantee that the cdf is always less than or equal to 1 (and if you check the cdf for values larger than max(x) you might get something like 1.00097. But I'm too tired to fix that now. This should give a decent start.

An alternative approach would be to use log-spline density estimation rather than kernel density estimation. Look at the 'logspline' package. With logspline density estimations you get CDF (plogspline) and inverse CDF (qlogspline) functions.

Related

Empirical CDF vs Theoretical CDF in R

I want to check the "probability integral transform" theorem using R.
Let's suppose X is an exponential random variable with lambda = 5.
I want to check that the random variable U = F_X = 1 - exp(-5*X) has a uniform (0,1) distribution.
How would you do it?
I would start in this way:
nsample <- 1000
lambda <- 5
x <- rexp(nsample, lambda) #1000 exponential observation
u <- 1- exp(-lambda*x) #CDF of x
Then I need to find the CDF of u and compare it with the CDF of a Uniform (0,1).
For the empirical CDF of u I could use the ECDF function:
ECDF_u <- ecdf(u) #empirical CDF of U
Now I should create the theoretical CDF of Uniform (0,1) and plot it on the same graph of the ECDF in order to compare the two graphs.
Can you help with the code?
You are almost there. You don't need to compute the ECDF yourself – qqplot will take care of this. All you need is your sample (u) and data from the distribution you want to check against. The lazy (and not quite correct) approach would be to check against a random sample drawn from a uniform distribution:
qqplot(runif(nsample), u)
But of course, it is better to plot against the theoretical quantiles:
# the actual plot
qqplot( qunif(ppoints(length(u))), u )
# add a line
qqline(u, distribution=qunif, col='red', lwd=2)
Looks pretty good to me.

Simulating a draw from the distribution of $X$ (in R)

I have a pdf $f(x)=4x^3$ of a random variable $X$ in which I need to simulate a draw from the distribution.
My solution consists of finding the cdf from the pdf (1st issue):
> pdf <- function(x){4*x^3}
> cdf <- integrate(pdf,lower=0,upper=x)
Error in integrate(pdf, lower = 0, upper = x) : object 'x' not found
Once I get the cdf $U$, I will set $X=F^-1(U)$. I notice that the pdf follows a Beta distribution with $\alpha=4$ and $\beta=1$.
Is it best to find the $F^-1$ via a inverse beta function? Is there a quick way to find the inverse of a beta function in R?
Since you have identified your pdf as beta, just use rbeta to sample.
s1 <- rbeta(5000,4,1)
In the case where the distribution is non-standard and you cannot solve analytically, you can use rejection sampling. Let's pretend we don't know your pdf is beta and we don't know how to integrate/inverse.
pdf <- function(x) 4*x^3 # on [0,1]
First we draw from our proposal distribution
p <- runif(50000)
Calculate the density values under our pdf
dp <- pdf(p)
And randomly accept/reject in proportion
s2 <- p[runif(50000) < dp/max(dp)]
You should find the distributions of s1 and s2 comparable, using histograms or, preferably, a qqplot.

Simulate from an (arbitrary) continuous probability distribution [duplicate]

This question already has answers here:
How do I best simulate an arbitrary univariate random variate using its probability function?
(4 answers)
Closed 8 years ago.
For a normalized probability density function defined on the real line, for example
p(x) = (2/pi) * (1/(exp(x)+exp(-x))
(this is just an example; the solution should apply for any continuous PDF we can define) is there a package in R to simulate from the distribution? I am aware of R's built-in simulators for many distributions.
I could numerically compute the inverse cumulative distribution function at a set of quantiles, store them in a table, and use the table to map from uniform variates to variates from the desired distribution. Is there already a package that does this?
Here is a way using the distr package, which is designed for this.
library(distr)
p <- function(x) (2/pi) * (1/(exp(x)+exp(-x))) # probability density function
dist <-AbscontDistribution(d=p) # signature for a dist with pdf ~ p
rdist <- r(dist) # function to create random variates from p
set.seed(1) # for reproduceable example
X <- rdist(1000) # sample from X ~ p
x <- seq(-10,10, .01)
hist(X, freq=F, breaks=50, xlim=c(-5,5))
lines(x,p(x),lty=2, col="red")
You can of course also do this is base R using the methodology described in any one of the links in the comments.
If this is the function that you're dealing with, you could just take the integral (or, if you're rusty on your integration rules like me, you could use a tool like Wolfram Alpha to do it for you).
In the case of the function provided, you can simulate with:
draw.val <- function(numdraw) log(tan(pi*runif(numdraw)/2))
A histogram confirms that we're sampling correctly:
hist(draw.val(10000), breaks=100, probability=T)
x <- seq(-10, 10, .001)
lines(x, (2/pi) * (1/(exp(x)+exp(-x))), col="red")

R: empirical version of pnorm() and qnorm()?

I have a normalization method that uses the normal distribution functions pnorm() and qnorm(). I want to alter my logic so that I can use empirical distributions instead of assuming normality. I've used ecdf() to calculate the empirical cumulative distributions but then realized I was beginning to write a function that basically was the p and q versions of the empirical. Is there a simpler way to do this? Maybe a package with pecdf() and qecdf()? I hate reinventing the wheel.
You can use the quantile and ecdf functions to get qecdf and pecdf, respectively:
x <- rnorm(20)
quantile(x, 0.3, type=1) #30th percentile
Fx <- ecdf(x)
Fx(0.1) # cdf at 0.1
'emulating' pnorm for an empirical distribution with ecdf:
> set.seed(42)
> x <- ecdf(rnorm(1000))
> x(0)
[1] 0.515
> pnorm(0)
[1] 0.5
Isn't that exactly what bootstrap p-values do?
If so, keep a vector, sort, and read out at the appropriate position (i.e. 500 for 5% on 10k reptitions). There are some subtle issue with with positions to pick as e.g. help(quantile) discusses under 'Types'.

Generate stochastic random deviates from a density object with R

I have a density object dd created like this:
x1 <- rnorm(1000)
x2 <- rnorm(1000, 3, 2)
x <- rbind(x1, x2)
dd <- density(x)
plot(dd)
Which produces this very non-Gaussian distribution:
alt text http://www.cerebralmastication.com/wp-content/uploads/2009/09/nongaus.png
I would ultimately like to get random deviates from this distribution similar to how rnorm gets deviates from a normal distribution.
The way I am trying to crack this is to get the CDF of my kernel and then get it to tell me the variate if I pass it a cumulative probability (inverse CDF). That way I can turn a vector of uniform random variates into draws from the density.
It seems like what I am trying to do should be something basic that others have done before me. Is there a simple way or a simple function to do this? I hate reinventing the wheel.
FWIW I found this R Help article but I can't grok what they are doing and the final output does not seem to produce what I am after. But it could be a step along the way that I just don't understand.
I've considered just going with a Johnson distribution from the suppdists package but Johnson won't give me the nice bimodal hump which my data has.
Alternative approach:
sample(x, n, replace = TRUE)
This is just a mixture of normals. So why not something like:
rmnorm <- function(n,mean, sd,prob) {
nmix <- length(mean)
if (length(sd)!=nmix) stop("lengths should be the same.")
y <- sample(1:nmix,n,prob=prob, replace=TRUE)
mean.mix <- mean[y]
sd.mix <- sd[y]
rnorm(n,mean.mix,sd.mix)
}
plot(density(rmnorm(10000,mean=c(0,3), sd=c(1,2), prob=c(.5,.5))))
This should be fine if all you need are samples from this mixture distribution.

Resources