Generate random variables from a distribution function using inverse sampling - r

I have a specific density function and I want to generate random variables knowing the expression of the density function.
For example, the density function is :
df=function(x) { - ((-a1/a2)*exp((x-a3)/a2))/(1+exp((x-a3)/a2))^2 }
From this expression I want to generate 1000 random elements with the same distribution.
I know I should use the inverse sampling method. For this, I use the CDF function of my PDF which is calculated as follows:
cdf=function(x) { 1 - a1/(1+exp((x-a3)/a2))
The idea is to generate uniformly distributed samples and then map them with my CDF functions to get an inverse mapping. Something like this:
random.generator<-function(n) sapply(runif(n),cdf)
and then call it with the desired number of random variables to generate.
random.generator(1000)
Is this approach correct?

The first step is to take the inverse of your cdf function, which in this case can be done with simple arithmetic:
invcdf <- function(y) a2 * log(a1/(1-y) - 1) + a3
Now you want to call the inverse cdf with standard uniformly distributed random variables to sample:
set.seed(144)
a1 <- 1 ; a2 <- 2 ; a3 <- 3
invcdf(runif(10))
# [1] -2.913663 4.761196 4.955712 3.007925 1.472119 4.138772 -3.568288
# [8] 4.973643 -1.949684 6.061130
This is a histogram of 10000 simulated values:
hist(invcdf(runif(10000)))
And here is the plot of the pdf:
x <- seq(-20, 20, by=.01)
plot(x, df(x))

Related

Convert uniform draws to normal distributions with known mean and std in R

I apply the sensitivity package in R. In particular, I want to use sobolroalhs as it uses a sampling procedure for inputs that allow for evaluations of models with a large number of parameters. The function samples uniformly [0,1] for all inputs. It is stated that desired distributions need to be obtained as follows
####################
# Test case: dealing with non-uniform distributions
x <- sobolroalhs(model = NULL, factors = 3, N = 1000, order =1, nboot=0)
# X1 follows a log-normal distribution:
x$X[,1] <- qlnorm(x$X[,1])
# X2 follows a standard normal distribution:
x$X[,2] <- qnorm(x$X[,2])
# X3 follows a gamma distribution:
x$X[,3] <- qgamma(x$X[,3],shape=0.5)
# toy example
toy <- function(x){rowSums(x)}
y <- toy(x$X)
tell(x, y)
print(x)
plot(x)
I have non-zero mean and standard deviations for some input parameter that I want to sample out of a normal distribution. For others, I want to uniformly sample between a defined range (e.g. [0.03,0.07] instead [0,1]). I tried using built in R functions such as
SA$X[,1] <- rnorm(1000, mean = 579, sd = 21)
but I am afraid this procedure messes up the sampling design of the package and resulted in odd results for the sensitivity indices. Hence, I think I need to adhere for the uniform draw of the sobolroalhs function in which and use the sampled value between [0, 1] when drawing out of the desired distribution (I think as density draw?). Does this make sense to anyone and/or does anyone know how I could sample out of the right distributions following the syntax from the package description?
You can specify mean and sd in qnorm. So modify lines like this:
x$X[,2] <- qnorm(x$X[,2])
to something like this:
x$X[,2] <- qnorm(x$X[,2], mean = 579, sd = 21)
Similarly, you could use the min and max parameters of qunif to get values in a given range.
Of course, it's also possible to transform standard normals or uniforms to the ones you want using things like X <- 579 + 21*Z or Y <- 0.03 + 0.04*U, where Z is a standard normal and U is standard uniform, but for some distributions those transformations aren't so simple and using the q* functions can be easier.

Sample from a custom likelihood function

I have the following likelihood function which I used in a rather complex model (in practice on a log scale):
library(plyr)
dcustom=function(x,sd,L,R){
R. = (log(R) - log(x))/sd
L. = (log(L) - log(x))/sd
ll = pnorm(R.) - pnorm(L.)
return(ll)
}
df=data.frame(Range=seq(100,500),sd=rep(0.1,401),L=200,U=400)
df=mutate(df, Likelihood = dcustom(Range, sd,L,U))
with(df,plot(Range,Likelihood,type='l'))
abline(v=200)
abline(v=400)
In this function, the sd is predetermined and L and R are "observations" (very much like the endpoints of a uniform distribution), so all 3 of them are given. The above function provides a large likelihood (1) if the model estimate x (derived parameter) is in between the L-R range, a smooth likelihood decrease (between 0 and 1) near the bounds (of which the sharpness is dependent on the sd), and 0 if it is too much outside.
This function works very well to obtain estimates of x, but now I would like to do the inverse: draw a random x from the above function. If I would do this many times, I would generate a histogram that follows the shape of the curve plotted above.
The ultimate goal is to do this in C++, but I think it would be easier for me if I could first figure out how to do this in R.
There's some useful information online that helps me start (http://matlabtricks.com/post-44/generate-random-numbers-with-a-given-distribution, https://stats.stackexchange.com/questions/88697/sample-from-a-custom-continuous-distribution-in-r) but I'm still not entirely sure how to do it and how to code it.
I presume (not sure at all!) the steps are:
transform likelihood function into probability distribution
calculate the cumulative distribution function
inverse transform sampling
Is this correct and if so, how do I code this? Thank you.
One idea might be to use the Metropolis Hasting Algorithm to obtain a sample from the distribution given all the other parameters and your likelihood.
# metropolis hasting algorithm
set.seed(2018)
n_sample <- 100000
posterior_sample <- rep(NA, n_sample)
x <- 300 # starting value: I chose 300 based on your likelihood plot
for (i in 1:n_sample){
lik <- dcustom(x = x, sd = 0.1, L = 200, R =400)
# propose a value for x (you can adjust the stepsize with the sd)
x.proposed <- x + rnorm(1, 0, sd = 20)
lik.proposed <- dcustom(x = x.proposed, sd = 0.1, L = 200, R = 400)
r <- lik.proposed/lik # this is the acceptance ratio
# accept new value with probablity of ratio
if (runif(1) < r) {
x <- x.proposed
posterior_sample[i] <- x
}
}
# plotting the density
approximate_distr <- na.omit(posterior_sample)
d <- density(approximate_distr)
plot(d, main = "Sample from distribution")
abline(v=200)
abline(v=400)
# If you now want to sample just a few values (for example, 5) you could use
sample(approximate_distr,5)
#[1] 281.7310 371.2317 378.0504 342.5199 412.3302

Generating random sample from the quantiles of unknown density in R [duplicate]

This question already has answers here:
How do I best simulate an arbitrary univariate random variate using its probability function?
(4 answers)
Closed 9 years ago.
How can I generate random sample data from the quantiles of the unknown density f(x) for x between 0 and 4 in R?
f = function(x) ((x-1)^2) * exp(-(x^3/3-2*x^2/2+x))
If I understand you correctly (??) you want to generate random samples with the distribution whose density function is given by f(x). One way to do this is to generate a random sample from a uniform distribution, U[0,1], and then transform this sample to your density. This is done using the inverse cdf of f, a methodology which has been described before, here.
So, let
f(x) = your density function,
F(x) = cdf of f(x), and
F.inv(y) = inverse cdf of f(x).
In R code:
f <- function(x) {((x-1)^2) * exp(-(x^3/3-2*x^2/2+x))}
F <- function(x) {integrate(f,0,x)$value}
F <- Vectorize(F)
F.inv <- function(y){uniroot(function(x){F(x)-y},interval=c(0,10))$root}
F.inv <- Vectorize(F.inv)
x <- seq(0,5,length.out=1000)
y <- seq(0,1,length.out=1000)
par(mfrow=c(1,3))
plot(x,f(x),type="l",main="f(x)")
plot(x,F(x),type="l",main="CDF of f(x)")
plot(y,F.inv(y),type="l",main="Inverse CDF of f(x)")
In the code above, since f(x) is only defined on [0,Inf], we calculate F(x) as the integral of f(x) from 0 to x. Then we invert that using the uniroot(...) function on F-y. The use of Vectorize(...) is needed because, unlike almost all R functions, integrate(...) and uniroot(...) do not operate on vectors. You should look up the help files on these functions for more information.
Now we just generate a random sample X drawn from U[0,1] and transform it with Z = F.inv(X)
X <- runif(1000,0,1) # random sample from U[0,1]
Z <- F.inv(X)
Finally, we demonstrate that Z is indeed distributed as f(x).
par(mfrow=c(1,2))
plot(x,f(x),type="l",main="Density function")
hist(Z, breaks=20, xlim=c(0,5))
Rejection sampling is easy enough:
drawF <- function(n) {
f <- function(x) ((x-1)^2) * exp(-(x^3/3-2*x^2/2+x))
x <- runif(n, 0 ,4)
z <- runif(n)
subset(x, z < f(x)) # Rejection
}
Not the most efficient but it gets the job done.
Use sample . Generate a vector of probablities from your existing function f, normalized properly. From the help page:
sample(x, size, replace = FALSE, prob = NULL)
Arguments
x Either a vector of one or more elements from which to choose, or a positive integer. See ‘Details.’
n a positive number, the number of items to choose from. See ‘Details.’
size a non-negative integer giving the number of items to choose.
replace Should sampling be with replacement?
prob A vector of probability weights for obtaining the elements of the vector being sampled.

Generate a random number from a density object (or more broadly from a set of numbers)

Let's say I have a set of numbers that I suspect come from the same distribution.
set.seed(20130613)
x <- rcauchy(10)
I would like a function that randomly generates a number from that same unknown distribution. One approach I have thought of is to create a density object and then get the CDF from that and take the inverse CDF of a random uniform variable (see Wikipedia).
den <- density(x)
#' Generate n random numbers from density() object
#'
#' #param n The total random numbers to generate
#' #param den The density object from which to generate random numbers
rden <- function(n, den)
{
diffs <- diff(den$x)
# Making sure we have equal increments
stopifnot(all(abs(diff(den$x) - mean(diff(den$x))) < 1e-9))
total <- sum(den$y)
den$y <- den$y / total
ydistr <- cumsum(den$y)
yunif <- runif(n)
indices <- sapply(yunif, function(y) min(which(ydistr > y)))
x <- den$x[indices]
return(x)
}
rden(1, den)
## [1] -0.1854121
My questions are the following:
Is there a better (or built into R) way to generate a random number from a density object?
Are there any other ideas on how to generate a random number from a set of numbers (besides sample)?
To generate data from a density estimate you just randomly choose one of the original data points and add a random "error" piece based on the kernel from the density estimate, for the default of "Gaussian" this just means choose a random element from the original vector and add a random normal with mean 0 and sd equal to the bandwidth used:
den <- density(x)
N <- 1000
newx <- sample(x, N, replace=TRUE) + rnorm(N, 0, den$bw)
Another option is to fit a density using the logspline function from the logspline package (uses a different method of estimating a density), then use the rlogspline function in that package to generate new data from the estimated density.
If all you need is to draw values from your existing pool of numbers, then sample is the way to go.
If you want to draw from the presumed underlying distribution, then use density , and fit that to your presumed distribution to get the necessary coefficients (mean, sd, etc.), and use the appropriate R distribution function.
Beyond that, I'd take a look at Chapter7.3 ("rejection method") of Numerical Recipes in C for ways to "selectively" sample according to any distribution. The code is simple enough to be easily translated into R .
My bet is someone already has done so and will post a better answer than this.
Greg Snow's answer was helpful to me, and I realized that the output of the density function has all the data needed to create random numbers from the input distribution. Building on his example, you can do the following to get random values using the density output.
x <- rnorm(100) # or any numeric starting vector you desire
dens <- density(x)
N <- 1000
newx <- sample(x = dens$x, N, prob = dens$y, replace=TRUE) + rnorm(N, 0, dens$bw)
You can even create a simple random number generating function
rdensity <- function(n, dens) {
return(sample(x = dens$x, n, prob = dens$y, replace=TRUE) + rnorm(n, 0, dens$bw))
}

Superimposing gamma distribution curve to a plot

I have to superimpose the gamma distribution curve to a plot of others powerlaw-like curve.
I first plot dot points of the histogram in log-log scale
plot(log(pp$mids),log(pp$density))
then i would like to superimpose my gamma distribution curve calling an external function gamma()
gamma <- function(X)
{
n <- length(X)
theta<-var(hh2$V1)/mean(hh2$V1)
kappa<-mean(hh2$V1)/theta
y<-rgamma(n,kappa,theta)
xx<-hist(y,plot=F)
curve(log(xx$density),add=T,col='violet',type='l')
return( c(kappa) )
}
but this return me an error because curve() needs a true curve to plot. How can i do this?
Here is a somewhat-working variant of your code:
Generate an example structured as (I guess) your data are:
library(rmutil) ## for rpareto
set.seed(101)
hh2 <- data.frame(V1=rpareto(1000, m=1, s=1.5))
Initial histogram calculation:
pp <- hist(hh2$V1,plot=FALSE)
Function (better not to call it gamma as that masks a built-in function):
ghistfun <- function(x) {
n <- length(x)
scalepar <- var(x)/mean(x)
shapepar <- mean(x)^2/var(x)
y <- rgamma(n,shape=shapepar,scale=scalepar)
xx <- hist(y,plot=FALSE)
lines(log(xx$mids),log(xx$density),col="red")
curve(dgamma(exp(x),shape=shapepar,scale=scalepar,log=TRUE),
add=TRUE,col="blue")
shapepar
}
It might be better to use a very large number for n rather than just using the length of the data, unless you are particularly interested in seeing the random fluctuations in a data set of precisely the same size. Alternatively you could just use curve(dgamma(x,...)), as shown (I initially thought that you would have to allow for scaling from a density of x to a density of log(x), but because of the way you have computed a histogram on the original (unlogged) scale and then transformed the bin midpoints, you don't have to ...)
plot(log(pp$mids),log(pp$density))
ghistfun(hh2$V1)

Resources