What exactly does the rexp()-method in R do? - r

i am trying to generate random values for an Exp(0.5) distribution.
I have the following code :
y <- rexp(10, rate=1/2)
y
This gives me:
[1] 4.5582556 2.3285161 4.2466828 0.9995597 3.6326827 0.1016917 0.2518124
[8] 0.3189424 0.8553740 0.8277078
The problem i have here is that i don´t know what this values mean. They can´t be values of the density function of Exp(1/2) which is , because the density function is defined as 0 for x < 0 and f(x) = 4.55 for x < 0.
What do these values mean?

For people coming to R from statistics, for any distribution:
d - PDF
p - CDF
q - inverse CDF
r - sampling according to PDF

Related

compute the integration in R

I compute the cumulative distribution function whose result should lie in [0,1]. The equation for computing the CDF is:
\begin{align}
F= \int_{\hat{a}}^{x}\frac{2}{\hat{b}-\hat{a}} ~\sum \nolimits_{k=0}^{' N-1} C_{k}~\text{cos} \bigg( \big(y - \hat{a} \big) \frac{k \pi}{\hat{b} - \hat{a}}\bigg) ~dy
\end{align}
where
Ck is a vector
cos term is a vector
length(ck) = length(cos term) = N.
I am sure the equation is correct, but I am afraid my code is incorrect.
Here is my code:
integrand<-function(x,myCk)
{
(2/(b-a))*(t(myCk)%*%as.matrix(cos((x-hat.a)*uk)))
}
f <- function(x){integrand(x,myCk)}
# define a vectorized version of this function
fv <- Vectorize(f,"x")
res<-integrate(fv,upper = r,lower = hat.a, subdivisions = 2000)$value
resreturns the cumulative distribution function, and the result can be larger than 1.
myCkis a vector generated by another function.
hat.ais the lower bound for integration, and it is negative.
ukis a vector generated by a function. The length of ukequals the length of myCk.
I appreciate your advice!

Generate random variables from a distribution function using inverse sampling

I have a specific density function and I want to generate random variables knowing the expression of the density function.
For example, the density function is :
df=function(x) { - ((-a1/a2)*exp((x-a3)/a2))/(1+exp((x-a3)/a2))^2 }
From this expression I want to generate 1000 random elements with the same distribution.
I know I should use the inverse sampling method. For this, I use the CDF function of my PDF which is calculated as follows:
cdf=function(x) { 1 - a1/(1+exp((x-a3)/a2))
The idea is to generate uniformly distributed samples and then map them with my CDF functions to get an inverse mapping. Something like this:
random.generator<-function(n) sapply(runif(n),cdf)
and then call it with the desired number of random variables to generate.
random.generator(1000)
Is this approach correct?
The first step is to take the inverse of your cdf function, which in this case can be done with simple arithmetic:
invcdf <- function(y) a2 * log(a1/(1-y) - 1) + a3
Now you want to call the inverse cdf with standard uniformly distributed random variables to sample:
set.seed(144)
a1 <- 1 ; a2 <- 2 ; a3 <- 3
invcdf(runif(10))
# [1] -2.913663 4.761196 4.955712 3.007925 1.472119 4.138772 -3.568288
# [8] 4.973643 -1.949684 6.061130
This is a histogram of 10000 simulated values:
hist(invcdf(runif(10000)))
And here is the plot of the pdf:
x <- seq(-20, 20, by=.01)
plot(x, df(x))

Using antiD function for variance of gamma distribution

This is my first post here and I hope I'll follow all the rules of the community.
I'm trying to calculate variance of gamma distribution with shape parameter 2 and scale parameter 3 in R using function antiD from mosaic package. The R code I use is the following
stopifnot(require(mosaic))
f <- function(y) {
dgamma(y, shape = 2, scale = 3)
}
mean_integral <- antiD( z*f(z) ~ z )
mn <- mean_integral(10^4)
g <- function(y) {
(y - mn)^2
}
variance <- antiD(f(x)*g(x) ~ x)
variance(10^5)
## [1] 7.115334e-09
The problem is that the number I get doesn't make sense as the variance for Gamma distribution with those parameters should be equal to 2*3^2 = 18 (Wiki page on Gamma distribution). Moreover if I put 10^4 as an upper bound (the default lower bound is 0) for variance() it will return the following:
variance(10^4)
## [1] 18
And the integral from 10^4 to 10^5 will be:
variance(10^5) - variance(10^4)
## [1] -18
Does anyone know why variance(10^5) produce nonsensical results in this case? I also will be grateful for any additional comments on the style of the post.

Random Pareto distribution in R with 30% of values being <= specified amount

Let me begin by saying this is a class assignment for an intro to R course.
First, in VGAM why are there dparetoI, ParetoI, pparetoI, qparetoI & rparetoI?
Are they not the same things?
My problem:
I would like to generate 50 random numbers in a pareto distribution.
I would like the range to be 1 – 60 but I also need to have 30% of the values <= 4.
Using VGAM I have tried a variety of functions and combinations of pareto from what I could find in documentation as well as a few things online.
I experimented with fit, quantiles and forcing a sequence from examples I found but I'm new and didn't make much sense of it.
I’ve been using this:
alpha <- 1 # location
k <- 2 # shape
mySteps <- rpareto(50,alpha,k)
range(mySteps)
str(mySteps[mySteps <= 4])
After enough iterations, the range will be acceptable but entries <= 4 are never close.
So my questions are:
Am I using the right pareto function?
If not, can you point me in the right direction?
If so, do I just keep running it until the “right” data comes up?
Thanks for the guidance.
So reading the Wikipedia entry for Pareto Distribution, you can see that the CDF of the Pareto distribution is given by:
FX(x) = 1 - (xm/x)α
The CDF gives the probability that X (your random variable) < x (a given value). You want Pareto distributions where
Prob(X < 4) ≡ FX(4) = 0.3
or
0.3 = 1 - (xm/4)α
This defines a relation between xm and α
xm = 4 * (0.7)1/α
In R code:
library(VGAM)
set.seed(1)
alpha <- 1
k <- 4 * (0.7)^(1/alpha)
X <- rpareto(50,k,alpha)
quantile(X,0.3) # confirm that 30% are < 4
# 30%
# 3.891941
Plot the histogram and the distribution
hist(X, breaks=c(1:60,Inf),xlim=c(1,60))
x <- 1:60
lines(x,dpareto(x,k,alpha), col="red")
If you repeat this process for different alpha, you will get different distribution functions, but in all cases ~30% of the sample will be < 4. The reason it is only approximately 30% is that you have a finite sample size (50).

Generating random sample from the quantiles of unknown density in R [duplicate]

This question already has answers here:
How do I best simulate an arbitrary univariate random variate using its probability function?
(4 answers)
Closed 9 years ago.
How can I generate random sample data from the quantiles of the unknown density f(x) for x between 0 and 4 in R?
f = function(x) ((x-1)^2) * exp(-(x^3/3-2*x^2/2+x))
If I understand you correctly (??) you want to generate random samples with the distribution whose density function is given by f(x). One way to do this is to generate a random sample from a uniform distribution, U[0,1], and then transform this sample to your density. This is done using the inverse cdf of f, a methodology which has been described before, here.
So, let
f(x) = your density function,
F(x) = cdf of f(x), and
F.inv(y) = inverse cdf of f(x).
In R code:
f <- function(x) {((x-1)^2) * exp(-(x^3/3-2*x^2/2+x))}
F <- function(x) {integrate(f,0,x)$value}
F <- Vectorize(F)
F.inv <- function(y){uniroot(function(x){F(x)-y},interval=c(0,10))$root}
F.inv <- Vectorize(F.inv)
x <- seq(0,5,length.out=1000)
y <- seq(0,1,length.out=1000)
par(mfrow=c(1,3))
plot(x,f(x),type="l",main="f(x)")
plot(x,F(x),type="l",main="CDF of f(x)")
plot(y,F.inv(y),type="l",main="Inverse CDF of f(x)")
In the code above, since f(x) is only defined on [0,Inf], we calculate F(x) as the integral of f(x) from 0 to x. Then we invert that using the uniroot(...) function on F-y. The use of Vectorize(...) is needed because, unlike almost all R functions, integrate(...) and uniroot(...) do not operate on vectors. You should look up the help files on these functions for more information.
Now we just generate a random sample X drawn from U[0,1] and transform it with Z = F.inv(X)
X <- runif(1000,0,1) # random sample from U[0,1]
Z <- F.inv(X)
Finally, we demonstrate that Z is indeed distributed as f(x).
par(mfrow=c(1,2))
plot(x,f(x),type="l",main="Density function")
hist(Z, breaks=20, xlim=c(0,5))
Rejection sampling is easy enough:
drawF <- function(n) {
f <- function(x) ((x-1)^2) * exp(-(x^3/3-2*x^2/2+x))
x <- runif(n, 0 ,4)
z <- runif(n)
subset(x, z < f(x)) # Rejection
}
Not the most efficient but it gets the job done.
Use sample . Generate a vector of probablities from your existing function f, normalized properly. From the help page:
sample(x, size, replace = FALSE, prob = NULL)
Arguments
x Either a vector of one or more elements from which to choose, or a positive integer. See ‘Details.’
n a positive number, the number of items to choose from. See ‘Details.’
size a non-negative integer giving the number of items to choose.
replace Should sampling be with replacement?
prob A vector of probability weights for obtaining the elements of the vector being sampled.

Resources