I have a pdf $f(x)=4x^3$ of a random variable $X$ in which I need to simulate a draw from the distribution.
My solution consists of finding the cdf from the pdf (1st issue):
> pdf <- function(x){4*x^3}
> cdf <- integrate(pdf,lower=0,upper=x)
Error in integrate(pdf, lower = 0, upper = x) : object 'x' not found
Once I get the cdf $U$, I will set $X=F^-1(U)$. I notice that the pdf follows a Beta distribution with $\alpha=4$ and $\beta=1$.
Is it best to find the $F^-1$ via a inverse beta function? Is there a quick way to find the inverse of a beta function in R?
Since you have identified your pdf as beta, just use rbeta to sample.
s1 <- rbeta(5000,4,1)
In the case where the distribution is non-standard and you cannot solve analytically, you can use rejection sampling. Let's pretend we don't know your pdf is beta and we don't know how to integrate/inverse.
pdf <- function(x) 4*x^3 # on [0,1]
First we draw from our proposal distribution
p <- runif(50000)
Calculate the density values under our pdf
dp <- pdf(p)
And randomly accept/reject in proportion
s2 <- p[runif(50000) < dp/max(dp)]
You should find the distributions of s1 and s2 comparable, using histograms or, preferably, a qqplot.
Related
My question is how to generate a sample in R from a logistic CDF with the inverse CDF method. The logistic density is p(θ) = exp(θ)/(1 + exp(θ))^2
Here is the algorithm for that method:
1: for t = 1 to T do
2: sample q(t) ∼ Unif(0, 1)
3: θ(t) ← F^−1(q(t))
4: end for
Here is my code but it just generates a vector of the same number. The result should be log-concave but obviously it would not be that if I put it in the histogram, so what is the problem?:
First define T as the number of draws you're taking from uniform distribution
T<-100000
sample_q<-runif(T,0,1)
It seems like plogis will give you the cumulative distribution function, so I suppose I can just take its inverse:
generate_samples_from_logistic_CDF <- function(p) {
for(t in length(T))
cdf<-plogis((1+exp(p)/(exp(p))))
inverse_cdf<-(1/cdf)
return(inverse_cdf)
}
should generate_samples_from_logistic_CDF(sample_q)
but instead it only gives me the same value for everything
Since the inverse CDF is already coded in R as qlogis(), this should work:
qlogis(runif(100000))
or if you want to do it "by hand" rather than using the built-in qlogis(), you can use R <- runif(100000); log(R/(1-R))
Note that rlogis(100000) should be more efficient.
One of your confusions is that "inverse" in the algorithm description above doesn't mean the multiplicative inverse or reciprocal (i.e. 1/x), but rather the function inverse (which in this case is log(q/(1-q)))
I want to check the "probability integral transform" theorem using R.
Let's suppose X is an exponential random variable with lambda = 5.
I want to check that the random variable U = F_X = 1 - exp(-5*X) has a uniform (0,1) distribution.
How would you do it?
I would start in this way:
nsample <- 1000
lambda <- 5
x <- rexp(nsample, lambda) #1000 exponential observation
u <- 1- exp(-lambda*x) #CDF of x
Then I need to find the CDF of u and compare it with the CDF of a Uniform (0,1).
For the empirical CDF of u I could use the ECDF function:
ECDF_u <- ecdf(u) #empirical CDF of U
Now I should create the theoretical CDF of Uniform (0,1) and plot it on the same graph of the ECDF in order to compare the two graphs.
Can you help with the code?
You are almost there. You don't need to compute the ECDF yourself – qqplot will take care of this. All you need is your sample (u) and data from the distribution you want to check against. The lazy (and not quite correct) approach would be to check against a random sample drawn from a uniform distribution:
qqplot(runif(nsample), u)
But of course, it is better to plot against the theoretical quantiles:
# the actual plot
qqplot( qunif(ppoints(length(u))), u )
# add a line
qqline(u, distribution=qunif, col='red', lwd=2)
Looks pretty good to me.
Trying to fit a chi_square distribution using fitdistr() in R. Documentation on this is here (and not very useful to me): https://stat.ethz.ch/R-manual/R-devel/library/MASS/html/fitdistr.html
Question 1: chi_df below has the following output: 3.85546875 (0.07695236). What is the second number? The Variance or standard deviation?
Question 2: fitdistr generates 'k' defined by the Chi-SQ distribution. How do I fit the data so I get the scaling constant 'A'? I am dumbly using lines 14-17 below. Obviously not good.
Question 3: Is the Chi-SQ distribution only defined for a certain x-range? (Variance is defined as 2K, while mean = k. This must require some constrained x-range... Stats question not programming...)
nnn = 1000;
## Generating a chi-sq distribution
chii <- rchisq(nnn,4, ncp = 0);
## Plotting Histogram
chi_hist <- hist(chii);
## Fitting. Gives probability density which must be scaled.
chi_df <- fitdistr(chii,"chi-squared",start=list(df=3));
chi_k <- chi_df[[1]][1];
## Plotting a fitted line:
## Spanning x-length of chi-sq data
x_chi_fit <- 1:nnn*((max(chi_hist[[1]][])-min(chi_hist[[1]][]))/nnn);
## Y data using eqn for probability function
y_chi_fit <- (1/(2^(chi_k/2)*gamma(chi_k/2)) * x_chi_fit^(chi_k/2-1) * exp(-x_chi_fit/2));
## Normalizing to the peak of the histogram
y_chi_fit <- y_chi_fit*(max(chi_hist[[2]][]/max(y_chi_fit)));
## Plotting the line
lines(x_chi_fit,y_chi_fit,lwd=2,col="green");
Thanks for your help!
As commented above, ?fitdistr says
An object of class ‘"fitdistr"’, a list with four components,
...
sd: the estimated standard errors,
... so that parenthesized number is the standard error of the parameter.
The scale parameter doesn't need to be estimated; you need either to scale by the width of your histogram bins or just use freq=FALSE when drawing your histogram. See code below.
The chi-squared distribution is defined on the non-negative reals, which makes sense since it's the distribution of a squared standard Normal (this is a statistical, not a programming question).
Set up data:
nnn <- 1000
## ensure reproducibility; not a big deal in this case,
## but good practice
set.seed(101)
## Generating a chi-sq distribution
chii <- rchisq(nnn,4, ncp = 0)
Fitting.
library(MASS)
## use method="Brent" based on warning
chi_df <- fitdistr(chii,"chi-squared",start=list(df=3),
method="Brent",lower=0.1,upper=100)
chi_k <- chi_df[[1]][1]
(For what it's worth, it looks like there might be a bug in the print method for fitdistr when method="Brent" is used. You could also use method="BFGS" and wouldn't need to specify bounds ...)
Histograms
chi_hist <- hist(chii,breaks=50,col="gray")
## scale by N and width of histogram bins
curve(dchisq(x,df=chi_k)*nnn*diff(chi_hist$breaks)[1],
add=TRUE,col="green")
## or plot histogram already scaled to a density
chi_hist <- hist(chii,breaks=50,col="gray",freq=FALSE)
curve(dchisq(x,df=chi_k),add=TRUE,col="green")
I have the pdf of a distribution. This distribution is not a standard distribution and no functions exist in R to sample from it. How to I sample from this pdf using R?
This is more of a statistics question, as it requires sampling, but in general, you can take this approach to the problem:
Find a distribution f, whose pdf, when multiplied by any given constant k, is always greater than the pdf of the distribution in question, g.
For each sample, do the following steps:
Sample a random number x from the distribution f.
Calculate C = f(x)*k/g(x). This should be equal to or less than 1.
Draw a random number u from a uniform distribution U(0,1). If C < u, then go back to step 3. Otherwise keep x as the number and continue sampling if desired.
This process is known as rejection sampling, and is often used in random number generators that are not uniform.
The normal distribution and the uniform distribution are some of the more common distributions to sample from, but you can do other ones. Generally you want the shapes of k*f(x) and g(x) to be very close, so you don't have to reject a lot of samples.
Here's an example implementation:
#n is sample size
#g is pdf you want to sample from
#rf is sampling function for f
#df is density function for f
#k is multiplicative constant
#... is any necessary parameters for f
function.sample <- function(n,g,rf,df,k,...){
results = numeric(n)
counter = 0
while(counter < n){
x = rf(1,...)
x.pdf = df(x,...)
if (runif(0,1) >= x.pdf * k/g(x)){
results[counter+1] = x
counter = counter + 1
}
}
}
There are other methods to do random sampling, but this is usually the easiest, and it works well for most functions (unless their PDF is hard to calculate but their CDF isn't).
Is there any function in R which will calculate the inverse kernel(i am considering normal) CDF for a particular alpha(0,1).
I have found quantile but I am not sure how it works.
Thanks
We can integrate to get the cdf and we can use a root finding algorithm to invert the cdf. First we'll want to interpolate the output from density.
set.seed(10000)
x <- rnorm(1000, 10, 13)
pdf <- density(x)
# Interpolate the density
f <- approxfun(pdf$x, pdf$y, yleft=0, yright=0)
# Get the cdf by numeric integration
cdf <- function(x){
integrate(f, -Inf, x)$value
}
# Use a root finding function to invert the cdf
invcdf <- function(q){
uniroot(function(x){cdf(x) - q}, range(x))$root
}
which gives
med <- invcdf(.5)
cdf(med)
#[1] 0.5000007
This could definitely be improved upon. One issue is that I don't guarantee that the cdf is always less than or equal to 1 (and if you check the cdf for values larger than max(x) you might get something like 1.00097. But I'm too tired to fix that now. This should give a decent start.
An alternative approach would be to use log-spline density estimation rather than kernel density estimation. Look at the 'logspline' package. With logspline density estimations you get CDF (plogspline) and inverse CDF (qlogspline) functions.