Plotting the CDF of Generalized Pareto Distirbution - r

I need to plot the CDF of a generalized pareto distribution when x is greater than 100,000,000 with location parameter = 100,000,000, scale parameter = 49,761,000 and shape parameter = 0.10. The CDF starts at prob. 0.946844, the values below 100,000,000 are modeled by a uniform distribution. I only need to plot the CDF of the GPD.
library(DescTools)
x <- c(100000001:210580000)
pareto_distribution <- dGenPareto(x, 100000000, 49761000, 0.10)
graph <- data.frame(loss = x, probability = pareto_distribution)
plot(graph)
When I try the code above, the probabilities start at 0. I know that dGenPareto is not the code for the CDF but I was starting at the pdf and then going to calculate the CDF. How do I restrict the probability of the GPD so that it starts at the probability at 0.946844 not zero.
I am expecting the CDF of GPD to start at 0.946844 when x = 100,000,000. The x values are discrete.

Related

MASS:: fitdistr negative binomial with weights in R

We are carrying out an Operational Risk study, in particular we are fitting a severity frequency function with a negative binomial as follows:
# Negative Binomial Fitting
fit = MASS::fitdistr(datosf$Freq,"negative binomial")[[1]]
BN_s <- fit[1]
BN_mu <- fit[2]
# fitdistr parametrises the BN with size and mu, we calculate the parameter p as size/(size+mu)
BN_prob<-fit[1]/(fit[1]+fit[2])
# scale size to model annual frequency
BN_size= BN_s*f_escala
# goodness-of-fit test
chi_2_test = chisq.test(datosf$Freq,rnbinom(n=l,size=BN_s,prob=BN_prob))
# goodness-of-fit plot
nbinom = function(x)dnbinom(x, size = BN_s, mu = BN_mu)
hist(datosf$Freq, freq=FALSE, nclass=50)
curve(nbinom, from=0, to=max(datosf$Freq), n=max(datosf$Freq)+1, add=TRUE, col="blue")
In the data frame datosf$Freq we have the frequency (of the historical series) grouped monthly.
Currently, we have the objective of weighting these years according to the time horizon using the function:
w(t) = 1.05 - t/20 where t is the number of years and t=1,....,10
i.e. the objective is to maximise the following likelihood function:
L(x_i,\theta) = \prod_{i} w_i f(x_i,\theta)
Where x_i is the frequency and f(x_i) is the negative binomial density function.
How can we readapt the code to include the weights w_i?
Thank you very much!

how to generate data from a distribution whose cdf is not in closed form?

I am working on a distribution whose PDF and CDF is
clearly, CDF is not in neat form. So how do I generate data from this CDF in R as the inverse CDF method can't be applied here?
I am using the following R program to generate a sample from this dist but I am not able to justify every step. Here is my code:
alpha=1.5; beta=3.2;
pdf=function(x)((beta/alpha)*((x/alpha)^beta)*sin(pi/beta))/(((1+(x/alpha)^beta)^2)*(pi/beta))
rand_smplfunc <- function(func, lower, upper){
x_values <- seq(lower,upper,by = 10^-3)
sample(x = x_values, size = 1, replace = TRUE,prob = func(x_values))
}
replicate(10,rand_smplfunc(pdf,0,10))
You can do it the following way, using the inverse cdf method with the empirical cdf (don't need the analytically derived cdf):
alpha = 1.5
beta = 3.2
pdf = function(x)((beta/alpha)*((x/alpha)^beta)*sin(pi/beta))/(((1+(x/alpha)^beta)^2)*(pi/beta))
cdf = function(x) cumsum(pdf(x)) / sum(pdf(x)) # approximate the cdf
x <- seq(0.1, 50, 0.1)
y <- pdf(x)
plot(x, pdf(x), type='l', main='pdf', ylab='f(x)') # plot the pdf
plot(x, cdf(x), type='l', main='cdf', ylab='F(x)') # plot the cdf
# draw n samples by using probability integral transform from unifrom random
# samples by computing the inverse cdf approximately
n <- 5000
random_samples <- approx(x = cdf(x), y = x, runif(n))$y
# plot histogram of the samples
hist(random_samples, 200)
From the above plot, you can see that the distribution of the generated samples indeed resembles the original pdf.
You could use the empirical cdf function ecdf as an approximation. E.g.
set.seed(7)
pdf <- rnorm(10000)
cdf <- ecdf(pdf)
summary(cdf)
Empirical CDF: 10000 unique values with summary
Min. 1st Qu. Median Mean 3rd Qu. Max.
-3.930902 -0.671235 -0.001501 0.001928 0.669820 3.791824
range <- -3:3
cdf(range)
[1] 0.0013 0.0219 0.1601 0.5007 0.8420 0.9748 0.9988
So similarly generate a sufficient number of realizations of your pdf and then use the ecdf function to approximate your cdf.
Note that you can also plot your empirical cdf:plot(cdf)

determine equal-tail credible interval

I have obtained the posterior density for part d: $2 theta^{-1}(1- theta)^{-1}$. How do I plot in R the distribution to find the l and u such that $F_{theta| x} (l) = 0.025$ and $F_{theta| x} (u) = 0.975$? (the equal-tail interval)
Your result is erroneous. By Bayes' theorem, the posterior density is proportional to p(theta)P(X=2|theta) = 1-theta. So we recognize the Beta distribution Beta(1,2). To graph it in R, you can do:
curve(dbeta(x, 1, 2), from = 0, to = 1)
Now the posterior equi-tailed credible interval is given by the quantiles of this distribution. In R:
qbeta(0.025, 1, 2) # lower bound
qbeta(0.975, 1, 2) # upper bound
If you don't know the Beta distribution, you can get these quantiles by elementary calculations. The integral of 1-theta on [0,1] is 1/2. So the posterior density is 2(1-theta) (it must integrate to one). So the posterior cumulative distribution function is 2(theta - theta²/2) = -theta² + 2theta. To get the p-quantile (with p=0.025 and p=0.975), you have to solve the equation -theta² + 2theta = p in theta. This a second-degree polynomial equation, easy to solve.
Finding the central 95% CI is actually easier than finding the 95% HPD. As you have the density (PDF), you also know the CDF. The lower and upper limits of the central 95% CI are given by CDF(l) = 0.025 and CRF(u) = 0.975.

Generate random numbers with bivariate gamma distribution in R

How to generate random numbers with bivariate gamma distribution. The density is:
F(X, Y)(x, y) = αp+qxp-1(y-x)q-1e-αy / [Γ(p) Γ(q)], 𝕀0≤ x≤ y
With y>x>0, α>0, p>0 and q>0.
I did not find any package on R that does this and nothing in literature.
This is straightforward:
Generate X~ Gamma(p,alpha) (alpha being the rate parameter in your formulation)
Generate W~ Gamma(q,alpha), independent of X
Calculate Y=X+W
(X,Y) have the required bivariate distribution.
in R (assuming p,q,alpha and n are already defined):
x <- rgamma(n,p,alpha)
y <- x + rgamma(n,q,alpha)
generates n values from the bivariate distribution with parameters p,q,alpha

compute the density of a multivariate Dirichlet and Gamma distribution in R

I'd like to compute the density of a multivariate dirichlet distribution and to generate random realizations from such a distribution. Like what the function dmvnorm does with the multivariate normal distribution. I found this for the normal distribution and i would like to know if there is a function that could do this for the Dirichlet and Gamma distribution :
g <- expand.grid(x = seq(-2,2,0.05), y = seq(-2,2,0.05)) ## x and y are the 2 normal distributions.
g$z <- dmvnorm(x=cbind(g$x,g$y),mean = c(0,0),sigma = diag(2),log = FALSE)

Resources