I have obtained the posterior density for part d: $2 theta^{-1}(1- theta)^{-1}$. How do I plot in R the distribution to find the l and u such that $F_{theta| x} (l) = 0.025$ and $F_{theta| x} (u) = 0.975$? (the equal-tail interval)
Your result is erroneous. By Bayes' theorem, the posterior density is proportional to p(theta)P(X=2|theta) = 1-theta. So we recognize the Beta distribution Beta(1,2). To graph it in R, you can do:
curve(dbeta(x, 1, 2), from = 0, to = 1)
Now the posterior equi-tailed credible interval is given by the quantiles of this distribution. In R:
qbeta(0.025, 1, 2) # lower bound
qbeta(0.975, 1, 2) # upper bound
If you don't know the Beta distribution, you can get these quantiles by elementary calculations. The integral of 1-theta on [0,1] is 1/2. So the posterior density is 2(1-theta) (it must integrate to one). So the posterior cumulative distribution function is 2(theta - theta²/2) = -theta² + 2theta. To get the p-quantile (with p=0.025 and p=0.975), you have to solve the equation -theta² + 2theta = p in theta. This a second-degree polynomial equation, easy to solve.
Finding the central 95% CI is actually easier than finding the 95% HPD. As you have the density (PDF), you also know the CDF. The lower and upper limits of the central 95% CI are given by CDF(l) = 0.025 and CRF(u) = 0.975.
Related
I need to plot the CDF of a generalized pareto distribution when x is greater than 100,000,000 with location parameter = 100,000,000, scale parameter = 49,761,000 and shape parameter = 0.10. The CDF starts at prob. 0.946844, the values below 100,000,000 are modeled by a uniform distribution. I only need to plot the CDF of the GPD.
library(DescTools)
x <- c(100000001:210580000)
pareto_distribution <- dGenPareto(x, 100000000, 49761000, 0.10)
graph <- data.frame(loss = x, probability = pareto_distribution)
plot(graph)
When I try the code above, the probabilities start at 0. I know that dGenPareto is not the code for the CDF but I was starting at the pdf and then going to calculate the CDF. How do I restrict the probability of the GPD so that it starts at the probability at 0.946844 not zero.
I am expecting the CDF of GPD to start at 0.946844 when x = 100,000,000. The x values are discrete.
Good Evening,
I fitted a four parameter logistic curve using R nls function with the following equation:
y = alpha + lambda/(1+exp(-beta(x-mu))
I would like to determine the maximum slope of this curve and for this I would like to compute the derivative of the function. Do you know how I can find the derivative of this function and use it to determine the maximum slope or the maximum derivative value?
Thank you in advance,
Rohan
I find the regular sigmoid equation y = 1/1+e-x and its derivative but not with the parameters.
I am expecting some help with the derivative of my equation and a piece of script that can help me to find the maximum value.
Let us say that the parameters you have calculated are alpha = 1, lambda = 2, beta = 3 and mu = 4. Then create the derivative function fder and use optimize to find its maximum. Evidently the maximum slope occurs at mu and equals 1.5 or substituting x = mu into the derivative fder we have that the derivative at the maximum equals lambda * beta / 4.
fder <- function(x, alpha = 1, lambda = 2, beta = 3, mu = 4) {}
body(fder) <- D(expression(alpha + lambda/(1+exp(-beta*(x-mu)))), "x")
optimize(fder, c(-10, 10), maximum = TRUE)
## $maximum
## [1] 3.99999
##
## $objective
## [1] 1.5
Starting from #G.Grothendieck's answer, here's a logical explanation of why the maximum derivative is lambda*beta/4.
The maximum derivative of the unscaled logistic function is 1/4, at x=0
The maximum derivative of 1/(1+exp(-beta*x)) is beta/4 at x=0 (you can look this up on Wikipedia
adjusting the midpoint (e.g. 1/(1+exp(-beta*(x-mu)))) shifts the location of the maximum derivative to x=mu but doesn't change its value
shifting the curve up by adding alpha (alpha + 1/(1+exp(-beta*(x-mu)))) doesn't change the max slope or its location
scaling the curve by lambda (alpha + lambda/(1+exp(-beta*(x-mu)))) scales the max derivative by lambda (beta/4 → lambda*beta/4)
By using the library(statmod), I can evaluate a univariate normal distribution using Gauss-Hermite quadrature. But how can I evaluate a bivariate normal distribution using Gauss-Hermite quadrature?
Any help will be appreciated. Thanks in advance.
I have given the codes that I used for evaluating univariate normal using Gauss-Hermite 5 points. But how I can I do for bivariate normal distribution?
library(statmod)
## generating Gauss-Hermite quadrature points and weights
q=gauss.quad(n=5,kind="hermite")
## defining univariate normal function
mu=0
sigma=2
norm=function(b){
M=((2*pi*sigma)^(-1/2))*exp(-(1/2)*(b^2/sigma^2))
return(M)
}
## approximating the integral of norm(b) using Gauss-Hermite method
sum(q$weights*norm(q$nodes)*exp(q$nodes^2))
Here is a minimal worked-through example, based on a bivariate standard normal (probability) density with a given covariance matrix sigma and the quadrature rules implemented in the mvQuad library.
Note that we use the Gauss-Legendre quadrature rule, which allows integration over an arbitrary bounded domain because in the Gauss-Hermite quadrature the domain is unbounded from (-∞, +∞). Since we're working with a bivariate standard normal probability density, the integral over the unbounded domain trivially equals 1. Generally, mvQuad::createNIGrid allows the implementation of various quadrature rules, including the Gauss-Hermite quadrature (see ?createNIGrid for details).
Define a covariance matrix for the a bivariate standard normal probability density
library(mvtnorm)
sigma <- matrix(c(1, 0.2, 0.2, 1), ncol = 2)
dens <- function(x) dmvnorm(x, sigma = sigma)
We are interested in the integral of dens in the domain x ϵ [-1, 2] and y ϵ [-1, 2].
We follow instructions from the mvQuad vignette to create a grid and rescale to the domain of interest
library(mvQuad)
grid <- createNIGrid(dim = 2, type = "GLe", level = 6)
rescale(grid, domain = rbind(c(-1, 2), c(-1, 2)))
Calculate the integral of the bivariate normal in the domain x ϵ [-1, 2] and y ϵ [-1, 2]
quadrature(function(x) dmvnorm(x, sigma = sigma), grid = grid)
#[1] 0.6796583
This value is in good agreement with the value from pmvnorm (which computes the distribution function of the multivariate normal for arbitrary limits and covariance matrices)
pmvnorm(lower = c(-1, -1), upper = c(2, 2), sigma = sigma)
#[1] 0.6796584
#attr(,"error")
#[1] 1e-15
#attr(,"msg")
#[1] "Normal Completion"
In a coin flip, we would like to compute p(theta|Data), where theta is the underlying parameter.
The prior follows a beta distribution with parameters a and b.
The likelihood follows a Bernoulli distribution which gives us the probability of coming up heads.
Here is the code implementation:
a = 1 # a and b are the beta distribution's parameters
b= 1
num = 1e5 #Number of candidate theta values
z= 17220 #Number of heads
N= 143293 #Total number of flips
Theta = seq(0.07,0.12, length.out= num)
prior = dbeta(Theta, a,b) #Compute the prior at each value
likelihood = Theta^z *(1-Theta)^(N-z)
pData = likelihood * prior /sum(likelihood * prior) #Compute evidence
posterior = likelihood*prior / pData
I would like to verify that the posterior is equal to the analytical solutions beta(a+z, N-z+b). However, since the likelihood equals 0 because the theta values are small, the probability of the evidence is a Nan and so is the posterior.
I have tried computing the log likelihood but it gives me a large negative number which is equal to 0 when taking the exponential.
Theta = seq(0.07,0.12, by= num_steps)
lprior = log(dbeta(Theta, a,b)) #Compute the log prior at each value
llikelihood = log(Theta)*z + log(1-Theta)*(N-z) #log likelihood
lpData = llikelihood + lprior - sum(llikelihood + lprior) #compute evidence
lposterior = llikelihood+lprior - lpData
posterior = exp(lposterior)
plot(Theta, posterior, type="l")
lines(Theta, exp(llikelihood), type="l")
lines(Theta, exp(lprior), type="l")
If my ultimate goal is to have a nice graph that shows the posterior, likelihood and prior like so
How should I be computing each value?
This answer has been provided through the comment section by #JosephClarkMcIntyre.
Here is a summary:
In a Bernoulli trial, when N -the total number of trials- and z -the total number of success are large and the underlying parameter theta is small, it is better to only operate in the log space and never take the exponential.
Moreover, since the log function is increasing, comparing the log posteriors of two distributions is similar to comparing the posterior.
The above implementation was wrong because the formula for computing the evidence is not correct. p(evidence) = sum(likelihood*prior), p(log_evidence)= sum(log_likelihood +log_prior)
This is the final code, where the prior, likelihood and evidence are in the log space:
a = 1 # a and b are the beta distribution's paramteres
b= 1
num_steps = 1e5
z= 17220 #Number of heads
N= 143293 #Total number of flips
Theta = seq(from=0.07,to=0.12, length.out= num_steps)
lprior = dbeta(Theta, a,b,log=TRUE) #Compute the log prior at each value
llikelihood = log(Theta)*z + log1p(-Theta)*(N-z) #log likelihood
lpData = sum(llikelihood + lprior) #compute log of the evidence
lposterior = llikelihood+lprior - lpData
plot(Theta,log(dbeta(Theta,a+z,N-z+b)))
plot(Theta, lposterior, type="l")
However, the analytical and the computed log posterior are not the same as shown in the graph..
Feel free to comment if you think there is a flaw in this answer or explain why the analytical and computed log posterior are not the same. ^^
For a brief background, I am insterested in describing a distribution of fire sizes, which is presumed to follow a lognormal distribution (many small fires and few large fires). For my specific application I am only interested in the fires that fall within a certain range of sizes (> min, < max). So, I am attempting to fit a lognormal distribution to a data set that has been censored on both ends. In essence, I want to find the parameters of the lognormal distribution (mu and sigma) that best fits the full distribution prior to censoring. Can I fit the distribution taking into account that I know I am only looking a a portion of the distribution?
I have done some experimentation, but have become stumped. Here's an example:
# Generate data #
D <- rlnorm(1000,meanlog = -0.75, sdlog = 1.5)
# Censor data #
min <- 0.10
max <- 20
Dt <- D[D > min]
Dt <- Dt[Dt <= max]
If I fit the non-censored data (D) using either fitdistr (MASS) or fitdist (fitdistrplus) I obviously get approximately the same parameter values as I entered. But if I fit the censored data (Dt) then the parameter values do not match, as expected. The question is how to incorporate the known censoring. I have seen some references elsewhere to using upper and lower within fitdistr, but I encounter an error that I'm not sure how to resolve:
> fitt <- fitdist(Dt, "lognormal", lower = min, upper = max)
Error in fitdist(Dt, "lognormal", lower = min, upper = max) :
The dlognormal function must be defined
I will appreciate any advice, first on whether this is the appropriate way to fit a censored distribution, and if so, how to go about defining the dlognormal function so that I can make this work. Thanks!
Your data is not censored (that would mean that observations outside the interval
are there, but you do not know their exact value)
but truncated (those observations have been discarded).
You just have to provide fitdist with the density and the cumulative distribution function
of your truncated distribution.
library(truncdist)
dtruncated_log_normal <- function(x, meanlog, sdlog)
dtrunc(x, "lnorm", a=.10, b=20, meanlog=meanlog, sdlog=sdlog)
ptruncated_log_normal <- function(q, meanlog, sdlog)
ptrunc(q, "lnorm", a=.10, b=20, meanlog=meanlog, sdlog=sdlog)
library(fitdistrplus)
fitdist(Dt, "truncated_log_normal", start = list(meanlog=0, sdlog=1))
# Fitting of the distribution ' truncated_log_normal ' by maximum likelihood
# Parameters:
# estimate Std. Error
# meanlog -0.7482085 0.08390333
# sdlog 1.4232373 0.0668787