How do I use the pgamma() function in R to compute the CDF of a gamma distribution? - r

I want to compute the cumulative distribution function in R for data that follows a gamma distribution. I understood how to do this with a lognormal distribution using the equation from Wikipedia; however, the gamma equation seems more complicated and I decided to use the pgamma() function.
I'm new to this and don't understand the following:
Why do I get three different values out of pgamma, and how does it make sense that they are negative?
Am I supposed to take the log of all the quantiles, just as I used log(mean) and log(standard deviation) when doing calculations with a lognorm distribution?
How do I conceptually understand the CDF calculated by pgamma? It made sense for lognorm that I was calculating the probability that X would take a value <= x, but there is no "x" in this pgamma function.
Really appreciate the help in understanding this.
shape <- 1.35721347
scale <- 1/0.01395087
quantiles <- c(3.376354, 3.929347, 4.462594)
pgamma(quantiles, shape = shape, scale = scale, log.p = TRUE)

Related

Is there an R function / package that can perform inverse normal distribution? [duplicate]

To plot a normal distribution curve in R we can use:
(x = seq(-4,4, length=100))
y = dnorm(x)
plot(x, y)
If dnorm calculates y as a function of x, does R have a function that calculates x as a function of y? If not what is the best way to approach this?
What dnorm() is doing is giving you a probability density function. If you integrate over that, you would have a cumulative distribution function (which is given by pnorm() in R). The inverse of the CDF is given by qnorm(); that is the standard way these things are conceptualized in statistics.
I'm not sure if the inverse of the density function is built in -- it's not used nearly as often as the inverse of the cumulative distribution function. I can't think offhand of too many situation where the inverse density function is useful. Of course, that doesn't mean there aren't any, so if you are sure this is the function you need, you could just do:
dnorminv<-function(y) sqrt(-2*log(sqrt(2*pi)*y))
plot(x, y)
points(dnorminv(y),y,pch=3)
The derivation of the inverse of the standard normal pdf is:

What does it mean to put an `rnorm` as an argument of another `rnorm` in R?

I have difficulty understanding what it means when an rnorm is used as one of the arguments of another rnorm? (I'll explain more below)
For example, below, in the first line of my R code I use an rnorm() and I call this rnorm(): mu.
mu consists of 10,000 x.
Now, let me put mu itself as the mean argument of a new rnorm() called "distribution".
My question is how mu which itself has 10,000 x be used as the mean argument of this new rnorm() called distribution?
P.S.: mean argument of any normal distribution can be a single number, and with only ONE single mean, we will have a single, complete normal. Now, how come, using 10,000 mu values still results in a single normal?
mu <- rnorm( 1e4 , 178 , 20 ) ; plot( density(mu) )
distribution <- rnorm( 1e4 , mu , 1 ) ; plot( density(distribution) )
You distribution is a conditional density. While the density you draw with plot(density(distribution)), is a marginal density.
Statistically speaking, you first have a normal random variable mu ~ N(178, 20), then another random variable y | mu ~ N(mu, 1). The plot you produce is the marginal density of y.
P(y), is mathematically an integral of joint distribution P(y | mu) * p(mu), integrating out mu.
#李哲源ZheyuanLi, ahhh! so when we use a vetor as the mean argument or sd argument of an rnorm, the single, final plot is the result of the integral, right?
It means you are sampling from the marginal distribution. The density estimate approximates the Monte Carlo integral from samples.
This kind of thing is often seen in Bayesian computation. Toy R code on Bayesian inference for mean of a normal distribution [data of snowfall amount] gives a full example, but integral is computed by numerical integration.

how to define a GEV (generalized extreme value) distribution to a copula?

I am trying to fit a copula for two variables which have extreme value distribution. for "mvdc" class, I need to define margins and parammargins. Since GEV is not included in default distribution functions of Rcopula, I got these two values by using "evd" package, by these two functions:
# pgev gives the Generalized Extreme Value distribution function
GEVmarginU1<-pgev(U1, loc=0, scale=1, shape=0, lower.tail = TRUE)
GEVmarginV2<-pgev(V2, loc=0, scale=1, shape=0, lower.tail = TRUE)
#fit a generalised extreme value distribution to my data
MU1 <- fgev(U1, scale = 1, shape = 0)
MV2 <- fgev(V2, scale = 1, shape = 0)
but when I give these values to "mvdc" function, I get an error
myMvd <- mvdc(copula = ellipCopula(family = "Frank", param = 0), margins = c(pgev, pgev),
paramMargins = list(list(MU1), list(MV2))
Most importantly, I want to be sure whether I am in a right track. Since two variables are obtained from discrete choice model, I have extreme value distribution. Also the marginal have GEV distribution, right? So I need to define GEV for "mvdc" otherwise my fitted copula will not wok well.
(1) Ui = β1Xi1 + β2Xi2 + β3Xi3 + εi
(2) Vi = γ1Yj1 + γ2Yj2 + γ3Yj3 + ηi
in summary:
(1) Ui = β'Xi' + εi
(2) Vi = γ'Yj' + ηi
Since these models are made from discrete choice modelling approach, the distribution function follows “extreme value” distribution. First step: I estimate coefficients of β1,β2,β3,γ1,γ2,γ3 separately for each variable of i and Vj by using multinomial logit model using Biogeme software. But intuitively I know that they are dependent variables, so I try to fit a copula and again estimate coefficients by considering dependency value. So, the joint probability that Ui and Vi is chosen by decision-maker n is:
These marginals are transformed to continuous, but still have extreme value distribution, am I right?!???
1) How can I define GEV when using “mvdc” copula class in Rcopula?
Second, assume I used “fitcopula” instead of “mvdc”, and got param(dependency parameter of copula), if I understood correctly, “fitcopula” is for parametric and in my case, it’s non-parametric, am I right?
2) Now, how should I update coefficients by using a joint distribution and dependency parameter???
For the first question, I found out that my marginals are logistic randomly distributed, since they are the difference between two error terms in the utility model and we know that error terms follow type 1 extreme value or Gumbel distribution, and the difference between two Gumbel distribution follow logistics distribution, according to the Wikipedia.

How does one extract hat values and Cook's Distance from an `nlsLM` model object in R?

I'm using the nlsLM function to fit a nonlinear regression. How does one extract the hat values and Cook's Distance from an nlsLM model object?
With objects created using the nls or nlreg functions, I know how to extract the hat values and the Cook's Distance of the observations, but I can't figure out how to get them using nslLM.
Can anyone help me out on this? Thanks!
So, it's not Cook's Distance or based on hat values, but you can use the function nlsJack in the nlstools package to jackknife your nls model, which means it removes every point, one by one, and bootstraps the resulting model to see, roughly speaking, how much the model coefficients change with or without a given observation in there.
Reproducible example:
xs = rep(1:10, times = 10)
ys = 3 + 2*exp(-0.5*xs)
for (i in 1:100) {
xs[i] = rnorm(1, xs[i], 2)
}
df1 = data.frame(xs, ys)
nls1 = nls(ys ~ a + b*exp(d*xs), data=df1, start=c(a=3, b=2, d=-0.5))
require(nlstools)
plot(nlsJack(nls1))
The plot shows the percentage change in each model coefficient as each individual observation is removed, and it marks influential points above a certain threshold as "influential" in the resulting plot. The documentation for nlsJack describes how this threshold is determined:
An observation is empirically defined as influential for one parameter if the difference between the estimate of this parameter with and without the observation exceeds twice the standard error of the estimate divided by sqrt(n). This empirical method assumes a small curvature of the nonlinear model.
My impression so far is that this a fairly liberal criterion--it tends to mark a lot of points as influential.
nlstools is a pretty useful package overall for diagnosing nls model fits though.

R : How to obtain the fitting values from distribution fit?

I fit gamma distribution on empirical distribution function using the $fitdist$ function:
fit = fitdist(data=empdistr,distr="gamma")
I then use the $denscomp$ function to compare data to fitted values:
dc = denscomp(fit)
But I would like to extract from $fit$ or from $dc$ the actual fitted values, i.e. the points of the gamma density (with the fitted parameters) which are displayed in the $denscomp$ function.
Does anybody have an idea of how I can do that.
Thanks in advance!
Use dgamma to predict the density for a given quantile:
dgamma(x, coef(fit)[1], coef(fit)[2])

Resources