In a coin flip, we would like to compute p(theta|Data), where theta is the underlying parameter.
The prior follows a beta distribution with parameters a and b.
The likelihood follows a Bernoulli distribution which gives us the probability of coming up heads.
Here is the code implementation:
a = 1 # a and b are the beta distribution's parameters
b= 1
num = 1e5 #Number of candidate theta values
z= 17220 #Number of heads
N= 143293 #Total number of flips
Theta = seq(0.07,0.12, length.out= num)
prior = dbeta(Theta, a,b) #Compute the prior at each value
likelihood = Theta^z *(1-Theta)^(N-z)
pData = likelihood * prior /sum(likelihood * prior) #Compute evidence
posterior = likelihood*prior / pData
I would like to verify that the posterior is equal to the analytical solutions beta(a+z, N-z+b). However, since the likelihood equals 0 because the theta values are small, the probability of the evidence is a Nan and so is the posterior.
I have tried computing the log likelihood but it gives me a large negative number which is equal to 0 when taking the exponential.
Theta = seq(0.07,0.12, by= num_steps)
lprior = log(dbeta(Theta, a,b)) #Compute the log prior at each value
llikelihood = log(Theta)*z + log(1-Theta)*(N-z) #log likelihood
lpData = llikelihood + lprior - sum(llikelihood + lprior) #compute evidence
lposterior = llikelihood+lprior - lpData
posterior = exp(lposterior)
plot(Theta, posterior, type="l")
lines(Theta, exp(llikelihood), type="l")
lines(Theta, exp(lprior), type="l")
If my ultimate goal is to have a nice graph that shows the posterior, likelihood and prior like so
How should I be computing each value?
This answer has been provided through the comment section by #JosephClarkMcIntyre.
Here is a summary:
In a Bernoulli trial, when N -the total number of trials- and z -the total number of success are large and the underlying parameter theta is small, it is better to only operate in the log space and never take the exponential.
Moreover, since the log function is increasing, comparing the log posteriors of two distributions is similar to comparing the posterior.
The above implementation was wrong because the formula for computing the evidence is not correct. p(evidence) = sum(likelihood*prior), p(log_evidence)= sum(log_likelihood +log_prior)
This is the final code, where the prior, likelihood and evidence are in the log space:
a = 1 # a and b are the beta distribution's paramteres
b= 1
num_steps = 1e5
z= 17220 #Number of heads
N= 143293 #Total number of flips
Theta = seq(from=0.07,to=0.12, length.out= num_steps)
lprior = dbeta(Theta, a,b,log=TRUE) #Compute the log prior at each value
llikelihood = log(Theta)*z + log1p(-Theta)*(N-z) #log likelihood
lpData = sum(llikelihood + lprior) #compute log of the evidence
lposterior = llikelihood+lprior - lpData
plot(Theta,log(dbeta(Theta,a+z,N-z+b)))
plot(Theta, lposterior, type="l")
However, the analytical and the computed log posterior are not the same as shown in the graph..
Feel free to comment if you think there is a flaw in this answer or explain why the analytical and computed log posterior are not the same. ^^
Related
I'm going to estimate some parameters of linear equation systems with repeated measures. My equations will look like this:
Variant1:
Variant2:
At least 10 values (repeated measures; technical replicates) are known for every and . I want to estimate the values for and resp. .
Additionally I'd like to know the standard error of these estimates, if possible.
In R, my data set would look like this (in reality I have :
i <- rep(1:3, each = 30)
j <- rep(rep(1:3, each = 10), 3)
K.i <- rep(c(6, 5, 10), each = 30) + rnorm(90)
K.ij <- K.i + rnorm(90)
# X_i, X_ij and x_ij should be 0 (since I assumed K_j being K_ij + normal noise
data <- cbind(i, j, K.i, K.ij)
How can I estimate the expected parameter values (minimizing the sums of squares) and the standard errors of these estimates in R?
Thanks a lot in advance for your help!
I'm building my own maximum likelihood estimator that estimates the parameters associated with the mean and standard deviation. On simulated data my function works when the true mean is a linear function and the standard deviation is constant. However, if the mean structure is polynomial my function cannot recover the true parameters. Can anybody point me to a solution?
I'm aware there are plenty of existing functions for estimating means and SDs. I'm not interested in them, I'm interested in why my function is not working.
Below is a reproducible example where my model does not recover the true standard deviation (true sd = 1.648, mysd = 4.184123)
*Edit: added library()
library(tidyverse)
my_poly_loglik <- function(pars, #parameters
outcome, #outcome variable
poly_mean){ #data frame of polynomials
#modelling the mean - adding intercept column
mean_mdl = cbind(1, poly_mean) %*% pars[1:(ncol(poly_mean) + 1)]
#modelling the standard deviation on exponential scale
sd_mdl = exp(pars[length(pars)])
#computing log likelihood
sum_log_likelihood <- sum(dnorm(outcome,
mean = mean_mdl,
sd = sd_mdl,
log = TRUE),
na.rm = TRUE)
#since optim() is minimizing we want the -log likelihood
return(-sum_log_likelihood)
}
#Generate data
set.seed(103)
n <- 100000 #100k obs
z <- runif(n, min = 0.1, max = 40) #independent variable sampled uniformly
mean <- 10 + 0.2 * z + 0.4 * z^2 #mean structure
sd = exp(0.5) #constant SD
y <- rnorm(n,mean, sd)
#Visualizing simulated data
#plot(z,mean)
#plot(z,sd)
#plot(z,y)
mydf = data.frame(z,y)
#Defining polynomials
polymean = cbind(z, z^2)
#Initial values. 2 extra for mean_intercept and SD
pars = rep(0, ncol(polymean) + 2)
#Optimising my likelihood function
optim_res <- optim(pars,
fn = my_poly_loglik,
outcome = mydf$y,
poly_mean = polymean)
if (optim_res$convergence != 0) stop("optim_res value is not 0!")
#comparing my function to the real parameter
plot_df = data.frame("mymean" = optim_res$par[1] + (polymean %*% optim_res$par[2:3]),
"truemean" = mean,
"z" = z)
#my mean (black) and true mean (red)
plot_df %>%
ggplot(aes(x = z, y = mymean)) +
geom_line() +
geom_line(aes(y = truemean), color = "red")
#Works!
#my SD and true SD - PROBLEM!
sd #true sd
exp(optim_res$par[length(optim_res$par)]) #my sd
this is not a complete solution but it might help others find the correct answer.
The code looks good overall and the issue emerges only with a high range of the z values. In fact, scaling them or generating data from a considerably lower range leads to the correct solution. Furthermore, checking the hessian shows that the covariance matrix of the estimates is not positive semidefinite and slightly reducing the range results in correlations of the mean parameters close to 1. (This is a bit puzzling since a normal linear model with the same parametrization does not suffer from the same issue -- I know it does not optimize the likelihood directly, but still a bit unintuitive to me).
So, a temporal solution might be rescaling the predictors / using an orthogonal parametrization? But that does not really explain core of the issue.
I try to find any way for test Poisson residuals like normals in aov(). In my hypothetical example:
# For normal distribution
x <- rep(seq(from=10, to=50, by=0.5),6)
y1 <- rnorm(length(x), mean=10, sd=1.5)
#Normality test in aov residuals
y1.av<-aov(y1 ~ x)
shapiro.test(y1.av$res)
# Shapiro-Wilk normality test
#
#data: y1.av$res
#W = 0.99782, p-value = 0.7885
Sounds silly, OK!!
Now, I'll like to make a same approche but for Poisson distribution:
# For Poisson distribution
x <- rep(seq(from=10, to=50, by=0.5),6)
y2 <- rpois(x, lambda=10)
#Normality test in aov residuals
y2.av<-aov(y2 ~ x)
poisson.test(y2.av$res)
Error in poisson.test(y2.av$res) :
'x' must be finite, nonnegative, and integer
There is any stat approach for make this?
Thanks!
You could analyse your data below a counting context. Discrete data, such as variables of Poisson nature, can be analysed based on observed frequencies. You can formulate hypothesis testing for this task. Being your data y you can contrast the null hypothesis that y follows a Poisson distribution with some parameter lambda against the alternative hypothesis that y does not come from the Poisson distribution. Let's sketch the test with you data:
#Data
set.seed(123)
# For Poisson distribution
x <- rep(seq(from=10, to=50, by=0.5),6)
y2 <- rpois(x, lambda=10)
Now we obtain the counts, which are elemental for the test:
#Values
df <- as.data.frame(table(y2),stringsAsFactors = F)
df$y2 <- as.integer(df$y2)
After that we must separate the observed values O and its groups or categories classes. Both elements constitute the y variable:
#Observed values
O <- df$Freq
#Groups
classes <- df$y2
As we are testing a Poisson distribution, we must compute the lambda parameter. This can be obtained with Maximum Likelihood Estimation (MLE). The MLE for Poisson is the mean (considering we have counts and groups in order to determine this value), so we compute it with next code:
#MLE
meanval <- sum(O*classes)/sum(O)
Now, we have to get the probabilities of each class:
#Probs
prob <- dpois(classes,meanval)
Poisson distribution can go to infinite values, so we must compute the probability for the values that can be greater than our last group in order to have probabilities that sum to one:
prhs <- 1-sum(prob)
This probability can be easily added to the last value of our group in order to transform to account for values greater or equal to it (For example, instead of only having the probability that y equals to 20 we can have the probability that y is greater or equal to 20):
#Add probability
prob[length(prob)]<-prob[length(prob)]+prhs
With this we can conduct a goodness of fit test using chisq.test() function in R. It requires the observed values O and the probabilities prob that we have computed. Just a reminder that this test uses to set wrong degrees of freedom, so we can correct it by the formulation of the test that uses k-q-1 degrees. Where k is the number of groups and q is the number of parameters computed (we have computed one parameter with MLE). Next the test:
chisq.test(O,p=prob)
The output:
Chi-squared test for given probabilities
data: O
X-squared = 7.6692, df = 17, p-value = 0.9731
The key value from the test is the X-squared value which is the test statistic. We can reuse the value to obtain the real p-value (In our example, we have k=18 and minus 2, the degrees of freedom are 16).
The p.value can be obtained with next code:
p.value <- 1-pchisq(7.6692, 16)
The output:
[1] 0.9581098
As this value is not greater that known significance levels we do not reject the null hypothesis and we can affirm that y comes from a Poisson distribution.
I've spent 2 months wondering if this question is StackOverflow worthy, and I've concluded it is.
I'm volunteering on a team for a year to forecast a number of interesting things, a few months ago it was forecasting the probability of the number of earthquakes worldwide over mag 5 during the month of March. Really interesting problem. Thought I was reasonably good with R and then I hit this problem like a brick wall. It is a count problem, wanted to use Poisson distribution, but it won't work, mean and variance aren't equal. It is over dispersed.
The goal is to estimate the probability of:
<100 earthquakes
100-140 earthquakes 140-170 earthquakes 170-210
earthquakes 210 earthquakes
But I wrote some code here:
#(load data and libaries blah blah blah)
quakes_this_month<-10
days_left=31-1
days_left
month_left<- days_left/31
month_left
earthq5<- earthq4
earthq5$mag<-earthq5$mag*month_left
mu <- mean(earthq5$mag)
sigma <- sd(earthq5$mag)
paste("mean is ",mu, " and sigma is ", sigma)
pnorm((99-quakes_this_month) , mu, sigma, lower.tail = T)
lower.bound<- 100 -quakes_this_month
upper.bound<- 140.5-quakes_this_month
(pnorm(upper.bound, mu, sigma, lower.tail = T) - pnorm(lower.bound, mu, sigma))
lower.bound<- 140.5-quakes_this_month
upper.bound<- 170.5-quakes_this_month
(pnorm(upper.bound, mu, sigma) - pnorm(lower.bound, mu, sigma))
lower.bound<- 170.5-quakes_this_month
upper.bound<- 210.5-quakes_this_month
(pnorm(upper.bound, mu, sigma) - pnorm(lower.bound, mu, sigma))
(pnorm(210.5-quakes_this_month, mu, sigma, lower.tail = F))
So the idea here is as the month progresses and some earthquakes have happened, I can estimate the probability of hitting those limit thresholds. However, this isn't a Gaussian distribution, I can't use pnorm, I should use pnbinom(q, size, prob, mu, lower.tail = TRUE, log.p = FALSE) but I don't know how to get the 'size' and 'prob' out of a count problem. This isn't taking 15 balls out of a jar 4 time. So I'm reaching out on this one, as it's been haunting me for weeks.
How can I use pnbinom() in place of pnorm() given this is about earthquake counts per month?
So I found the answer, and for anyone else, here is how I did it. The data I was using was from USGS about earthquakes. There are quite a few other libraries I use in R. I think only MASS is needed for this example.
Load library and data
library(MASS)
quakeSim <- earthq4$count # this was my real data
quakeSim <- rnbinom(n = 12000, mu = 145, size =18) # you can use this for the example
Test for distribution fit checking 3 likely distributions, Gaussian, Poisson, and Negative Binomial
quakeDistNB <- MASS::fitdistr(quakeSim, densfun = "negative binomial")
quakeDistPois <- MASS::fitdistr(quakeSim, densfun = "poisson")
quakeDistGaus<-MASS::fitdistr(quakeSim, densfun = "normal")
Compare Negative binomial, Poisson, and Guassian - lower AIC is better so pick the distribution with the lowest AIC.
AIC(quakeDistNB)
AIC(quakeDistPois)
AIC(quakeDistGaus)
Quick check on Normalicy with shapiro test. (if Gaussian is lowest)
shapiro.test(earthq4$count)
Use the 5% rule. But it is NB, and not Gaussian so ignore all the CI tests below
summary(earthq4)
t.test(earthq4$count ) #default 0.95
So my data shows Negative Binomial distribution. Now lets look at it as a histogram with enough bins to show the shape of a a NB.
visualize empirical distrib
hist(quakeSim, breaks=80)
Fit a negative binomial model and get the two critical values sizeHat and muHat from the output of the model 'quakeDistNB'
This part really drove me nuts until a friend shows me.
quakeDistNB <- MASS::fitdistr(earthq4$count , densfun = "negative binomial")
quakeDistNB
sizeHat <- quakeDistNB$estimate[1]
sizeHat
muHat <- quakeDistNB$estimate[2]
Now then, my problem was to predict the probability of less than 100 earthquakes and between 150 and 100 of greater than or equal to 5 magnitude.
Then the probability of fewer than 100:
p100 <- pnbinom(q = 100, size = sizeHat, mu = muHat)
p100
probability of fewer than 150:
p150 <- pnbinom(q = 150, size = sizeHat, mu = muHat)
p150
probability of 100 to 150:
p150 - p100
In GAM (and GLM, for that matter), we're fitting a conditional likelihood model. So after fitting the model, for a new input x and response y, I should be able to compute the predictive probability or density of a specific value of y given x. I might want to do this to compare the fit of various models on validation data, for example. Is there a convenient way to do this with a fitted GAM in mgcv? Otherwise, how do I figure out the exact form of the density that is used so I can plug in the parameters appropriately?
As a specific example, consider a negative binomial GAM :
## From ?negbin
library(mgcv)
set.seed(3)
n<-400
dat <- gamSim(1,n=n)
g <- exp(dat$f/5)
## negative binomial data...
dat$y <- rnbinom(g,size=3,mu=g)
## fit with theta estimation...
b <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=nb(),data=dat)
And now I want to compute the predictive probability of, say, y=7, given x=(.1,.2,.3,.4).
Yes. mgcv is doing (empirical) Bayesian estimation, so you can obtain predictive distribution. For your example, here is how.
# prediction on the link (with standard error)
l <- predict(b, newdata = data.frame(x0 = 0.1, x1 = 0.2, x2 = 0.3, x3 = 0.4), se.fit = TRUE)
# Under central limit theory in GLM theory, link value is normally distributed
# for negative binomial with `log` link, the response is log-normal
p.mu <- function (mu) dlnorm(mu, l[[1]], l[[2]])
# joint density of `y` and `mu`
p.y.mu <- function (y, mu) dnbinom(y, size = 3, mu = mu) * p.mu(mu)
# marginal probability (not density as negative binomial is discrete) of `y` (integrating out `mu`)
# I have carefully written this function so it can take vector input
p.y <- function (y) {
scalar.p.y <- function (scalar.y) integrate(p.y.mu, lower = 0, upper = Inf, y = scalar.y)[[1]]
sapply(y, scalar.p.y)
}
Now since you want probability of y = 7, conditional on specified new data, use
p.y(7)
# 0.07810065
In general, this approach by numerical integration is not easy. For example, if other link functions like sqrt() is used for negative binomial, the distribution of response is not that straightforward (though also not difficult to derive).
Now I offer a sampling based approach, or Monte Carlo approach. This is most similar to Bayesian procedure.
N <- 1000 # samples size
set.seed(0)
## draw N samples from posterior of `mu`
sample.mu <- b$family$linkinv(rnorm(N, l[[1]], l[[2]]))
## draw N samples from likelihood `Pr(y|mu)`
sample.y <- rnbinom(1000, size = 3, mu = sample.mu)
## Monte Carlo estimation for `Pr(y = 7)`
mean(sample.y == 7)
# 0.076
Remark 1
Note that as empirical Bayes, all above methods are conditional on estimated smoothing parameters. If you want something like a "full Bayes", set unconditional = TRUE in predict().
Remark 2
Perhaps some people are assuming the solution as simple as this:
mu <- predict(b, newdata = data.frame(x0 = 0.1, x1 = 0.2, x2 = 0.3, x3 = 0.4), type = "response")
dnbinom(7, size = 3, mu = mu)
Such result is conditional on regression coefficients (assumed fixed without uncertainty), thus mu becomes fixed and not random. This is not predictive distribution. Predictive distribution would integrate out uncertainty of model estimation.