Computing the marginal likelihood of a Gaussian model in R with integrate() - r

I am trying to compute the marginal likelihood of a Gaussian model in R. More precisely, I am trying to integrate the likelihood over both a Gaussian prior on mu and a Gaussian prior on sigma, with some observations yi.
In other words, I am trying to compute:
I tried to write this in R using the following function (following a similar SA question here: Quadrature to approximate a transformed beta distribution in R):
marglik <- function(data) {
integrand <-
Vectorize(function(data, mu, sigma) {
prod(dnorm(data, mu, sigma) ) * dnorm(mu, 110, 1) * dnorm(sigma, 10, 1)
} )
integrate(integrand, lower = 0, upper = Inf, mu = 100, sigma = 10)$value
}
Using this function, I can compute the marginal likelihood of the above model for a set of observations:
set.seed(666)
d <- rnorm(100, mean = 107.5, sd = 2.5)
marglik(data = d)
[1] 9.704133e-24
However, the results I obtain with this procedure are quite different from results I obtain with grid approximation, or using other packages/softwares.
My question is then: is it possible to do this double integration with integrate ? If it is, how would you do that ?

integrate() only takes in univariate functions. That is, the function you put in must be one-dimensional.
In general, such a problem is better tackled using specialised tools, either using something bridgesampling, ie. through the bridgesampling package if you have MCMC output or the cubature package if you have more general multivariate integration problems.
However, if we absolutely must do this using integrate() twice, we can make this work, but some errors need to be taken out of the code, and . Something like the following would work, although numerically the result seems to be zero most of the time, which is why you would generally try to obtain the log-marginal likelihood.
marglik <- function(data) {
# Function that integrates over mu for given sigma.
mu_integrand <- Vectorize(function(sigma) {
mu_given_sigma_fun <- Vectorize(function(mu) {
prod(dnorm(data, mu, sigma) ) * dnorm(mu, 110, 1) * dnorm(sigma, 10, 1)
})
integrate(mu_given_sigma_fun, lower = -Inf, upper = Inf)$value
})
integrate(mu_integrand, lower = 0, upper = Inf)$value
}
set.seed(666)
d <- rnorm(100, mean = 110, sd = 10)
marglik(data = d)

Related

my location-scale estimator function not working with polynomial mean

I'm building my own maximum likelihood estimator that estimates the parameters associated with the mean and standard deviation. On simulated data my function works when the true mean is a linear function and the standard deviation is constant. However, if the mean structure is polynomial my function cannot recover the true parameters. Can anybody point me to a solution?
I'm aware there are plenty of existing functions for estimating means and SDs. I'm not interested in them, I'm interested in why my function is not working.
Below is a reproducible example where my model does not recover the true standard deviation (true sd = 1.648, mysd = 4.184123)
*Edit: added library()
library(tidyverse)
my_poly_loglik <- function(pars, #parameters
outcome, #outcome variable
poly_mean){ #data frame of polynomials
#modelling the mean - adding intercept column
mean_mdl = cbind(1, poly_mean) %*% pars[1:(ncol(poly_mean) + 1)]
#modelling the standard deviation on exponential scale
sd_mdl = exp(pars[length(pars)])
#computing log likelihood
sum_log_likelihood <- sum(dnorm(outcome,
mean = mean_mdl,
sd = sd_mdl,
log = TRUE),
na.rm = TRUE)
#since optim() is minimizing we want the -log likelihood
return(-sum_log_likelihood)
}
#Generate data
set.seed(103)
n <- 100000 #100k obs
z <- runif(n, min = 0.1, max = 40) #independent variable sampled uniformly
mean <- 10 + 0.2 * z + 0.4 * z^2 #mean structure
sd = exp(0.5) #constant SD
y <- rnorm(n,mean, sd)
#Visualizing simulated data
#plot(z,mean)
#plot(z,sd)
#plot(z,y)
mydf = data.frame(z,y)
#Defining polynomials
polymean = cbind(z, z^2)
#Initial values. 2 extra for mean_intercept and SD
pars = rep(0, ncol(polymean) + 2)
#Optimising my likelihood function
optim_res <- optim(pars,
fn = my_poly_loglik,
outcome = mydf$y,
poly_mean = polymean)
if (optim_res$convergence != 0) stop("optim_res value is not 0!")
#comparing my function to the real parameter
plot_df = data.frame("mymean" = optim_res$par[1] + (polymean %*% optim_res$par[2:3]),
"truemean" = mean,
"z" = z)
#my mean (black) and true mean (red)
plot_df %>%
ggplot(aes(x = z, y = mymean)) +
geom_line() +
geom_line(aes(y = truemean), color = "red")
#Works!
#my SD and true SD - PROBLEM!
sd #true sd
exp(optim_res$par[length(optim_res$par)]) #my sd
this is not a complete solution but it might help others find the correct answer.
The code looks good overall and the issue emerges only with a high range of the z values. In fact, scaling them or generating data from a considerably lower range leads to the correct solution. Furthermore, checking the hessian shows that the covariance matrix of the estimates is not positive semidefinite and slightly reducing the range results in correlations of the mean parameters close to 1. (This is a bit puzzling since a normal linear model with the same parametrization does not suffer from the same issue -- I know it does not optimize the likelihood directly, but still a bit unintuitive to me).
So, a temporal solution might be rescaling the predictors / using an orthogonal parametrization? But that does not really explain core of the issue.

How to fit normal distribution with respect to frequency and intensity in R?

I have a list of data
frequency x1,x2,...,xn
i.e. 10,20,...,5000.
Intensity y1,yx,...,yn
0,0,...,50,60,50,...,0
where I want to fit a normal distribution to the data.
I found some website online such as (http://www.di.fc.ul.pt/~jpn/r/distributions/fitting.html) through the procedure like,
my_data <- rnorm(250, mean=1, sd=0.45)# unkonwn distribution parameters
fit <- fitdistr(my_data, densfun="normal")
but obviously, those methods won't work.
How to fit the above data to a normal distribution?
You can use the maximum likelihood function, mle, to solve this problem. Here is how you would do that:
my_data <- rnorm(250, mean=1, sd=0.45)# unkonwn distribution parameters
logLik <- function(sigma, mu){
ll <- vapply(my_data,
function(x) dnorm(x, mean = mu, sd = sigma),
FUN.VALUE = numeric(1))
-sum(log(ll))
}
mle(logLik, start = list(sigma = 1, mu = 1))
mle requires a log-likehood function that it uses to determine the optimal parameters (which in the case of a normal distribution are mu (mean) and sigma (st. dev.)). It takes the negative sum of the log-likelihood -sum(log(ll)) as part of a numerical procedure to find the best parameters for the distribution. It then returns the estimated parameters:
Call:
mle(minuslogl = logLik, start = list(sigma = 1, mu = 1))
Coefficients:
sigma mu
0.4595003 0.9724402

mle failed to estimate the parameters with the error code 7

I'm trying to estimate the Weibull-Gamma Distribution parameters, but I'm encountering the following error:
"the function mle failed to estimate the parameters, with the error
code 7"
What do I do?
The Weibull-Gamma Distribution
Density Function
dWeibullGamma <- function(x, alpha, beta, lambda)
{
((alpha*beta)/(lambda))*(x^(alpha-1))*(1+(1/lambda)*x^(alpha))^(-(beta+1))
}
Cumulative Distribution Function
pWeibullGamma <- function(x, alpha, beta, lambda)
{
1-(1+(1/lambda)*x^(alpha))^(-(beta))
}
Hazard Function
hWeibullGamma <- function(x, alpha, beta, lambda)
{
((alpha*beta)/(lambda))*(x^(alpha-1))*(1+(1/lambda)*x^(alpha))^(-(beta+1))/(1+(1/lambda)*x^(alpha))^(-(beta))
}
Survival Function
sWeibullGamma <- function(x,alpha,beta,lambda)
{
(1+(1/lambda)*x^(alpha))^(-(beta))
}
Estimation
paramWG = fitdist(data = dadosp, distr = 'WeibullGamma', start = c(alpha=1.5,beta=1,lambda=1.5), lower= c(0, 0))
summary(paramWG)
Sample:
dadosp = c(240.3,71.9,271.3, 186.3,241,253,287.4,138.3,206.9,176,270.4,73.3,118.9,203.1,139.7,31,269.6,140.2,205.1,133.2,107,354.6,277,27.6,186,260.9,350.4,242.6,292.5, 112.3,242.8,310.7,309.9,53.1,326.5,145.7,271.5, 117.5,264.7,243.9,182,136.7,103.8,188.3,236,419.8,338.6,357.7)
For your sample, the algorithm does not converge when estimating the ML. Fitting a Weibull-Gamma distribution to this data would require an extremely high lambda value. You can solve this problem by estimating log10(lambda) instead of lambda.
You can add lambda <- 10^lambda inside your 4 functions, e.g.
dWeibullGamma <- function(x, alpha, beta, lambda)
{
lambda <- 10^lambda
((alpha*beta)/(lambda))*(x^(alpha-1))*(1+(1/lambda)*x^(alpha))^(-(beta+1))
}
Then, the algorithm seems to converge:
library(fitdistrplus)
paramWG = fitdist(data = data, distr = 'WeibullGamma',
start = list(alpha=1, beta=1, lambda=1), lower = c(0, 0, 0))
summary(paramWG)$estimate
Output:
alpha beta lambda
2.432939 799.631852 8.680802
We see that the estimate of lambda is 10^8.68, hence the convergence problem when not taking the log.
You can also have a look at the fit as follows:
newx <- 0:500
pars <- summary(paramWG)$estimate
pred <- dWeibullGamma(newx, pars["alpha"], pars["beta"], pars["lambda"])
hist(data, freq = FALSE)
lines(newx, pred, lwd = 2)
Note: maybe fitting another distribution would make more sense?

mgcv: obtain predictive distribution of response given new data (negative binomial example)

In GAM (and GLM, for that matter), we're fitting a conditional likelihood model. So after fitting the model, for a new input x and response y, I should be able to compute the predictive probability or density of a specific value of y given x. I might want to do this to compare the fit of various models on validation data, for example. Is there a convenient way to do this with a fitted GAM in mgcv? Otherwise, how do I figure out the exact form of the density that is used so I can plug in the parameters appropriately?
As a specific example, consider a negative binomial GAM :
## From ?negbin
library(mgcv)
set.seed(3)
n<-400
dat <- gamSim(1,n=n)
g <- exp(dat$f/5)
## negative binomial data...
dat$y <- rnbinom(g,size=3,mu=g)
## fit with theta estimation...
b <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=nb(),data=dat)
And now I want to compute the predictive probability of, say, y=7, given x=(.1,.2,.3,.4).
Yes. mgcv is doing (empirical) Bayesian estimation, so you can obtain predictive distribution. For your example, here is how.
# prediction on the link (with standard error)
l <- predict(b, newdata = data.frame(x0 = 0.1, x1 = 0.2, x2 = 0.3, x3 = 0.4), se.fit = TRUE)
# Under central limit theory in GLM theory, link value is normally distributed
# for negative binomial with `log` link, the response is log-normal
p.mu <- function (mu) dlnorm(mu, l[[1]], l[[2]])
# joint density of `y` and `mu`
p.y.mu <- function (y, mu) dnbinom(y, size = 3, mu = mu) * p.mu(mu)
# marginal probability (not density as negative binomial is discrete) of `y` (integrating out `mu`)
# I have carefully written this function so it can take vector input
p.y <- function (y) {
scalar.p.y <- function (scalar.y) integrate(p.y.mu, lower = 0, upper = Inf, y = scalar.y)[[1]]
sapply(y, scalar.p.y)
}
Now since you want probability of y = 7, conditional on specified new data, use
p.y(7)
# 0.07810065
In general, this approach by numerical integration is not easy. For example, if other link functions like sqrt() is used for negative binomial, the distribution of response is not that straightforward (though also not difficult to derive).
Now I offer a sampling based approach, or Monte Carlo approach. This is most similar to Bayesian procedure.
N <- 1000 # samples size
set.seed(0)
## draw N samples from posterior of `mu`
sample.mu <- b$family$linkinv(rnorm(N, l[[1]], l[[2]]))
## draw N samples from likelihood `Pr(y|mu)`
sample.y <- rnbinom(1000, size = 3, mu = sample.mu)
## Monte Carlo estimation for `Pr(y = 7)`
mean(sample.y == 7)
# 0.076
Remark 1
Note that as empirical Bayes, all above methods are conditional on estimated smoothing parameters. If you want something like a "full Bayes", set unconditional = TRUE in predict().
Remark 2
Perhaps some people are assuming the solution as simple as this:
mu <- predict(b, newdata = data.frame(x0 = 0.1, x1 = 0.2, x2 = 0.3, x3 = 0.4), type = "response")
dnbinom(7, size = 3, mu = mu)
Such result is conditional on regression coefficients (assumed fixed without uncertainty), thus mu becomes fixed and not random. This is not predictive distribution. Predictive distribution would integrate out uncertainty of model estimation.

R: How to fit a large dataset with a combination of distributions?

To fit a dataset of real-valued numbers (x) with one distribution, we can use MASS as follows either the gamma or Student's t distribution:
fitdistr(x, "gamma")
or
fitdistr(x2, "t")
What if I believe my dataset should fit by the sum of gamma and t distributions?
P(X) = Gamma(x) + t(x)
Can I fit the parameters of mixtures of probability distributions using Maximum Likelihood fitting in R?
There are analytic maximum-likelihood estimators for some parameters, such as the mean of a normal distribution or the rate of an exponential distribution. For other parameters, there is no analytic estimator, but you can use numerical analysis to find reasonable parameter estimates.
The fitdistr() function in R uses numerical optimization of the log-likelihood function by calling the optim() function. If you think that your data is a mixture of Gamma and t distribution, then simply make a likelihood function that describes such a mixture. Then, pass those parameter values to optim() for optimization. Here is an example using this approach to fitting a distribution:
library( MASS )
vals = rnorm( n = 10000, mean = 0, sd = 1 )
print( summary(x_vals) )
ll_func = function(params) {
log_probs = log( dnorm( x = vals, mean = params[1], sd = params[2] ))
tot = sum(log_probs)
return(-1 * tot)
}
params = c( 0.5, 10 )
print( ll_func(params) )
res = optim( params, ll_func )
print( res$par )
Running this program in R produces this output:
[1] "mean: 0.0223766157516646"
[1] "sd: 0.991566611447471"
That's fairly close to the initial values of mean = 0 and sd = 1.
Don't forget that with a mixture of two distributions, you have one extra parameter that specifies the relative weights between the distributions. Also, be careful about fitting lots of parameters at once. With lots of free parameters you need to worry about overfitting.
Try mixdist. Here's an example of a mixture of three distributions:
https://stats.stackexchange.com/questions/10062/which-r-package-to-use-to-calculate-component-parameters-for-a-mixture-model

Resources