How to plot the log-likelihood of binomial distribution - r

In order to solve (c), I think I need a plot of the log-likelihood of the binomial distribution. Can anyone please help me do it in R? The data and the question is as follows;
I think I need this kind of plot:

Something like this should work:
F <- c(18,31,34,33,27,33,28,23,33,12,19,25,14,4,22,7)
M <- c(11,22,27,29,24,29,25,26,38,14,23,31,20,6,34,12)
Y <- F
N <- F + M
#a)
Y / N
#b)
sum(Y) / sum(N)
#c)
logL <- function(p) sum(log(dbinom(Y, N, p)))
#plot logL:
p.seq <- seq(0.01, 0.99, 0.01)
plot(p.seq, sapply(p.seq, logL), type="l")
#optimum:
optimize(logL, lower=0, upper=1, maximum=TRUE)
As noted by Ben (see comments), the numerical accuracy is increased by the use of: logL <- function(p) sum(dbinom(Y,N,p,log=TRUE)) instead, especially it can "rescue" you in cases where dbinom() returns 0 but the likelihood-score is actually just close to 0.

Related

constrained optimisation using maxLik

I have read the maxLik document on how to do constrained optimization. However, I do not understand how I can do it. I have a custom likelihood function as below. The value of rho should be between 0 and 1 (making that two constraints). Now how exactly do I put those constraints? I have 3 parameters.
I have seen an almost similar question here with 3 constraints and 3 parameters but I am really a novice and do not understand the proposed hints on how to include the constraints ? how to use maxLik() to do the constrained optimization in R
require(maxLik)
data<- matrix(rnorm(3600,5,1),ncol=20)
Y=data[,c(1:20)]
Y <- as.matrix(Y, ncol=20)
p=4
T=nrow(Y)
X <- Y[p:(T-1),1:4]
unos <- rep(1,T)
X <- cbind(unos, X)
set.seed(101)
loglik <- function(theta) {
eta <- theta[1]
n <- theta[2]
rho <- theta[3]
coefis=as.matrix(c(mu0=0.0112, mu1=0.0115, mu2=0.009, mu3=0.021,
mu4=0.01237),ncol=1) #coefficients for the intercept and four lags
resi= Y- X%*%coefis
y <- Custom_lik(resi, eta, n, rho, T) #my custom likelihood function
return(-y[[1]])
}
m <- maxLik(loglik, start=c(eta=1.1, n=1.5, rho=0.5))

Maximum likelihood function - plot

Im doing some maximum likelihood in R. I have a dataset c(3,33,12,22,23) which corresponds to how long time a battery will last. The time a battery will last is Exp(theta) distributed. Im supposed to plot the Maximum likelihood function. I have this:
liklihood <- function(theta, n){
rpois(theta,n)
}
ac <- c(3,33,12,22,23)
theta <- seq(from=0, to=30, length=10)
plot(theta, liklihood(theta, n=length(ac)), type="l",xlab=expression(theta),
ylab=expression(L(theta)), col="blue",
main="1 (b)")
but it does not look right, can anyone please help we if you see what I do incorrect.
You should check the definition of the likelihood. Likelihood is product of densities of random sample and it is function of parameter. Your function liklihood samples (which is wrong) from Poisson distribution with probably wrong parameters - check ?rpois (first parameter is sample size and second is lambda).
For Poisson distribution:
likelihood <- function(theta, x) prod(dpois(x, theta))
log_likelihood <- function(theta, x) sum(log(dpois(x, theta)))
Now you want to calculate likelihood for different values of theta keeping vector x same (in your case that is vector ac):
theta <- seq(0, 30, by = 0.5)
plot(theta, unlist(lapply(theta, likelihood, x = ac)))
For exponential distribution change dpois with dexp but be aware that mean of exponential is 1/rate.
liklihood1 <- function(theta, x){
prod(dexp(n, 1/theta))
}
ac <- c(3,33,12,22,23)
theta <- seq(0, 100, by = 1)
plot(theta, unlist(lapply(theta, liklihood1, x =
ac)),type="l",xlab=expression(theta), ylab=expression(L(theta)),
col="blue", main="1 (b)")
is this correct for an exponential distribution?

R: Confidence intervals on non-linear fit with a non-analytic model

I need to fit x-y data with a model, which is non-analytic. I have a function f(x) that calculates the model for each x numerically, but there is no analytical equation. For the fit, I use optim in R. I minimise RMS between the model and the data. It works well and returns reasonable parameters.
I would like to find confidence intervals (or at least standard errors) on the best-fitting parameters. I found on internet that this can be done from the Hessian matrix, but only if maximising log-likelihood function. I don't know how to do this, all I have is x, y and f(x) from which I find RMS. Alas, I have no good way of estimating errors on y.
How can I find confidence intervals on my fit parameters?
Edit: perhaps an example in R might help explaining what I'm asking for. This example uses a simple analytic function to fit the data, in my real case the function is non-analytic, so I cannot use, e.g., nls.
set.seed(666)
# generate data
x <- seq(100) / 100
y <- 0.5 * x + rnorm(100, sd = 0.03) + 0.2
# function to fit
f <- function(x, a, b) {
a * x + b
}
# error function to minimise: RMS
errfun <- function(par, x, y) {
a <- par[1]
b <- par[2]
err <- sqrt(sum((f(x, a, b) - y)^2))
}
# use optim to fit the model to the data
par <- c(1, 0)
res <- optim(par, errfun, gr=NULL, x, y)
# best-fitting parameters
best_a <- res$par[1]
best_b <- res$par[2]
The best fitting parameters are a = 0.50 and b = 0.20. I need to find 95% confidence intervals on these.
This is a job for the bootstrap:
(1) create a large number of synthetic datasets x*. These are created by sampling from x with replacement the same number of data as were in x. For example, if your data is (1,2,3,4,5,6) an x* might be (5,2,4,4,2,3) (note that values might appear multiple times, or not at all because we are sampling with replacement)
(2) For each x*, calculate f(x*). If there are other parameters which don't depend on the data, don't change them. (so f(x,a,b,c) becomes f(x*,a,b,c) as long as a,b,c don't depend on x. Call these quantities f*.
(3) You can estimate anything you want from these f*. If you want the standard deviation of f(x), take the standard deviation of f*. If you want the 95% confidence interval, take the range from the 2.5 to the 97.5 percentiles of f*. More formally, if you want to estimate g(f(x)) you estimate it as g(f(x*)).
I should say this is a very practically-oriented explanation of the bootstrap. I have glossed over many theoretical details, but the bootstrap is near-universally applicable (basically as long as the thing you are trying to estimate actually exists, you are usually okay).
To apply this to the example you have given in your code:
x <- seq(100) / 100
y <- 0.5 * x + rnorm(100, sd = 0.03) + 0.2
# function to fit
f <- function(x, a, b) {
a * x + b
}
# error function to minimise: RMS
errfun <- function(par, x, y) {
a <- par[1]
b <- par[2]
err <- sqrt(sum((f(x, a, b) - y)^2))
}
# this is the part where we bootstrap
# use optim to fit the model to the data
best_a <- best_b <- numeric(10000)
for(i in 1:10000){
j <- sample(100,replace=TRUE)
x.boot <- x[j]; y.boot <- y[j]
par <- c(1, 0)
res <- optim(par, errfun, gr=NULL, x.boot, y.boot)
# best-fitting parameters
best_a[i] <- res$par[1]
best_b[i] <- res$par[2]
}
# now, we look at the *vector* best_a
# for example, if you want the standard deviation of a,
sd(best_a)
# or a 95% confidence interval for b,
quantile(best_b,c(0.025,0.975))

How to create a confidence interval for a weighted average of areas under the ROC curve in r?

What I have is a vector with different areas under the ROC curve (of different studies), e.g,
a <- c(.91, .85, .76, .89)
I also have the absolute number of participants in each study, e.g.,
n <- c(50, 34, 26, 47)
I calculated the weighted average for the areas with
weighted.mean(a, n)
Is there a way in R to also calculate the 95% confidence intervals of the weighted mean, based on the information I have? I looked into pROC, but as far as I understood it, there you need the raw data for each ROC curve (which I don't have). I would be very thankful for any suggestions!
weighted.ttest.ci <- function(x, weights, conf.level = 0.95) {
require(Hmisc)
nx <- length(x)
df <- nx - 1
vx <- wtd.var(x, weights, normwt = TRUE) ## From Hmisc
mx <- weighted.mean(x, weights)
stderr <- sqrt(vx/nx)
tstat <- mx/stderr ## not mx - mu
alpha <- 1 - conf.level
cint <- qt(1 - alpha/2, df)
cint <- tstat + c(-cint, cint)
cint * stderr
}
> weighted.ttest.ci(a,n)
[1] 0.7696 0.9627

Why does this 3D curve in R show a clear maximum for sigma^2 but not for mu?

So I used the rgl package and created my own likelihood function to output the log likelihood of a sample from a normal distribution. I was doing this really just to learn how to program this myself so I could better understand how likelihood works and also how MLE works. Anyways, I noticed something particularly strange and I wanted to know if someone knew the answer here. When I plot the graph, it comes out in a folded curve shape, but I supposed I was expecting more of a cone type shape. Basically, what im curious about is why when the plot peaks at the sigma^2 value (on this axis, there is a good decline on both sides of the peak), the mu value stays roughly the same? It's as if once the sigma^2 parameter has reached the optimal level, the differences in likelihood between mu values are pretty small. For example, when I check the variance of the likelihoods of the maximum point of sigma (keeping it constant), it's 11.5. In contrast, when I check the variance of the mu's across that same point, the variance is 23402. Since I can't yet post images since I don't have enough reputation, I will just post my R-code that produces the graph.
#Define LL function
LL <- function(X, theta)
{
mu <- theta[1]
sigma2 <- theta[2]
log.likelihood <- 0
n <- length(X)
for (i in 1:length(X))
{
log.likelihood <- log.likelihood - (((X[i]-mu)^2)/(2*sigma2)) -
log(sqrt(2*pi*sigma2))
}
return(log.likelihood)
}
#Parameters
Mu <- 100
Sigma2 <- 50
#Sample
N <- 100
set.seed(1)
IQs <- rnorm(N, mean=Mu, sd=sqrt(Sigma2))
#Possible values to test
x <- posMu <- seq(80, 120, length.out=200)
y <- posSig <- seq(20, 60, length.out=200)
#x1 <- sort(x, decreasing=T)
#Produce LLs for plotting
LLlist <- NULL
for (m in 1:length(posMu)){
LLs <- NULL
for(s in 1:length(posSig)){
posTheta <- cbind(posMu[m],posSig[s])
LLs <- c(LLs, LL(IQs,posTheta))
}
LLlist <- cbind(LLlist,LLs, deparse.level=0)
}
z <- LLlist
#Find the approximate MLE
mLL <- which(LLlist == max(LLlist), arr.ind=TRUE)
cbind(posMu[mLL[2]],posSig[mLL[1]],LLlist[mLL])
#Graph the LLs
library(rgl)
open3d()
plot3d(mean(x),mean(y),mean(z), xlab="Mu", ylab="Sigma2", zlab="log L", xlim=c(min(x),max(x)), ylim=c(min(y),max(y)), zlim=c(min(z),max(z)))
surface3d(x, y, z, color=rainbow(length(x)))
So, is my code just wrong? Or is this what a LL curve should look like? If so, why is it that sigma^2 seems to show a clear curve and height whereas mu hardly differs at the maximum? Thanks in advance!

Resources